2023-07-15

Notes on Python Machine Learning - Chapter 01

This is the notes of the first chapter of the book Python Machine Learning.
It introduces three types of machine learning tasks and the common pre-steps of these tasks.

1. Three Types of ML

type feature task

type	feature	task
`Supervised Learning`	Labeled data Direct feedback Predict outcome/future	Classification Regression
`Unsupervised Learning`	No labels No Feedback Find hidden data structure	Clustering Dimensionality Reduction
`Reinforcement Learning`	Decision process Reward System Learn series of actions	Interactive problems

Supervised Learning

Labeled data
Direct feedback
Predict outcome/future

Classification
Regression

Unsupervised Learning

No labels
No Feedback
Find hidden data structure

Clustering
Dimensionality Reduction

Reinforcement Learning

Decision process
Reward System
Learn series of actions

Interactive problems

2. Pre-Steps of ML

2.1 Process Data

Extract meaningful features
Scale features value range
Dimensionality reduction

2.2 Split SubSets

Train
- The training set is used to train the model
- 70-80%
Validation
- tune the model’s hyperparameters and to make decisions about the model’s structure
- 10-15%
Test
- provide an unbiased evaluation of the final model
- 10-15%

2.3 Choose Performance Metrics

Accuracy
- $\frac{TP + TN}{TP + TN + FP + FN}$
- the proportion of predictions that your model got right, among all predictions it made
- accuracy can be misleading if your classes are imbalanced among different classes
Precission
- $\frac{TP}{TP + FP}$
- the proportion of true positive predictions among all positive predictions
- tend to be higher if the model tends to make less true positive predictions
- it says nothing about the items it failed to label as positive
- suitable if you want to avoid false positives(identifying spam emails)
Recall
- $\frac{TP}{TP + FN}$
- the proportion of true positive predictions among all actual positives
- tend to be higher if the model tends to make more true positive predictions
- it says nothing about the items it falsely labeled as positive
- suitable if you want to avoid false negatives(identifying disease)
F1 Score
- $2\frac{Precision * Recall}{Precision + Recall}$
- balance precision and recall

ImHuWQ

穷则独善其身,达则兼济天下