Notes on Python Machine Learning - Chapter 01

This is the notes of the first chapter of the book Python Machine Learning.
It introduces three types of machine learning tasks and the common pre-steps of these tasks.

1. Three Types of ML

type feature task
Supervised Learning
Labeled data
Direct feedback
Predict outcome/future
Classification
Regression
Unsupervised Learning
No labels
No Feedback
Find hidden data structure
Clustering
Dimensionality Reduction
Reinforcement Learning
Decision process
Reward System
Learn series of actions
Interactive problems

2. Pre-Steps of ML

2.1 Process Data

  • Extract meaningful features
  • Scale features value range
  • Dimensionality reduction

2.2 Split SubSets

  • Train
    • The training set is used to train the model
    • 70-80%
  • Validation
    • tune the model’s hyperparameters and to make decisions about the model’s structure
    • 10-15%
  • Test
    • provide an unbiased evaluation of the final model
    • 10-15%

2.3 Choose Performance Metrics

  • Accuracy
    • TP+TNTP+TN+FP+FN\frac{TP + TN}{TP + TN + FP + FN}
    • the proportion of predictions that your model got right, among all predictions it made
    • accuracy can be misleading if your classes are imbalanced among different classes
  • Precission
    • TPTP+FP\frac{TP}{TP + FP}
    • the proportion of true positive predictions among all positive predictions
    • tend to be higher if the model tends to make less true positive predictions
    • it says nothing about the items it failed to label as positive
    • suitable if you want to avoid false positives(identifying spam emails)
  • Recall
    • TPTP+FN\frac{TP}{TP + FN}
    • the proportion of true positive predictions among all actual positives
    • tend to be higher if the model tends to make more true positive predictions
    • it says nothing about the items it falsely labeled as positive
    • suitable if you want to avoid false negatives(identifying disease)
  • F1 Score
    • 2PrecisionRecallPrecision+Recall2\frac{Precision * Recall}{Precision + Recall}
    • balance precision and recall