Understanding Machine Learning

Chapter 1

Introduction

Pages 19–30

What is learning? Types, relations to other fields, and how to read this book.

Chapter 2

A Gentle Start

Pages 33–42

The statistical learning framework and empirical risk minimization.

Chapter 3

A Formal Learning Model

Pages 43–53

PAC learning and agnostic PAC learning formalized.

Chapter 4

Learning via Uniform Convergence

Pages 54–59

When uniform convergence guarantees learnability.

Chapter 5

The Bias-Complexity Tradeoff

Pages 60–66

No-Free-Lunch theorem and error decomposition.

Chapter 6

The VC-Dimension

Pages 67–82

Measuring hypothesis class complexity with VC-dimension.

Chapter 7

Nonuniform Learnability

Pages 83–99

SRM, MDL, Occam's Razor, and consistency.

Chapter 8

The Runtime of Learning

Pages 100–114

Computational complexity and implementing ERM.

Chapter 9

Linear Predictors

Pages 117–129

Halfspaces, linear regression, and logistic regression.

Chapter 10

Boosting

Pages 130–143

Weak learnability, AdaBoost, and face recognition.

Chapter 11

Model Selection and Validation

Pages 144–155

Cross-validation, hold-out sets, and model selection.

Chapter 12

Convex Learning Problems

Pages 156–170

Convexity, Lipschitzness, and surrogate loss functions.

Chapter 13

Regularization and Stability

Pages 171–183

Tikhonov regularization and the fitting-stability tradeoff.

Chapter 14

Stochastic Gradient Descent

Pages 184–201

GD, SGD, subgradients, and convergence analysis.

Chapter 15

Support Vector Machines

Pages 202–214

Hard-SVM, Soft-SVM, margin, and support vectors.

Chapter 16

Kernel Methods

Pages 215–226

Feature space embeddings and the kernel trick.

Chapter 17

Multiclass, Ranking, Complex Prediction

Pages 227–249

One-vs-All, structured output, and ranking.

Chapter 18

Decision Trees

Pages 250–257

Tree algorithms, gain measures, pruning, and random forests.

Chapter 19

Nearest Neighbor

Pages 258–267

k-NN, generalization bounds, and curse of dimensionality.

Chapter 20

Neural Networks

Pages 268–284

Feedforward networks, backpropagation, and expressive power.

Chapter 21

Online Learning

Pages 287–306

Online classification, weighted majority, and online convex optimization.

Chapter 22

Clustering

Pages 307–322

Linkage, k-means, spectral clustering, and information bottleneck.

Chapter 23

Dimensionality Reduction

Pages 323–341

PCA, random projections, and compressed sensing.

Chapter 24

Generative Models

Pages 342–356

MLE, Naive Bayes, LDA, EM algorithm, and Bayesian reasoning.

Chapter 25

Feature Selection and Generation

Pages 357–372

Filters, greedy selection, sparsity, and auto-encoders.

Chapter 26

Rademacher Complexities

Pages 375–387

Rademacher complexity, calculus, and generalization bounds.

Chapter 27

Covering Numbers

Pages 388–391

Covering and chaining for complexity analysis.

Chapter 28

Proof of Fundamental Theorem

Pages 392–401

Upper and lower bounds for PAC learning.

Chapter 29

Multiclass Learnability

Pages 402–409

Natarajan dimension and multiclass fundamental theorem.

Chapter 30

Compression Bounds

Pages 410–414

Compression-based generalization bounds.

Chapter 31

PAC-Bayes

Pages 415–418

PAC-Bayes bounds for learning theory.