4. Data

  • Samples are picked from some unknown distribution + some noise

where

expectation

Bias and variance

K-fold Cross Validation

  • Split training data into subsets
  • Train models: Hold back one as validation set, train on other
  • Average validation error
  • Pick hyperparameters with lowest validation error
  • Train on entire dataset
  • Evaluate on unbiased test data

Best: b/c would be very similar to . Typical in practice: 5 or 10.

Limit Model Complexity

Limit the search space of possible functions by:

  • Restrict polynomial degree
  • LASSO (L1 regularization): Limit L1 norm for some (unknown) (lasso)
  • Ridge (L2 regularization): Limit L2 norm for some unknown

Key distinction: LASSO → sparse (feature selection), Ridge → small but nonzero weights.