4. Data
- Samples are picked from some unknown distribution + some noise
where
expectation
Bias and variance
K-fold Cross Validation
- Split training data into subsets
- Train models: Hold back one as validation set, train on other
- Average validation error
- Pick hyperparameters with lowest validation error
- Train on entire dataset
- Evaluate on unbiased test data
Best: b/c would be very similar to . Typical in practice: 5 or 10.
Limit Model Complexity
Limit the search space of possible functions by:
- Restrict polynomial degree
- LASSO (L1 regularization): Limit L1 norm for some (unknown) (lasso)
- Ridge (L2 regularization): Limit L2 norm for some unknown
Key distinction: LASSO → sparse (feature selection), Ridge → small but nonzero weights.