5. Classification
The magnitude of the weights are irrelevant b/c we only take the sign.
We can control false pos/neg by setting the boundary to something other than 0.
Max-Margin Solution
Too many perfect classifiers → chose weights with max distance between the boundary and the data points that are closest to it.
Boundary to datapoint dist: (projecting onto , then normalizing by )
Since only ‘s direction matters, we fix
If correctly classified: Since picking from perfect ones:
Max-margin classifier:
Support Vector Machine (SVM)
Since minimax difficult to solve → minimize the norm of subject to all points being at least distance 1 from the boundary:
Support vectors: the vectors sitting exactly 1 unit away from the boundary. If you removed any other point, the solution wouldn’t change.
Implicit Bias of GD: GD w/ logistic loss yields max-margin solution (recently proven):
Soft margin
Multi-classification
Two options:
- Train one vs. rest:
- Computationally expensive
- Splitting one from rest may not result in best overall classification
- Train via cross entropy
Errors first class
- Control via decision boundary (can be mutated by choosing a value other than 0)
Distribution shifts
- Model trained on Australian crop disease data might not hold in Europe because the underlying distributions are diff
- Solution: Reuse feature extraction, retrain classifier