5. Classification

The magnitude of the weights are irrelevant b/c we only take the sign.

We can control false pos/neg by setting the boundary to something other than 0.

Max-Margin Solution

Too many perfect classifiers → chose weights with max distance between the boundary and the data points that are closest to it.

Boundary to datapoint dist: (projecting onto , then normalizing by )

Since only ‘s direction matters, we fix

If correctly classified: Since picking from perfect ones:

Max-margin classifier:

Support Vector Machine (SVM)

Since minimax difficult to solve → minimize the norm of subject to all points being at least distance 1 from the boundary:

Support vectors: the vectors sitting exactly 1 unit away from the boundary. If you removed any other point, the solution wouldn’t change.


Implicit Bias of GD: GD w/ logistic loss yields max-margin solution (recently proven):

Soft margin

Multi-classification

Two options:

  • Train one vs. rest:
    • Computationally expensive
    • Splitting one from rest may not result in best overall classification
  • Train via cross entropy

Errors first class

  • Control via decision boundary (can be mutated by choosing a value other than 0)

Distribution shifts

  • Model trained on Australian crop disease data might not hold in Europe because the underlying distributions are diff
  • Solution: Reuse feature extraction, retrain classifier