5. Classification

The magnitude of the weights are irrelevant b/c we only take the sign.

We can control false pos/neg by setting the boundary to something other than 0.

Max-Margin Solution

Too many perfect classifiers → chose weights with max distance between the boundary and the data points that are closest to it.

Boundary to datapoint dist: $dist (w, x_{i}) = \frac{w ^{T} x _{i}}{∥ w ∥ _{2}}$ (projecting onto $w$ , then normalizing by $w$ )

Since only $w$ ‘s direction matters, we fix $∥ w ∥_{2} = 1$ $⟹ dist (w, x_{i}) = ∣ w^{T} x_{i} ∣$

If correctly classified: $y_{i} \cdot w^{T} x_{i} > 0 (y_{i} \in {- 1, 1})$ Since picking from perfect ones: $dist (w, x_{i}) = y_{i} \cdot w^{T} x_{i}$

Max-margin classifier: $w_{MM}$ $= ∣ w ∣_{2} = 1 max margin (w)$ $= ∣ w ∣_{2} = 1 max 1 \leq i \leq n min y_{i} \cdot w^{T} x_{i}$

Support Vector Machine (SVM)

Since minimax difficult to solve → minimize the norm of $w$ subject to all points being at least distance 1 from the boundary:

w \in R^{d} min ∥ w ∥_{2} s.t. y_{i} \cdot w^{T} x_{i} \geq 1 \forall i

Support vectors: the vectors sitting exactly 1 unit away from the boundary. If you removed any other point, the solution wouldn’t change.

Implicit Bias of GD: GD w/ logistic loss yields max-margin solution (recently proven):

t \to \infty lim \frac{w ^{t}}{∥ w ^{t} ∥ _{2}} = w_{MM} .

Soft margin

Multi-classification

Two options:

Train one vs. rest:
- Computationally expensive
- Splitting one from rest may not result in best overall classification
Train via cross entropy

Errors first class

Control via decision boundary (can be mutated by choosing a value other than 0)

Distribution shifts

Model trained on Australian crop disease data might not hold in Europe because the underlying distributions are diff
Solution: Reuse feature extraction, retrain classifier

Uni Notes

Explorer