Support Vector Machines

Support Vector Machines are a classifier that allows to build a non linear decision boundary much like multinomial logistic regression, but without having to manually choose the variations of the features ( $x^{2}, x, x^{2} 3 x$ etc.).

The main concepts behind how a SVM work are Optimal Margin Classifiers and Kernels.

Optimal Margin Classifier

For now we assume that the dataset is separable, meaning it can exists a linear boundary that separates the examples.

While at its core SVM is a linear classifier, it can become a nonlinear classifier with some kernel tricks, since it will always find a linear separator, but in a higher-dimensional transformed space that allows the model to capture non-linear relationships.

Functional Margin

Recap on how binary classification works in logistic regression
$h_{θ} (x) = g (θ^{T} x)$
The classifier predicts $1$ if $θ^{T} x > 0$ , and $0$ otherwise. This because $θ^{T} x > 0 \Rightarrow h_{θ} (x) \geq 0.5$ .

In other words, this means that if $y^{(i)} = 1$ , we hope that $θ^{T} x ≫ 0$ ; if $y^{(i)} = 0$ , we hope that $θ^{T} x ≪ 0$ .

Functional margin of hyperplane defined by $(w, b)$ with respect to $(x^{(i)}, y^{(i)})$

The parameters $w, b$ defines an hyperplane, that in just a higher dimension line that separates the samples.

We define the functional margin as following:

\overset{γ}{^}^{(i)} = y^{(i)} (w^{T} x^{(i)} + b)

As before for logistic regression:

If $y^{(i)} = 1$ , we want and hope that $w^{T} x^{(i)} + b ≫ 0$ ;
If $y^{(i)} = 0$ , we want and hope that $w^{T} x^{(i)} + b ≪ 0$

And this means that we want and hope that $\overset{γ}{^}^{(i)} ≫ 0$ .

We can define the functional margin with respect to the entire training set as:

\overset{γ}{^} = i = 1 min n \overset{γ}{^}^{(i)}

There is a problem here, since it’s possible to “cheat” and change the functional margin, without influencing the decision boundary, by just multiplying for the same factor both the parameters $w$ and $b$ .

In order to solve this problem, is to normalize the length of the parameters. One way of doing this is forcing the magnitude of the vector to be 1 ( $∥ x ∥ = 1$ ), or by replacing $(w, b)$ with $(\frac{w}{∥ w ∥}, \frac{b}{∥ b ∥})$ .

Geometric Margin

The geometric margin describes how much the decision boundary stands apart from the examples. A decision boundary that separates best the points, meaning that doesn’t comes very close to some training samples, has a larger geometric margin hence is better.

More formally, the geometric margin is the euclidean distance between the training sample coordinates and the line (or hyperplane) that represents the decision boundary.

We formally define the geometric margin with respect to a particular training sample $(x^{(i)}, y^{(i)})$ as:

γ^{(i)} = \frac{y ^{(i)} ( w ^{T} x ^{(i)} + b )}{∥ x ∥}

We can define the geometric margin with respect to entire training set as the worst gemetric margin w.r.t a particular training sample:

γ = i = 1 min n γ^{(i)}

We can now see that the geometric margin and the functional margin are correlated:

γ^{(i)} = \frac{γ ^ ^{(i)}}{∥ x ∥}

An optimal margin classifier aims to find the parameters $w, b$ that maximise $γ$ .

In other words, this means to find:

w, b max γ s.t. \forall i γ^{(i)} \geq γ

This problem written like this cannot be solved with algorithms like gradient descend or similar, but can be rewritten in order to be optimized with one of those algorithms:

w, b min ∥ x ∥^{2} s.t. \forall i γ^{(i)} \geq 0

Kernels

Let’s suppose that the parameter $w$ can be expressed as a linear combination between the training examples:

w = i = 1 \sum n α_{i} y^{(i)} x^{(i)} where x^{(i)} \in R^{k} with k very large

The Representer theorem prove that this is possible without loosing any performances.

tags:#ai/machine-learning

👨🏽‍💻 Domiziano's Notes

Explorer

Support Vector Machines

Optimal Margin Classifier

Functional Margin

Functional margin of hyperplane defined by $(w, b)$ with respect to $(x^{(i)}, y^{(i)})$

Geometric Margin

Kernels

Graph View

Table of Contents

Backlinks

👨🏽‍💻 Domiziano's Notes

Explorer

Support Vector Machines

Optimal Margin Classifier

Functional Margin

Functional margin of hyperplane defined by (w,b) with respect to (x(i),y(i))

Geometric Margin

Kernels

Graph View

Table of Contents

Backlinks

Functional margin of hyperplane defined by $(w, b)$ with respect to $(x^{(i)}, y^{(i)})$