Linear Regression

The most basic machine learning case is linear regression.

The model is linear plus a bias. If we include the bias then the model wouldn’t be linear in the parameters. (We don’t have homogeneity and additivity)

The function is the function of a line:

f_{Θ} (x_{i}) = y_{i}

The $n$ pairs of data points $(x_{i}, y_{i})$ are given. The $x_{i}$ are called regressors.

The goal is to find the parameters $θ$ that when plugged into the equation give out a line of best fit.

A good choice for a loss function, in this case, is the mean squared error (MSE) between the input (that is the ground truth) and the predicted output.

ϵ = a, b \in R min \frac{1}{n} i = 1 \sum n (y_{i} - f_{Θ} (x_{i}))^{2}

(Note that we can ignore the $\frac{1}{n}$ factor since it reduces the sum by a constant, so the minimum will change but not the chosen parameters).

When $f_{Θ}$ is linear, we can call the error the least-squares approximation.

We can rewrite the function as:

ϵ = Θ min ℓ_{Θ} ({x_{i}, y_{i}})

or more generally:

ϵ = Θ min ℓ (Θ)

Note that the loss is defined on the entire dataset and not only a single data point. Also, note that $f_{Θ}$ is linear but $ℓ_{Θ}$ is quadratic.

Today there is no unified theory that allows us to do constrained learning. The constraint is on the parameters, for example, constrain them to be positive. We will mostly deal with unconstrained problems.

Another constraint is to make the model output a probability distribution, meaning that all the possibilities in the output vector sum to 1.

Closed form solution for Linear regression

A closed form solution means that we have a formula to solve a certain problem. Most of the times in real world’s problems this is not possible.

Starting from the MSE problem, we can set the gradient of the loss to zero and solve for the parameters in order to find a closed form solution. Remember that the loss is convex so we are guaranteed to find the global minimum.

After some calculations that can be found on the slides (Lecture 4 - 63 to 68) we find the closed form solution:

θ = (X^{T} X)^{- 1} X^{T} y

Equivalent for more dimensions

Until now we have seen the case in which each data point is just one number, meaning is one dimensional.

In the more general case, the data points are vectors in $R^{d}$ where $d$ is the dimensionality.

The linear regression formula can be expressed as:

y_{i} = A x_{i} + b

The closed form solution is:

Θ = (X X^{T})^{- 1} X Y^{T}

👨🏽‍💻 Domiziano's Notes

Explorer

Linear Regression

Closed form solution for Linear regression

Equivalent for more dimensions

Graph View

Table of Contents

Backlinks