1

I'm going through the Stanford AI course on Coursera (like many others with similar questions). For univariate linear regression, supposedly there is only one (global) minimum for the cost function (squared error). I assume this can be proven mathematically and is also apparent based on the graph of the cost function, but I have a doubt about it intuitively.

Imagine that your data takes the shape of a symmetrical "X." In this case, wouldn't there be two optimal solutions/parameters (e.g. for slope): one positive and one negative?

Cheers!

1 Answers1

1

If you have doubts intuitively you can think of the following. Take a univariate regression $y_{t}=\beta x_{t}+\epsilon_{t}$ for t=1,...,T. Now imagine to consider for each row t an equation of the kind $y_{t}=\beta x_{t} $. In this case you have T equations (i.e. a system of T equation) in a single unknown $\beta$, as you have to find the beta that simultaneously solves all the T equations (one for each observation of y and x in the sample of length T). Which is impossible, as long as y is not a linear combination of x (for example $ y=3x $ in the population in which case the solution exists and is unique $\beta = 3$ and the sum of squared residuals is 0). The best proxy for a solution of such system of T equations will be the beta that minimizes the squared errors (or equivalently the squared norm of the vector $\epsilon$). That solution is the unique solution to the OLS problem, and in this case can be seen as $(x^{T}x)^{-1}x^{T}y$ where $x$ is a vector. Notice that when $X$ is also matrix, then such product is the product of the Moore Penrose peeudoinverse of matrix $X$ given by $(X^{T}X)^{-1}X^{T}$ and the vector of the observations of the dependent variable y. Which is unique as Moore Penrose pseudoinverse is uniquely defined (and clearly y is unique as well). For more info see here .

Fr1
  • 1,348
  • 3
  • 10