18

Let's say that we have X of shape (2, 5)
and y of shape (2,)

This works: np.linalg.lstsq(X, y)

We would expect this to work only if X was of shape (N,5) where N>=5 But why and how?

We do get back 5 weights as expected but how is this problem solved?

Isn't it like we have 2 equations and 5 unknowns?
How could numpy solve this?
It must do something like interpolation to create more artificial equations?..

1 Answers1

23

My understanding is that numpy.linalg.lstsq relies on the LAPACK routine dgelsd.

The problem is to solve:

$$ \text{minimize} (\text{over} \; \mathbf{x}) \quad \| A\mathbf{x} - \mathbf{b} \|_2$$

Of course, this does not have a unique solution for a matrix A whose rank is less than length of vector $\mathbf{b}$. In the case of an undetermined system, dgelsd provides a solution $\mathbf{z}$ such that:

  • $A\mathbf{z} = \mathbf{b}$
  • $\| \mathbf{z} \|_2 \leq \|\mathbf{x} \|_2$ for all $\mathbf{x}$ that satisfy $A\mathbf{x} = \mathbf{b}$. (i.e. $\mathbf{z}$ is the minimum norm solution to the undetermined system.

Example, if system is $x + y = 1$, numpy.linalg.lstsq would return $x = .5, y = .5$.

How does dgelsd work?

The routine dgelsd computes the singular value decomposition (SVD) of A.

I'll just sketch the idea behind using an SVD to solve a linear system. The singular value decomposition is a factorization $U \Sigma V' = A$ where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix where the diagonal entries are known as singular values.

The effective rank of matrix $A$ will be the number of singular values that are effectively non-zero (i.e. sufficiently different from zero relative to machine precision etc...). Let $S$ be a diagonal matrix of the non-zero singular values. The SVD is thus:

$$ A = U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V' $$

The pseudo-inverse of $A$ is given by:

$$ A^\dagger = V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U' $$

Consider the solution $\mathbf{x} = A^\dagger \mathbf{b}$. Then:

\begin{align*} A\mathbf{x} - \mathbf{b} &= U \begin{bmatrix} S & 0 \\ 0 & 0 \end{bmatrix} V' V \begin{bmatrix} S^{-1} & 0 \\ 0 & 0 \end{bmatrix} U' \mathbf{b} - \mathbf{b} \\ &= U \begin{bmatrix} I & 0 \\ 0 & 0 \end{bmatrix} U' \mathbf{b} - \mathbf{b}\\ \end{align*}

There basically two cases here:

  1. The number of non-zero singular values (i.e. the size of matrix $I$) is less than the length of $\mathbf{b}$. The solution here won't be exact; we'll solve the linear system in the least squares sense.
  2. $A\mathbf{x} - \mathbf{b} = \mathbf{0}$

This last part is a bit tricky... need to keep track of matrix dimensions and use that $U$ is an orthogonal matrix.

Equivalence of pseudo-inverse

When $A$ has linearly independent rows, (eg. we have a fat matrix), then: $$A^\dagger = A'\left(AA' \right)^{-1}$$

For an undetermined system, you can show that the pseudo-inverse gives you the minimum norm solution.

When $A$ has linearly independent columns, (eg. we have a skinny matrix), then: $$A^\dagger = \left(A'A \right)^{-1}A'$$

Matthew Gunn
  • 20,541
  • 1
  • 47
  • 85