Why the inverse for unknown coefficients vector?

Question

From my understanding, this formula is used for least-squares when we're interested in minimizing the distance between a point and some space we are projecting on.

Somebody can correct me if this is not accurate but that is not what the question is about.

My question is about the inverse aspect of the formula. I'm not understanding the purpose of the inverse. What does it actually do? I get the purpose of transpose (to make the matrix multiplication work) and I get that A represents the input values and B the observed values.

I have worked on some problems so I get how to do the actual math but it is difficult to imagine.

Thanks

In a handwaving sense, you want something like $\mathbf y = \mathbf X \boldsymbol{\beta}$, though you will not get it because of errors and you actually model $\mathbf y = \mathbf X \boldsymbol{\beta} + \boldsymbol{\varepsilon}$. Even ignoring that point, you would like to be able to have $\boldsymbol{\beta} = \mathbf X^{-1} \mathbf y$. But that raises a further problem that $X$ is unlikely to be invertible because it is typically not a square matrix.... — Henry, Oct 09 '21 at 01:10
... But you could say $\mathbf y = \mathbf X \boldsymbol{\beta}$ would lead to $\mathbf X^T\mathbf y = \mathbf X^T\mathbf X \boldsymbol{\beta}$ and that in turn would suggest $(\mathbf X^T \mathbf X)^{-1}\mathbf X^T \mathbf y = \boldsymbol{\beta}$. This can now usually work since $\mathbf X^T \mathbf X$ is square; you are using the [Moore–Penrose pseudoinverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse) $(\mathbf X^T \mathbf X)^{-1}\mathbf X^T$. That part ignores the errors, but the effect is a least-squares minimisation of the residuals. — Henry, Oct 09 '21 at 01:10

Why the inverse for unknown coefficients vector?

0 Answers0