Polynomial curve fitting - unique solution

Question

Let's say I have $$\mathbf{y} = (y_1, ..., y_N)^T \text{ and } \mathbf{X} \in \mathbb{R}^{M\times N}$$. Likewise, I have $\mathbf{y} = \mathbf{X}\mathbf{w}$ where $\mathbf{w} = (w_0, ..., w_M)^T$. I'm representing SSE as $$||\mathbf{Xw}-\mathbf{d}||_2^2 = (\mathbf{Xw}-\mathbf{d})^T(\mathbf{Xw}-\mathbf{d})$$

My first(minor) question is the following. What does it mean when people bluntly write $$\text{Note: } \nabla \mathbf{w}^T\mathbf{Aw} = (\mathbf{A}+\mathbf{A}^T\mathbf{w})$$ I don't quite understand what $\mathbf{A}$ generally stands for in the domain of curve fitting.

Secondly, the following statement was made: $$\nabla E(\mathbf{w}) = 0$$ $$\mathbf{X}^T(\mathbf{X}\mathbf{w}-\mathbf{d})=0$$ If we have $N \geq M + 1$ distinct $x_i$, the solution is unique: $\mathbf{w}* = (\mathbf{X}^T\mathbf{X})^1\mathbf{X}^T\mathbf{d}$. Otherwise, there is an infinite number of solutions.

Why exactly does that condition imply you have finite/infinite solutions and what does the asterisk on $\mathbf{w}$ represent?

My attempt at understanding the conditional statement is the following: If there are more training samples than the number of features used to describe each sample, then we can uniquely describe each sample. However, if there are more features per sample than samples, then multiple combinations of features can map to the same sample (this is based on the properties of matrices).

(1) The "blunt" formula must have a typographical error, because it makes no sense in terms of matrix multiplication. Most likely you forgot a factor of $w^\prime.$ If so, it's important to state the derivative is taken with respect to $w,$ not with respect to $A.$ (2) $A$ plays the role of $X^\prime X$ in your representation of the MSE. (3) The star on $w$ indicates it minimizes the error. (4) Your question is answered [elsewhere on this site.](https://stats.stackexchange.com/search?q=matrix+rank+inverse+more) — whuber, Oct 04 '19 at 20:57
So would it be more accurate to write $\nabla_w \mathbf{w}^T\mathbf{A}\mathbf{w}$ — Christian, Oct 04 '19 at 21:01
That would be clearer. BTW, we have many threads on [matrix calculus](https://stats.stackexchange.com/search?q=matrix+calculus). See especially https://stats.stackexchange.com/questions/236411, https://stats.stackexchange.com/questions/206332, and https://stats.stackexchange.com/questions/246738 (for a rigorous account with modern notation). — whuber, Oct 04 '19 at 21:03
$(Xw-d)^\prime(Xw-d) = w^\prime(X^\prime X)w - d^\prime Xw - w^\prime X^\prime d + d^\prime d.$ The leading (quadratic) term takes the form $w^\prime A w$ with $A=X^\prime X.$ — whuber, Oct 04 '19 at 21:05

Polynomial curve fitting - unique solution

0 Answers0