estimation of polynomial regression: bootstrap approach

Question

Assume, one deals with polynomial regression, i.e. $$ y_{i} = \beta_{0} + \beta_{1}x_{i} + \beta_{2}x_{i}^{2}+ \dots + \beta_{m}x_{i}^{m} + \varepsilon_{i}, $$ where $i = 1, \dots, n$, with $m < n$. Then, the OLS solution is given by $$ \hat{\beta} = (X^{T}X)^{-1}X^{T}y $$

The matrix $X$ is Vandermonde. Assume that all $x_{i}$ are distinct. In this case solution exists, but, because of bad invertibility of Vandermond matrix, the solution is not "stable". Therefore, the point estimation of covariance matrix of coefficients $\beta$ will not be trustable.

The question: will a bootstrap be better approach to estimate the variance of estimates of $\beta$, provided that the size of data set, $n$ is moderately large?

P.S. The question is not only about OLS for polynomial regression. It is more about using of sampling methods for (possible) improvements for the problems which suffer from computational instability.

Even if you do bootstrap, the rows of $X$ will still be like $1, a, a^2, a^3,...$ so I don’t see the way that it would help. However, $X$ is not square and isn’t what we invert. Does $X^TX$ have the numerical instability you describe? — Dave, Feb 11 '20 at 11:43
Dear @Dave, yes, I think the inversion of $X^{T}X$ is also problematic, since $(AB)^{-1} = B^{-1}A^{-1}$. The idea is that bootstrap is less risky than the estimate based on the whole dataset, because bootstrap provides averaging in some sense. — ABK, Feb 11 '20 at 12:13
$X$ is not a square matrix. How do you invert a $300 \times 4$ matrix? — Dave, Feb 11 '20 at 12:22
Dear @Dave, ok, I see. Still, it is well known that polynomial regression suffers from this poor invertibility. — ABK, Feb 11 '20 at 12:23
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/104316/discussion-between-dave-and-abk). — Dave, Feb 11 '20 at 12:26
The standard solution to the underlying regression problem uses orthogonal polynomials. This essentially is a change of basis of $X;$ when that change can be precisely computed, the numerical instability in the OLS solution goes away. In other words, the numerical computation issues are shunted over to the problem of computing orthogonal polynomials, isolating it from all the other issues in applying and interpreting OLS. In particular, that (strongly) suggests bootstrapping will accomplish nothing. — whuber, Feb 11 '20 at 16:49
Dear @whuber, thank you for the comment. I have modified question. — ABK, Feb 11 '20 at 18:27
I see two sources of instability. The first is doing a matrix inversion on a computer. The second is that the parameter estimates will have high variance, even if the computer perfectly handles the matrix inversion. Which do you mean? — Dave, Feb 11 '20 at 18:48
Dear @Dave, why will we have high variance (true variance) here? I was concerned with numerical thing. — ABK, Feb 11 '20 at 18:51
The two problems are very closely related. One unifying concept is that of the *condition number* of $X:$ in OLS it can be related to high variances in parameter estimates, but its technical definition is in terms of the amount by which small errors in the data are magnified by the calculations, which measures the "numerical instability." — whuber, Feb 11 '20 at 19:40
Dear @whuber, thank you! I think it is clearer for me now. Why wouldn't you sum it up and write it as an answer? — ABK, Feb 12 '20 at 11:56

estimation of polynomial regression: bootstrap approach

0 Answers0