4

Assume, one deals with polynomial regression, i.e. $$ y_{i} = \beta_{0} + \beta_{1}x_{i} + \beta_{2}x_{i}^{2}+ \dots + \beta_{m}x_{i}^{m} + \varepsilon_{i}, $$ where $i = 1, \dots, n$, with $m < n$. Then, the OLS solution is given by $$ \hat{\beta} = (X^{T}X)^{-1}X^{T}y $$

The matrix $X$ is Vandermonde. Assume that all $x_{i}$ are distinct. In this case solution exists, but, because of bad invertibility of Vandermond matrix, the solution is not "stable". Therefore, the point estimation of covariance matrix of coefficients $\beta$ will not be trustable.

The question: will a bootstrap be better approach to estimate the variance of estimates of $\beta$, provided that the size of data set, $n$ is moderately large?

P.S. The question is not only about OLS for polynomial regression. It is more about using of sampling methods for (possible) improvements for the problems which suffer from computational instability.

ABK
  • 396
  • 2
  • 17
  • 1
    Even if you do bootstrap, the rows of $X$ will still be like $1, a, a^2, a^3,...$ so I don’t see the way that it would help. However, $X$ is not square and isn’t what we invert. Does $X^TX$ have the numerical instability you describe? – Dave Feb 11 '20 at 11:43
  • Dear @Dave, yes, I think the inversion of $X^{T}X$ is also problematic, since $(AB)^{-1} = B^{-1}A^{-1}$. The idea is that bootstrap is less risky than the estimate based on the whole dataset, because bootstrap provides averaging in some sense. – ABK Feb 11 '20 at 12:13
  • 1
    $X$ isn’t square. – Dave Feb 11 '20 at 12:15
  • Dear @Dave, could you, please, clarify about "the square"? – ABK Feb 11 '20 at 12:17
  • $X$ is not a square matrix. How do you invert a $300 \times 4$ matrix? – Dave Feb 11 '20 at 12:22
  • Dear @Dave, ok, I see. Still, it is well known that polynomial regression suffers from this poor invertibility. – ABK Feb 11 '20 at 12:23
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/104316/discussion-between-dave-and-abk). – Dave Feb 11 '20 at 12:26
  • 2
    The standard solution to the underlying regression problem uses orthogonal polynomials. This essentially is a change of basis of $X;$ when that change can be precisely computed, the numerical instability in the OLS solution goes away. In other words, the numerical computation issues are shunted over to the problem of computing orthogonal polynomials, isolating it from all the other issues in applying and interpreting OLS. In particular, that (strongly) suggests bootstrapping will accomplish nothing. – whuber Feb 11 '20 at 16:49
  • Dear @whuber, thank you for the comment. I have modified question. – ABK Feb 11 '20 at 18:27
  • I see two sources of instability. The first is doing a matrix inversion on a computer. The second is that the parameter estimates will have high variance, even if the computer perfectly handles the matrix inversion. Which do you mean? – Dave Feb 11 '20 at 18:48
  • Dear @Dave, why will we have high variance (true variance) here? I was concerned with numerical thing. – ABK Feb 11 '20 at 18:51
  • 2
    The two problems are very closely related. One unifying concept is that of the *condition number* of $X:$ in OLS it can be related to high variances in parameter estimates, but its technical definition is in terms of the amount by which small errors in the data are magnified by the calculations, which measures the "numerical instability." – whuber Feb 11 '20 at 19:40
  • Dear @whuber, thank you! I think it is clearer for me now. Why wouldn't you sum it up and write it as an answer? – ABK Feb 12 '20 at 11:56
  • 1
    See https://stats.stackexchange.com/questions/168259. – whuber Feb 15 '20 at 22:14

0 Answers0