I'm trying to figure out why adding a cubic term in the model doesn't guarantee a perfect multicollinearity. If $X$ is known, then $X^3$ is known in both magnitude and sign and vice versa. It may not be the case between $X$ and $X^2$ in terms of sign.

- 25,948
- 3
- 57
- 106

- 617
- 1
- 5
- 13
-
2Multicollinearity usually refers to a **linear** relation between two variables. – nope Oct 11 '19 at 10:44
-
A standard tool in mathematics is based on such considerations: if a nontrivial linear relation $c_0+c_1X+c_2X^2+c_3X^3=0$ holds, that means every component of $X$ is a root of the polynomial $c_0+c_1x+c_2x^2+c_3x^3,$ whence (by the Fundamental Theorem of Algebra) there are at most three distinct possible values for the components of $X.$ Since that's not generally true--many datasets have many more distinct values of their variables than that--it cannot be generally true that $1,X,X^2,X^3$ are collinear. This idea appears in my analysis at https://stats.stackexchange.com/a/408855/919, *e.g.* – whuber Oct 11 '19 at 14:50
2 Answers
Multicollinearity refers to the situation in which the regressor matrix $Z$ does not have full column rank $k$.
This is the case if it is possible to linearly combine the columns $z_1,\ldots,z_k$ into the zero vector with a vector $a=(a_1,\ldots,a_k)'$ other than the trivial zero vector $0$, i.e., $$ a_1z_1+\ldots+a_kz_k=0 $$ for $a\neq0$. If, say, $z_1\equiv X=(-1,0,1,2)'$, then $z_2\equiv X^3=(-1,0,1,8)'$. You will not find values $a_1,a_2$ other than zeros that produce $$ a_1\begin{pmatrix}-1\\0\\1\\2\end{pmatrix}+a_2\begin{pmatrix}-1\\0\\1\\8\end{pmatrix}=0. $$ If $z_2$ were some multiple or fraction of $z_1$, it would be possible, so that we would have multicollinearity.
As an aside, if your regressor $X$ is a dummy variable, we do have multicollinearity with powers of $X$, as powers of $0$ and $1$ are of course also $0$ and $1$.
Try, e.g.,
X <- -1:2
lm(rnorm(4)~X+I(X^3)-1)
X <- sample(c(0,1),10, replace = T)
lm(rnorm(10)~X+I(X^3)-1)

- 25,948
- 3
- 57
- 106
You often will get multicollinnearity issue with cubes but not the perfect kind. In your case a perfect multicollinnearity can be defined as: $\alpha x+x^3=c$. This is not true by definition. However, when $x<<1$ you get $x-x^3\approx 0$ because $x^3\approx x$. Therefore, sometimes you may get perfect multicollinearity warning or design matrix condition number too big warning due to rounding, but this will not happen with every data set.

- 55,939
- 5
- 90
- 176