Perfect multicollinearity with a cubic term in the model?

Question

I'm trying to figure out why adding a cubic term in the model doesn't guarantee a perfect multicollinearity. If $X$ is known, then $X^3$ is known in both magnitude and sign and vice versa. It may not be the case between $X$ and $X^2$ in terms of sign.

Multicollinearity usually refers to a **linear** relation between two variables. — nope, Oct 11 '19 at 10:44
A standard tool in mathematics is based on such considerations: if a nontrivial linear relation $c_0+c_1X+c_2X^2+c_3X^3=0$ holds, that means every component of $X$ is a root of the polynomial $c_0+c_1x+c_2x^2+c_3x^3,$ whence (by the Fundamental Theorem of Algebra) there are at most three distinct possible values for the components of $X.$ Since that's not generally true--many datasets have many more distinct values of their variables than that--it cannot be generally true that $1,X,X^2,X^3$ are collinear. This idea appears in my analysis at https://stats.stackexchange.com/a/408855/919, *e.g.* — whuber, Oct 11 '19 at 14:50

Christoph Hanck · Accepted Answer · 2019-10-11T10:53:29.693

Multicollinearity refers to the situation in which the regressor matrix $Z$ does not have full column rank $k$.

This is the case if it is possible to linearly combine the columns $z_1,\ldots,z_k$ into the zero vector with a vector $a=(a_1,\ldots,a_k)'$ other than the trivial zero vector $0$, i.e., $$ a_1z_1+\ldots+a_kz_k=0 $$ for $a\neq0$. If, say, $z_1\equiv X=(-1,0,1,2)'$, then $z_2\equiv X^3=(-1,0,1,8)'$. You will not find values $a_1,a_2$ other than zeros that produce $$ a_1\begin{pmatrix}-1\\0\\1\\2\end{pmatrix}+a_2\begin{pmatrix}-1\\0\\1\\8\end{pmatrix}=0. $$ If $z_2$ were some multiple or fraction of $z_1$, it would be possible, so that we would have multicollinearity.

As an aside, if your regressor $X$ is a dummy variable, we do have multicollinearity with powers of $X$, as powers of $0$ and $1$ are of course also $0$ and $1$.

Try, e.g.,

X <- -1:2
lm(rnorm(4)~X+I(X^3)-1)

X <- sample(c(0,1),10, replace = T)
lm(rnorm(10)~X+I(X^3)-1)

score 0 · Answer 2 · answered Oct 11 '19 at 15:31

You often will get multicollinnearity issue with cubes but not the perfect kind. In your case a perfect multicollinnearity can be defined as: $\alpha x+x^3=c$. This is not true by definition. However, when $x<<1$ you get $x-x^3\approx 0$ because $x^3\approx x$. Therefore, sometimes you may get perfect multicollinearity warning or design matrix condition number too big warning due to rounding, but this will not happen with every data set.

Perfect multicollinearity with a cubic term in the model?

2 Answers2