3

In OLS, if I have design matrix X (an NxK matrix of full column rank) and I add a constant, such as 2, to every entry of X, how does that change my estimators?

Let's denote $\tilde{X} = X + 2$.

I can't compute the OLS estimator $\beta_{OLS} = (\tilde{X}'\tilde{X})^{-1}\tilde{X}'y $ because $\tilde{X}$ doesn't have full column rank (or does it? If so, I cannot prove it).

I'm thinking, my intercept term will change while the other coefficients do not, but I'm having trouble proving it.

FWL
  • 43
  • 6
  • To be precise, are you asking about (i) adding a constant 2 to every entry of matrix $X$, (ii) appending a row to $X$ where every entry in the new row is 2, or (iii) appending a column to $X$ where every entry in the new column is 2? – Matthew Gunn May 16 '18 at 22:00
  • There's no reason why adding 2 to every element of a full-rank $X$ should, in general, make $\tilde X$ be less than full rank. – shadowtalker May 16 '18 at 22:12
  • 2
    If the design matrix contains a column for an intercept term then this will cancel out the added constant. – Sextus Empiricus May 16 '18 at 22:26
  • @shadowtalker That's not true for all possible matrices. Consider $X=-2$. $\operatorname{rank}(X)=1$ but $\operatorname{rank}(X+2) = 0$. Consider $A = \begin{bmatrix} -1 & 0 \\ 0 & 2 \end{bmatrix}$ then $\operatorname{rank}(A)=2$ but $\operatorname{rank}(A+2) = 1$. – Matthew Gunn May 16 '18 at 22:31
  • I meant (i) adding a constant 2 to every entry of matrix $X$. – FWL May 16 '18 at 23:22

2 Answers2

5

Rank

When one of the columns is constant (an intercept term) then you can use: https://math.stackexchange.com/questions/676333/prove-that-if-ranka-n-then-rankab-rankb

For $X_{m \times n}$ and $Z_{n \times k}$, where $Z$ is of rank $n$, then

$$rank(XZ) = rank(X)$$

The addition of the constant can be expressed by multiplication of X with n x n matrix Z of rank n. This is done by taking the identity matrix and add the constant, such as $x=2$ (but $x$ can not be -1), to the row that corresponds to the column $i$ that is related to the intercept: $$Z = I + C, \qquad \text{with $c_{jk}=x$ if $j=i$ and $c_{jk}=0$ otherwise }$$

For instance:

$$\small\begin{bmatrix}1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \\ 1 & 3 & 9 & 27 \\ 1 & 4 & 16 & 64 \\ 1 & 5 & 25 & 125 \\ 1 & 6 & 36 & 216 \\ \end{bmatrix} \times \begin{bmatrix}3 & 2 & 2 & 2 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix}1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \\ 1 & 3 & 9 & 27 \\ 1 & 4 & 16 & 64 \\ 1 & 5 & 25 & 125 \\ 1 & 6 & 36 & 216 \\ \end{bmatrix} + \begin{bmatrix}2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 \\ 2 & 2 & 2 & 2 \\ \end{bmatrix}$$


Estimators change

You could see OLS as projection of observations Y onto the span of the columns in X. The span does not change by adding the constant (iff the X contains an intercept term) so $\tilde{y}_{OLS}=y_{OLS}$

You can use the same matrix Z to show how the coefficients change $Z \tilde\beta_{OLS} = \beta_{OLS}$ making all the coefficients the same except the one related to the intercept.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • The dimension of $Z$ should be $KxK$ in this case, but otherwise, clear answer. Great idea of making $\tilde{X}$ with a linear transformation rather than adding 2's everywhere like I tried. I didn't think of utilizing the first column of ones. – FWL May 17 '18 at 02:36
  • @FLW I may have switched some letters. I used a m x n design matrix instead of n x k. I did this switch to match the linked question notation. – Sextus Empiricus May 17 '18 at 06:20
3

$\newcommand{\one}{\mathbf 1}$Others have discussed the effect on the estimator (and +1 to Martijn) but I want to more carefully address the effect of adding a constant to $X$ on the rank of $\tilde X$. For the rank of $\tilde X$, it's not the presence of an intercept by itself that matters but whether the constant column is in the column space of $X$.

Let $\one_k$ be the column vector of $k$ $1$s. Then adding a constant $c$ to every element of $X$ can be done by $$ \tilde X = X + c\one_n\one_p^T $$ so this is a rank 1 update to $X$. It is indeed possible for this to result in $\tilde X$ becoming reduced rank. For instance, if $c=2$ and the first column of $X$ is all $-2$ then we'll get a column of $0$s in $\tilde X$ which means the rank will be at most $p-1$. I'll let $\mathcal C(X)$ denote the column space of $X$ and I'll assume throughout that $c \neq 0$.


Result 1: If $\one \notin \mathcal C(X)$ then $\tilde X$ is always full rank, or in other words $\one \in \mathcal C(X)$ is a necessary condition for $\tilde X$ to be reduced rank.

Pf: (by contrapositive) we will suppose $\tilde X$ is reduced rank and will show $\one \in \mathcal C(X)$. So if $\tilde X$ is reduced rank there must be some nonzero $\alpha \in \mathbb R^p$ such that $$ 0 = \tilde X\alpha = X\alpha + c(\one_p^T\alpha)\one_n. $$ Note that if $\alpha^T\one_p = 0$ then we have $X\alpha = 0 \implies \alpha=0$ by $X$ being full column rank, but that's a contradiction, so we must have $\alpha^T\one_p \neq 0$. This means $$ X\alpha = -c(\one_p^T\alpha)\one_n \implies X\left(\frac{-\alpha}{c\alpha^T\one_p}\right) = \one_n $$ so there exists a vector $\gamma \in \mathbb R^p$ such that $X\gamma = \one_n$, i.e. $\one \in \mathcal C(X)$.

$\square$


Result 2: if $\one \in \mathcal C(X)$ then there is at most one $c$ such that $\tilde X$ is reduced rank.

Pf: if $\one_n \in \mathcal C(X)$ then there is some non-zero $\alpha \in \mathbb R^p$ with $X\alpha = \one_n$. By $X$ being full rank this $\alpha$ is unique.

Case I: $\alpha^T\one_p \neq 0$. This lets us do $$ X\alpha - \one = X\alpha + \left(\frac{-1}{\alpha^T \one_p}\right)\one_n \one_p^T\alpha= (X + c\one_n\one_p^T)\alpha = 0 $$ for $c = \frac{-1}{\alpha^T \one_p}$.

Now for uniqueness, if we are to have any chance of making $\tilde X$ reduced rank we need $X\alpha \propto \one$ otherwise it can't be eliminated. But we can produce a $\gamma$ such that $X\gamma = d\one$ for any $d \in \mathbb R$ (although we'll take $d\neq 0$ since that's for $\gamma=0$). If we do this, then the corresponding calculation for $c$ is $$ X\gamma - d\one = X\gamma + \left(\frac{-d}{\gamma^T\one}\right)\one_n\one_p^T\gamma = 0 $$ so $c =\frac{-d}{\gamma^T\one}$. But $X\gamma = d\one=d(X\alpha) \implies \gamma = d\alpha$ so actually there is just a single $c$ that works. Thus if $\one \in \mathcal C(X)$ we can find a $c$ that makes $\tilde X$ low rank but there's just one such $c$ so a "random" $c$ is very unlikely to make this happen.

Case II: $\alpha^T\one_p = 0$. Again we'll try to find a $\gamma$ with $\tilde X\gamma=0$, so as before we'll have to take $\gamma = d\alpha$ for some $d$. Assuming we have such a $\gamma$ then $$ \tilde X\gamma = X\gamma + c\one_n\one_p^T\gamma = dX\alpha + cd\one_n^T\one^T\alpha = d\one \neq 0 $$ so in this special case there is no way to make $\tilde X$ reduced rank.

$\square$


So ultimately it's all about the column space rather than the individual vectors in $X$. If $\one \in \mathcal C(X)$ it's possible to get $\tilde X$ reduced rank, like in my example at the beginning with $c=2$, but in that case this is in fact the only such $c$ that works, so if $c$ is not carefully chosen we probably don't need to worry.

Here's an example where there's no such $c$: take $$ X = \left(\begin{array}{cc} 1&0 \\ 1&0 \\ 0&-1 \\ 0&-1\end{array}\right) $$ and note how $\one \in \mathcal C(X)$ and the way to get it is $X\alpha$ with $\alpha = {1\choose -1}$. Thus $\alpha^T\one = 0$. There's no way to make this matrix low rank by adding a constant to it. If we add $-1$ then we eliminate the top half of the first column, but we add to its lower half and the rank is preserved. And etc.

jld
  • 18,405
  • 2
  • 52
  • 65
  • Thanks for adding to the discussion. So in the case of full rank $X$ with the column of 1's (and thus $\mathbf{1} \in \mathcal{C}(X)$), there is only one $c$ that would make the transformation $X + c$ reduced rank? $c= -1$, in particular. – FWL May 17 '18 at 02:52
  • @FWL yeah (unless I made a mistake, although i went through that proof a couple of times so hopefully not), if the first column for example is all ones then $\alpha=e_1$ is the coordinate of $\mathbf 1$ in $\mathcal C(X)$ so $c=-1/e_1^T\mathbf 1 = -1$ is the only way to drop a rank – jld May 17 '18 at 02:57
  • @MartijnWeterings i think if $\mathbf 1$ is a column then we'd have $\alpha=e_1$ so $c$ would have to be negative since $\alpha^T\mathbf 1 > 0$, so i think even though $\one \in \mathcal C(X)$ you are still correct that adding a positive constant can't decrease the rank in this case. – jld May 18 '18 at 01:48