3

The multivariate linear regression model is given by $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}$, where $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0, \boldsymbol{\Sigma}})$. $\mathbf{X}$ is $n \times p$. We may want to learn a mean for our data in addition to the effect that each of the covariates in the columns of $\mathbf{X}$ has on $\mathbf{y}$. In this case, we would augment $\mathbf{X}$ with a column of $1$s. Is this an identifiable model, and if not, could someone please explain why?

Similarly, I am looking at a time series autoregression model of order $p$ (page 34 of Prado and West, 2010) in which the data have a nonzero mean. This is given by $y_t = \mu + (\mathbf{f}'- \mu\mathbf{1})\boldsymbol{\phi} + \epsilon_t$, where $\epsilon_t \sim \mathcal{N}(0, v)$. Here, $\mathbf{f}_t = [y_{t-1}, \ldots, y_{t-p}]'$ and $\boldsymbol{\phi} = [\phi_1, \ldots, \phi_p]'$. Could someone please explain why we need to subtract $\mu\mathbf{1}$ from $\mathbf{f}'$ in order to fit this model?

Vivek Subramanian
  • 2,613
  • 2
  • 19
  • 34
  • 1
    The first model would not be identifiable if $\mathbf{X}$ contains say a constant column: if you imagine $\mathbf{X}$ contains a column of 1's ($\mathbf{x_{1}}$ say), and a column ($\mathbf{x_{2}}$) whose entries are all the same (c say). Then if we picture $\mathbf{y},\mathbf{x_{1}}$, and $\mathbf{x_{2}}$ in $\mathbb{R}^{3}$ then all your $n$ triples $(y,x1,x2)=(y,1,c)$ lie on a line parallel to the y-axis regardless of what the $n$ values are for $\mathbf{y}$. Thus no information can be obtained on what happens to $\mathbf{y}$ as $\mathbf{x_{1}}$ and $\mathbf{x_{2}}$ change. – dandar Oct 03 '14 at 17:02
  • 2
    Formally all of the covariate pairs $(x_{1},x_{2})$ lie on a 1-dimensional hyperplane (i.e. a line) in $\mathbb{R}^{3}$, and generally this model would not be identifiable if all your $n$ covariate points $(x_{1},...,x_{p})$ lie on the same $(p-1)$-dimensional hyperplane. Going back to the two column example, if just one of the entries in $\mathbf{x}_{2}$ were not equal to $c$ then this would yield information on the covariate-response relationship, and the model MAY be identifiable. See "Identifiability of Models for Clusterwise Linear Regression", Hennig (2000) for an in-depth look at this. – dandar Oct 03 '14 at 17:14
  • 1
    Finally the first model is guaranteed to be identifiable if the minimum number of hyperplanes the covariate points lie on is 2 or greater - see Hennig Theorem 2.2. – dandar Oct 03 '14 at 17:20
  • 1
    Thanks, @dandar, for your comments. Would you mind explaining the difference between identifiability and estimability. For example, in the first model, would it be possible to estimate the mean and the effect of each of the covariates? – Vivek Subramanian Oct 03 '14 at 17:33
  • 1
    Sorry @vman049 for not replying sooner. It is easier to think first of unbiasedness and consistency which relate to the expected value the estimator is close to the true value and the probability an estimator is close to the true value. You can have one without the other (see http://stats.stackexchange.com/questions/31036/what-is-the-difference-between-a-consistent-estimator-and-an-unbiased-estimator). Using the "unbiased not consistent" example in the first answer, we see the prob X1 is close to mu never gets closeer to 1 as n gets larger (it does not depend on n), even though it is unbiased. – dandar Oct 09 '14 at 17:29
  • Thus with X1 as the estimator, we cannot take a higher sample size to be sure we will estimate something close to the truth (given one sample of data), but for the mean of the normally distributed variables we can (this estimator is unbiased and consistent). Thus consistency is more "useful" when we want to be sure we are close to the truth for single dataset (albeit usually with a large sample size needed). Finally it is well known an estimator cannot be consistent if the model is not identifiable. Thus identifiability is required for a model if we want any estimator to be consistent. – dandar Oct 09 '14 at 17:34

1 Answers1

1

The first model is identifiable iff the augmented $X$ matrix has full rank.

For the second model, it sounds like you are suggesting an alternative model of the form: $$ y_t = \mu' + \mathbf{f'} \mathbf{\phi} + \epsilon_t $$ where the parameter $\mu'$ is related to the original model via the transformation: $$ \mu' = \mu - \mu \mathbf{1} \mathbf{\phi} $$ This approach works as long as $t>p$. But it does not give the same results for $t\le p$.

Tom Minka
  • 6,610
  • 1
  • 22
  • 33