9

I always struggle to get the true essence of identification in econometrics. I know that we state that a parameter (say $\hat{\theta}$) can be identified if by simply looking at its (joint) distribution we can infer the value of the parameter. In a simple case of $y=b_1X+u$, where $E[u]=0,E[u|x]=0$ we can state that $b_1$ is identified if we know that its variance $Var(\hat{b})>0$. But what if $E[u|X]=a$ where $a$ is an unknown parameter? Can $a$ and $b_1$ be identified?

If I expand the model to $Y=b_0+b_1X+b_2XD=u$ where $D\in\{0,1\}$ and $E[u|X,D]=0$, to show that $b_1,b_2,b_3$are identified, do I simply have to restate that the variance for all three parameters is greater than zero?

I appreciate all the help on clearing my mind concerning identification.

CharlesM
  • 573
  • 2
  • 12
  • I was told that for the model with the dummy variable I simply have to show that $[X'X]^{-1}$ exist...meaning that the determinants of this matrix is not equal to 0. Correct? – CharlesM Oct 04 '12 at 03:09
  • I also posted question on math exchange and nothing.... – CharlesM Oct 05 '12 at 00:31
  • Does this help or just more of what you already know? [UChicago course notes](http://home.uchicago.edu/~amshaikh/webfiles/ident.pdf) – kirk May 06 '13 at 08:48

1 Answers1

3

Lets first define the following objects: In a statistical model $M$ that is used to model $Y$ as a function of $X$, there are $p$ parameters denoted by vector $\theta$. These parameters are allowed to vary within the parameter space $\Theta \subset \mathbb{R^p}$. We are not interested in estimation of all these parameters, but only of a certain subset, say in $q \leq p$ of the parameters that we denote $\theta^0$ and that varies within the parameter space $\Theta^0 \subset \mathbb{R^q}$. In our model $M$ the variables $X$ and the parameters $\theta$ will now be mapped such as to explain $Y$. This mapping is defined by $M$ and the parameters.

Within this setting, identifiability says something about Observational Equivalence. In particular, if parameters $\theta^0$ are identifiable w.r.t. $M$ then it will hold that $\nexists \theta^1 \in \Theta^0: \theta^1 \neq \theta^0, M(\theta^0) = M(\theta^1)$. In words, there does not exist a different parameter vector $\theta^1$ that would induce the same data generating process, given our model specification $M$. To make these concepts more conceivable, I give two examples.

Example 1: Define for $\theta = (a,b)$; $X\sim N(\mu, \sigma^2I_{n}); \varepsilon \sim N(0, \sigma_e^2 I_{n})$ the simple statistical model $M$: \begin{align} Y = a+Xb+\varepsilon \end{align} and suppose that $(a,b) \in \mathbb{R^2}$ (so $\Theta = \mathbb{R^2}$). It is clear that whether $\theta^0 = (a,b)$ or $\theta^0 = a$, it will always hold that $\theta^0$ is identifiable: The process generating $Y$ from $X$ has a $1:1$ relationship with the parameters $a$ and $b$. Fixing $(a,b)$, it will not be possible to find a second tuple in $\mathbb{R}$ describing the same Data Generating Process.

Example 2: Define for $\theta = (a,b,c)$; $X\sim N(\mu, \sigma^2I_{n}); \varepsilon \sim N(0, \sigma_e^2 I_{n})$ the more tricky statistical model $M'$: \begin{align} Y = a+X(\frac{b}{c})+\varepsilon \end{align} and suppose that $(a,b) \in \mathbb{R^2}$ and $c \in \mathbb{R}\setminus\{0\}$ (so $\Theta = \mathbb{R^3}\setminus\{(l,m,0)| (l,m) \in \mathbb{R^2}\}$). While for $\theta^0$, this would be an identifiable statistical model, this does not hold if one includes another parameter (i.e., $b$ or $c$). Why? Because for any pair of $(b,c)$, there exist infinitely many other pairs in the set $B:=\{(x,y)|(x/y) = (b/c), (x,y)\in\mathbb{R}^2\}$. The obvious solution to the problem in this case would be to introduce a new parameter $d = b/c$ replacing the fraction to identify the model. However, one might be interested in $b$ and $c$ as separate parameters for theoretical reasons - the parameters could correspond to parameters of interest in an (economic) theory sense. (E.g., $b$ could be 'propensity to consume' and $c$ could be 'confidence', and you might want to estimate these two quantities from your regression model. Unfortunately, this would not be possible.)

Jeremias K
  • 1,471
  • 8
  • 19
  • 1
    "There does not exist a different parameter vector $\theta^1$ that would generate the same data" doesn't sound quite right, unless you mean something unusual by "generate." Perhaps that needs spelling out or perhaps your meaning of "statistical model" needs to be made explicit. In most models, including those you use in your illustrations, *any* set of data could have been produced by *any* of the possible parameters. – whuber Feb 05 '16 at 21:36
  • 1
    @whuber that is a good point. What I should have said is that "There does not... that would *induce* the same *data generating process*". I changed this now :) – Jeremias K Feb 05 '16 at 22:32