What is the difference between Consistency and Identification?

Question

Dear experienced friends, I start to learn Econometrics recently and there is a question really confuses me. Suppose we have a sample linear model $$ y = \beta*X. $$ From the definition, we know the consistency means the estimated parameters $\widehat{\beta}$ we get from data will converge to the true parameters $\beta$ when our sample size goes infinity.

Meanwhile, the definition of identification says: If we have enough data, the model is identified if $\widehat{\beta}$ converges to the true parameters $\beta$, and this true $\beta$ is unique.

From my point of view, both concepts look very similar: They all require the $\widehat{\beta}$ converge to $\beta$ as $n$ goes infinity. Can we say if the estimate of a model's parameter is consistent, then this model is identified?

Please feel free to point out any mistake I made. Thank you in advance!

What reference are you using for the definition of identification? I've never heard of the way you define it. — doubled, Apr 13 '21 at 16:18

doubled · Accepted Answer · 2021-04-13T17:04:07.070

Revisiting the definition of identification.

Although your definition of consistency is fine, I think you're defining identification in a somewhat odd way, especially with regards to the usage of "if we have enough data."

Although we can loosely think about identification as having "infinite data," I'd suggest you instead consider a scenario where you know the true distribution of the observed data.

In this sense, let $P$ denote the true distribution of observed data where $P \in \mathcal{P} \equiv \{P_{\theta} : \theta \in \Theta\}$. We are interested in $\theta$ or some function $f(\theta)$.

Since $P \in \mathcal{P}$, we know that there exists some $\theta \in \Theta$ such that $P = P_{\theta}$. However, given $P$, we cannot distinguish $\theta$ from any other $\theta'$ such that $P = P_{\theta'}$. In words, this is saying that given $P$, we may not 'know' enough about $\theta$ to uniquely pin it down.

To illustrate when this can happen, suppose we observed $P = N(a+b,\sigma^2)$, so that $\theta = (a,b,\sigma^2)$, and we are interested in $f(\theta) = (a,b)$. Given $P$, I cannot uniquely pin down $f(\theta)$, because even though I know $a+b$, I can choose any $f(\theta) = (\theta_1,\theta_2)$ such that $\theta_1 + \theta_2 = a+b$. To make things super concrete, suppose $a = -1,b=1$ so that $a+b = 0$. Then both $(-1,1)$ and $(0,0)$ are consistent with $P$. Hence, even fully knowing the distribution $P$ does not give me enough to pin down the 'true' value of $(a,b)$.

In general, given $P$ and $\mathcal{P}$, the best we can say about $\theta$ is that $\theta \in \Theta^*(P)$ where $$\Theta^*(P) \equiv \{\theta \in \Theta : P_\theta = P\}.$$

This is simply defining $\Theta^*(P)$ to be the set of all $\theta$ that agree with the observed distribution $P$. We call this the identified set. We then say that $\theta$ is identified if $\Theta^*(P)$ is a singleton for all $P \in \mathcal{P}$. Here, by singleton, we are saying that given $P$, we can uniquely pin down $\theta$. We similarly define these terms for $f(\theta)$.

Relationship between identification and consistency.

It should hopefully be somewhat clear from the above that identification and consistency are closely related.

In particular, if $\theta$ is not identified, then it follows that consistent estimators cannot exist for $\theta$. Why? Well suppose that we had a consistent estimator $\hat{\theta}$. Because it is consistent, it should converge to all values in $\Theta^*(P)$. Since $\theta$ is not identified, then there are values $\tilde{\theta},\bar{\theta} \in \Theta^*(P)$ such that $\tilde{\theta} \neq \bar{\theta}$, and $\hat{\theta}$ cannot converge to two distinct values!

Conversely, if $\theta$ is identified, then consistent estimators may exist, though do not have to. Though most of the time, there will exist a consistent estimator (by appealing to law of large numbers, continuous mapping theorem, and so on), but exceptions exist (i.e., the mean for Pareto distributions with $\alpha < 1$).

Dear @doubled, thank you so much for your instructive explanation. Follow what you say, can I understand identification as: Suppose we are trying to find a parameter $\theta$ to describe the distribution of data. If the $\theta$ is a singleton, then this model is identified. In your example, if we define $\theta$ as $c$, where $c=a+b$. Then we are fine to say the model is identified. However, if we separate $c$ into $a$ and $b$ as parameters, then the model is not because we can choose infinity combinations of $a$ and $b$. — Nick Nick Nick, Apr 13 '21 at 18:58
In this way, finding an identified model is kinda like finding the parameter which can **uniquely** describe the distribution of the data you have. And for the relationship between identification and consistency, you have already given a clear and comprehensive explanation. Thank you! — Nick Nick Nick, Apr 13 '21 at 19:01
Yep, what you wrote sounds good to me! At it's essence, the idea of identification can be thought of as the precursor to estimation and inference. Before even thinking about how to estimate something, we want to know what we can say about that something if we observed the full population distribution. If we can show as a first step that that something is identified, then we know that the true distribution uniquely pins down that something. And based on the above explanation, you can hopefully appreciate why that's a nice property to have! — doubled, Apr 13 '21 at 19:10
Thank you @doubled. My guessing is: Suppose we want to use a linear model $y=a_1*x_1+a_2*x_2+b$ to express something. If the model is not identified, then we cannot pin down the $a_1$ and $a_2$, which means we cannot detect the marginal effect of the model's variables. That's why you said: identification is a nice property to have. Really appreciate your help! — Nick Nick Nick, Apr 13 '21 at 19:21
Yep! I'll actually add later today an example to illustrate identification when it comes to linear regression cause it's a fun example to illustrate the ideas. I'll @ you when I do. — doubled, Apr 13 '21 at 19:22
This is quite helpful @doubled ! Do you have any reference reading on this to follow up on the identification explanation? greatly appreciated! — Frank Swanton, Jan 11 '22 at 21:01
@FrankSwanton sorry for slow response, but I think Tamer (2010)'s Partial Identification in Econometrics is a good reference. It's about partial identification (when the identified set is not a singleton), but for me, viewing point identification as a special context in which the identified set is a singleton was very helpful. Link: https://scholar.harvard.edu/files/tamer/files/pie.pdf — doubled, Feb 06 '22 at 16:07

What is the difference between Consistency and Identification?

1 Answers1

Linked