5

A latent variable model involving a binomial observed variable $Y$ can be constructed such that $Y$ is related to the latent variable $Y^*$ via

$ Y = \begin{cases} 0, & \mbox{if }Y^*>0 \\ 1, & \mbox{if }Y^*<0. \end{cases} $

The latent variable $Y^*$ is then related to a set of regression variables $X$ by the model $Y^* = X\beta + \varepsilon$. This results in a binomial regression model.

The variance of $\varepsilon$ can not be identified and when it is not of interest is often assumed to be equal to one. If $\varepsilon$ is normally distributed, then a probit is the appropriate model and if $\varepsilon$ is log-Weibull distributed, then a logit is appropriate. If $\varepsilon$ is uniformly distributed, then a linear probability model is appropriate.

Maarten Buis
  • 19,189
  • 29
  • 59
user37115
  • 51
  • 2

1 Answers1

6

Let's try to validate the claim that if the error term of the underlying latent variable model is assumed uniformly distributed, then a Linear Probability model is appropriate.

The underlying latent variable model is (assuming a simple regression setting for simplicity - it doesn't change anything)

$$Y^* = b_0+ b_1X + \epsilon,\;\; \epsilon\mid X\sim U(-a,a)$$

where the limits for $U$ are chosen so that the error term has a zero expected value, conditional on the regressors. The cumulative distribution function here is $F_{\epsilon|X}(\epsilon\mid X) = \frac {\epsilon + a}{2a}$

and the observed model is (given how $Y$ is in the specific question defined as a function of $Y^*$)

$$P(Y =1\mid X ) = P(Y^*<0\mid X) = P(b_0+ b_1X + \epsilon<0\mid X) = P(\epsilon <- b_0- b_1X\mid X)$$ $$=F_{\epsilon|X}(- b_0- b_1X\mid X) = \frac {- b_0- b_1X + a}{2a} = \frac {- b_0+a}{2a}+\frac {- b_1}{2a}X$$

$$\Rightarrow P(Y =1\mid X )= \beta_0 + \beta_1X$$

which is the Linear Probability model with the mapping

$$\beta_0 = \frac {- b_0+a}{2a},\;\; \beta_1=\frac{- b_1}{2a}$$

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241