Here is a simplified version of a more complicated problem that I have.
Imagine a hidden Markov model where the state is $X_t\sim N(\mu,\sigma^2)$. The observed variable is $Y_t\sim Bin(N, p_t)$ where $N$ is fixed and $p_t$ is a logistic map of the state, so $$p_t = \frac{e^{X_t}}{(1+e^{X_t})}$$
I want to use the data observed each period $y_t = 1, \ldots, T$ to estimate the parameters of the underlying state, i.e. $\mu$ and $\sigma^2$.
But there's a problem with this -- the MLE estimate for $\sigma^2$ will always be zero, even if the true $\sigma^2>0$. Here's why...
When we compute the conditional likelihood of the observation, we get that $$ l(\mu, \sigma^2 | X_t, y_t) = y_t X_t - N H(X_t) $$ Where $H(X_t)=\log (1+e^{X_t})$.
Integrating over the unobserved $X_t$ gives the unconditional likelihood for observation $t$: $$ l(\mu, \sigma^2 | y_t) = y_t \mu - N \int_{X_t} H(X_t) dF(X_t|\mu, \sigma^2) $$ Where $dF(X_t|\mu, \sigma^2) $ is $N(\mu,\sigma^2)$.
Summing over the observations: $$ l(\mu, \sigma^2 | y_1, \ldots, y_T) = \mu \sum_{t=1}^T y_t - N T \int_{X} H(X) dF(X|\mu, \sigma^2) $$
So the MLE for $\mu$ will be interior. The first term is increasing in $\mu$ with slope <$NT$. The second term is decreasing in $\mu$, with slope that starts at zero and declines to $-NT$.
But the MLE for $\sigma^2$ is always zero. The first term in the likelihood is independent of $\sigma$. But because $H(X)$ is convex and positive in $X$, any decrease in sigma will always reduce the integral of $H(X)$, and so increase the likelihood. Thus the likelihood is maximized at $\sigma=0$, no matter the true $\sigma$.
The full problem has more bells and whistles -- $X_t$ is multivariate and persistent -- but the basic problem is the same. The logistic conversion of the continuous state to the bounded probabilities $p_t$ means that MLE will be biased and inconsistent. I've checked numerically using the pomp
package in R, and it always drives sigma to zero, verifying the discussion here.
So, here are my questions.
- Why is this happening? What is the intuition, beyond convexity of $e^X$?
- Is estimation of the other parameter, $\mu$, still consistent? Is it biased?
- Is there an alternative way to convert a continuous state $X_t$ to a probability $p_t$ for the observation equation? Preferably one where estimation for $\sigma^2$ is consistent?