Setup
We are interested in estimating a model for the following setup:
$Y_t=\beta_0 + \beta_1^{'}X^{'}_t + \epsilon_t$
$COV(X^{'}_t,Y_{t-1,t-2,...,1} | X_{t-1,t-2,...,1}) = 0$
Where $\epsilon_t$ is iid normal. In other words: what would the effect of $X_t$ be if it were set “randomly” (independently of past $Y_t$s) at each timestep?
Unfortunately, what we actually have in front of us is:
$Y_t=\beta_0 + \beta_1X_t + \epsilon_t$
$X_{t}=\phi(Y_{t-1,t-2,...,1})$
Where $\phi$ is an unknown function. In other words, $X_{t}$ is correlated with past-values of $Y$.
Just to make this more concrete, let's imagine a situation where this might arise: we are measuring customer behavior (spending) in response to some marketing action (coupons). We'd like to understand the impact of coupon \$ amount $X_t$ on customer-spending $Y_t$. Unfortunately, the coupon-amount that is given at $t$ is determined by taking a fraction of average of past-spending, i.e.:
$Xt = \phi(Y_{t-1}...Y_1) = m_t/(t-1) * \sum_{1}^{t-1}(Y_t-1,...Y_1)$
Where $m_t$ is the fraction (let's say it's drawn from $U(.10,.20)$).
Goal/Questions
We'd like to recover $\beta_1^{'}$ (e.g. uncover the effect of coupons on spending if coupons had been given randomly).
Problem: If we simply fit an OLS regression model $Y_t=\beta_0 + \beta_1X_t + \epsilon_t$ for each customer, we will not get the correct value (see link to simulation below).
Questions:
Is there a name for a situation such as this? I have had no luck finding this situations when reviewing literature on autocorrelation, exogenous/endogenous predictors, and propensity-scoring/causal inference.
What are some approaches to modeling data in a situation like this?
UPDATE:
Here is some R code that shows a simulation of the process described above.
In cleaning up the code for this post, I realized that (as I'm simulating it), a mixed-effects model can actually handle the above-described situation -- i.e., it can recover simulated coefficients without bias. (I had previously incorrectly concluded that it cannot -- reasons described in the code.)
So I suppose I'd like to add a third question:
- Why is the mixed-effects approach doing so well here? OLS on each group produces a negative bias in $\beta_1$; it’s not clear to me how/why the machinery of mixed-effects models avoids that. Does this success generalize (i.e., generalize to other $\phi$s)? I had thought that including a fixed-effect $X_t$ that’s so heavily correlated with the random-intercepts would be problematic (e.g. Mundak, 1978).
My concern is that I’ve inadvertently created a simulation where everything works out OK; but since I don’t actually understand why things worked out OK, I risk trying the same approach on a real-world dataset with a different $\phi$ and succumbing to bias.