Including a variable in a regression but NOT estimating its coefficient

Question

I came across a curious problem trying to replicate a paper. The results in the paper were estimated using Eviews, which I am not familiar with. I noticed the author specified (by formula) and estimated an equation (using 2SLS) as follows:

Y = C(1) + X*Z + C(2)*H + ...

I have never encountered such a thing before. No coefficient is estimated for the interaction X*Z, but it is included in the equation. I do not fully understand how this pans out econometrically. I have succesfully replicated the above estimation using Eviews, but I am trying to do the same in Stata. But Stata automatically calculates coefficients for included variables in the regression. I think I am missing something crucial here. What am I looking at here and how could I do this equivalently in Stata?

Here's the paper I am trying to replicated: https://mitcre.mit.edu/wp-content/uploads/2014/03/The-Quarterly-Journal-of-Economics-2010-Saiz-1253-96.pdf

Specifically, we're estimating Equation 3. As is seen, the interaction term includes no coefficient. This is because, according to the author, the interaction term is "known and calibrated in the model".

Did you mean to write $X*Y$ and therefore have the response variable on both sides of the equation? — Dave, Mar 10 '20 at 15:29
... and if the paper really does mean $XY,$ then it's positing that the coefficient of the interaction is $1.$ Equivalently, the model is $Y(1-X) = C(1) + C(2)Z + \cdots$ and there are no conceptual or computational difficulties in estimating its coefficients (although it does make one a little concerned about the implicit homoscedasticity assumption). — whuber, Mar 10 '20 at 15:35
@Dave My bad. No, the response variable does not appear on both sides of the equation. X*Z are two other variables interacted. I have edited my question accordingly. — OhHiClark, Mar 10 '20 at 15:38
This is a special case of the (multiple regression version) of https://stats.stackexchange.com/questions/50447, where "$p(x)$" is replaced by $x*z$ in my answer there. — whuber, Mar 10 '20 at 15:41
Equation 3 is a solution of a differential equation with logarithms, this is really important imo. — Firebug, Mar 10 '20 at 17:07
@Firebug Thank you very much! I took a closer look at the model setup and you're right. This clears things up. — OhHiClark, Mar 11 '20 at 08:43

Dave · Answer 1 · 2020-03-10T15:59:26.010

Let's consider a linear model under the usual assumptions (Gauss-Markov, normal error term, etc):

$$y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \epsilon_i$$

The way we solve for the OLS estimate of $\beta = (\beta_0,\beta_1,\beta_2)^T$ is by solving a multivariate minimization problem where we minimize square loss over all $n$ observations.

$$L(y,\hat{y}) = \sum_{i=1}^n(y_i - \hat{y}_i)^2 = \sum_{i=1}^n(y_i - (\beta_0 + \beta_1x_{i1} + \beta_2x_{i2}))^2$$

We do the usual calculus to find $\beta$ that minimizes $L$.

But maybe we want to set $\beta_1=2$. No problem! We just modify the loss function.

$$L(y,\hat{y}) = \sum_{i=1}^n(y_i - \hat{y}_i)^2 = \sum_{i=1}^n(y_i - (\beta_0 + 2x_{i1} + \beta_2x_{i2}))^2$$

The $\beta = (\beta_0,\beta_2)^T$ that minimizes this loss function is whatever the calculus says minimizes the loss.

Therefore, there is no inherent problem with specifying a coefficient.

However, we want to do more than just say that the minimum is achieved at $\underset{\beta}{\text{argmin}} \sum_{i=1}^n(y_i - (\beta_0 + 2x_{i1} + \beta_2x_{i2}))^2$. We want to calculate the coefficients.

Yes, I would say to follow Nick Cox's advice and minimize $L(z,\hat{z}) = \sum_{i=1}^n(z_i - (\beta_0 + \beta_2x_{i2}))^2$ for $z_i = y_i - 2x_{i1}$. (In other words, make a new response variable.) I am not sure, however, how parameter inference would go in this situation.

Alternatively just subtract to get $y - 2 x_1$ and feed that to any regression routine. — Nick Cox, Mar 10 '20 at 15:46
In my specific case, is the beta associated with the interaction therefore equal to one? And for estimation using State, I suppose I could use @NickCox's answer to estimate the model. Simply subtract the interaction product from the dependent variable and estimate the model without the interaction term on the right hand side. Would that be correct? — OhHiClark, Mar 10 '20 at 15:51

Including a variable in a regression but NOT estimating its coefficient

1 Answers1