which model is better when conducting linear regression

Question

I am doing a regression on two variables x and y. And I have two approaches:

assume the mean of y is in the form: $g(x;\alpha,\beta)=\alpha \rm{exp}(\beta x)$. Then use least squares solution to find the parameters.
conduct a linear regression of log(y) over x directly and find the least squares solution to intercept and slope.

Can anyone tell me which approach is in general better or it's YMMV case? Thanks!

I don't think there is a general answer to this. The "better" model is the one which fits the data best, and which makes theoretical sense for the relationship between x and y you hypothesized. — rocinante, Jan 23 '16 at 07:42
The first equation is non-linear in $\beta$ so I would opt for the second one, i.e. $ln(y)=\beta_0+\beta_1x + \epsilon$. — , Jan 23 '16 at 07:54
I agree. Nonlinear models require NLS, which relies on numerical approximations and does not provide closed-form solutions like OLS. — Christoph Hanck, Jan 23 '16 at 08:45
@fcop The problem is the OP is discussing for a model for the *mean* of y. If you just exponentiate that log-fit it's not predicting the mean of $y$. There's some issues to be dealt with there. *Personally*, I'd be slightly more inclined to fit a glm with log link, where the model really is for the mean. — Glen_b, Jan 23 '16 at 09:37
@Glen_b: I see your point with predicting the mean and I agree with you on that. But the OP' s question was whether he could use Ols and if you advise glm then implicitly you also say that Ols is not ok in the first case or do I see it wrong? — , Jan 23 '16 at 20:31
@fcop It would be premature to advise the use of one model over another without first ascertaining the nature of the residuals and understanding the intended use of the model. The two approaches described here make radically different assumptions about the conditional distribution of the response. Although a linear model may be easier to fit and perhaps easier to interpret, if it's a terrible fit to the data your advice may be more harmful than helpful. — whuber, Jan 23 '16 at 21:03
@whuber: may I conclude from your comment that the question is too imprecise to give an answer? That the OP should first tell us about the distribution of the residuals/response? Is that not an ( open) question for any application of theory to 'real world'? — , Jan 23 '16 at 21:26
@fcop I believe that is what Glen_b might mean by "some issues to be dealt with here." The question as stated does have a glib answer: to wit, it depends. It potentially has a good answer that would discuss the differences between the two models and how to examine one's data to decide which of the two--if either--might apply: but such an answer, if it is complete, would be lengthy. — whuber, Jan 23 '16 at 22:25
@fcop what you're saying isn't wrong; it is incomplete. My comment was too short to do more than hint at one of several issues (which are indeed related to whuber's point). Even if the error structure was such that the log-model was the correct probability model, you'd need to explain how to estimate$E(Y|x)$. When exponentiating a fit on the log scale, we must keep in mind that $\text{E}[\varphi(x)] \neq \varphi[\text{E}(x)]$ See the discussion [here](http://stats.stackexchange.com/questions/186360/exponential-regression-with-x-outside-of-exponential/186769#186769) — Glen_b, Jan 24 '16 at 00:36
On the conditional distribution [see here](http://stats.stackexchange.com/questions/175381/in-a-glm-does-the-link-transform-the-estimated-mean-or-is-the-mean-estimated-f). [This plot](http://i.imgur.com/ZG41pQD.png) shows the same exponential model fitted under three different sets of assumptions about the conditional distribution. Discussion of the issue in a similar model is [here](http://stats.stackexchange.com/questions/47870/exponent-for-non-linear-regression-in-r) and [here](http://stats.stackexchange.com/questions/61747/linear-vs-nonlinear-regression/61806#61806). — Glen_b, Jan 24 '16 at 00:36
I wish I could find the post the plot came from, because if I remember rightly it deals with the specifics here more directly - I think it's effectively a duplicate. — Glen_b, Jan 24 '16 at 01:05
Sorry for not providing more details. The data could be accessed at: https://www.dropbox.com/s/mjz95z5mk31jvax/Supplemental_data.csv?dl=0 I know that this has sth to do with the fact that $E[\phi(x)] \neq \phi[E(x)]$ and using the second approach (log-scale ols) requires this to be approximately equal. My question is does this mean that the first approach is always the safer one to go with? When will the second approach bear the first one? — Sheldon, Jan 24 '16 at 01:09

which model is better when conducting linear regression

0 Answers0