Is it reasonable to choose a regression model with a value of 0
for the intercept when this makes logical sense? For example, I am trying to model a physical geometric relationship, and I know that when x = 0
, y = 0
. Yet the consequences of choosing such a model are that the R^2
value becomes significantly higher (it changes from 0.67
to 0.95
). When I examine the residuals for both models, I can see that they both have roughly the same distribution. The origin option is shown in Figure 1 and non-origin in Figure 2.
How should I decide which model is more appropriate?
I've read through some of the other questions and answers on this topic but I haven't seen any discussion about physical limitations providing the basis for the choice.
My dependent variable here is an area calculation, and my independent variable is a measurement of one dimension of the shape. For example, if I had a set of rectangles of length l
, width w
and area A
, I am trying to model the relationship between l
and A
. However, as these are not perfectly regular there is some variation in the relationship but it appears to be linear in several cases, and based on some of the comments, not so much in this particular instance.