How do I fit a linear regression model in R with interaction but with zero intercept?

Question

I have an experiment where we measure the energy used by a building and want to regress this energy linearly against so-called degree-days, calculated with two different methods. The data looks like this:

A regression line has been added to each group, that has been forced to go through the origin.

I want to compute the slope of these lines (with std. error), but I'm not sure what is the right way. My data looks like this:

> alvDegreeDays[sample(nrow(alvDegreeDays), 4),]
          Energy  BaseTemp DegreeDays
Feb 2014   984.7 Estimated   365.9771
Mar 2014   864.7 Estimated   307.2246
Apr 20151  512.8       SIA    50.0000
Sep 2015   239.2 Estimated    95.4787

I've tried this first:

lm(Energy ~ DegreeDays * BaseTemp + 0, alvDegreeDays)

Call:
lm(formula = Energy ~ DegreeDays * BaseTemp + 0, data = alvDegreeDays)

Coefficients:
            DegreeDays       BaseTempEstimated             BaseTempSIA  
                 2.436                  23.094                 174.390  
DegreeDays:BaseTempSIA  
                 1.181

But this yields BaseTempEstimated and BaseTempSIA terms which are, in effect, intercept terms.

Next I tried the following:

(foo <- lm(Energy ~ DegreeDays + DegreeDays:BaseTemp + 0, alvDegreeDays))

Call:
lm(formula = Energy ~ DegreeDays + DegreeDays:BaseTemp + 0, data = alvDegreeDays)

Coefficients:
                  DegreeDays  DegreeDays:BaseTempEstimated        DegreeDays:BaseTempSIA  
                       4.401                        -1.897                            NA

This looks better, but when I try to call predict on this model I get weird error messages:

> predict(foo, list(DegreeDays = 1, BaseTemp = "Estimated"))
       1 
2.504507 
Warning message:
In predict.lm(foo, list(DegreeDays = 1, BaseTemp = "Estimated")) :
  prediction from a rank-deficient fit may be misleading

Any idea what I may be doing wrong (or right) here?

My guess is that it has to do with the `class` of `BaseTemp`. Can you specify it - is it a factor or character? — danas.zuokas, Feb 01 '16 at 09:15
Well you should also take away this factor from the model specification. Try `lm(Energy ~ DegreeDays * BaseTemp + 0 - BaseTemp, alvDegreeDays)`. — danas.zuokas, Feb 01 '16 at 09:24
That gives me the same result as with `DegreeDays + DegreeDays:BaseTemp + 0` — lindelof, Feb 01 '16 at 09:25
So your only concern is prediction? What you get is not an error it is a warning. From the documentation of `predict.lm`: "If the fit is rank-deficient, some of the columns of the design matrix will have been dropped. Prediction from such a fit only makes sense if newdata is contained in the same subspace as the original data. That cannot be checked accurately, so a warning is issued." My guess is that if you specify `BaseTemp` as factor in `predict` call with same levels as in `lm` call you will get rid of this warning. — danas.zuokas, Feb 01 '16 at 09:34
It sure doesn't look like regression through the origin is a good model for your data. You may want to read this: [When is it ok to remove the intercept in a linear regression model?](https://stats.stackexchange.com/q/7948/) — gung - Reinstate Monica, Aug 16 '18 at 17:52

score 3 · Answer 1 · answered Aug 15 '18 at 05:18

I think what you want is:

lm(Energy ~ 0 + DegreeDays : BaseTemp, alvDegreeDays)

Which should give:

Call:
lm(formula = Energy ~ 0 + DegreeDays:BaseTemp, data = alvDegreeDays)

Coefficients:
DegreeDays:BaseTempEstimated        DegreeDays:BaseTempSIA  
                       2.504                         4.401

score 1 · Answer 2 · answered Aug 15 '18 at 08:28

1

Given that you only want the slopes, and both lines are constrained to pass through the origin, there's a simple solution: split the dataset by BaseTemp and run separate regressions.

alvDegreeDays_est <- alvDegreeDays[alvDegreeDays$BaseTemp == 'Estimated',]
alvDegreeDays_sia <- alvDegreeDays[alvDegreeDays$BaseTemp == 'SIA',]

lm_est <- lm(Energy ~ DegreeDays + 0, alvDegreeDays_est)
lm_sia <- lm(Energy ~ DegreeDays + 0, alvDegreeDays_sia)

answered Aug 15 '18 at 08:28

mkt

11,770
9
51
125

3

This is a subtly different model - in a single regression the variance is pooled. – pete Aug 15 '18 at 11:37
@pete Good point! +1 to your answer – mkt Aug 15 '18 at 11:44

How do I fit a linear regression model in R with interaction but with zero intercept?

2 Answers2