1

I have an experiment where we measure the energy used by a building and want to regress this energy linearly against so-called degree-days, calculated with two different methods. The data looks like this:

enter image description here

A regression line has been added to each group, that has been forced to go through the origin.

I want to compute the slope of these lines (with std. error), but I'm not sure what is the right way. My data looks like this:

> alvDegreeDays[sample(nrow(alvDegreeDays), 4),]
          Energy  BaseTemp DegreeDays
Feb 2014   984.7 Estimated   365.9771
Mar 2014   864.7 Estimated   307.2246
Apr 20151  512.8       SIA    50.0000
Sep 2015   239.2 Estimated    95.4787

I've tried this first:

lm(Energy ~ DegreeDays * BaseTemp + 0, alvDegreeDays)

Call:
lm(formula = Energy ~ DegreeDays * BaseTemp + 0, data = alvDegreeDays)

Coefficients:
            DegreeDays       BaseTempEstimated             BaseTempSIA  
                 2.436                  23.094                 174.390  
DegreeDays:BaseTempSIA  
                 1.181  

But this yields BaseTempEstimated and BaseTempSIA terms which are, in effect, intercept terms.

Next I tried the following:

(foo <- lm(Energy ~ DegreeDays + DegreeDays:BaseTemp + 0, alvDegreeDays))

Call:
lm(formula = Energy ~ DegreeDays + DegreeDays:BaseTemp + 0, data = alvDegreeDays)

Coefficients:
                  DegreeDays  DegreeDays:BaseTempEstimated        DegreeDays:BaseTempSIA  
                       4.401                        -1.897                            NA  

This looks better, but when I try to call predict on this model I get weird error messages:

> predict(foo, list(DegreeDays = 1, BaseTemp = "Estimated"))
       1 
2.504507 
Warning message:
In predict.lm(foo, list(DegreeDays = 1, BaseTemp = "Estimated")) :
  prediction from a rank-deficient fit may be misleading

Any idea what I may be doing wrong (or right) here?

lindelof
  • 741
  • 4
  • 19
  • My guess is that it has to do with the `class` of `BaseTemp`. Can you specify it - is it a factor or character? – danas.zuokas Feb 01 '16 at 09:15
  • It's a `factor`. – lindelof Feb 01 '16 at 09:16
  • Well you should also take away this factor from the model specification. Try `lm(Energy ~ DegreeDays * BaseTemp + 0 - BaseTemp, alvDegreeDays)`. – danas.zuokas Feb 01 '16 at 09:24
  • That gives me the same result as with `DegreeDays + DegreeDays:BaseTemp + 0` – lindelof Feb 01 '16 at 09:25
  • 1
    So your only concern is prediction? What you get is not an error it is a warning. From the documentation of `predict.lm`: "If the fit is rank-deficient, some of the columns of the design matrix will have been dropped. Prediction from such a fit only makes sense if newdata is contained in the same subspace as the original data. That cannot be checked accurately, so a warning is issued." My guess is that if you specify `BaseTemp` as factor in `predict` call with same levels as in `lm` call you will get rid of this warning. – danas.zuokas Feb 01 '16 at 09:34
  • It sure doesn't look like regression through the origin is a good model for your data. You may want to read this: [When is it ok to remove the intercept in a linear regression model?](https://stats.stackexchange.com/q/7948/) – gung - Reinstate Monica Aug 16 '18 at 17:52

2 Answers2

3

I think what you want is:

lm(Energy ~ 0 + DegreeDays : BaseTemp, alvDegreeDays)

Which should give:

Call:
lm(formula = Energy ~ 0 + DegreeDays:BaseTemp, data = alvDegreeDays)

Coefficients:
DegreeDays:BaseTempEstimated        DegreeDays:BaseTempSIA  
                       2.504                         4.401 
pete
  • 202
  • 1
  • 6
1

Given that you only want the slopes, and both lines are constrained to pass through the origin, there's a simple solution: split the dataset by BaseTemp and run separate regressions.

alvDegreeDays_est <- alvDegreeDays[alvDegreeDays$BaseTemp == 'Estimated',]
alvDegreeDays_sia <- alvDegreeDays[alvDegreeDays$BaseTemp == 'SIA',]

lm_est <- lm(Energy ~ DegreeDays + 0, alvDegreeDays_est)
lm_sia <- lm(Energy ~ DegreeDays + 0, alvDegreeDays_sia)
mkt
  • 11,770
  • 9
  • 51
  • 125