5

My case is that I have one continuous DV variable and two categorical IVs containing 11 and 12 different levels (YEAR & MONTH) form 1998 to 2008.

Until now I have experimented a lot with contrasts() and the deviation coding seems to be the one I want to use, because I would rather compare the individuals levels in each IV to the mean of YEAR rather than some default baseline.

An example of my output is provided here:

model <- lm(LN.IDEA ~ 0 + MONTH + MONTHMONTH + YEAR + YEARYEAR, data = ds)

             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -3.467431   0.031038 -111.717  < 2e-16 ***
MONTH1       0.207696   0.098558    2.107   0.0375 *  
MONTH2       0.080218   0.098558    0.814   0.4176    
MONTH3      -0.197687   0.098558   -2.006   0.0475 *  
MONTH4      -0.110153   0.098558   -1.118   0.2663    
MONTH5       0.039526   0.098558    0.401   0.6892    
MONTH6      -0.194322   0.098558   -1.972   0.0514 .  
MONTH7       0.174461   0.098558    1.770   0.0797 .  
MONTH8       0.014709   0.098558    0.149   0.8817    
MONTH9      -0.025038   0.094821   -0.264   0.7923    
MONTH10     -0.086207   0.094821   -0.909   0.3654    
MONTH11      0.060783   0.094821    0.641   0.5229    
YEAR1        0.081754   0.155188    0.527   0.5995    
YEAR2        0.545592   0.090489    6.029 2.65e-08 ***
YEAR3        0.044065   0.090489    0.487   0.6273    
YEAR4       -0.103906   0.090489   -1.148   0.2535    
YEAR5        0.005907   0.090489    0.065   0.9481    
YEAR6       -0.110614   0.090489   -1.222   0.2244    
YEAR7        0.076436   0.090489    0.845   0.4003    
YEAR8        0.069966   0.090489    0.773   0.4412    
YEAR9       -0.218867   0.090489   -2.419   0.0173 *  
YEAR10      -0.204054   0.090489   -2.255   0.0263 * 

It has been claimed that this question is a duplicate of

Claimed answer

I realize that this answer touches upon what levels in regression are, but the question is about testing and interpreting p-values in regression. Therefore I argue that this is not a duplicate.

Kasper Christensen
  • 600
  • 2
  • 5
  • 18
  • 3
    One level is the reference level and it is "hidden" in the intercept. – Momo Mar 15 '13 at 21:53
  • This question is answered in over 100 threads on this site. You can find many of them [with a search](http://stats.stackexchange.com/search?tab=votes&q=dummy%20coding%20reference). It is explicitly answered (*en passant*) at http://stats.stackexchange.com/questions/31690/how-to-test-the-statistical-significance-for-categorical-variable-in-linear-regr/31694#31694. – whuber Mar 15 '13 at 21:57
  • And there is no way to get around that? In my case the intercept then becomes a mixture of MONTH 12 and YEAR11. I am thinking that it might be easier to interpret if one simply had the mean of all months and the mean of all YEARS and the compare to that... – Kasper Christensen Mar 15 '13 at 21:57
  • 1
    Kasper, I believe your followup is also answered in many places. `R` gives you the ability to specify reference levels for factors and it is also flexible enough to change your dummy codes completely, in any fashion you see fit. Just beware that a model with means of all months *and* all years contains a redundant variable and so is not identifiable; any good stats software will have to throw out one of your variables--and will be arbitrary about how it does that. – whuber Mar 15 '13 at 21:59
  • whuber@: Let me refrase my question then. – Kasper Christensen Mar 15 '13 at 22:04
  • Other than your title, I don't see what's different here. I suggest you spend some time reading some of the related threads on the site, & if you still have a question after that, you can ask a more specific question that's not answered already elsewhere. – gung - Reinstate Monica Mar 15 '13 at 22:15
  • gung@: I did change the text as well... – Kasper Christensen Mar 15 '13 at 22:17
  • 1
    You did (I had posted my comment before your edit came through). There hare a number of answers that address these issues, mostly in passing. [Here](http://stats.stackexchange.com/questions/24242/how-to-apply-coefficient-term-for-factors-and-interactive-terms-in-a-linear-equa) are [two](http://stats.stackexchange.com/questions/21282/regression-based-for-example-on-days-of-week). You should read those & other material 1st & then re-ask if necessary. – gung - Reinstate Monica Mar 15 '13 at 22:23
  • @gung: Thanks for the links. I read them and all this stuff i have read several times on: http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#DEVIATION As I changed my question to "How to compare against mean" i do think i made the right corrections. All the links that have been provided me seems only to explain the theory behind, which i have studied. No i am simply asking if it is possible to define ones own intercept... – Kasper Christensen Mar 15 '13 at 22:29

1 Answers1

5

I am guessing that month 12 and year 11 somehow goes into the intercept but how does this work

You surmise correctly. It's impossible to freely estimate all levels as well as an intercept - each factor has to have one level that is the base level, which is "in" the intercept. If you have a natural baseline, this makes sense.

R just automatically picks one to go in - as you see it picked the last level of the factor. Some other stats programs by default take the first factor level as baseline.

and is there a way to avoid it?

Well, you can leave out the intercept,

(you achieve this by adding "0" or "-1" to your list of predictors:

yourresponse ~ 0 + MONTH + YEAR 

just like that) or you can constrain the parameters for your factor to sum to zero, but if you're using contrasts this shouldn't be an issue for you anyway. Are you not getting the results from your contrast working?

--

If you really want the intercept to be the mean, you simply mean-correct all the IVs including each dummy individually.

[This won't solve the issue that you have one fewer factor level in each factor. That's a different issue; you can keep all the factor levels if you constrain their parameters (such as to have a weighted sum of zero).]

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • 1
    My results are working, but with one year and one month hidden in the intercept. You suggestion works as well, but now I get all results significant... Can one take the mean of all YEARS (implicit all months as well) and then put that into the intercept? – Kasper Christensen Mar 15 '13 at 22:22
  • I didn't refer to 'results' but specifically of seeing the estimates and standard errors of contrasts. *You* referred to contrasts - did you not figure out how to get them to work? – Glen_b Mar 15 '13 at 23:30
  • @KasperChristensen: did you find a way to have the mean as the intercept, as you suggested ? – nassimhddd Mar 17 '13 at 17:58
  • @cafe876: No... I think I accepted that the type of regression i am doing require something to compare against. I have used a lot of time looking into the subject, so apparently i am the only one who think this would be smart / cannot see why it does not make sense:). Do you have similar problem or a suggestion for solution? – Kasper Christensen Mar 18 '13 at 00:37
  • @KasperChristensen Same feeling as you. In your case, you could normalize ´LN.IDEA´, but I'm working with a logit model so that doesn't work.. As it doesn't answer your question, could you please invalidate this answer? – nassimhddd Mar 18 '13 at 13:46
  • You have a point. I am working with a logit model as well. Maybe we could team up? – Kasper Christensen Mar 18 '13 at 17:52
  • 1
    @KasperChristensen If you think my response doesn't answer your original question, you should probably clarify where it's inadequate. (If you feel instead that it doesn't respond to additional questions in comments, you should either modify your question or ask a new one.) – Glen_b Mar 18 '13 at 21:34
  • @cafe876 if you feel my answer is an inadequate to the original question, you have two main options. The first is to suggest a modification to the answer. The second is to attempt an answer. – Glen_b Mar 18 '13 at 21:36
  • @Glen_b: Hmm I guess you do answer the question actually. The thing is just that i want the mean of YEAR to go into the intercept, but that is not stated in my question. Sorry for the trouble! – Kasper Christensen Mar 18 '13 at 22:02
  • There's no trouble; we just need to be clear about what questions are to be addressed in the answers. If you want to modify your question, I'll try to update my answer to discuss that issue briefly. Note that you're presumably asking for a *SAMPLE* mean to be included, you might want to ponder if that addresses your needs. It might help if you were pretty specific about what you want to get. – Glen_b Mar 18 '13 at 22:52
  • Kasper - I have added a little on the mean thing. – Glen_b Mar 19 '13 at 08:52