0

I have age as a covariate in my material. A continuous variable. The age varies between 18-70 years.

I'm into a logistic regression and do not really know how to treat the variable. As a linear effect or as a polynomial?

   gender       passinggrade age    prog
1    man          FALSE      69     FRIST
2    man             NA      70     FRIST
3 woman             NA       65     FRIST
4 woman           TRUE       68      FRIST
5 woman             NA       65     NMFIK
6    man          FALSE      70     FRIST

my model;

mod.fit<-glm(passinggrade ~prog+gender+age,family=binomial,data=both)

summary(mod.fit)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.42653    0.28096   8.636  < 2e-16 ***
progLARAA    0.44931    0.25643   1.752 0.079746 .  
progNASTK   -0.15524    0.26472  -0.586 0.557597    
progNBFFK    0.12091    0.65460   0.185 0.853462    
progNBIBK   -0.18850    0.37656  -0.501 0.616659    
progNDATK   -2.84617    0.73077  -3.895 9.83e-05 ***
progNFYSK    0.64391    0.19634   3.280 0.001040 ** 
progNMATK    0.18424    0.16451   1.120 0.262733    
progNMETK    0.22433    0.29086   0.771 0.440554    
progNMFIK    0.38877    0.42152   0.922 0.356373    
progNSFYY    0.97205    0.29320   3.315 0.000915 ***
progSMEKK   -0.58043    0.18185  -3.192 0.001414 ** 
genderman   -0.05623    0.10477  -0.537 0.591496        
age         -0.11780    0.01028 -11.462  < 2e-16 ***

how would you treat the variable age? and how should I interpret the results for age?

PerkinsN
  • 41
  • 1
  • 5

1 Answers1

1

When considering age in regression it is important to have some understanding of its relationship across the population. In the context of education, it is possible that the relationship is not linear across a wide age range.

As an example, in children aged 5 - 16, you would expect mathematical ability to increase as older pupils have had greater exposure to teaching. If one extends the age range, it is likely that individuals continue to improve in higher education (college / university). However, it is possible that a point is reached where individuals begin to forget what was learnt at school and college and their numerical skill declines. How many OAPs can still perform complex integration?

A simple solution to assess or control this effect in your data is to group age as a continuous variable into meaningful categories with an appropriate granularity. You may need to 'relevel' this new variable to provide an appropriate reference category.

Simon
  • 53
  • 5
  • Good explanation of why the effect of age may well not be linear; but what could "meaningful categories" be, other than age ranges over which you'd expect the response to be constant, with step changes at the boundaries between categories? How often would modelling age in such a fashion be appropriate? Representing age as a polynomial, as the OP suggests, or as a natural spline, would usually be a more sensible way to deal with curvilinearity - see [here](http://stats.stackexchange.com/questions/68834/). – Scortchi - Reinstate Monica Apr 08 '15 at 14:09
  • Thank you very much. Do you have to change anything in the code when I'll treat it as a polynomial? @Scortchi – PerkinsN Apr 08 '15 at 18:18
  • @Malin: See `?poly` & `?I` (for the use of e.g `I(x^2)` in a formula). Some reading up on regression in general might also be useful: [Faraway (2002), *Practical Regression and Anova using R*](http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf). – Scortchi - Reinstate Monica Apr 09 '15 at 16:11