Splitting one variable according to bins from another variable

Question

I have continuous data "A", binary categorical data "O", gender/sex and age for several participants in a study.

A linear model in R shows no correlation between A and age. I would now like to group A into groups by age and see if there is a difference between the groups. I know about 'hist' and 'split' in R, but these do not do what I need.

(1) How can I divide/split A into groups based on age (18 to 27, 28 to 37, etc.).

(2) Once I've done that, I would use a $\chi^2$ test?

(3) Could I also test O in the same groups, using counts?

This question appears to be a continuation of http://stats.stackexchange.com/q/6337/919 . — whuber, Jan 18 '11 at 15:08
I thought it best not to mix the questions together as they cover different topics. Here I am asking how to bin the data; the previous question was asking about comparisons between subsets of the data. — SabreWolfy, Jan 18 '11 at 15:11
You'll have a much better idea what you're doing if you plot your data, e.g. 'scatterplot(A, age)' and 'plotMeans(O, binnedAge, error.bars = "se")' You'll need 'library(Rcmdr)' for 'plotMeans()'. — Michael Bishop, Mar 23 '11 at 22:11

caracal · Accepted Answer · 2011-01-18T15:31:51.173

> A   <- round(rnorm(100, 100, 15), 2)       # generate some data
> age <- sample(18:65, 100, replace=TRUE)
> sex <- factor(sample(0:1, 100, replace=TRUE), labels=c("f", "m"))

# 1) bin age into 4 groups of similar size
> ageFac <- cut(age, breaks=quantile(age, probs=seq(from=0, to=1, by=0.25)),
+               include.lowest=TRUE)

> head(ageFac)
[1] (26,36.5] (26,36.5] (36.5,47] [18,26]   [18,26]   [18,26]  
Levels: [18,26] (26,36.5] (36.5,47] (47,65]

> table(ageFac)   # check group size
ageFac
[18,26] (26,36.5] (36.5,47]   (47,65] 
     27        23        26        24

# 2) test continuous DV in age-groups
> anova(lm(A ~ ageFac))
Analysis of Variance Table
Response: A
          Df  Sum Sq Mean Sq F value Pr(>F)
ageFac     3    15.8   5.272  0.0229 0.9953
Residuals 96 22099.2 230.200               

# 3) chi^2-test for equal distributions of sex in age-groups    
> addmargins(table(sex, ageFac))
     ageFac
sex   [18,26] (26,36.5] (36.5,47] (47,65] Sum
  f        11        10        12      11  44
  m        16        13        14      13  56
  Sum      27        23        26      24 100

> chisq.test(table(sex, ageFac))
        Pearson's Chi-squared test
data:  table(sex, ageFac) 
X-squared = 0.2006, df = 3, p-value = 0.9775

(+1) The `cut2()` function from [Hmisc](http://cran.r-project.org/web/packages/Hmisc/index.html) is a very handy replacement to base `cut()`. — chl, Jan 18 '11 at 15:33
@chl Thanks! Indeed, `cut2()` makes things easier, especially argument `g` and `minmax`. — caracal, Jan 18 '11 at 15:48
Thanks for the reply and the example. I'll work through it carefully now. Seems I was on the wrong track trying to use 'hist(age)$...' to split the A variable. — SabreWolfy, Jan 18 '11 at 16:29

Splitting one variable according to bins from another variable

1 Answers1