I want to calculate a summary of a variable in a data.frame for each unique combination of factors in the data.frame. Should I use plyr to do this? I am ok with using loops as opposed to apply() ; so just finding out each unique combination would be enough.
-
1Question is misleading as you ask about unique combinations of factors and then in details you ask about summary by unique combinations. – Wojtek Aug 17 '10 at 05:59
6 Answers
See aggregate
and by
. For example, from the help file for aggregate
:
## Compute the averages according to region and the occurrence of more
## than 130 days of frost.
aggregate(state.x77,
list(Region = state.region,
Cold = state.x77[,"Frost"] > 130),
mean)

- 10,209
- 29
- 32
While I think aggregate
is probably the solution you are seeking, if you are want to create an explicit list of all possible factor combinations, expand.grid
will do that for you. e.g.
> expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
sex = c("Male","Female"))
height weight sex
1 60 100 Male
2 65 100 Male
...
30 80 100 Female
31 60 150 Female
You could then loop over each row in the resulting data frame to pull out records from your original data.

- 166
- 2
Here's the plyr solution, which has the advantage of returning multiple summary stats and producing a progress bar for long computes:
library(ez) #for a data set
data(ANT)
cell_stats = ddply(
.data = ANT #use the ANT data
, .variables = .(cue,flanker) #uses each combination of cue and flanker
, .fun = function(x){ #apply this function to each combin. of cue & flanker
to_return = data.frame(
, acc = mean(x$acc)
, mrt = mean(x$rt[x$acc==1])
)
return(to_return)
}
, .progress = 'text'
)

- 12,691
- 8
- 40
- 65
-
Thank You! This worked, although I had to drop a comma in the call to data.frame. stats = ddply( .data = ords , .variables = .(Symbol,SysID,Hour) , .fun = function(x){ to_return = data.frame( s = sum(x$Profit) , m = mean(x$Profit) ) return(to_return) } , .progress = 'text' ) – Aug 16 '10 at 16:06
I personally like cast()
, from the reshape package because of it's simplicity:
library(reshape)
cast(melt(tips), sex ~ smoker | variable, c(sd,mean, length))

- 6,672
- 9
- 35
- 46
In addition to other suggestions you may find the describe.by()
function in the psych
package useful.
It can be used to show summary statistics on numeric variables across levels of a factor variable.

- 42,044
- 23
- 146
- 250
In library(doBy)
there is also the summaryBy()
function, e.g.
summaryBy(DV1 + DV2 ~ Height+Weight+Sex,data=my.data)

- 787
- 2
- 8
- 20

- 17,079
- 16
- 67
- 98