When to collapse data on subject level?

Question

I have a question about when to collapse raw data to means per unit (e.g., subjects). In my data I have the following variables:

id: subject id
rt: reaction time
type: either A or B
being: either animal, human, robot or plant

The structure of the data is that I have 100 trials per subject and each trial has an rt, a type and a being.

If I use two different collapsing methods, I get different values:

Method A:

I collapse my data so that I have a mean rt for each subject for each combination of type and being. Now I want to collapse the being values human and robot together and the values animal and plant together. So I add them and divide them by 2 (or use the mean function).

So I get: MeanA_human&robot and MeanA_animalplant

Method B:

I create a factor (e.g., being_category) which is 1 for human or robot and 2 for animal and plant, and then collapse per subject id, type and this factor.

Here I get: MeanB_human&robot and MeanB_animalplant

My problem is that MeanA_humanrobot is not exactly equal to MeanB_humanrobot (same for ..animalplant).

The differences are small, but I do not understand conceptually why there are any differences.

So basically - I think - this comes down to the question of when to collapse the data. Can someone help out here?

You have a study with reaction times for plants? My hat's off to you. Are they censored or did you wait? — gung - Reinstate Monica, Nov 18 '14 at 22:51
Which means differ from which? Can you list them? Are your variables correlated? — gung - Reinstate Monica, Nov 18 '14 at 22:55
I just wanted to come with descriptive factors instead of the codes in my rawdata. I thought it might simplify my problem. RTs for plants would be groundbreaking... — ben_aaron, Nov 18 '14 at 22:55
The means with collapsing per id, type and being and then deriving a mean for each category of being, differ from the means when I collapse per id and type directly via the being_category factor. The variables are correlated, yes. — ben_aaron, Nov 18 '14 at 22:59
It would be nice to have some numbers, but given that your variables are correlated the issue is almost certainly Simpson's paradox. I have an answer here: [Basic Simpson's paradox](http://stats.stackexchange.com/a/21901/7290) that might be helpful. — gung - Reinstate Monica, Nov 18 '14 at 23:21
It was - in a way. Thank you so much. You brought me on the right track. Great. It were minor differences in `n`... — ben_aaron, Nov 19 '14 at 00:44

When to collapse data on subject level?

0 Answers0