0

I have a question about when to collapse raw data to means per unit (e.g., subjects). In my data I have the following variables:

  • id: subject id
  • rt: reaction time
  • type: either A or B
  • being: either animal, human, robot or plant

The structure of the data is that I have 100 trials per subject and each trial has an rt, a type and a being.

If I use two different collapsing methods, I get different values:

Method A:

I collapse my data so that I have a mean rt for each subject for each combination of type and being. Now I want to collapse the being values human and robot together and the values animal and plant together. So I add them and divide them by 2 (or use the mean function).

So I get: MeanA_human&robot and MeanA_animalplant

Method B:

I create a factor (e.g., being_category) which is 1 for human or robot and 2 for animal and plant, and then collapse per subject id, type and this factor.

Here I get: MeanB_human&robot and MeanB_animalplant

My problem is that MeanA_humanrobot is not exactly equal to MeanB_humanrobot (same for ..animalplant).

The differences are small, but I do not understand conceptually why there are any differences.

So basically - I think - this comes down to the question of when to collapse the data. Can someone help out here?

ben_aaron
  • 121
  • 5
  • You have a study with reaction times for plants? My hat's off to you. Are they censored or did you wait? – gung - Reinstate Monica Nov 18 '14 at 22:51
  • Which means differ from which? Can you list them? Are your variables correlated? – gung - Reinstate Monica Nov 18 '14 at 22:55
  • I just wanted to come with descriptive factors instead of the codes in my rawdata. I thought it might simplify my problem. RTs for plants would be groundbreaking... – ben_aaron Nov 18 '14 at 22:55
  • The means with collapsing per id, type and being and then deriving a mean for each category of being, differ from the means when I collapse per id and type directly via the being_category factor. The variables are correlated, yes. – ben_aaron Nov 18 '14 at 22:59
  • @gung I've adjusted the post. – ben_aaron Nov 18 '14 at 23:05
  • 1
    It would be nice to have some numbers, but given that your variables are correlated the issue is almost certainly Simpson's paradox. I have an answer here: [Basic Simpson's paradox](http://stats.stackexchange.com/a/21901/7290) that might be helpful. – gung - Reinstate Monica Nov 18 '14 at 23:21
  • It was - in a way. Thank you so much. You brought me on the right track. Great. It were minor differences in `n`... – ben_aaron Nov 19 '14 at 00:44

0 Answers0