This seems like a simple question but I've been having a hard time finding an answer. In a long daily diary dataset where each day has a row, the person mean for a given level-1 variable is repeated in each row. As a result, if I were to take the grand mean of the person mean in this dataset, it would be affected by the number of days each person participated. Thus, I'm assuming I should only calculate a grand mean from a wide dataset, correct?
Asked
Active
Viewed 89 times
1
-
You could use a weighted average, the weights being the sample size. – user2974951 Jul 30 '19 at 08:07
-
@user2974951 do you mean the inverse of the sample size? – mdewey Jul 30 '19 at 12:31
-
@mdewey Yes something like that, depending on how much we want to penalize bigger samples. – user2974951 Jul 30 '19 at 12:49
1 Answers
0
Your concerns seem well justified. If you want to know the mean of people's ages and they are represented varying multiple times in your long data-set then you should use the wide data-set. It would also be possible as @user2974951 suggests to do it in the long data-set as long as you weight observations by the inverse of the number of times that person occurs in the data-set. If you only have the long data-set to hand then back-transform it into the wide one as working with that is much easier for some purposes including this one.

mdewey
- 16,541
- 22
- 30
- 57