Should grand-mean centering happen in long or wide dataset?

Question

This seems like a simple question but I've been having a hard time finding an answer. In a long daily diary dataset where each day has a row, the person mean for a given level-1 variable is repeated in each row. As a result, if I were to take the grand mean of the person mean in this dataset, it would be affected by the number of days each person participated. Thus, I'm assuming I should only calculate a grand mean from a wide dataset, correct?

You could use a weighted average, the weights being the sample size. — user2974951, Jul 30 '19 at 08:07
@mdewey Yes something like that, depending on how much we want to penalize bigger samples. — user2974951, Jul 30 '19 at 12:49

score 0 · Answer 1 · answered Jul 30 '19 at 15:57

Your concerns seem well justified. If you want to know the mean of people's ages and they are represented varying multiple times in your long data-set then you should use the wide data-set. It would also be possible as @user2974951 suggests to do it in the long data-set as long as you weight observations by the inverse of the number of times that person occurs in the data-set. If you only have the long data-set to hand then back-transform it into the wide one as working with that is much easier for some purposes including this one.

Should grand-mean centering happen in long or wide dataset?

1 Answers1