I am thinking about the difference between pooled cross sectional data and unbalanced panel data, especially in the context of fixed effect models.
What I can't get my head around is, where exactly the difference between these two types of data sets are. Suppose we draw from a pool of firms/households at 3 points in time. This definition alone would make it seem to like it was pooled cross-sectional data.
However, by chance, the resulting data frame could look like:
Firm | year | y |
---|---|---|
A | 2000 | 10 |
B | 2000 | 12 |
A | 2002 | 54 |
C | 2002 | 11 |
A | 2004 | 123 |
B | 2006 | 24 |
Meaning we observe firm A every year, firm B in every but the second year and firm C only in the second year. In my opinion such a data set would perfectly match the wikipedia-definition of an unbalanced panel data set which "is a dataset in which at least one panel member is not observed every period".
However, we started off with something that sounded very much like a pooled cross-section.
Can someone please eloberate a bit further on the difference between the two kinds of data sets? Furthermore, I am specifically interest in when it is possible to use unit fixed effects with pooled cross-sectional data, because it could also happen, that we observe 100% distinct groups at the two points in time, meaning that we would need to estimate N dummies for the inclusion of unit fixed effects.