Unit (geographic unit)

Question

I have to generate a regression model to get at the impact of each of 70 units (these are geographic organizations). We have to control for 39 variables other than unit (this is determined by the federal government). Since I have the entire population p values are not an issue nor is statistical power. One way to do this would be to create 69 dummy variables for the units (one per unit and one reference unit). One of the two dependent variables we will analyze is interval another has two levels (but the federal government determined we would run linear probability model so we will still run OLS). Any suggestions.

My goal is to find out how well the units are doing when controlling for a variety of factors. For example, we are trying to determine for each unit how much income they generate for their customers given factors such as age or education of their customers they have no control off (these are the controls I mentioned). I have studied splines before, but I struggle to interpret them. In any case the continuous predictors are not what interest us primarily, in this analysis. It is the performance of the units. One problem we have is there is very little theory to build on.

I decided to do what the federal government did, or will do, for this project which is to do fixed effect regression with one dummy for every unit. But being new to fixed effect regression I had a followup question. Do you remove one of the units as a reference level? And if you do how would you chose that unit. We need to measure the impact of every unit.

I have, depending on analysis, between 16-32 thousand cases. (16000-32000). — user54285, Mar 24 '21 at 00:37

kjetil b halvorsen · Answer 1 · 2021-04-03T18:24:51.953

0

You did not tell us your goal ... You seem to have enough observations so you can just code with dummys. For the continuous variables consider if they might have a nonlinear effect, so maybe representing (some of) them with splines or polys. Maybe even some interaction terms ...

You might consider regularization, and especially see Principled way of collapsing categorical variables with many levels?

For the extra question in edit: If unit is the only categorical variable, you can omit the intercept in place of omitting one unit. But the linear model you estimate does not depend on choice of parametrization, like the unit you decide to omit. You can always afterwards estimate the contrasts that you want. See Using an overall category as a reference group for dummy variables and Why is it necessary to "ignore" a level when applying sum contrasts?

If you want more specific advice you need to tell us some specifics!

edited Apr 03 '21 at 18:24

answered Mar 24 '21 at 02:21

kjetil b halvorsen

63,378
26
142
467

Thanks. My goal is to find out how well the units are doing when controlling for a variety of factors. For example, we are trying to determine for each unit how much income they generate for their customers given factors such as age or education of their customers they have no control off (these are the controls I mentioned). I have studied splines before, but I struggle to interpret them. In any case the continuous predictors are not what interest us primarily, in this analysis. It is the performance of the units. One problem we have is there is very little theory to build on. – user54285 Mar 24 '21 at 02:28
Please add this extra information in comments as an edit to the post. That might well help n getting some beter answers! Few people read comments! – kjetil b halvorsen Mar 24 '21 at 12:51
thanks. I did not think you could edit the original question after the first few minutes. I will. – user54285 Mar 24 '21 at 17:40

Unit (geographic unit)

1 Answers1