0

I have a correlation between X and Y for different groups (they are states in the US, fwiw):

enter image description here

I'd like to make some claim about the aggregate correlation: something like "Overall, as X increases, Y decreases." Prima facie this seems like a reasonable claim to make as most of the individual correlations are downward-sloping.

However, I can't simply aggregate and regress all of the points: this would be committing the ecological fallacy, and give me an overestimate of the overall effect:

enter image description here

How can I perform some meaningful and "kosher" statistical analysis here? There must be something I can meaningfully conclude. I thought of using an arithmetic mean of each individual regression line, or running the regression while controlling for state. This question seems to be related, but I can't figure it out.

  • Why does state matter at all? If you have individual level data, then simply use a regression model. Judging by the number of points, you could easily adjust for each state as well as its interaction with $X$ and make inferences accordingly. – AdamO Mar 10 '20 at 16:06
  • @AdamO, do you mean the functional form $$Y = \beta_0 + \beta_1 X + \beta_2 State$$ where $State$ is a dummy variable? Is it necessary to interact with X? – Lieu Zheng Hong Mar 10 '20 at 16:33
  • After all, if I interact with State to get the functional form $$Y = \beta_0 + \beta_1 X + \beta_2 State + \beta_3 State * X$$, I recover the individual regressions. – Lieu Zheng Hong Mar 10 '20 at 16:35
  • 2
    Picking up on AdamO's comment, as I understand his suggestion it would conform with the first functional form in your first response. If you're running a linear regression using OLS, then the F-statistics would provide information about the strength of the association but be uninformative wrt the sign of the relationship. Running t-tests for each parameter would provide that more granular information. The magnitude and sign of *t* for *X* would give you the answer you seek. –  Mar 10 '20 at 17:16
  • @LieuZhengHong yes I do mean state as a dummy variable. Add the interaction, and it is 102 fixed effects in the model. But if the graphs are correct, I see there are several hundred if not thousand individual observations per state. – AdamO Mar 10 '20 at 19:00

0 Answers0