I have two datasets. To simplify, they look something like this:
CountyMonthYear TotalPop FatalOverdoses
Brown_2017_01 2546 1
Brown_2017_02 2346 2
Jackson_2017_01 78345 7
Jackson_2017_02 80456 10
And:
CountyMonthYear TotalPop DrugSeizures
Brown_2017_01 2546 3
Brown_2017_02 2346 5
Jackson_2017_01 78345 20
Jackson_2017_02 80456 30
I want to find the relationship between drug seizures (predictor) and fatal overdoses, so likely using multiple linear regression. I'm wondering:
Should I use the raw counts of drug seizures and overdoses? Or, should I use seizures per capita as a predictor? Or, should I use population and drug seizures (raw) as separate predictors?
I've ran this model with raw data (i.e. regressing # of fatal overdoses on # seizures and not using population as predictor) and get a very good R-squared. BUT: I'm guessing this model is really just correlating the relationship between population and overdoses.. since clearly more populous counties have more seizures and more overdoses, and less populous counties have less of both.
When I regress per capita overdoses on per capita seizures, the R-squared goes to about 0. I'm struggling with how to proceed here in developing a model and how to transform these variables. Thanks in advance!