Problem
Suppose I have two variables: (1) heat index for each county in a state, $h_{it}$, and (2) acres in each county, $acres_{it}$. The data has 10 years and also includes a variable for the amount of ice cream melted, $y_{it}$ for each county and year in the sample.
I'm told that a strong predictor of ice cream melt can be found by weighting the heat index by the size of the county and then aggregate to state-level data, such that:
$$\frac{\sum_{i} h_{it} \cdot acres_{it}}{\sum_{i} acres_{it}} = hw_{st}$$
A simple linear regression can then predict the state-level ice cream melt by:
Regression 1:
$$log(y_{st}) = \beta_{1}h_{st} + \epsilon_{it}$$
Call:
lm(formula = log(y) ~ h, data = datt)
Coefficients:
(Intercept) h
2.64010 -0.01072
Regression 2 (using weighted variable):
$$log(y_{st}) = \beta_{1}hw_{st} + \epsilon_{it}$$
Call:
lm(formula = log(y) ~ hw, data = datt)
Coefficients:
(Intercept) hw
2.39100 0.04908
Question
Is the interpretation of these two regressions different? My interpretation for regression one is that an increase in $h$ increases $y$ by some percentage.
But what about the second regression? Is there a different way to interpret the regression coefficient because it is weighted?
Sample R
Code:
library(dplyr)
# Sample Data
datt <- structure(list(year = c(2000L, 2001L, 2002L, 2000L, 2001L, 2002L,
2000L, 2001L, 2002L, 2000L, 2001L, 2002L), county = c(1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), state = c("CA", "CA",
"CA", "CA", "CA", "CA", "CO", "CO", "CO", "CO", "CO", "CO"),
y = c(5L, 10L, 7L, 4L, 2L, 8L, 9L, 11L, 2L, 5L, 6L, 8L),
h = c(5L, 7L, 1L, 9L, 6L, 4L, 8L, 2L, 5L, 8L, 7L, 1L), acres = c(10L,
25L, 40L, 8L, 13L, 42L, 50L, 24L, 57L, 24L, 35L, 15L)), .Names = c("year",
"county", "state", "y", "h", "acres"), class = "data.frame", row.names = c(NA,
-12L))
# Build Weighted Variable
datt<- datt%>%
group_by(year) %>%
mutate(w = acres/sum(acres, na.rm = TRUE))
# Apply Weight
datt$hw <- datt$h * datt$w
# Aggregate to State-level
datt<- datt%>%
group_by(year, state) %>%
summarise(hw = sum(hw, na.rm = TRUE),
h = sum(h),
y = sum(y))
# Regression 1
lm(log(y) ~ h, data = datt)
# Regression 2
lm(log(y) ~ hw, data = datt)
Related Question: Weighting variable based on another variable