How do I construct an economic index with historical data using Principal Component Analysis?

Question

I initially asked this question here: https://economics.stackexchange.com/questions/50054/how-do-i-build-a-synthetic-measure-of-economic-activity-for-linear-regression
Since I received a good hint for which statistical method I might want to use, I hope it's ok to restate the question here.

The situation is this:

I'm trying to estimate the effects of WW2 on the economy using historical data for cities.

The regression is pretty simple:

$ economy1950 = \alpha + \beta_1*destruction1945 + \beta_2*distance to border + \beta_3*economy1939 + \epsilon $

The problem is that I don't have good data on economic activity (e.g. GDP) for the cities. But I do have data on this:

the amount of housing built in 1950
industrial output 1950
government expenditures 1950

I can plug in any of those measures as "economy1950" but none covers a large range of economic activity.

I got the hint to use Principal Component analysis to create a (sort of) index of the cities economies. But I'm not sure how to properly do it.

Do I run the PCA (prcomp() in R) on my data ($housing1950$, $industry1950$, $expenditures1950$) and then use the weights of the components to construct the index?

Something like this:

$economy1950 = W_1 * housing1950 + W_2 * industry1950 + W_3 * expenditure1950$

Of course, if any of the weights are negative or very low, it would be best to omit them. What's the correct decision criteria for which components to retain?

I'm concerned about the interpretability of the result – I'm guessing I can still figure out the statistical significance and sign of any effect but I'm unsure if this allows for any interpretation of the economic significance of the OLS results. Is it possible to construct $economy1950$ in a way that maximizes the amount we can interpret any regression results using PCA?

Any help would be greatly appreciated!

In general, leveraging causal inference techniques might be more suitable to address the effects. See [Causal Inference and Data-Fusion in Econometrics](https://arxiv.org/abs/1912.09104). — msuzen, Jan 19 '22 at 09:49
@MehmetSüzen I'm sorry but I don't follow the math/statistics in the paper. There doesn't seem to be a practical summary. Understanding the whole paper would take me weeks, and not even sure if it's going to be helpful. — Tototulbi, Jan 20 '22 at 16:51
Agreed. One needs time to be able to build causal inference models. A recent basic book might be more accessible [The Effect](https://theeffectbook.net/ch-StatisticalAdjustment.html). — msuzen, Jan 20 '22 at 18:15

How do I construct an economic index with historical data using Principal Component Analysis?

0 Answers0