1

Let's say I want to estimate the effects of wars in GDP. My data looks something like this, where war 1 was the first war in the country in that time period (but is not the same war), and LongWar1 is a variable which takes value 1 if the war lasted for longer than 3 months and 0 otherwise:

Country Year Years since War 1 Years since War 2 Long War 1 GDP
Afghanistan 1970 50 25 0 3,000,000
Iraq 1970 -5 2 1 4,000,000

Of course wars that haven't happened yet at that year cannot have any effect on GDP. So I coded the above to get an NA for that year if the war hasn't happened yet:

Country Year Years since War 1 Years since War 2 Long War 1 GDP
Afghanistan 1970 50 25 0 3,000,000
Iraq 1970 NA 2 NA 4,000,000

I am trying to estimate regression:

lm(GDP ~ YearssinceWar1 + YearssinceWar2 + LongWar1 + LongWar2 + YearsSinceWar1*LongWar1, data = mydata)

Of course, R doesn't let me run this due to the NA variables. Is there any way around this to get the regression I want? The only other alternative I can see is setting the NA variables to 0, but this comes with its own set of problems (Like also coding LongWar1 as 0, which simply suggests a short war as this is a dummy varibale).

Thanks!

  • 2
    What is the `Long War 1` variable? But see https://stats.stackexchange.com/questions/372257/how-do-you-deal-with-nested-variables-in-a-regression-model/372258#372258 – kjetil b halvorsen Jul 20 '21 at 15:29
  • Sorry, I wasn't clear: the LongWar1 variable is a dummy variable which takes value 1 if the war lasted for more than 3 months and 0 otherwise. Ill edit it in the question as well. And thanks! – Student In Need Jul 20 '21 at 16:12
  • The basic statistical question here is answerable (and kjetilbhalvorsen's comment points to a useful thread) but this seems like a strange model to fit. There's other determinants of GDP, the causal direction between poverty/GDP and war is not as simple as this model assumes, among other things. At the very least you'd want to model a time series of GDP (or GDP per capita) per country, not use a single GDP value. – mkt Jul 22 '21 at 06:18
  • Thanks! My variable is not GDP and I do have a time series of the dependent variable, as well as other covariates. This was just a simpler example to make it easier for people to understand and help me with the statistical question. Thanks for the comment though! – Student In Need Jul 26 '21 at 15:14

0 Answers0