0

I have a data set that lists every country in Africa, what their main contribution is to their GNI (industry,agriculture, etc), their GNI, and whether or not they have access to the sea. I need to find the relation between having access to the sea and GNI, and then what kind of economy they have an GNI (separately). I beleive the simplest way to do this would be with linear regression, but I'm not sure how to transform the string variables into something I can actually use in the comparison. Any help on this would be appreciated

1 Answers1

1

Easier than that, it's just a t-test. The outcome (GNI) is continuous unless you've grouped it into ranges. The predictor is binary, whether or not you're by the sea.

You only need to create an "indicator variable" for the two groups in question: whether or not you're beside the sea. Indicator variables take on a 1 when some condition is true and are 0 otherwise. Creating such a variable allows you to directly estimate the means between the two groups defined by 0/1 values. Most statistical software packaged will do this for you automatically once it recognizes you've included a string variable as a predictor.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • In R, if you have three columns `GNI`, `Sea`, and `Industry`, it would just be `lm(GNI ~ Sea + Industry)`. The code will break out your categories for you. – gregmacfarlane Apr 19 '13 at 19:15
  • Well GNI is at the country level. You might want to use GNI_industry for this analysis, and the interpretation is "Controlling for industry type, what is the difference between GNI in landlocked countries versus countries with ports?" I think the next step would be to calculate the industry specific land locked vs. port access differences using `lm(GNI_industry ~ Sea * Industry)`. – AdamO Apr 19 '13 at 19:19
  • I'm currently using SPSS. I don't think I have to worry about controlling for variables in this instance. I'm fairly certain that I only need to compare industry type and sea access separately. Thanks though, this really helped me out! – user2120893 Apr 19 '13 at 19:34
  • 1
    If you have both industry type and access to the sea as independent variables then it is *not* a t-test, it is linear regression with dummy coded variables (aka ANOVA). – Peter Flom Apr 19 '13 at 19:50
  • @PeterFlom You are absolutely right. In fact, I only addressed the first part of the question: "I need to find the relation between having access to the sea and GNI". I didn't see the need for adjustment, as I assumed country GNI and port_access to be data elements available with country-level observations. Certainly an ANOVA would be appropriate for the second part. – AdamO Apr 19 '13 at 20:07