1

I have cross-tabulated some data from a survey. The core survey questions have been cross-tabulated with demographic variables. I am looking at these cross-tabulations to see if any interesting trends occur. So the process i am going through is fairly exploratory.

One of the cross-tabs is as follows: Which mode of transport do you use (tick all that apply)? X What is your postcode?

The cross-tabulated results look like this:

mode of transport postcode area 1 postcode area 2 postcode areas 3
Car 33 56 48
Cycle 12 10 30
Walk 45 40 65

A section of the raw data looks like this:

respondent ID car cycle walk postcode area
1 1 1 area 1
2 1 1 area 2
3 1 1 area 3

I want to know if respondents from particular postcode areas are more likely to use certain modes of transport. If the observations were independent, I would do a chi-sq test and look at the standardised residuals. However, the observations are not independent, as a respondent can select multiple modes of transport. This means that I cannot do a chi-square test as the assumption of independent observations is violated.

I have been considering what alternative tests I can perform. I think one possibility would be multinomial logistic regression with mode of transport as the response variable, postcode area as an explanatory variable and respondent ID as a random effect.

Other tests I have considered are McNemar test or Cochran's Q test, but from the examples I have seen, i am not sure if my data is applicable to these methods.

Which tests would you recommend for this scenario? I'd rather do something that is analogous to chi-square if possible, as i have done chi-sqaure tests on other parts of this data set where observations are independent.

GaryStats
  • 65
  • 1
  • 6

0 Answers0