3

I have two surveys of business owners. One is a sample (sample 1) of business owners who were not members of the association, done using a random digit dialing approach. The other is a sample (sample 2) of business owners who were members of an association (used the list of members for the frame). Both samples were stratified by state. Sample 1 was weighted using state and gender. Sample 2 was weighted for non-response and weighted to reflect the association characteristics. As it turns out, there is a greater proportion of males in the association data (sample 2) than in the non-member data (sample 1 data). At first look, it appears that the association appeals to males more than females. We have 500 responses for each sample.

I want to use a probit regression to find the factors (mostly demographic type of factors) influencing association membership. What is the best way to do this? I was hoping I could just combine the samples. However the samples were done independently. Clearly there is a huge difference in the size of the populations - sample 1 is huge (all owners in the country) and sample 2 is just the members. The survey instrument (questions) was almost identical - there were some extra questions for the members of the association (sample 2) related to the satisfaction with the association.

chl
  • 50,972
  • 18
  • 205
  • 364
eliz
  • 31
  • 1
  • I'm wondering why you want to use probit. Logistic regression is standard for relating covariates to a discrete binary response. Should you be interested, I wrote a good deal about all that here: [difference-between-logit-and-probit-models](http://stats.stackexchange.com/questions/20523/30909#30909). – gung - Reinstate Monica Aug 25 '12 at 18:18

1 Answers1

2

This looks like a case-control study (all cases are sampled, a similar number of controls are sampled at a much lower rate) -- read up Alastair Scott and Chris Wild's s work on this (book chapter, invited lecture). I second gung's opinion about logistic regression being somewhat more suitable (the theoretical advantages being the exponential family and sufficient statistics).

StasK
  • 29,235
  • 2
  • 80
  • 165
  • Thanks to everyone for the answers. I'm now clear on what I need to do and I've found a couple of very useful papers. You're correct,it is logistic regression I want - to get the odds ratios out. Thanks!! – eliz Aug 26 '12 at 21:20