5

If a sub-sample of the survey sample, selected based on certain demographic characteristics of the data (e.g. age, race etc.), is used, which means the sub-sample might not be representative of the population anymore, is it better to not use original sampling survey weights provided or is it still better to use the survey weights calibrated for the original survey sample data?

P.S. If I choose to generate my own weights based on a sub-sample data, what is the best methodology for this, and could someone please point me to the references that might be helpful?

If the option of generating own weights isn't available, what would be the second best option?

StasK
  • 29,235
  • 2
  • 80
  • 165
tvl
  • 61
  • 6

1 Answers1

2

most nationally-representative survey weights are generated with those certain demographic characteristics (e.g. age, race, gender) as a part of their fundamental construction. unless you have a strong justification to stray from the weights provided to users of the microdata, you should err on the side of sticking with the survey weights. in r, this would simply mean analyzing the subsample as..

new_subsample <- subset( full_sample , some_demographic_group == TRUE )

lots of examples of subsetting available at http://asdfree.com thanks

Anthony Damico
  • 272
  • 2
  • 17
  • 1
    thank you for responding. But the original sampling weights were calibrated for these national representative survey sample data (that comes from complex survey design). If I take a subsample of the sample data for analysis (which isn't random, since the subsample selection is based on some demographic characteristics) , are the original weights still applicable? Will using those affect the biasedness or s.e's or estimates? Is it still better to use the original weights or not use them at all? – tvl Jun 12 '16 at 20:33
  • @tvl in general, nationally-representative weights produced by large government organizations like the cdc and the us census bureau are designed to maintain their representativeness when subsetting to some large group. please review the examples at https://github.com/ajdamico/asdfree – Anthony Damico Jun 13 '16 at 03:55
  • +1 to Anthony. A very accessible description of the subpopulation issue is available via http://www.stata-journal.com/article.html?article=st0153; while it is a journal on Stata software, the use of Stata is very, very light, and the exposition is very general. – StasK Aug 09 '16 at 16:12