0

i have a data set that's considered the best survey for a series of questions. therefore, i do not want to re-weight the survey -- and i cannot use any of the techniques described on http://r-survey.r-forge.r-project.org/survey/example-poststrat.html

i have one variable within the survey dataset that is not very good, and i want to adjust it to align better with a more realistic outside source.

the outside source has information on the marginal totals, but not the joint distribution. therefore, i am looking for a raking or calibration technique that only modifies a single variable and not the entire survey dataset -- but can control it based on marginal totals.

here's a simple calibration technique for a single survey variable (mobility) across a single control (stype). i am looking for a way to do this across two control variables (perhaps stype and both within this data set)

library(survey)
library(zoo)

data(api)

dstrat <- svydesign( id = ~1 , strata = ~stype , weights = ~pw , data = apistrat , fpc = ~fpc )

# some unadjusted results
svyby( ~ mobility , ~ stype , dstrat , svyquantile , seq( 0 , 1 , .1 ) , keep.var = FALSE )

# univariate control totals
stype_adjustment <- 
    data.frame( 
        pctile = seq( 0 , 1 , .1 ) ,
        stype_E = seq( 300 , 400 , 10 ) ,
        stype_M = seq( 400 , 500 , 10 ) ,
        stype_H = seq( 500 , 600 , 10 )
    )

so right here, notice my stype_adjustment dataset, which has my target percentiles for this variable. here are the desired 20th, median, and 80th percentiles for my variable mobility

stype_adjustment[ stype_adjustment$pctile %in% c( .2 , .5 , .8 ) , ]
      pctile stype_E stype_M stype_H
3    0.2     320     420     520
6    0.5     350     450     550
9    0.8     380     480     580

so those are the targets. that in mind, here's a pretty straightforward algebra-only adjustment:

# merge the eleven known controls onto all 101 percentiles
all_percentiles <- 
    merge( 
        data.frame( pctile = seq( 0 , 1 , 0.01 ) ) , 
        stype_adjustment , all = TRUE 
    )

# interpolate the in-between percentiles in the control columns
all_percentiles$stype_E <- na.approx( all_percentiles$stype_E )
all_percentiles$stype_M <- na.approx( all_percentiles$stype_M )
all_percentiles$stype_H <- na.approx( all_percentiles$stype_H )

# calcualte the pre-adjustment results from the data set
pa <- data.frame( t( data.frame( svyby( ~ mobility , ~ stype , dstrat , svyquantile , seq( 0 , 1 , .01 ) , keep.var = FALSE ) ) )[ -1 , ] )
pa[ , ] <- sapply( pa[ , ] , function( w ) as.numeric( as.character( w ) ) )
names( pa ) <- paste0( "unadj_" , names( pa ) )
pa$pctile = seq( 0 , 1 , 0.01 )

# merge on the unadjusted percentiles
all_percentiles <- merge( all_percentiles , pa )

# find the multiplier within each of your strata
all_percentiles$e_multiplier <- all_percentiles$unadj_E / all_percentiles$stype_E - 1
 all_percentiles$m_multiplier <- all_percentiles$unadj_M / all_percentiles$stype_M - 1
all_percentiles$h_multiplier <- all_percentiles$unadj_H / all_percentiles$stype_H - 1

# for each observation in the dataset, figure out the nearest percentile -- and save the associated multiplier
e_factor <- all_percentiles$e_multiplier[ unlist( lapply( dstrat$variables$mobility , function( z ) which.min( abs( z - pa$unadj_E ) ) ) ) ]
m_factor <- all_percentiles$m_multiplier[ unlist( lapply( dstrat$variables$mobility , function( z ) which.min( abs( z - pa$unadj_M ) ) ) ) ]
h_factor <- all_percentiles$h_multiplier[ unlist( lapply( dstrat$variables$mobility , function( z ) which.min( abs( z - pa$unadj_H ) ) ) ) ]

# compute the new mobility variable, adjusted based on your univariate controls
dstrat <- 
    update( 
        dstrat , 
        adjusted_mobility = 
            ifelse( stype == 'E' , mobility / ( 1 + e_factor ) , 
            ifelse( stype == 'M' , mobility / ( 1 + m_factor ) , 
            ifelse( stype == 'H' , mobility / ( 1 + h_factor ) , NA ) ) )
    )

notice that the weights have not changed -- and the unadjusted mobility variable gives the same results

svyby( ~ mobility , ~ stype , dstrat , svyquantile , c( 0.2 , 0.5 , 0.8 ) , keep.var = FALSE )

  stype statistic1 statistic2 statistic3
E     E         12         17         23
H     H          6         10         15
M     M          8         13         19

so this is not a calibrated data set.. it's only a single calibrated variable. the adjusted mobility variable has been scaled up across every percentile

svyby( ~ adjusted_mobility , ~ stype , dstrat , svyquantile , c( 0.2 , 0.5 , 0.8 ) , keep.var = FALSE )

  stype statistic1 statistic2 statistic3
E     E        319        350        380
H     H        510        544        580
M     M        414        448        480

how could this be done if i have the marginal controls for the variables stype and both but not for their joint distribution?

> head( apiclus1[ , c( 'stype' , 'both' ) ] )
  stype both
1     H  Yes
2     E  Yes
3     E  Yes
4     E   No
5     E  Yes
6     E  Yes

not sure if i need some iterative proportional fitting (for weighted data) algorithm, or what..

thanks!

Anthony Damico
  • 272
  • 2
  • 17

1 Answers1

2

IPF is a variant of log-linear modeling. It's no problem for the algorithm if you only have the univariate marginals and not their joint distribution. Can we assume that your goal is to create a new, adjusted mobility factor that can be used in lieu of the unadjusted one? The trick, of course, is that this new, adjusted mobility factor will have to subsequently scale based on the original and canonical set of weights.

So, run a two-variate IPF on mobility using the available marginals. Bring the new, adjusted (weighted) mobility values back down to the same unit of analysis for the original weights, as appropriate. Then divide the new, adjusted mobility factor by the original weight to get it into a form that will scale back up to the full adjusted values.

Mike Hunter
  • 9,682
  • 2
  • 20
  • 43
  • thank you! so something like this? (1) `rake` the survey data as normal, allowing the weights to change (2) for each observation, `adjusted mobility` = `original mobility` * raked-weight / original-weight – Anthony Damico Mar 05 '16 at 19:28
  • Yes...something like that. Note that in scaling the new, adjusted mobility values down to the same unit of analysis as for the original weights, some additional tinkering may be required. In other words, the new, adjusted weights for mobility will likely be aggregate weights. – Mike Hunter Mar 05 '16 at 19:36