1

I implement n permutations into a regression analysis, to test the model for stability. Thus I obtain n odds ratios (ORs) and n associated 95% CI intervals.

Each permutation represents a matched-pair study. We pair similar case's with control's and then run a conditional logistic regression to obtain a measure of association between the outcome of interest and exposure variable (treatment status).

Taking the following example I have implemented into a R script. In short what I have done is:

  1. Take a portion of the a given population
  2. We assign at dummy variable to the population (1/0) to indicate treatment status
  3. based on a set of parameters we pair those with treatment status==1 to equivalent treatment status==0
  4. we define an outcome of interest that we wish to measure if treatment had an effect on the outcome
  5. We conduct a logistic regression to determine the ORs associated with treatment status
  6. we repeat this n time, each time obtaining an ORs and associated 95% confidence interval

But what I am not sure, is how I can report on the spread of my data. I generate a different odds ratio and 95% CI for each permutation.

Taking the following hypothetical example, we run a simulation 100 times. It only takes a minute to simulate.

We take an worked exampled from the Matching package in R.

set.seed(123)    
# preamble, prepare the data for the simulation
    #1.
    library(Matching)
    library(survival)
    #2.
    require(doParallel)
    cl<-makeCluster(2)
    registerDoParallel(cl)
    #3.
    clusterEvalQ(cl,library(Matching))
    clusterEvalQ(cl,library(survival))

    m <- 100


    Result = foreach(i=1:m,.combine=cbind) %do%{

      # attach the data
      data(lalonde)

      # we want to assess if treatment is associated with greater odds for the outcome of interest
      # lets create our hypothetical outcome of interest
      lalonde$success <- with(lalonde, ifelse(re78 > 8125, 1, 0))

      # lets take a portion of the original population, say only 395
      n <- sample(1:445,420, replace = F)
      n <- sort(n, decreasing = F)
      lalonde <- lalonde[n,]
      head(lalonde$age)

      # taking from the example from GenMatch (in Matching package)
      #The covariates we want to match on
      # but we only include some of the original variables (we come back to the others later)
      X = cbind(lalonde$age, lalonde$educ, lalonde$black, lalonde$hisp, 
                lalonde$married, lalonde$nodegr)

      #The covariates we want to obtain balance on
      BalanceMat <- X

      # creat our matrix
      genout <- GenMatch(Tr=lalonde$treat, X=X, BalanceMatrix=BalanceMat, 
                         pop.size=16, max.generations=10, wait.generations=1)


      # match our collisions on a 1-1 basis
      mout <- Match(Y=NULL, Tr=lalonde$treat, X=X, Weight.matrix=genout, ties = F, replace = F)
      summary(mout)

      # here we create our case and control populations
      treat <- lalonde[mout$index.treat,]
      control <- lalonde[mout$index.control,]

      # and we want to apply a unique identifier for each pair
      # we call this command during the regression
      treat$Pair_ID <- c(1:length(treat$age))
      control$Pair_ID <- treat$Pair_ID 

      # finally we combine the data
      matched <- rbind(treat, control)

      # now we run a conditional logitic regression on the paired data to determine the Odds Ratio associated with treatment
      # we account for the difference in pairs by the strata() command
      # we account for some of the original matching parameters that we removed from the matching process
      model_1 <- clogit(success ~ treat + strata(Pair_ID) + re74, matched, method = 'efron')
      summary(model_1)

      OR_M1 <- exp(model_1$coefficients[1])
      CI_U1 <- exp(confint(model_1))[1,2]
      CI_L1 <- exp(confint(model_1))[1,1]

      Result <- rbind(OR_M1, CI_U1, CI_L1)

    }

To summarise the script:

  • we take 420 people from the original population (of 445)
  • we define the outcome of interest is. That is if the person had re78 > 8125 yes or no
  • for each treat==1, we find an equivalent treat==0 based on age, educ, black, hisp, married, nodegr. We only want exact 1-1 matching
  • we assign an unique indicator variable for each pair 1,2,3.....x
  • We then develop a regression model to determine the OR for our outcome of interest (re78 > 8125) associated with the treatment status (=1 relative to =0).
  • we save the ORs and 95%CI

We can then plot the ORs and shade the 95%CI

plot(Result[1,], ylim=c(0,2.5))
polygon(c(1:m,m:1), c(Result[3,],Result[2,]),col=adjustcolor("grey", alpha=0.4), border = NA)

But how can I summarise the several ORs I obtained, the spread of it and/or an associated confidence level?

EDIT Am I able to assess my study as if it was a meta-analysis. If so, one could implement the solution proposed by @Bernd Weiss here?

For this we need to obtain the natural log of the ORs and the std. err.?

We update the last part of the command to:

.......    
model_1 <- clogit(success ~ treat + strata(Pair_ID) + re74, matched, method = 'efron')
      summary(model_1)

  OR_M1 <- exp(model_1$coefficients[1])

  l_OR_T2 <- model_1$coefficients[1]
  s_e <- coef(summary(model_1))[1,3]

  CI_U1 <- exp(confint(model_1))[1,2]
  CI_L1 <- exp(confint(model_1))[1,1]

  Result <- rbind(OR_M1, l_OR_T2, s_e, CI_U1, CI_L1)

Using we can then call upon the metagen(), command

library(meta)
or.fem <- metagen(as.numeric(Result[2,]), as.numeric(Result[3,]), sm = "OR")

Where as.numeric(Result[2,]) is the log(OR) and as.numeric(Result[3,]) is the std. err. Thus we obtain a 95% CI ...... But have we introduced a bias in the CI by the imputations. We see our 95% range is significant (greater than 1), however for each permutation, we only get a lower 95% CI > 1

sum(as.numeric(Result[5,])>1.00)

times. Therefore I think the large n and thus degrees of freedom in the meta-analysis are giving us a significant result

lukeg
  • 391
  • 2
  • 5
  • 14
  • What you are doing looks very much like bootstrapping but there you would sample 445 with replacement not 420. Perhaps I have misunderstood your procedure though. – mdewey Apr 11 '16 at 16:13
  • I only called upon 420 to induce noise, so the same case-controls were not always matched. But if we remove the command (or change the 420 to 445) then we get a varied OR values anyway. Nonetheless, I am unsure how I can report a confidence band? – lukeg Apr 11 '16 at 16:20
  • Is there a way to trick the boot function into taking m permutations instead of the standard sampling with indices – lukeg Apr 13 '16 at 06:35

0 Answers0