I implement n
permutations into a regression analysis, to test the model for stability. Thus I obtain n
odds ratios (ORs) and n
associated 95% CI intervals.
Each permutation represents a matched-pair study. We pair similar case
's with control
's and then run a conditional logistic regression to obtain a measure of association between the outcome of interest and exposure variable (treatment status).
Taking the following example I have implemented into a R
script.
In short what I have done is:
- Take a portion of the a given population
- We assign at dummy variable to the population (1/0) to indicate treatment status
- based on a set of parameters we pair those with treatment status==1 to equivalent treatment status==0
- we define an outcome of interest that we wish to measure if treatment had an effect on the outcome
- We conduct a logistic regression to determine the ORs associated with treatment status
- we repeat this n time, each time obtaining an ORs and associated 95% confidence interval
But what I am not sure, is how I can report on the spread of my data. I generate a different odds ratio and 95% CI for each permutation.
Taking the following hypothetical example, we run a simulation 100 times. It only takes a minute to simulate.
We take an worked exampled from the Matching package in R.
set.seed(123)
# preamble, prepare the data for the simulation
#1.
library(Matching)
library(survival)
#2.
require(doParallel)
cl<-makeCluster(2)
registerDoParallel(cl)
#3.
clusterEvalQ(cl,library(Matching))
clusterEvalQ(cl,library(survival))
m <- 100
Result = foreach(i=1:m,.combine=cbind) %do%{
# attach the data
data(lalonde)
# we want to assess if treatment is associated with greater odds for the outcome of interest
# lets create our hypothetical outcome of interest
lalonde$success <- with(lalonde, ifelse(re78 > 8125, 1, 0))
# lets take a portion of the original population, say only 395
n <- sample(1:445,420, replace = F)
n <- sort(n, decreasing = F)
lalonde <- lalonde[n,]
head(lalonde$age)
# taking from the example from GenMatch (in Matching package)
#The covariates we want to match on
# but we only include some of the original variables (we come back to the others later)
X = cbind(lalonde$age, lalonde$educ, lalonde$black, lalonde$hisp,
lalonde$married, lalonde$nodegr)
#The covariates we want to obtain balance on
BalanceMat <- X
# creat our matrix
genout <- GenMatch(Tr=lalonde$treat, X=X, BalanceMatrix=BalanceMat,
pop.size=16, max.generations=10, wait.generations=1)
# match our collisions on a 1-1 basis
mout <- Match(Y=NULL, Tr=lalonde$treat, X=X, Weight.matrix=genout, ties = F, replace = F)
summary(mout)
# here we create our case and control populations
treat <- lalonde[mout$index.treat,]
control <- lalonde[mout$index.control,]
# and we want to apply a unique identifier for each pair
# we call this command during the regression
treat$Pair_ID <- c(1:length(treat$age))
control$Pair_ID <- treat$Pair_ID
# finally we combine the data
matched <- rbind(treat, control)
# now we run a conditional logitic regression on the paired data to determine the Odds Ratio associated with treatment
# we account for the difference in pairs by the strata() command
# we account for some of the original matching parameters that we removed from the matching process
model_1 <- clogit(success ~ treat + strata(Pair_ID) + re74, matched, method = 'efron')
summary(model_1)
OR_M1 <- exp(model_1$coefficients[1])
CI_U1 <- exp(confint(model_1))[1,2]
CI_L1 <- exp(confint(model_1))[1,1]
Result <- rbind(OR_M1, CI_U1, CI_L1)
}
To summarise the script:
- we take 420 people from the original population (of 445)
- we define the outcome of interest is. That is if the person had
re78 > 8125
yes or no - for each treat==1, we find an equivalent treat==0 based on age, educ, black, hisp, married, nodegr. We only want exact 1-1 matching
- we assign an unique indicator variable for each pair 1,2,3.....x
- We then develop a regression model to determine the OR for our outcome of interest (
re78 > 8125
) associated with the treatment status (=1 relative to =0). - we save the ORs and 95%CI
We can then plot the ORs and shade the 95%CI
plot(Result[1,], ylim=c(0,2.5))
polygon(c(1:m,m:1), c(Result[3,],Result[2,]),col=adjustcolor("grey", alpha=0.4), border = NA)
But how can I summarise the several ORs I obtained, the spread of it and/or an associated confidence level?
EDIT Am I able to assess my study as if it was a meta-analysis. If so, one could implement the solution proposed by @Bernd Weiss here?
For this we need to obtain the natural log of the ORs and the std. err.?
We update the last part of the command to:
.......
model_1 <- clogit(success ~ treat + strata(Pair_ID) + re74, matched, method = 'efron')
summary(model_1)
OR_M1 <- exp(model_1$coefficients[1])
l_OR_T2 <- model_1$coefficients[1]
s_e <- coef(summary(model_1))[1,3]
CI_U1 <- exp(confint(model_1))[1,2]
CI_L1 <- exp(confint(model_1))[1,1]
Result <- rbind(OR_M1, l_OR_T2, s_e, CI_U1, CI_L1)
Using we can then call upon the metagen()
, command
library(meta)
or.fem <- metagen(as.numeric(Result[2,]), as.numeric(Result[3,]), sm = "OR")
Where as.numeric(Result[2,])
is the log(OR) and as.numeric(Result[3,])
is the std. err. Thus we obtain a 95% CI ...... But have we introduced a bias in the CI by the imputations. We see our 95% range is significant (greater than 1), however for each permutation, we only get a lower 95% CI > 1
sum(as.numeric(Result[5,])>1.00)
times. Therefore I think the large n
and thus degrees of freedom
in the meta-analysis are giving us a significant result