0

First off, I have a dataset with sparse longitudinal data. There are 30 individuals with 1 sample, 30 individuals with 2 samples, and 5 individuals with 3 samples. Various categorical variables are known for each individual and I want to see if these variables are correlated with a drug level (a continuous variable). Let's just focus on one categorical variable: homelessness. The main issue is that the number of people who are homeless is not equal to those who are not so I cannot perform a simple wilcoxon signed rank test or most other paired tests. As a result, I generated a linear model to see the relationship between homelessness and the drug levels using a random slope/intercept for each individual and another for those who are not homeless. Of course, if I just perform an ANOVA(linearmodel1, linearmodel2) I get the result:

"all fitted objects must use the same number of observations".

Edit:

As pointed in comment by @Roland (see link below in comments), one approach is to combine the data and make 2 models: 1 with the variable homelessness and 1 without. Using polynomial regression this can be done with:

###Create some example data
mydata1 <- subset(iris, Species == "setosa", select = c(Sepal.Length, Sepal.Width))
mydata2 <- subset(iris, Species == "virginica", select = c(Sepal.Length, Sepal.Width))

#add a grouping variable
mydata1$g <- "a"
mydata2$g <- "b"

#combine the datasets
mydata <- rbind(mydata1, mydata2)

#model without grouping variable
fit0 <- lm(Sepal.Width ~ poly(Sepal.Length, 2), data = mydata)

###model with grouping variable
fit1 <- lm(Sepal.Width ~ poly(Sepal.Length, 2) * g, data = mydata)

#Compare models
anova(fit0, fit1)
enter code here

#But this doesnt work in nlme
fit1 <- lme(Sepal.Width ~ Sepal.Length * g, data=mydata)
#It throws an error:
"invalid formula for groups"

######Not sure if this is the correct way
###nlme
#model without grouping variable
model0 = gls(Sepal.Width ~ Sepal.Length,data=mydata)
#model with grouping variable
model1 = lme(Sepal.Width ~ Sepal.Length ,random = ~1|g,data=mydata)
anova(model0,model1)
###lme4
#model without grouping variable
fm0 <- lm(Div ~ TimeRaw,ddmerged)
#model with grouping variable
fm1 <- lmer(Sepal.Width ~ Sepal.Length+(1|g),mydata, REML=FALSE)
anova(fm0,fm1)

But how do I create two models with and without a specific group using nlme/lme4?

Thanks in advance

user250071
  • 153
  • 6
  • There seems to be some confusion in your question regarding what a paired design (which makes a paired test or mixed-effects model appropriate) actually is. Anyway, you can easily solve this by not creating two separate models but one combined model. The approach would be similar to what I show in [this answer](https://stats.stackexchange.com/questions/231059/compare-the-statistical-significance-of-the-difference-between-two-polynomial-re/231091#231091). – Roland Jun 06 '19 at 06:05
  • Ok, so I have to generate 2 models from a merged dataset, one with the grouping and one without. While a polynomial regression is similar, it is not the same and is implemented differently. For example, what is the equivalent of "poly(Sepal.Length, 2) * g" for a linear model, say in the nlme package or the lme4 package? – user250071 Jun 06 '19 at 18:50
  • I believe `lme4::lme` has an `anova.lme` method. – AdamO Jun 06 '19 at 20:52
  • Yes....of course you can use anovas with lme4. But I do not know how to exclude / include a variable using these packages. – user250071 Jun 06 '19 at 20:58
  • Show your lme4 model formula and I can explain ... – Roland Jun 07 '19 at 06:01
  • @Roland, Im going to stick with the example for consistency. Please see updated question with some formulas for both nlme and lme4 – user250071 Jun 07 '19 at 23:00
  • The grouping variable indicating the dataset is a fixed effect and not the same as the grouping variable of the random effect (which should be a subject ID). The only challenge here is how to specify different variables for the random effects for both datasets.There are no repeated measures in the iris dataset. – Roland Jun 08 '19 at 05:33

1 Answers1

-2

For parametric test, t-test is the one you should use. But you have to make sure the populations are normally distributed. For non parametric it would be Wilcoxon Rank Sum Test on two independent samples.

rainman
  • 11
  • 2