I have a question regarding multiple regression with an unbalanced grouping factor. Essentially what I am doing is an ANCOVA, but the interaction term ends up significant (which is interesting!) so I've chosen not to call it a true ANCOVA.
The Data
The dataset is comprised of 72 individuals who responded to many different measures for the purposes of conducting a cluster analysis to uncover relatively heterogenous subgroups within the dataset. Three clusters resulted form this analysis, where the resulting cluster sizes were n=30, n=32, and n=10. These clusters were interpreted for the purpose of a descriptive analysis.
An independent dataset describes these same 72 individuals on two separate continuous measures: score, and dv. The hope for my current project is to asses the effect of group (cluster membership, unbalanced) and score (and the interaction) on the dv.
The Data (Example)
g1 <-rep(1,30)
g2 <-rep(2,32)
g3 <-rep(3,10)
group <-as.factor(c(g1,g2,g3))
score <-as.numeric(sample(1:10,72,replace=T))
dv <-as.numeric(sample(1:7,72,replace=T))
data <-data.frame(cbind(group, score, dv))
head(data)
head(data)
group score dv
1 1 9 5
2 1 3 6
3 1 10 6
4 1 10 6
5 1 10 6
6 1 4 5
My Question
1) Can I run an analysis despite my groups being so unbalanced? If I understand correctly, by using type III SS, all groups will be weighted equally but I'm not sure if this solves my issue so simply.
For example:
lm<-lm(dv~1+score*group,data=data)
library(car)
Anova(lm,type="III)
2) If not, am I unable to proceed in some other way?
I am looking for any suggestions / guidance as I try to sort this out.
Thanks!