I conducted several experiments where I have one underlying independent variable (tree species, IV). Each of these experiments gave me one dependent variable (DV), like bark pH, rugosity or the water-holding capacity. Now I want to conduct a MANOVA to see if the tree species differ in the various dependent variables. My analysis is conducted in R
.
My model therefore looks like:
pH + rugosity + water-holding capacity + [...] ~ tree species
where I have per tree species...
- 3 measurements of the bark pH.
- 9 measurements of the bark rugosity.
- 4 measurements of the bark thickness,
- 5 measurements of the water-holding capacity,
- 5 measurements of the water retention.
However, unlike most examples I've found on how to do a MANOVA (i.e. here, here, here), my data stems from different measurements and from different individuals. Now, I've found only this thread discussing unequal sample sizes, but this targets only sample sizes within the explaining factor.
My Question:
My dependent variables all have different sample sizes. Would a MANOVA be appropriate for such kind of data? Can I just ignore the different variable sizes? Is there an alternative way to do this or rather an alternative statistic test? Does my small sample size matter?
EDIT: What I really want to find out
I really just want to conduct a statistical test telling me, if I have an underlying pattern. So are the tree species different in regards to the dependent variables? In the end I want to be able to tell, if some species have a certain set of traits different from other species.
Example Data:
My data looks like this:
> manova_df
# A tibble: 45 x 6
tree_species rugosity bark_mm pH whc ret
<fct> <dbl> <int> <dbl> <dbl> <dbl>
1 AS 2.36 8 6.49 295. 119.
2 AS 1.45 8 6.83 222. 105.
3 AS 3.13 9 5.8 291. 181.
4 AS 2.38 8 NA 314. 214.
5 AS 4.39 7 NA 613. 317.
6 AS 2.21 NA NA NA NA
7 AS 0.810 NA NA NA NA
8 AS 1.58 NA NA NA NA
9 AS 0.934 NA NA NA NA
10 BU 3.34 6 7.22 189. 74.9
# ... with 35 more rows
The NA
s stem from the fact that I have different sample sizes but had to get all the variables into one data.frame
. So I just binded all the columns of the different observations together. This means, that the single observations of the various DVs are not tight together and the order within the tree species levels is totally arbitrary!
My analysis is pretty straightforward:
mano_mod = manova(cbind(pH, bark_mm, rugosity, whc, ret) ~ tree_species, data = manova_df)
> summary(mano_mod)
Df Pillai approx F num Df den Df Pr(>F)
tree_species 4 2.4207 3.372 20 44 0.0003836 ***
Residuals 12
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Full Data Set:
structure(list(tree_species = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("AS", "BU", "CL",
"MB", "PR"), class = "factor"), rugosity = c(2.36, 1.45, 3.13,
2.38, 4.39, 2.21, 0.81, 1.58, 0.93, 3.34, 5.06, 0, 0.77, 12.64,
4.1, 0.8, 1.03, 0.84, 6.49, 9.09, 5.96, 5.32, 8.41, 15.29, 9.91,
7.65, 2.13, 9.43, 10.14, 13.24, 10.26, 9.81, 12.34, 17.23, 16.63,
8.82, 1.68, 0.7, 0.82, 2.43, 0, 0.76, 0.77, 0, 1), bark_mm = c(8L,
8L, 9L, 8L, 7L, NA, NA, NA, NA, 6L, 8L, 8L, 7L, 9L, NA, NA, NA,
NA, 9L, 9L, 8L, 10L, 9L, NA, NA, NA, NA, 5L, 9L, 9L, 8L, 4L,
NA, NA, NA, NA, 5L, 5L, 5L, 6L, NA, NA, NA, NA, NA), pH = c(6.49,
6.83, 5.8, NA, NA, NA, NA, NA, NA, 7.22, 7.11, 7.72, 7.29, NA,
NA, NA, NA, NA, 7.39, 7.18, 7.3, 7.3, NA, NA, NA, NA, NA, 6.76,
6.55, 6.24, NA, NA, NA, NA, NA, NA, 5.76, 6.59, 5.44, NA, NA,
NA, NA, NA, NA), whc = c(295.2, 222.4, 290.6, 314.3, 613.4, NA,
NA, NA, NA, 189.4, 248.2, 336.8, 330.1, 427.8, NA, NA, NA, NA,
236, 492.6, 549.3, 330.1, 370.7, NA, NA, NA, NA, 430, 142.2,
372.4, 260, 176.1, 680, 215, NA, NA, 333.8, 320.6, 282.4, 322.9,
576.7, NA, NA, NA, NA), ret = c(118.9, 104.9, 180.6, 214.5, 317.3,
NA, NA, NA, NA, 74.9, 95.7, 127.3, 150.1, 327.3, NA, NA, NA,
NA, 80.8, 176.7, 255.7, 142.6, 236.6, NA, NA, NA, NA, 148.4,
32.4, 244.2, 66.8, 76.4, 246.1, 73.6, NA, NA, 111.2, 151.3, 102.1,
200.6, 258.1, NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -45L))
(If anything is unclear, please ask.)