Low number of observation while comparing means

Question

I would like to test the difference between the mean weights of 4 groups So, my dependent variable is continuos, but there are 3, 12, 26 and 56 observations in my groups. Should I use One-way ANOVA or Kruskal-Wallis H Test? If it is Kruskal-Wallis H Test I should be using and get a p-value < 0.05, which post-hoc test should I use next?

The choice of ANOVA vs. Kruskal Wallis H test does not depend on sample size. — Peter Flom, Mar 07 '17 at 20:35
What is the reason the group sizes are so different? It is so extreme as to suggest something unusual might be going on which we should look into before venturing an answer. — whuber, Mar 07 '17 at 20:37
@whuber, I have 97 observations and classified them by being smaller or greater than some values, which led to 4 groups. So, I would like to campare their means now. I am expecting to get a p value < 0.05 because of the way I classified them and ANOVA produces a p-value of 0.59 where Kruskal-Wallis 0.02 — Günal, Mar 07 '17 at 20:44
Were the comparisons made using the weights or some other variable? — whuber, Mar 07 '17 at 20:56
@whuber, it was another variable, age. Those < 18 in group 1, those >= 18 & <25 in group 2 etc. So, I have 3 people aged < 18, 12 people in 2nd group etc and want to compare their weights — Günal, Mar 07 '17 at 21:05
Have you considered a regression model? That would be much more powerful (and less arbitrary) than binning the weights into groups. Indeed, often a scatterplot (of weight *vs* age) will settle the issue and a formal test might be unnecessary. It will also provide much more insight into just how weights vary with age. — whuber, Mar 07 '17 at 23:34
please incorporate information or results by editing your question. — , Mar 09 '17 at 08:24

score 2 · Answer 1 · answered Jan 17 '19 at 08:34

As @whuber points out in his comment above, a regression model is a far better and more powerful approach to analysing your dataset than an ANOVA. Splitting a continuous variable into bins causes major problems (discussed here and here). It is best avoided even in the absence of the imbalanced group problem that you are facing.

score 0 · Answer 2 · answered Jan 17 '19 at 08:56

I agree with mkt. I've done many similar comparisons and I suggest you to try linear regression. Depending on the nature of your response variable, you can go for general linear models or generalized linear models. With generalized linear models, you define an error distribution family (including those other than Gaussian) depending on the nature of your response variable. With general linear models, you assume that the residuals or errors will follow a normal (or Gaussian) distribution. To determine what error family, you should examine your response variable. Things such as whether it assumes negative values or not, whether it's continuous or discrete, whether it's binomial or not etc. will be determining factors. There is a whole bunch of stuff to tell but I suggest you start slowly and learn linear regression well as it is used very commonly and frequently.

P.S. Remember you should always do some data exploration (could be box plots in your case, to have an idea about your data in the beginning and to see if it's worth further exploring)

Low number of observation while comparing means

2 Answers2