I want to compare the means of non-random samples to population mean. However, most standard tests (eg. t-tests, ANOVA, Welch test etc.) are based on the assumption that samples are randomly obtained from the population. In that case, is there any way to analyse the significant difference between the means of non-random samples? Thanks.

- 11
- 3
-
1Does this answer your question? [Can non-random samples be analyzed using standard statistical tests?](https://stats.stackexchange.com/questions/13607/can-non-random-samples-be-analyzed-using-standard-statistical-tests) – Ertxiem - reinstate Monica Jan 10 '20 at 11:50
-
1Nope :) I believe there are other methods , which is not discussed in that post. I read it before posting. – Maria Sahakyan Jan 10 '20 at 11:52
1 Answers
What tests generally do is that they tell you whether what you have observed is unlikely under the null hypothesis, which is a probability model for random data. Note that this does not mean that you can only use them if the data are indeed random. If you apply a statistical test to nonrandom data and you get a significant result, it means that your data look definitely different from what the random model would have predicted. If your result is not significant, it means that your data is (in the sense of the test statistic) indistinguishable from random data from the null hypothesis. This may be valuable information even if the data are in fact not random.
The problem is this: In case you reject the null hypothesis, which in your example is "means are the same", this could be because the means are meaningfully different, or it could just be because the data are nonrandom, and the nonrandom process has caused the observed mean difference. You may have knowledge about the subject matter and the data collection process that could give you an idea how this may have happened, or you may think that the data collection, despite being nonrandom, should not have produced so different means in case that the two treatments you may be comparing don't have different effects (which is probably what you are interested in). But the data alone can't tell the difference. Also non-rejection of the null hypothesis may be caused by nonrandom sampling in some way (although anyway a non-rejection can never be interpreted as "proving that the null hypothesis is true").
So you may run the test, however the result may not tell you what you want to know. Proper background knowledge may help you to appreciate it, though. (A Bayesian would tell you that in this case you should use a Bayesian approach and incorporate your knowledge in the prior.)
As statistical tests generally have probability models for random data as null hypothesis, it is not possible to "solve" this problem with a different statistical test, although arguably it cannot be solved in other statistical ways either (except by using background information), because ultimately the data on their own cannot tell you whether observed mean differences are caused by meaningful differences between treatments or just due to nonrandom sampling.

- 10,796
- 8
- 35