Hypothesis testing for mean difference of 2 samples

Question

For example,

\begin{array} {|r|r|}\hline & \text{Sample}\enspace X & \text{Sample}\enspace Y \\ \hline mean & 14 & 20 \\ \hline median & 5 & 5 \\ \hline \end{array}

How should we approach in order to test if $mean_Y > mean_X$ is statistical significant?

My thought is: It seems that the variables in the 2 groups don't follow a normal distribution (mean != median). But if the sample size is large enough, we can use the two-sample t-test (parametric). However, if our sample size is too small, we should use Wilcoxon-Mann-Whitney (or rank sum) test (non-parametric).

Is that a good approach? Not sure if I'm missing anything here.

Thank you!

Note that hypotheses are about populations, not samples. If (as it seems) your hypothesis of interest involves the mean, you could directly do a nonparametric test of means (e.g a permutation test). How can you decide if sample size is 'large enough'? — Glen_b, Sep 15 '20 at 23:36
Thanks for the suggestion. I'm not sure but I think used on the Central Limit Theorem definition, usually, sample size >= 30 should be large enough? Most importantly, why don't we use two-sample t-test or Wilcoxon-Mann-Whitney test here? My ultimate goal is to test the statistical significance of the mean difference. — KatieN, Sep 16 '20 at 00:00
1. Note that the actual central limit theorem makes no reference to any specific sample size; it's about the behaviour of the standardized sample mean as n increases beyond any finite value. e.g. see https://en.wikipedia.org/wiki/Central_limit_theorem ... 2. A t-statistic is not just a mean 3. 30 is not always sufficient. See (i) https://stats.stackexchange.com/questions/81074/how-useful-is-the-clt-in-applications/81087#81087 (ii) https://stats.stackexchange.com/questions/412606/central-limit-theorem-only-needs-sample-size-n/412608#412608 ... ctd — Glen_b, Sep 16 '20 at 00:14
(iii) https://stats.stackexchange.com/questions/437372/central-limit-theorem-and-gaussian-distribution-gaussian-distributions-and-habi/437379#437379 (and many more on site) 4. when this works it just implies that the test will have about the right significance level; but people usually also care about power. ... So far I've basically been addressing the premises of the question, but that last question in your comment there would be best addressed in an answer. — Glen_b, Sep 16 '20 at 00:15
You might find a few relevant questions and answers for your issues are on site already -- such as https://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl/123389 ... it would be good to search for existing questions and identify what is not already duplicating questions already here — Glen_b, Sep 16 '20 at 00:18
Thank you! I also have some questions: 1/ From your answer here: https://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl, it seems that we could use any of these 3 tests (t-test, Mann-Whitney, Permutation test) depending on the sample size? May I know how did you define if it's "n medium-large/moderate small/very small"? 2/ When the 2 samples have the same medians but different means, is it correct to say that the distribution isn't normal? If not, what else I should also consider? — KatieN, Sep 16 '20 at 00:51
In respect of the second question: Note that *sample* mean can vary from sample median when their population values are identical, and that many very-non-normal distributions have equal population mean and median. You should probably reconsider the value of assessing normality of your sample for this purpose of choosing a test.. e.g. see https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless - I particular recommend Harvey Motulsky's short answer (though much more could be said) — Glen_b, Sep 16 '20 at 01:48
Sorry I'm still confused, so from what you said, I shouldn't assess the normality of my samples (by ignoring the mean & median differences in this case), and instead, use the sample size for choosing the test (as mentioned here https://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl), is that correct? — KatieN, Sep 16 '20 at 01:58
I don't mean to imply that you can just ignore the normality assumption, simply that checking it for a sample and then using that to decide which test to use *on that same set of data* is problematic; it affects the properties of any tests you might choose between — Glen_b, Sep 16 '20 at 02:17
On thwe "what's small vs medium-large" issue -- choosing where one should place the boundaries isn't some hard and fast thing, unfortunately; it depends on the circumstances (how sensitive things some things are depends on the population distribution, for example, and how much we care about that sensitivity depends on our individual circumstances and preferences), but in what I was writing "small" can mean things like "small enough that the discreteness of a nonparametric test affects available significance levels" or ,,, ctd — Glen_b, Sep 16 '20 at 02:29
ctd.. "small enough that asymptotic relative efficiency may noticeably overestimate relative power for a Wilcoxon-Mann-Whitney". Specific sorts of concerns are mentioned in that answer — Glen_b, Sep 16 '20 at 02:29

Hypothesis testing for mean difference of 2 samples

0 Answers0