1

We have good discussions on hypothesis testing in large data: With large amount of data, even tinny difference between two samples can be detected, and we are almost certain to reject NULL hypothesis.

Effect size is one "fix", but I am interested in other fix, where, in addition to say they are different, but say how much the difference are.

I not know how to do it in R. Let us assume we are doing two sample T test, how can I say I want to test if the means are different by say certain amount say $0.1$ ?

The following code is test if two means are equal. How to modify it to test if two means are difference by $0.1$?

sample1=rnorm(1e5)
sample2=rnorm(1e5)
t.test(sample1,sample2)


    Welch Two Sample t-test

data:  sample1 and sample2
t = -0.5542, df = 2e+05, p-value = 0.5794
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.01124444  0.00628720
sample estimates:
    mean of x     mean of y 
-0.0002658978  0.0022127222 

Edit:

I was trying to ask if the amount difference on mean of two samples within range of $-0.1$ to $0.1$, not exactly to equal $0.1$.

Reading Peter Flom 's comment, modifying T test will not meet my needs?

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • 1
    In a regular two sample t-test the null is "the two means are equal". In a test of equivalence, the null is "the two means are different by at least XXX". I think you want something else, namely null is "the two means are withing XXX" – Peter Flom Oct 06 '16 at 21:33

4 Answers4

2

R code questions are actually off-topic here but it is:

t.test(sample1, sample2, alternative = "two.sided", mu = 0.1)

I think the question should stay here, however, because it raises a statistical question. My view is that looking for an effect larger than a certain amount makes a lot of sense, but it can be hard to choose an amount.

Stefan
  • 4,977
  • 1
  • 18
  • 38
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Thanks, I thought to ask it to stack overflow.. but as you said, it raises a statistical question. – Haitao Du Oct 06 '16 at 21:17
  • 1
    No, it's not a stack overflow question either - because it's not really about programming. It's an interesting statistics question along with a very basic R question. – Peter Flom Oct 06 '16 at 21:18
  • 1
    two.sided not two-sided. But I'm not sure this is right, I think the OP really wants the null hypothesis is mu is between -0.1 and 0.1, and alternative is that abs(mu) > 0.1 (not mu = 0.1).? – Peter Ellis Oct 06 '16 at 21:20
  • Thanks for the syntax fix but I think what I had does test abs(mu) > 0.1. Exactly 0.1 doesn't make sense. Unless I am missing something (which certainly could be the case). Are you suggesting OP wants a test of equivalence? – Peter Flom Oct 06 '16 at 21:22
  • @PeterFlom sorry I was trying to ask within range of -0.1 to 0.1, not exactly to equal 0.1 – Haitao Du Oct 06 '16 at 21:23
  • What is -0.1 to 0.1? Confidence interval? – Jon Oct 06 '16 at 21:25
  • Now you've got me confused. A test of equivalence is inherently quite different from a (regular) t test. I don't think a TofE is a solution to the big data issue. – Peter Flom Oct 06 '16 at 21:26
  • I'm pretty sure your test has the null hypothesis that difference = 0.1 and "alternative hypothesis: true difference in means is not equal to 0.1" (from the output). So the small p-value it returns just means there is evidence against the difference in means being exactly 0.1. – Peter Ellis Oct 06 '16 at 21:28
  • I think @hxd1011 needs to clarify his question – Jon Oct 06 '16 at 21:28
2

There is not a usual way to use a t-test to test if the absolute difference is larger than 0.1, or even to test if the difference lies within a given range.

However, your goal of assessing how large the difference is can be achieved by using the confidence interval. It's already in your results, since R and most statistical packages produce a confidence interval when told to perform a t-test:

95 percent confidence interval:
 -0.01124444  0.00628720
Pere
  • 5,875
  • 1
  • 13
  • 29
  • I think this is the best approach. Fuzzy null hypotheses are tricky in the frequentist paradigm, and more attention to the effect size rather than a sharp hypothesis test is the best and simplest thing to explain and encourage. – Peter Ellis Oct 06 '16 at 22:25
1

I'm not going to attempt to answer the R portion of this question, but I wanted to comment on this: "Effect size is one "fix", but I am interested in other fix, where, in addition to say they are different, but say how much the difference are."

The purpose of effect size is to attach a measure of magnitude of effect, or difference, between the populations (or treatments).

If you are not familiar with effect sizes, I recommend you read: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ and http://www.leeds.ac.uk/educol/documents/00002182.htm

'Effect size' is simply a way of quantifying the size of the difference between two groups.

As a statistical consultant, I always push for reporting of effect sizes to any statistical product.

An Edit:

I want to comment on this: "I was trying to ask if the amount difference on mean of two samples within range of −0.1−0.1 to 0.10.1, not exactly to equal 0.10.1."

I think there needs to be some clarity on this. What exactly is -0.1 to 0.1? Is this a confidence interval? Acceptance region?

Jon
  • 2,180
  • 1
  • 11
  • 28
  • +1, I think may need your help to say it formally. What in my mind, is: we know they are different, but can we say they are different by $x$ amount? – Haitao Du Oct 06 '16 at 21:40
  • No. That's the purpose of a confidence interval. You're 95% confident that the difference between Pop A and B is -0.1 to 0.1. Also, note that that's another big issue. If 0 is in your Conf Interval, then it's REALLY difficult to accept that there is a difference at all. – Jon Oct 06 '16 at 21:43
0

You can just subtract or add 0.1 to one of the datasets.

Gena Kukartsev
  • 206
  • 1
  • 2