0

I am looking to test if two mean and standard deviation values I obtain are not significantly different (comparing a sensor method to the gold standard).

I am planning on using a two-tailed t-test in python, but I am getting confused when interpreting the results.

Does a high p-value mean that there is no significant difference and the two methods are similar?

I know this is an incredibly basic question but I don't really know statistics.

I plan on using scipy's ttest_ind and here is a box plot of my data where I am comparing each blue and orange box:

Data I am comparing

Eric
  • 131
  • 2

2 Answers2

1

You reject the null hypothesis that the means are equal when p-value < chosen significance level (usually 5%, or 1%). So, for example, if you get a p-value of 0.001% and your significance level is 5%, then you can reject the null and conclude that the means are statistically different. Indeed, in that case, you have the statistical evidence to reject the null hypothesis of equal means. If instead the condition $p-value<SignificanceLevel$ does not hold then you stay with the null hypothesis that the means are equal.

Fr1
  • 1,348
  • 3
  • 10
  • I apologize but my previous answer resulted truncated due to a bad use of $ in Tex, now fixed – Fr1 Aug 19 '19 at 20:06
0

A high p-value would tell you that either you do not have enough data to conclude that the means of both groups are different or that indeed the means are identical. In reality, the means in both groups will differ and the question is not whether they differ but rather by how much. An appropriate tool to answer this question would be confidence intervals (for the same significance level as you would run your test). A confidence interval then tells you whether you don't have enough data (large confidence interval) or the means are not really that far apart (small confidence interval). Note that these confidence intervals would be calculated for the difference between the means.

There are several other issues I see here:

  • You seem to be computing 4 separate t-Tests so you should remember to adjust for multiple comparisons. The simplest adjustment would be to do a Bonferroni correction by multiplying each p-value you obtained by $4$ or, equivalently, dividing your significance level $\alpha$ by $4$.

  • You should also question whether a t-Test is an appropriate test here, i.e. how big are the sample sizes in each group? Are they big enough for the CLT to work? Is your data skewed or does it have heavy tails (by looking at your boxplots this does not seem to be the case).

Stefan
  • 368
  • 1
  • 5