3

I am writing a program in C# that requires me to use the Ttest formula. I have to effectively interpret the Excel formula:

=TTEST(range1,range2,1,3)

I am using the formula given here

and have interpreted into code as such:

 double TStatistic = (mean1 - mean2) / Math.Sqrt((Math.Pow(variance1, 2) / 
        count1) + (Math.Pow(variance2, 2) / count2));

However, I don't fully understand t-test and the values I am getting are completely different than those calculated within Excel.

I have been using the following ranges:

R1:
91.17462277,
118.3936425,
96.6746393,
102.488785,
91.26831043

R2:
17.20546254,
19.56969811,
19.2831241,
13.03360631,
13.86577314

The value I am getting using my attempt is 1.8248, however that from Excel is 1.74463E-05. Could somebody please point me in the right direction?

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
Darren Young
  • 133
  • 3
  • Perhaps this question is better suited for Stack Overflow: http://stackoverflow.com/ –  Mar 08 '11 at 18:30
  • 2
    Why would you be squaring the variances in the second code snippet? Or are your variables just oddly named? – cardinal Mar 08 '11 at 18:58
  • @Darren Young, Excel has well documented problems with its statistical questions, so take care validating your code with it. This is for the future reference, as others pointed out in this case there was an error in your code. – mpiktas Mar 08 '11 at 21:06
  • @Darren Young, [this question](http://stats.stackexchange.com/questions/5913/interpreting-two-sided-two-sample-welch-t-test/5915#5915) and the answers might be of interest. – mpiktas Mar 08 '11 at 21:08
  • @Cardinal - I think I have mis-interpreted the formula. @mpiktas Thanks for the info. I'll take a look when in work tomorrow. – Darren Young Mar 08 '11 at 21:40
  • Excel makes a good prototyping platform for the C# code: first implement TTEST() using lower-level Excel calculations (the kind you can do easily in C#) and exploit Excel to verify their correctness. Once that's accomplished, the port is easy. – whuber Mar 08 '11 at 23:13

2 Answers2

5

There are at least two problems with what you have done.

  1. You have misinterpreted the formula $$t = \frac{\bar{x}_1-\bar{x}_2}{\sqrt{s_1^2 / n_1 + s_2^2 / n_2}}$$ since $s^2$ is already a variance (square of standard deviation) and does not need to be squared again.
  2. You are comparing eggs and omelettes: you need compare your "calculated $t$-value, with $k$ degrees of freedom ... to the $t$ distribution table". Excel has already done this with TTEST().

There are other possible issues such as using a population variance or sample variance formula.

Henry
  • 30,848
  • 1
  • 63
  • 107
  • Thanks Henry. So, when I calculate the variance, then that is the figure I need for s2, rather than squaring the actual variance again? Also, I had a look at the distribution table, but I wasn't sure what (if any) calculation I needed to perform against the table - what does k degrees of freedom mean? Sorry for the questions - I am clearly not a statistician - more of a computer programmer. Thanks again. – Darren Young Mar 08 '11 at 20:44
  • @Darren Young: Correct, you don't need to square the variances again; but you do need make sure you are using [unbiased estimates of the population variances](http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance). – Henry Mar 08 '11 at 22:43
  • @Darren Young: The [degrees of freedom](http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29) is a parameter in a [Student's t-test](http://en.wikipedia.org/wiki/Student%27s_t-test) - to complicate things, its simplistic explanation of how they come to be what they are does not apply when you are assuming unequal variances, so look especially at the section on [unequal variances](http://en.wikipedia.org/wiki/Student%27s_t-test#Unequal_sample_sizes.2C_unequal_variance). – Henry Mar 08 '11 at 22:44
  • @I have re-calculated the functions, and for the original data above I get 17.84 as the t value, which has been confirmed using online ttest calculator. Why then in the Excel formula would it give such a small number as 1.74463E-05 instead? I have looked at the dist table, and with this input data the df = 8, and checked at 0.05 value, but I don't know what to do with that value? Any ideas please? Thanks again. – Darren Young Mar 09 '11 at 11:45
  • You should get a $t$ value of about $15.959 \approx \frac{100-16.5915}{\sqrt{127.427/5 + 9.1449/5}}$ as confirmed [here](http://www.graphpad.com/quickcalcs/ttest1.cfm). If you are assuming unequal variances as you did with `TTEST(,,,3)` you need Welch's t-test and you get 4.57 degrees of freedom here which you might round down to 4; if you assume equal variances you get 8 degrees of freedom. Either way, you next need to find the probability that if the means were equal then $t$ would be 15.959 or more extreme. The probability what Excel is reporting and the step you have missed. – Henry Mar 09 '11 at 12:28
  • The reason Excel is giving such a small value is that with a sample of data in the range 91-119 and another sample in the range 13-20, it is exremely unlikely that the two samples came from distributions with the same mean. – Henry Mar 09 '11 at 12:48
  • Thanks Henry, I get the same t value now. Is there a simple formula to calculate the p-value? – Darren Young Mar 09 '11 at 13:06
  • There isn't a simple formula for the cumulative distribution function of [Student's t-distribution](Student's t-distribution) but any decent Statistics library for C# or whatever you are using should have something you can apply or approximate. I am not the person to ask on this. – Henry Mar 09 '11 at 13:12
2

Since version 4 the .NET libraries include the following methods:

StatisticFormula.TTestEqualVariances
StatisticFormula.TTestUnequalVariances

This class is available in the namespaces System.Web.UI.DataVisualization.Charting and System.Windows.Forms.DataVisualization.Charting

Link to documentation and usage:

TTestResult result = Chart1.DataManipulator.Statistics.TTestUnEqualVariances(0.2, 0.05, "Series1", "Series2");
Neil
  • 121
  • 3