I’m writing some code (JavaScript) to compare benchmark results. I’m using the Welch T-test because the variance and/or sample size between benchmarks is most likely different. The critical value is pulled from a T-distribution table at 95% confidence (two-sided).
The Welch formula is pretty straight-forward, but I am fuzzy on interpreting a significant result. I am not sure if the critical value should be divided by 2 or not. Help clearing that up is appreciated. Also should I be rounding the degrees of freedom, df
, to lookup the critical value or would Math.ceil
or Math.floor
be more appropriate?
/**
* Determines if the benchmark's hertz is higher than another.
* @member Benchmark
* @param {Object} other The benchmark to compare.
* @returns {Number} Returns `1` if higher, `-1` if lower, and `0` if indeterminate.
*/
function compare(other) {
// use welch t-test
// http://frank.mtsu.edu/~dkfuller/notes302/welcht.pdf
// http://www.public.iastate.edu/~alicia/stat328/Regression%20inference-part2.pdf
var a = this.stats,
b = other.stats,
pow = Math.pow,
bitA = a.variance / a.size,
bitB = b.variance / b.size,
df = pow(bitA + bitB, 2) / ((pow(bitA, 2) / a.size - 1) + (pow(bitB, 2) / b.size - 1)),
t = (a.mean - b.mean) / Math.sqrt(bitA + bitB),
c = getCriticalValue(Math.round(df));
// check if t-statistic is significant
return Math.abs(t) > c / 2 ? (t > 0 ? 1 : -1) : 0;
}
Update: Thanks for all the replies so far! My colleague posted some more info here, in case that affects the advice.