4

Could someone provide some Google-able words for this equation?

$$Q_{\chi^2,d} = \left[2^{d/2}\Gamma\left(\frac{d}{2}\right)\right]^{-1}\int_{\chi^2}^\infty (t)^{\frac{d}{2}-1}e^{-\frac{t}{2}}dt$$

It's from here. It's for calculating a p-value that, if the chi-square value exceeds it, causes rejection of the null hypothesis that the observed values are uniformly distributed.

In particular, I see the Gamma function, but what do the square brackets signify? And is there a name for the integral part on the right?

I have a program that calculates Pearson's chi-squared. But I have to look up the critical p-values on the calculator linked above, because I can't figure out how to use this equation. Any hints would be appreciated.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Sam Porch
  • 185
  • 1
  • 5
  • 6
    Have you thought of looking up "chi-square distribution"? This is its complementary cumulative distribution function by the way, $P(X>x)$. – Alecos Papadopoulos Jan 10 '15 at 04:04
  • 1
    @AlecosPapadopoulos I almost wrote that exact comment. Then I looked at the linked page, almost threw up in my mouth, and decided a more thorough answer was required. – shadowtalker Jan 10 '15 at 04:16
  • 4
    "*It's for calculating probability that a chi-squared value is due to chance.*" --- such a statement is not correct. It's used to calculate p-values, but they don't tell you that. If that's what your link says about it, I suggest you consider other sources. – Glen_b Jan 10 '15 at 04:40
  • 3
    The integral is an [Incomplete Gamma Function](http://mathworld.wolfram.com/IncompleteGammaFunction.html). The expression preceding it is a normalizing constant constructed to make $Q_{0,d}=1$. Together they give the [tail probability](http://mathworld.wolfram.com/TailProbability.html) for the $\chi^2$ distribution with $d$ degrees of freedom. It is sometimes called the [complementary cumulative distribution function](http://en.wikipedia.org/wiki/Cumulative_distribution_function#Complementary_cumulative_distribution_function_.28tail_distribution.29) (of a $\chi^2$ distribution). – whuber Jan 10 '15 at 18:19

2 Answers2

11

On the linked page it introduces the equation as, "[t]he probability Q that a $\chi^2$ value calculated for an experiment with $d$ degrees of freedom... is due to chance". This suggests it is a version of the chi-squared distribution's CDF. Moreover, it looks a lot like the chi-squared distribution's pdf listed on the Wikipedia page, but with the integral added.

Recognize that any distribution's CDF (cumulative distribution function) is the integral of it's pdf (probability density function). If you were to draw what people think of as the 'shape' of a distribution, you are typically drawing the pdf. Here are some chi-squared pdfs from the Wikipedia page:

enter image description here

Integrating over this means that for one of the curves, you take the height of the line at every point from a lower bound (possibly as low as $0$) to an upper bound (possibly as high as $\infty$) and add them up. In the case of your equation, you have an integral that goes from the observed chi-squared value to infinity.

A defining feature of a pdf is that it must integrate (add up) to $1$. But the expression inside the integral does not necessarily add up to $1$. We can get out of this problem by dividing by the total, as any number divided by itself is $1$. Notice that the bracketed expression is raised to the power of $-1$; thus, you are dividing the integral by the bracketed expression. From this we can deduce that the bracketed expression is the total (or would be, if you integrated over the entire range from $0$ to $\infty$).

So this calculation is giving you the proportion of the chi-squared distribution that is to the right of / $\ge$ the observed chi-squared value. Namely, it is giving the $p$-value.


At this point I must state that the quote from the linked page that I pasted in above is incorrect. It actually gives a pernicious misunderstanding / myth about $p$-values. It states that the equation gives you the probability an experimental value is due to chance. This is false. Instead, this calculation gives you the probability a value drawn from this distribution would be that large or larger. You do not know whether your observed value was drawn from this (null) distribution or not, and the $p$-value is definitely not the probability that the null hypothesis is true. To get a clearer understanding of $p$-values, it may help to read this excellent CV thread: What is the meaning of p values and t values in statistical tests?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thank you very much for your response, it did help. Unfortunately, though, this has just shown me how little I know of statistics. At the moment I'm just trying to write a computer program. I found the simplest solution was to take the javascript source from the website I linked and port it to C#. – Sam Porch Jan 10 '15 at 21:39
  • You're welcome, @SamPorch. I wouldn't worry too much about not having mastered statistics. Try reading some of the threads here & you can always ask when you have more questions. Given the issues w/ that site, I would be reluctant to use their code, though. I don't know javascript, but I would be surprised if there aren't pre-existing libraries that can perform these calculations for you. – gung - Reinstate Monica Jan 10 '15 at 22:38
10

The brackets are for grouping; they're just parentheses here. This is $1 - \operatorname{CDF}_{\chi^2}(x^2; d)$.

Let $\operatorname{PDF}_{\chi^2}(\cdot;d) = f(\cdot;d)$. Then

$$ f(t;d) \equiv \left( 2^\frac{d}{2} \operatorname{\Gamma}\left(\frac{d}{2} \right) \right)^{-1} t^{\frac{d}{2}-1} e^{-\frac{t}{2}} $$

so that

$$\begin{align} 1 - \operatorname{CDF}_{\chi^2}(x;d) &= 1 - \int_{-\infty}^x f(t)\,dt \\ &= \int_x^{\infty} f(t)\,dt \\ &= \int_x^\infty \left( 2^\frac{d}{2} \operatorname{\Gamma}\left(\frac{d}{2} \right) \right)^{-1} t^{\frac{d}{2}-1} e^{-\frac{t}{2}}\,dt \\ &= \left( 2^\frac{d}{2} \operatorname{\Gamma}\left(\frac{d}{2} \right) \right)^{-1} \int_x^\infty t^{\frac{d}{2}-1} e^{-\frac{t}{2}}\,dt \end{align}$$


As for the statement:

It's for calculating probability that a chi-squared value is due to chance.

That's not true.

For any random variable $X$, $\operatorname{CDF}(x) \equiv \operatorname{Pr}(X \leq x)$. So $Q_{\chi^2,d}$ is really the probability that a chi-square random variable with $d$ degrees of freedom takes a value less than $\chi^2$, whatever $\chi^2$ might be.

A more correct statement would be:

Given some test statistic $T$ and an observed value of that test statistic $t$, and given that $T \sim \chi^2(d)$ under the null hypothesis of that test, it's the probability that a $T$ at least as large as $t$ could have arisen purely by chance while the null hypothesis is known to be true.

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • (+1) I like the expression "... the probability that a chi-square value is due to chance". It either means that, with the complementary probability this "chi-square value" is deterministic, or that, with the given probability, this "chi-square value" follows a uniform distribution. Both open up doors to new probabilistic and statistical paradigms, I guess. – Alecos Papadopoulos Jan 10 '15 at 11:55
  • 3
    @Alecos That's no surprise, since the web pages that quotation comes from are devoted to parapsychology. One of the hallmarks of parapsychology is playing fast and loose with scientific method, including statistics. It's hard to imagine a worse place to learn about anything--except in a negative way, as an exemplar of how not to do things. – whuber Jan 10 '15 at 18:16