How to derive t-distribution using sympy?

Question

I'm studying inference and I would like to understand better how classical distributions were derived. I manage to derivate the t-distribution using sympy but my result is a bit different from whats was expected. Here is what I have tried:

First, I define a standard normal and a chi-square distributions:

from sympy import *
from sympy.stats import Normal, Chi, StudentT, density
from sympy.abc import x,y,z,k

X=Normal('X',0,1)
Y=Chi('Y',k)

normal=density(X).pdf(x)

chi=density(Y).pdf(y)

As I understand, we can recover the density of random variable give by the ratio of two independent random variables ($Z=X/Y$) using:

In this case, the above expression is given by:

integrate(abs(y)*normal.subs({x:z*y})*chi, y)

$$\frac{\sqrt{2} \cdot 2^{1 - \frac{k}{2}} \int \frac{y^{k} e^{- \frac{y^{2}}{2}} e^{- \frac{y^{2} z^{2}}{2}} \left|{y}\right|}{y}\, dy}{2 \sqrt{\pi} \Gamma\left(\frac{k}{2}\right)} $$

The computation of this integral gives:

result=integrate(abs(y)*normal.subs({x:z*y})*chi, (y,0,+oo))
result

$$\begin{cases} \frac{\left(z^{2} + 1\right)^{- \frac{k}{2} - \frac{1}{2}} \Gamma\left(\frac{k}{2} + \frac{1}{2}\right)}{\sqrt{\pi} \Gamma\left(\frac{k}{2}\right)} & \text{for}\: \left(2 \left|{\arg{\left(z \right)}}\right| \leq \frac{\pi}{2} \wedge \frac{\operatorname{re}{\left(k\right)}}{2} + \frac{1}{2} > 0\right) \vee \left(\frac{\operatorname{re}{\left(k\right)}}{2} + \frac{1}{2} > 0 \wedge 2 \left|{\arg{\left(z \right)}}\right| < \frac{\pi}{2}\right) \\\int\limits_{0}^{\infty} \frac{\sqrt{2} \cdot 2^{1 - \frac{k}{2}} y^{k - 1} e^{- \frac{y^{2}}{2}} e^{- \frac{y^{2} z^{2}}{2}} \left|{y}\right|}{2 \sqrt{\pi} \Gamma\left(\frac{k}{2}\right)}\, dy & \text{otherwise} \end{cases} $$

The interest here lays on the first expression, which looks like a t-distribution but is a bit different. It appears to me that I should have $\left(1+\frac {z^2}{k}\right)$, instead of $(z^2 +1)$ . Namely, the t-distribution should be:

$$\frac{\Gamma[(k+1)/2]}{\sqrt{k\pi}\Gamma(k/2)}\left(1+\frac {z^2}{k}\right)^{-\frac 12 (k+1)}$$

I thought that perhaps the two expressions are equivalent, then I did a numerical evaluation of my t-distribution with the real t-distribution. For $k=20$ and $z=40$, the real t-distribution gives:

Z=StudentT("z", k)

N(density(Z)(z).subs({k:20,z:40}))

3.60073318935429⋅10^21

while my t-distribution gives:

N(result.args[0][0].subs({k:20,z:40}))

3.98006030714142⋅10^34

I mean, the results are different because I did some thing wrong on the derivation or is something related to round error?

What am I doing wrong here? Am I in the right direction?

You forgot a term in the definition of the Student t statistic: it must also be divided by $\sqrt{k}.$ Thus, your answer has a scale factor of $\sqrt{k}$ incorporated. To compensate, you need to change $z^2$ to $z^2/k$ and divide the entire PDF by $\sqrt{k}.$ — whuber, Mar 18 '21 at 21:41
Thanks. Now it works. I need to understand how to reach the first expression. I just use it but I don't know where it come from. Any reading suggestion? — Lucas, Mar 18 '21 at 22:20
Sorry, I mean this equation: https://i.stack.imgur.com/u3MLX.png — Lucas, Mar 19 '21 at 13:30
See https://stats.stackexchange.com/questions/510331 for a sketch of a general solution. Detailed worked examples appear in https://stats.stackexchange.com/questions/185683. — whuber, Mar 19 '21 at 14:06

How to derive t-distribution using sympy?

0 Answers0