5

From scipy documentation at https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html, Gamma distribution is written as

$$f(x, \alpha) = \frac{x^{\alpha - 1} e^{-x}}{\Gamma(\alpha)}$$

The doc also says this is equivalent to the more common way of parametrizing Gamma distribution

$$f(x, \alpha, \beta) = \frac{\beta^\alpha x^{\alpha - 1} e^{-\beta x}}{\Gamma(\alpha)}$$

but with a scale of $\frac{1}{\beta}$.

Can anyone provide more details to show how the two are equivalent?

I'm not sure if this is the right way to do it, but if I substitute $x = \beta y$ into the first equation, it seems there would be a factor $\beta$ missing compared to the 2nd equation.

zyxue
  • 847
  • 6
  • 21
  • You're right about the missing factor, because that substitution isn't quite the correct way to relate one density to another. One way to remember the correct formula is to bear in mind that $f(x)$ is always accompanied by a factor of $\mathrm{d}x:$ that's where the missing factor of $\beta$ comes from. For a detailed example see https://stats.stackexchange.com/a/415436/919 and for the underlying theory look at https://stats.stackexchange.com/a/154298/919. S. Empiricus' argument about needing to scale the height is illustrated at https://stats.stackexchange.com/a/14490/919. – whuber Sep 15 '21 at 17:46
  • I made before two graphs illustrating the need to scale the height, here https://stats.stackexchange.com/a/445930 and here https://stats.stackexchange.com/a/319362/ – Sextus Empiricus Sep 15 '21 at 22:50

2 Answers2

6

This distribution $f(x, \alpha) = \frac{x^{\alpha - 1} e^{-x}}{\Gamma(\alpha)}$ is the distribution with a fixed scale parameter $1/\beta = \theta = 1$.


The article states further on

The probability density above is defined in the “standardized” form. To shift and/or scale the distribution use the loc and scale parameters. Specifically, gamma.pdf(x, a, loc, scale) is identically equivalent to gamma.pdf(y, a) / scale with y = (x - loc) / scale

So, in the end, they put the second parameter back by the use of the scale parameter.


if I substitute $x = \beta y$ into the first equation, it seems there would be a factor $\beta$ missing compared to the 2nd equation.

If you transform the variable $x = \beta y$ you are sort of squeezing or stretching the density function. When you do this then you need to correct the height as well in order that the pdf integrates to a total area of 1.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • "gamma.pdf(x, a, loc, scale) is identically equivalent to gamma.pdf(y, a) / scale with y = (x - loc) / scale", now I see where the missing beta went, although it feels kind of convoluted. – zyxue Sep 15 '21 at 22:35
3

A change of variables in the density requires more than substitution. In particular you need to multiply by the absolute value of the derivative of the inverse function. This would be more obvious if you considered the cumulative distribution function and then differentiated to get the density: the extra multiplicative factor would come through the chain rule.

If $g(x)$ is strictly increasing, you could consider $$f_Y(y)=\tfrac{d}{dy} F_Y(y) = \tfrac{d}{dy} F_X\big(g^{-1}(y)\big) = f_X\big(g^{-1}(y)\big) \tfrac{d}{dy} \big(g^{-1}(y)\big)$$ and something similar if $g(x)$ was strictly decreasing. Combining these two results, a typical statement is that if you have $f_X(x)$ and want to consider the density of $Y=g(X)$, then $$f_Y(y) =f_X\big(g^{-1}(y)\big) \left| \tfrac{d}{dy} \big(g^{-1}(y)\big) \right|$$ though it gets more complicated when $g(x)$ is not a bijection

In your example $g(x)=\frac x\beta$ so $g^{-1}(y)=\beta y$ and thus you need to multiply by $\left| \tfrac{d}{dy} \big(g^{-1}(y)\big) \right| = \beta$

Henry
  • 30,848
  • 1
  • 63
  • 107