19

According to Wikipedia the beta probability distribution has two shape parameters: $\alpha$ and $\beta$.

When I call scipy.stats.beta.fit(x) in Python, where x is a bunch of numbers in the range $[0,1]$, 4 values are returned. This strikes me as odd.

After googling I found one of the return values must be 'location', since the third variable is 0 if I call scipy.stats.beta.fit(x, floc=0).

Does anyone know what the fourth variable is, and if the first two are $\alpha$ and $\beta$?

Comp_Warrior
  • 2,075
  • 1
  • 20
  • 35
Peter Smit
  • 458
  • 1
  • 3
  • 10
  • 1
    The [documentation](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html) calls the last two "location" and "scale" parameters. Thus the fourth is the scale parameter. Location and scale have standard statistical meanings. One interpretation in this context is given explicitly in the [NIST handbook](http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm). – whuber Sep 02 '13 at 15:12
  • I'm having this exact same issue, but for some reason all my beta models tend to "hold water". For instance for `stats.beta.fit([60,61,62,72])` I get `(0.7313395126217731, 0.7153715263378897, 58.999999999999993, 3.3500998441036982)`. Any idea what I can do about this? – TheChymera Nov 16 '14 at 16:15
  • 1
    Just adding this documentation for the generic continuous random variable fit method, which includes some examples using beta.fit(): https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html#scipy.stats.rv_continuous.fit – mathisfun Apr 13 '19 at 17:20

1 Answers1

21

Despite an apparent lack of documentation on the output of beta.fit, it does output in the following order:

$\alpha$, $\beta$, loc (lower limit), scale (upper limit - lower limit)

jdj081
  • 326
  • 3
  • 5
  • Is it just spitting out the lower and upper limits based on the range of the data, or doing something else? – shadowtalker Aug 29 '14 at 14:27
  • The limits are based on the probability distribution. ie. Normal distribution has no limits, but sample data rarely exceeds ~`+/-3`. Beta distribution has hard limits, with probability of 0 outside those limits. It is likely that your data won't reach the limits, depending on what you are modeling. In fact, trying to force those limits to match the range of the data can be problematic, as many beta distributions tend to zero probability at the limits. See [this post](http://stackoverflow.com/questions/23329331/how-to-properly-fit-a-beta-distribution-in-python) for more on that issue. – jdj081 Aug 29 '14 at 16:06
  • 1
    Yes, I'm aware. Those limits are always 0 and 1. Hence: what are the upper and lower limits returned by this function, and how are they at all the same as "location" and "scale"? I guess I just don't understand this answer. – shadowtalker Aug 29 '14 at 18:33
  • 5
    The way the beta distribution is defined, those limits are always 0 and 1. But the *generalized* beta distribution includes these two scaling factors. The data I model doesn't fall between 0 and 1, so I have to use those numbers. If your data is between 0 and 1, then those outputs should be very close to 0 and 1. If you know your limits are 0 and 1, you can force those with the `floc=0` and `fscale=1` kwargs. You will still get those outputs, but they will be identical to what you force them to be. And it will likely change your alpha and beta values. – jdj081 Aug 29 '14 at 19:42