10

Say we have $X \sim \text{Beta}(\alpha, \beta)$. What's the sampling distribution of its sample mean?

In other words, what distribution does the sample mean $\bar{X}$ of a Beta follow?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Josh
  • 3,408
  • 4
  • 22
  • 46
  • 2
    Wow - tough question. Might be hard to characterize over all values of alpha and beta given the strange shapes that occur for some parameter choices, but when they're both greater than 1, it looks like it will tend asymptotically to Gaussian per CLT, but I can't say for sure. – T3am5hark Oct 25 '16 at 01:39
  • 5
    The asymptotic distribution of a sample average of a random sample will be governed by the CLT whenever the variance exists, which does not require that $\alpha,\beta>1$. – Christoph Hanck Oct 25 '16 at 08:08

2 Answers2

2

I thought this was an interesting question so here's a quick visual exploration. For $X\sim Beta(\alpha_1,\alpha_2)$, I first selected 4 separate Beta distributions (PDFs shown below).

Beta_PDFs

Then I collected sample means, $\bar X = \frac{1}{n}\sum_{i=1}^n x_i$ and plotted the corresponding histograms as shown below. The results look Normal and I'm inclined to believe @ChristophHanck's assertion that the Central Limit Theorem (CLT) is at work here.

Beta_means


MATLAB code

% Parameters
n = 5000;
K = 5000;
% Define Beta distributions
pd1 = makedist('Beta',0.25,0.45);
pd2 = makedist('Beta',0.25,2.5);
pd3 = makedist('Beta',4,0.15);
pd4 = makedist('Beta',3.5,5);
% Collect Sample Means
X1bar = zeros(K,1);
X2bar = zeros(K,1);
X3bar = zeros(K,1);
X4bar = zeros(K,1);
for k = 1:K                           % get K sample means 
    X1bar(k) = mean(random(pd1,n,1)); % take mean of n samples
    X2bar(k) = mean(random(pd2,n,1));
    X3bar(k) = mean(random(pd3,n,1));
    X4bar(k) = mean(random(pd4,n,1));
end
% Plot Beta distribution PDFs
Xsupport = 0:.01:1;

figure, hold on, box on
title('Beta(\alpha_1,\alpha_2) PDFs')
plot(Xsupport,pdf(pd1,Xsupport),'r-','LineWidth',2.2)
plot(Xsupport,pdf(pd2,Xsupport),'b-','LineWidth',2.2)
plot(Xsupport,pdf(pd3,Xsupport),'k-','LineWidth',2.2)
plot(Xsupport,pdf(pd4,Xsupport),'g-','LineWidth',2.2)
legend('(0.25,0.45)','(0.25,2.5)','(4,0.15)','(3.5,5)')

figure
s(1) = subplot(2,2,1), hold on, box on
    histogram(X1bar,'FaceColor','r')
s(2) = subplot(2,2,2), hold on, box on
    histogram(X2bar,'FaceColor','b')
s(3) = subplot(2,2,3), hold on, box on
    histogram(X3bar,'FaceColor','k')
s(4) = subplot(2,2,4), hold on, box on
    histogram(X4bar,'FaceColor','g')
title(s(1),'(0.25,0.45)')
title(s(2),'(0.25,2.5)')
title(s(3),'(4,0.15)')
title(s(4),'(3.5,5)')

Edit: This post was a quick attempt to provide the OP something. As pointed out, we know the Central Limit Theorem (CLT) implies these results will hold for any distribution with a finite variance.

SecretAgentMan
  • 1,463
  • 10
  • 30
  • 3
    You have run a bunch of examples demonstrating the CLT. As noted in comments, there's nothing special about Beta distributions in these examples: you may start with literally *any* finite-variance distribution and obtain identical results. – whuber Dec 27 '18 at 16:10
  • You are correct. I upvoted that comment but provided an answer because there was none. Of course CLT holds for a finite-variance distribution. I even mentioned the commenter in the answer. Should I delete this answer? Or make it community? – SecretAgentMan Dec 31 '18 at 16:02
2

Note: see also for the same question: Sum of n i.i.d Beta-distributed variables.

For the case of a uniform distribution, $\text{Beta}(1,1)$, the distribution of the sum of a number of independent variables (and the mean is related) has been described as the Irwin-Hall distribution.

If $$X_n = \sum_{i=1}^n Y_i \quad \text{ with } \quad U_i \sim \text{Beta}(1,1)$$

then you have a spline of degree $n-1$

$$f_X(x;n) = \frac{1}{(n-1)!} \sum_{j=0}^{n-1} a_j(k,n)x^j \quad \text{ for } \quad k \leq x \leq k+1$$

where the $a_j(k,n)$ can be described by a recurrence relation:

$$a_j(k,n) = \begin{cases} 1 & \quad k=0,j=n-1 \\ 0 & \quad k=0,j< n-1 \\ a_j(k-1,n) + (-1)^{n+k-j-1} {{n}\choose{k}} {{n-1}\choose{j}} k^{n-j-1} & \quad k>1 \end{cases}$$


You could see the above formula as being constructed by a repeated convolution of $X_{n-1}$ with $Y_n$ where the integral is solved piecewice. Can we possibly generalize this for Beta distributed variables with any $\alpha$ and $\beta$?

Let $$X_n(\alpha,\beta) = \sum_{i=1}^n Y_i \quad \text{ with } \quad U_i \sim \text{Beta}(\alpha,\beta)$$

We expect the function $f_X(x;n,\alpha,\beta)$ to be split up in $n$ pieces (though possibly not a spline anymore). The convolution to compute the distribution of $X_{n}(\alpha,\beta) = X_{n-1}(\alpha,\beta)+U_n$ will be something like:

$$f_X(x;n,\alpha,\beta) = \int^{\text{min}(1,x)}_{1-\text{min}(1,n-x)} f_X(x-y;n-1,\alpha,\beta) y^{\alpha-1}(1-y)^{\beta-1} dy$$

  • For $n=2$:

$$f_X(x;n,\alpha,\beta) = \begin{cases} \int_{0\phantom{-x}}^{x} ((x-y)y)^{\alpha-1}((1-x+y)(1-y))^{\beta-1} dy & \quad \text{if $0 \leq x \leq 1$} \\ \int_{x-1}^{1} ((x-y)y)^{\alpha-1}((1-x+y)(1-y))^{\beta-1} dy & \quad \text{if $1 \leq x \leq 2$} \end{cases}$$

  • For integer $\alpha$ and $\beta$: the terms like $((x-y)y)^{\alpha-1}$ and $((1-x+y)(1-y))^{\beta-1}$ can be expanded for integer values of $\alpha$ and $\beta$, such that the integral is straightforward to solve.

    For example:

    $$\begin{array}{} f_X(x;2,2,2) &=& \begin{cases} \frac{1}{30} x^3(x^2-5x+5) & \quad \text{if $x \leq 1$} \\ \frac{1}{30}(2-x)^3(x^2+x-1) & \quad \text{if $x \geq 1$} \end{cases}\\ \\ f_X(x;2,3,3) &=& \begin{cases} \frac{1}{630} x^5(x^4-9x^3+30x^2-42x+21) & \quad \text{if $x \leq 1$} \\ \frac{1}{630}(2-x)^5(x^4+x^3-2x+1) & \quad \text{if $x \geq 1$} \end{cases} \end{array}$$

The solution for integer values of $\alpha$ and $\beta$ will be a spline as well. Possibly this could be cast in some nice (or more likely not so nice) formula for more general situations (not just $n=2$ and $\alpha=\beta=2$ or $\alpha=\beta=3$). But at this point one needs quite a few cups of coffee, or better an infuse, to tackle this stuff.

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52
Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161