9

So, let's say you flip a coin 10 times, and call that 1 "event". If you run, 1,000,000 of these "events", what is the proportion of events that have heads between 0.4 and 0.6? Binomial probability would suggest this to be about 0.65, but my Mathematica code is telling me about 0.24

Here's my syntax:

In[2]:= X:= RandomInteger[];
In[3]:= experiment[n_]:= Apply[Plus, Table[X, {n}]]/n;
In[4]:= trialheadcount[n_]:= .4 < Apply[Plus, Table[X, {n}]]/n < .6
In[5]:= sample=Table[trialheadcount[10], {1000000}]
In[6]:= Count[sample2,True];
Out[6]:= 245682

Where is the mishap?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 3
    perhaps this would be better suited to mathematica stackexchange http://mathematica.stackexchange.com/ – Jeromy Anglim Feb 02 '15 at 04:56
  • 1
    @JeromyAnglim In this case I suspect the problem is probably with the reasoning rather than strictly the coding. – Glen_b Feb 02 '15 at 05:34
  • @Glen_b I guess the main thing is that there is a good answer somewhere on the internet, which you appear to have provided. :-) – Jeromy Anglim Feb 02 '15 at 05:36

3 Answers3

19

The mishap is the use of strict less than.

With ten tosses, the only way to get a proportion-of-heads outcome strictly between 0.4 and 0.6 is if you get exactly 5 heads. That has a probability of about 0.246 (${{_{10}}\choose{^5}}(\frac{_1}{^2})^{10}\approx 0.246$), which is about what your simulations (correctly) give.

If you include 0.4 and 0.6 in your limits, (i.e. 4, 5 or 6 heads in 10 tosses) the result has a probability of about 0.656, much as you expected.

Your first thought should not be a problem with the random number generator. That kind of problem would have been obvious in a heavily used package like Mathematica long before now.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
8

Some comments about the code you wrote:

  • You defined experiment[n_] but never used it, instead repeating its definition in trialheadcount[n_].
  • experiment[n_] could be much more efficiently programmed (without using the built-in command BinomialDistribution) as Total[RandomInteger[{0,1},n]/n and this would also make X unnecessary.
  • Counting the number of cases where experiment[n_] is strictly between 0.4 and 0.6 is more efficiently accomplished by writing Length[Select[Table[experiment[10],{10^6}], 0.4 < # < 0.6 &]].

But, for the actual question itself, as Glen_b points out, the binomial distribution is discrete. Out of 10 coin tosses with $x$ observed heads, the probability that the sample proportion of heads $\hat p = x/10$ is strictly between 0.4 and 0.6 is actually just the case $x = 5$; i.e., $$\Pr[X = 5] = \binom{10}{5} (0.5)^5 (1-0.5)^5 \approx 0.246094.$$ Whereas, if you were to calculate the probability that the sample proportion is between 0.4 and 0.6 inclusive, that would be $$\Pr[4 \le X \le 6] = \sum_{x=4}^6 \binom{10}{x} (0.5)^x (1-0.5)^{10-x} = \frac{672}{1024} \approx 0.65625.$$ Therefore, you need only modify your code to use 0.4 <= # <= 0.6 instead. But of course, we could also write

Length[Select[RandomVariate[BinomialDistribution[10,1/2],{10^6}], 4 <= # <= 6 &]]

This command is approximately 9.6 times faster than your original code. I imagine someone even more proficient than I am at Mathematica could speed it up even further.

heropup
  • 5,006
  • 1
  • 16
  • 25
  • 2
    You can speed up your code by another factor of 10 by using `Total@Map[Counts@RandomVariate[BinomialDistribution[10, 1/2], 10^6], {4, 5, 6}]`. I suspect `Counts[]`, being a built-in function, is highly optimized in comparison to `Select[]`, which has to work with arbitrary predicates. – David Zhang Feb 02 '15 at 14:03
1

Doing Probability Experiments in Mathematica

Mathematica offers a very comfortable framework to work with probabilities and distributions and -- while the main issue of appropriate limits has been addressed -- I would like to use this question to make this clearer and maybe useful as a reference.

Let's simply make the experiments repeatable and define some plot options to fit our taste:

SeedRandom["Repeatable_151115"];
$PlotTheme = "Detailed";
SetOptions[Plot, Filling -> Axis];
SetOptions[DiscretePlot, ExtentSize -> Scaled[0.5], PlotMarkers -> "Point"];

Working with parametric distributions

We can now define the asymptotical distribution for one event which is the proportion $\pi$ of heads in $n$ throws of a (fair) coin:

distProportionTenCoinThrows = With[
    {
        n = 10, (* number of coin throws *)
        p = 1/2 (* fair coin probability of head*)
    },
    (* derive the distribution for the proportion of heads *)
    TransformedDistribution[
        x/n,
        x \[Distributed] BinomialDistribution[ n, p ]
    ];

With[
    {
        pr = PlotRange -> {{0, 1}, {0, 0.25}}
    },
    theoreticalPlot = DiscretePlot[
        Evaluate @ PDF[ distProportionTenCoinThrows, p ],
        {p, 0, 1, 0.1},
        pr
    ];
    (* show plot with colored range *)
    Show @ {
        theoreticalPlot,
        DiscretePlot[
            Evaluate @ PDF[ distProportionTenCoinThrows, p ],
            {p, 0.4, 0.6, 0.1},
            pr,
            FillingStyle -> Red,
            PlotLegends -> None
        ]
    }
]

Which gives us the plot of the discrete distribution of proportions: TheoreticalDistributionPlot

We can use the distribution immediately to calculate probabilities for $Pr[\,0.4 \leq \pi \leq 0.6\, |\,\pi \sim B(10,\frac{1}{2})]$ and $Pr[\,0.4 < \pi < 0.6\, |\,\pi \sim B(10,\frac{1}{2})]$:

{
    Probability[ 0.4 <= p <= 0.6, p \[Distributed] distProportionTenCoinThrows ],
    Probability[ 0.4 < p < 0.6, p \[Distributed] distProportionTenCoinThrows ]
} // N

{0.65625, 0.246094}

Doing Monte Carlo Experiments

We can use the distribution for one event to repeatedly sample from it (Monte Carlo).

distProportionsOneMillionCoinThrows = With[
    {
        sampleSize = 1000000
    },
    EmpiricalDistribution[
        RandomVariate[
            distProportionTenCoinThrows,
            sampleSize
        ]
    ]
];

empiricalPlot = 
    DiscretePlot[
        Evaluate@PDF[ distProportionsOneMillionCoinThrows, p ],
        {p, 0, 1, 0.1}, 
        PlotRange -> {{0, 1}, {0, 0.25}} , 
        ExtentSize -> None, 
        PlotLegends -> None, 
        PlotStyle -> Red
    ]
]

EmpirialDistributionPlot

Comparing this with the theoretical/asymptotical distribution shows that everthing pretty much fits in:

Show @ {
   theoreticalPlot,
   empiricalPlot
}

ComparingDistributions

gwr
  • 256
  • 5
  • 16
  • You can find a similar post with more background information with regard to *Mathematica* on [Mathematica SE](http://mathematica.stackexchange.com/questions/98834/how-to-model-rolling-a-pair-of-dice-100-times-monte-carlo/98913#98913). – gwr Nov 15 '15 at 18:05