(Failure) probability calculation

Question

I am working on mortality in 12 hospitals performing cardiac surgery in babies. The dataset is available here: Surg dataset. The dataset is structured in this way:

  n   r  hospital
 47   0     A
148  18     B
119   8     C
...  ...   ...

where $n$ is the number of operations and $r$ the number of deaths. My aim is to calculate the failure probability for each hospital $p_i$. The tutorial I am following (WinBUGS) reports that

$r_i \sim Binomial(p_i, n_i)$

but my question is: why $p_i$ can not be simply calculated as $p_i = r_i/n_i$?

I changed the text to clarify the question. The non-informative prior I'll use is, like in the tutorial, $p_i \sim Beta(1,1)$ — Fabio, Mar 04 '15 at 19:16
Cannot vote up the answer because my low reputation. The two links of Tim were very useful. The comment of jaradniemi is the answer that really clarified me the concept! — Fabio, Mar 04 '15 at 20:47
Are you conflating sample proportion ($\hat{p}_i=r_i/n_i$) with population proportion ($p_i$)? *In ancient times*, when I was a student these were denoted (following the more usual convention of Greek for parameters, Roman for variables) as $p_i$ and $\pi_i$, but $p$ came to be used for the population proportion in a bunch of US texts, and since then the old convention seems to have been dropped almost everywhere ... to much confusion, as we see here. — Glen_b, Mar 05 '15 at 03:07

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

You can use the formula you quote, but it seems you choose to use Bayesian estimation method that includes prior information in the statistical model. Those are two different ways of doing statistics. Check this question for learning more on what is Bayesian model.

Actually if you compute the likelihoods of different $p_i$'s you will find that they have the greatest peaks at $r_i/n_i$ points. On the plot below you see likelihood profiles for individual hospitals and the vertical lines are point estimates of $r_i/n_i$.

enter image description here

Notice that with using uninformative prior $\pi(\theta) = 1$ (or $Beta(1,1)$), then Bayesian estimates of $\pi(\theta|x) \propto f(x | \theta) \pi(\theta)$ are the same as with using likelihood-based approach, so the three approaches would lead to the same point estimates.

So using "something more" then $r_i/n_i$ is helpful for handling uncertainty of parameters as @jaradniemi noticed in comment. On another hand, more sophisticated approach could be used if (a) you wanted to build a hierarchical model where there is some general probability of failure and site-specyfic effects, or (b) you could use Bayesian approach to include some out-of-data information in your model as an informative prior.

In addition to this, $p_i$ is a parameter that has uncertainty and $r_i/n_i$ is an estimate for $p_i$ but almost surely is not the actual true value for $p_i$. Thus we use statistical approaches to describe our uncertainty about $p_i$. — jaradniemi, Mar 04 '15 at 20:07

score 0 · Answer 2 · answered Mar 04 '15 at 20:51

0

As @jaradniemi pointed out, $r_i/n_i$ are just estimates of the $p_i$. The bayesian model sets up the framework to handle the uncertainty of the $p_i$

answered Mar 04 '15 at 20:51

Fabio

151
7

(Failure) probability calculation

2 Answers2