1

I quote the whole related documents about Bayesian theory, and I am confused by a specific part "from the Bayesian viewpoint there is only a single data set D (namely the one that is actually observed)". My question is, suppose we have a few choices for values of parameter w, but we only have one observation of data set D, and if data set D is limited and bias, there is possible no co-existence of some possible value of w and D together, when how do we calculate in such case P(D|w)? Thanks.

For example, if we want to calculate P(D = win lottery | w = women) and in observed data set D, there is no data about women, how do we going to do? Treating it as zero posterior (P(w = woman| D = win lottery)) seems not very perfect?

enter image description here enter image description here enter image description here

regards, Lin

Lin Ma
  • 227
  • 3
  • 14

1 Answers1

4

Using Bayes theorem is not the same as using Bayesian statistics. You are mixing two different things.

If you knew what is the conditional probability of person's gender given his luck in lottery $\Pr(\text{gender} \mid \text{win})$ and the unconditional probability distribution of winning $\Pr(\text{win})$, then you could apply Bayes theorem to compute $\Pr(\text{win} \mid \text{gender})$. Notice that I did not use anywhere here terms such as prior, likelihood, or posterior, since they have nothing to do with such problems. (You could use naive Bayes classifier for such problems, but first is is not Bayesian since it does not use priors, and second you have insufficient data for it.)

As your quote mentioned, in Bayesian approach we have prior, likelihood and posterior. Likelihood is a conditional distribution of data given some parameter. Prior is distribution of this parameter that you assume a priori before seeing the data. Posterior is the estimate given the data you have and your prior.

To give concrete example illustrating it, let's assume that you have data about some coin since you threw it once and observed a head, let's call it $X$. Obviously, $X$ follows Bernoulli distribution parametrized by some parameter $p$ that is unknown and we want to estimate it. We do not know what is $p$, but we have likelihood function $f(X \mid p)$, that is probability mass function of Bernoulli distribution over $X$ parametrized by $p$. To learn about $p$ Bayesian way, we assume prior for $p$. Since we have no clue what $p$ could be, we can decide to use weekly informative "uniform" Beta(1,1) prior. So out model becomes

$$ X \sim \mathrm{Bernoulli}(p) \\ p \sim \mathrm{Beta}(\alpha, \beta) $$

where $\alpha = \beta = 1$ are parameters of beta distribution. Since beta is conjugate prior for Bernoulli distribution, we can easily compute the posterior distribution of $p$

$$ p \sim \mathrm{Beta}(\alpha + 1, \beta) $$

and it's expected value

$$ E(p \mid X) - \frac{\alpha + 1}{\alpha + \beta + 1} = 0.66 $$

so given the data we have and assuming Beta(1,1) prior, expected value of $p$ is $0.66$.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks Tim for the details and vote up for your comprehensive reply. I studied the Bayes Theorem of the wikipedia link carefully and especially the cancel/age example. I am confused by one statement, "The probability that a person with cancer is 65 years old. Suppose it is 0.5%", I am wondering whether it means `P(has cancel | age = 65) = 0.5%` or it means `P(age = 65 | has cancel) = 0.5%`? The statement is a bit confusing even after reading a few times. I think it is the latter, and it will be great if you could help to confirm. :) – Lin Ma Aug 20 '16 at 05:29
  • BTW, in your example, in order to calculate `Pr(gender∣win)`, I think simply know `Pr(win)` and `Pr(win∣gender)` might not be enough? We should also know `P(gender)`? Thanks. – Lin Ma Aug 20 '16 at 05:32
  • 1
    @LinMa $ p(a|b) = \frac{ p(b|a)p(a) }{ \sum p(b|a)p(a) } $ – Tim Aug 20 '16 at 06:15
  • Thanks Tim, vote up. I think from your formula above, you need to know both `P(a)` and `P(b)` (`P(b)` is the result of denominator part), it means we need to know both `P(win)` and `P(gender)`, correct? If I mis-read your intention, please feel free to correct me. – Lin Ma Aug 20 '16 at 07:17
  • 1
    @LinMa as you can see from the formula for Bayes theorem, p(b) is not mentioned anywhere. – Tim Aug 20 '16 at 08:21
  • Hi Tim, Thanks and vote up. I may mis-read what do you mean "anywhere", but `P(b)` do mentioned as denominator (https://en.wikipedia.org/wiki/Bayes%27_theorem), see blue neon sign. If I mis-read your points, please correct me. – Lin Ma Aug 20 '16 at 20:35
  • BTW, Tim, just curious why you call `Beta(1,1)` as weekly informative "uniform"? :) – Lin Ma Aug 20 '16 at 20:40
  • 1
    @LinMa it's $p(b)$ understood as $p(b) = \sum p(b|a)p(a)$. And Beta(1,1) is "uninformative" since it's flat, i.e. a priori you assume any value of $p$ as equally likely. – Tim Aug 20 '16 at 21:46
  • Thanks Tim, vote up. In your reply, you mentioned $p∼Beta(α,β)$ and also mentioned $p∼Beta(α+1,β)$, confused you mentioned p twice but different distribution? If you could clarify, it will be great. – Lin Ma Aug 21 '16 at 00:07
  • 1
    @LinMa in first case it's prior, in second posterior. If it's still unclear maybe you should start with some handbook on Bayesian statists? – Tim Aug 21 '16 at 06:48
  • Thanks Tim, I make some study today. And last question is, and what do you think Bayesian inference should be used instead of Bayes Theorem when calculating posterior? My intuition is, if I know likelihood `P(D|w)` and prior `P(w)` and also `P(D)`, I should use Bayes Theorem directly, and why bothering using Bayesian and dealing with additional computing with `beta` priors? – Lin Ma Aug 22 '16 at 04:22
  • 1
    @LinMa so what is your likelihood and what is your prior? I just gave simple example, you can use any distributions that make sense and likelihood and prior. – Tim Aug 22 '16 at 05:06
  • Thanks Tim, suppose for the simple toss coin example, I know want to calculate P(real good weather | Tim say good weather) = P(Tim say good weather | real good weather) * P (real good weather) / P (Tim say good whether) using Bayes Theorem. If I know P(Tim say good weather | real good weather), and know P (real good weather) and also know P (Tim say good whether), wondering why I need to deal with Beta distribution calculation? I can use the 3 values I know directly to calculate P(real good weather | Tim say good weather). – Lin Ma Aug 22 '16 at 05:26
  • 1
    @LinMa you should really start with some handbook and reading my answer carefully. You are talking about using Bayes theorem *not* about Bayesian statistics. Your example is not dealing with estimating any unknown parameetrr. You are misunderstanding the very basic idea of Bayesian statistics and I'm afraid I cannot help. – Tim Aug 22 '16 at 06:02
  • Thanks Tim for the help, I mark your reply answer. Actually I do not aware differences between Bayes theorem and Bayesian statistics at the beginning, but you teach me. Actually my last question is, if I know P(Tim say good weather | real good weather), and know P (real good weather) and also know P (Tim say good whether), whether I should use Bayes theorem directly. I think the answer is yes, correct? – Lin Ma Aug 23 '16 at 04:18
  • 1
    @LinMa the answer is yes. Bayes theorem is a general theorem that shows how to work with probabilities. In Bayesian statistics we *use* Bayes theorem to estimate unknown parameters using priors and data. In your both examples (in question and comments) there is no priors. – Tim Aug 23 '16 at 07:54
  • Thanks Tim, your answer is better than books and professors. More practical. :) – Lin Ma Aug 24 '16 at 03:59