32

There is a person behind a curtain - I do not know whether the person is female or male.

I know the person has long hair, and that 90% of all people with long hair are female

I know the person has a rare blood type AX3, and that 80% of all people with this blood type are female.

What is the probability the person is female?

NOTE: this original formulation has been expanded with two further assumptions: 1. Blood type and hair length are independent 2. The ratio male:female in the population at large is 50:50

(The specific scenario here is not so relevant - rather, I have an urgent project that requires I get my mind around the correct approach for answering this. My gut feel is that it's a question of simple probability, with a simple definitive answer, rather than something with multiple debatable answers according to different statistical theories.)

whuber
  • 281,159
  • 54
  • 637
  • 1,101
ProbablyWrong
  • 333
  • 3
  • 7
  • 1
    There are not multiple theories of probability, but it is notoriously true that people have difficulties thinking correctly about probabilities. (Augustus DeMorgan, a good mathematician, gave up the study of probability due to its difficulties.) Don't look at debates: look for appeals to principles of probability (such as the Kolmogorov axioms). Don't let this be resolved democratically: your question is attracting many ill-conceived answers which, even if some of them happen to agree, are merely collectively wrong. @Michael C gives good guidance; my reply tries to show you why he's right. – whuber Jun 22 '12 at 03:46
  • @Whuber, if independence is assumed, would you agree that 0.97297 is the correct answer? (I believe that the answer might be anywhere between 0% and 100% without this assumption - your diagrams show this nicely). – ProbablyWrong Jun 22 '12 at 04:53
  • Independence of what, precisely? Are you suggesting that female and male hairstyles are the same? As you say in your question, this particular scenario involving gender/hair/blood type may not be relevant: that tells me you seek to understand how to solve problems like this in general. To do that you will need to know which assumptions imply which conclusions. Thus you need to focus very carefully on the assumptions you are willing to make and determine exactly how much they allow you to conclude. – whuber Jun 22 '12 at 13:07
  • 3
    The kind of independence to explore concerns the combination of all three characteristics. E.g., if AX3 is a marker for a syndrome that includes baldness in females (but not in males), then any long-haired person with AX3 is necessarily male, making the probability of being female 0%, not 97.3%. I hope this makes it obvious that anybody producing a definite answer to this question *must* be making additional assumptions, even if they do not explicitly acknowledge them. The truly useful answers, IMHO, would be those that show directly how different assumptions lead to different results. – whuber Jun 22 '12 at 14:31
  • @Whuber: Your original answer assumes "that blood type and hair length are independent ... a fair and natural assumption to make when addressing such questions". Your conclusion that the question can't be answered definitely relies on this assumed independence somehow failing when we look at one gender. In your opinion, answers that fail to state the further assumption (that independence of blood type and hair length holds within gender) are "ill-conceived" and "collectively wrong". – ProbablyWrong Jun 23 '12 at 01:35
  • Consider this problem. A person rolls a fair dice, then tosses a fair coin. We are told that the outcomes of the dice roll and the coin toss are independent. A woman rolls a 3 with her fair dice, then tosses her fair coin. What is the probability that she tosses a head? If I answer "50%", is this ill-conceived/wrong, because I've failed to state my further assumption - that independence of dice roll and coin toss holds for women? – ProbablyWrong Jun 23 '12 at 01:36
  • More generally, if we are told (or if we assume) that two events are independent, do we really have to question whether this always holds? For instance, if I am told that the outcomes of a dice roll and a coin toss are independent, do I have to worry about this not being the case when the coin is tossed by a woman? Or when it is tossed at 3pm instead of 4pm? – ProbablyWrong Jun 23 '12 at 02:08
  • That is correct: you do not. So in applying what you learn here to the problems in which you are really interested, you will need to give some thought to such questions of independence and whether assumptions of independence can be justified or perhaps need testing. – whuber Jun 23 '12 at 15:04
  • 2
    You're missing the probability that a female *doesn't* have long hair. That's a critical measure. – Daniel R Hicks Jul 03 '12 at 19:54

8 Answers8

35

Many people find it helpful to think in terms of a "population," subgroups within it, and proportions (rather than probabilities). This lends itself to visual reasoning.

I will explain the figures in detail, but the intention is that a quick comparison of the two figures should immediately and convincingly indicate how and why no specific answer to the question can be given. A slightly longer examination will suggest what additional information would be useful for determining an answer or at least obtaining bounds on the answers.

Venn diagram

Legend

Cross-hatching: female / Solid background: male.

Top: long-haired / Bottom: short-haired.

Right (and colored): AX3 / Left (uncolored): non-AX3.

Data

Top cross-hatching is 90% of the top rectangle ("90% of all people with long hair are female").

Total cross-hatching in the right colored rectangle is 80% of that rectangle ("80% of all people with this blood type are female.")

Explanation

This diagram shows schematically how the population (of all females and non-females under consideration) can simultaneously be partitioned into females/non-females, AX3/non-AX3, and long haired/non-long haired ("short"). It uses area, at least approximately, to represent proportions (there's some exaggeration to make the picture clearer).

It is evident that these three binary classifications create eight possible groups. Each group appears here.

The information given states that the upper cross-hatched rectangle (long-haired females) comprises 90% of the upper rectangle (all long-haired people). It also states that the combined cross-hatched parts of the colored rectangles (long-haired females with AX3 and short-haired females with AX3) comprise 80% of the colored region at the right (all people with AX3). We are told that someone lies in the upper right corner (arrow): long-haired people with AX3. What proportion of this rectangle is cross-hatched (female)?

I have also (implicitly) assumed that blood type and hair length are independent: the proportion of the upper rectangle (long hair) that is colored (AX3) equals the proportion of the lower rectangle (short hair) that is colored (AX3). That's what independence means. It is a fair and natural assumption to make when addressing such questions like this, but of course it needs to be stated.

The position of the upper cross-hatched rectangle (long-haired females)is unknown. We can imagine sliding the top cross-hatched rectangle side-to-side and sliding the bottom cross-hatched rectangle side-to-side and possibly changing its width. If we do this so that 80% of the colored rectangle remains cross-hatched, such an alteration will change none of the stated information, yet it can alter the proportion of females in the upper right rectangle. Evidently the proportion could be anywhere between 0% and 100% and still be consistent with the information given, as in this image:

Figure 2


One strength of this method is it establishes the existence of multiple answers to the question. One could translate all this algebraically and, by means of stipulating probabilities, offer specific situations as possible examples, but then the question would arise whether such examples are really consistent with the data. For instance, if someone were to suggest that perhaps 50% of long-haired people are AX3, at the outset it is not evident that this is even possible given all the information available. These (Venn) diagrams of the population and its subgroups make such things clear.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 3
    Whuber, assuming that blood type and hair length are independent, then surely the portion of long haired women with type AX3 should be the same as the portion of short haired women with AX3? I.e. you don't have flexibility to shift rectangles in the way you propose... If we assume also that men and women are 50:50 in the whole population, doesn't that give us enough info to solve this question with a single indisputable answer? – ProbablyWrong Jun 21 '12 at 06:40
  • @whuber +1 very nice. – Michael R. Chernick Jun 21 '12 at 10:42
  • @whuber Fantastic graphical explanation – Ubermensch Jun 21 '12 at 13:50
  • 5
    ProbablyWrong, take a close look at the question in your comment: because it deals with *women*, it is making an additional assumption about independence *conditional* on gender. The assumption of (unconditional) independence of hair and blood type does not mention gender at all, so to understand what it means, *erase the cross-hatching from the figures.* This, I hope, indicates why we have the flexibility to situate the cross-hatching wherever we like within the upper and lower rectangles. – whuber Jun 21 '12 at 14:46
  • @whuber: The outcome is not flexible with ProbablyWrong's added constraints, but is fixed at 36:1. Also, with your original answer's assumption of independence of probabilities, the cross-hatching was only flexible by changing the ratio of men to women, as the final ratio is fixed at (36*m/f) females for every male of those with long hair and AX3 under that assumption. – Briguy37 Jun 21 '12 at 19:11
  • I can't really respond to that comment, @Briguy, because it is not clear what the "added constraints" are: the OP has now produced several comments (and a reply) with different sets of assumptions. Note that the proportion of females in the population does *not* change when sliding the two cross-hatched rectangles, yet that can still change the answer. (The 50:50 assumption was made in an edit long after I posted this reply.) – whuber Jun 21 '12 at 19:28
  • @whuber: To clarify, the added constraints are that the probabilities are independent of each other (self-imposed in your initial answer and included later in his updated question) and that the gender distribution is 50:50 (imposed in his last question update and mentioned in his earlier comment to you). Also, if you slide the cross-hatch without changing the gender ratio, you are removing the fact that the probabilities are independent of each other (please see [my updated answer](http://stats.stackexchange.com/a/30880/12128) for the calculation of why this relationship is fixed). – Briguy37 Jun 21 '12 at 19:49
  • I think you may be missing a subtle point, @Briguy: the OP, in his original comment to my reply, did *not* stipulate mutual independence of all three attributes. In fact, I do not yet see such an assumption even in the edited question. The independence of hair length and blood type does not imply their independence conditional on gender (which is an assumption you implicitly make in order to carry out your updated solution). – whuber Jun 21 '12 at 19:53
  • 1
    @whuber, I like this. However, I have 2 questions / clarifications: 1. the figures seem to assume population proportions for long vs short hair (about 6:4) & ~AX3 vs AX3 (about 85:15), but this is not mentioned in the original question nor discussed in your explanations of the figures. I suspect the pop proportions are not relevant. Am I right / could you clarify that in the explanations? 2. I think this situation is ultimately working w/ the same phenomenon as *Simpson's Paradox*, only framed differently (coming at the issue from the other direction, as it were). Is that a fair assessment? – gung - Reinstate Monica Jun 27 '12 at 16:25
  • 3
    @gung, thank you for making those clarifications. The figures of course *must* represent some proportions in order to work at all, but any proportions not specifically pinned down in the problem statement are free to vary. (I did construct the figure so that about 50% of the population appears female, anticipating a later edit in which this was assumed.) The idea of applying this graphical representation to understanding Simpson's Paradox is intriguing; I think it has merit. – whuber Jun 27 '12 at 16:34
13

This is a question of conditional probability. You know that the person has long hair and blood type Ax3 . Let$$\ \ \ \ \ A =\{\text{'The person has long hair'}\}\\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ B = \{\text{'The person has blood type Ax3'}\} \\ C =\{\text{'The person is female'}\}.$$
So you seek $P(C|A\ \text{and}\ B)$. You know that $P(C|A)=0.9$ and $P(C|B)=0.8$.
Is that enough to calculate $P(C|A\ \text{and}\ B)$? Suppose $P(A\ \text{and}\ B\ \text{and}\ C)=0.7$. Then $$P(C|A\ \text{and}\ B)=P(A\ \text{and}\ B\ \text{and}\ C)/ P(A\ \text{and}\ B)=0.7/P(A\ \text{and}\ B).$$ Suppose $P(A\ \text{and}\ B)=0.8$. Then, by the above, $P(C|A\ \text{and}\ B)=0.875$. On the other hand if $P(A\ \text{and}\ B)=0.9$ we would then have $P(C|A\ \text{and}\ B)$=0.78.

Now both are possible when $P(C|A)=0.9$ and $P(C|B)=0.8$. So we can't tell for sure what $P(C|A\ \text{and}\ B)$ is.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • Hi Michael, If I read you correctly, you're saying the question as posed can't be answered, is that right? Or to put it another way, you'd need more information to answer this question? 1. Let's assume that the rare blood type in my original question doesn't have any impact on a person's desire or ability to grow their hair long. Can the question now be answered? 2. Would you agree that the answer must be GREATER than 0.9? (Because you have a second piece of independent information - blood type - that reinforces the hypothesis that the person is a female) – ProbablyWrong Jun 21 '12 at 03:30
  • 2
    If $P(A\text{ and }B)$ is independant, then $P(A\text{ and }B)=P(A)P(B)$ and you'll need to specify what fraction of persons have long hair, i.e., $P(A)$ and what fraction of persons have blood type Ax3, i.e., $P(B)$. Also, you can't say that the answer must be greather than 0.9, which is equivalent to stating that $P(C|A\text{ and }B)>0.9$ (I really don't see why). – Néstor Jun 21 '12 at 07:39
  • 2
    @ProbablyWrong. Yes the problem as initially stated has insufficient information for a unique answer. – Michael R. Chernick Jun 21 '12 at 10:04
  • @Néstor, Micahael, I disagree that we need to know what fraction of persons have long hair, or what fraction of persons have blood type AX3. I think the answer to the original question resolves uniquely without knowing these (assuming A and B are independent, which we all have, and assuming we know the split of men and woman in the whole population - not unreasonable to suppose that's about 50:50, I think). – ProbablyWrong Jun 21 '12 at 10:42
  • 7
    Why does $$P(C|A\ \text{and}\ B)=P(A\ \text{and}\ B\ \text{and}\ C)\times P(A\ \text{and}\ B)??$$ I thought that $$P(C|A\cap B)=\frac{P(C \cap (A \cap B))}{P(A \cap B)}=\frac{P(A\cap B\cap C)}{P(A\cap B)}$$ using the definition of conditional probability. – Dilip Sarwate Jun 21 '12 at 11:13
  • Sorry Dilip, I meant to divide. I will correct the answer. – Michael R. Chernick Jun 21 '12 at 12:30
4

Fascinating discussion ! I am wondering if we specified P(A) and P(B) as well whether the ranges of P(C| A,B) will not be much narrower than the full interval [0,1], simply because of the many constraints we have.

Sticking to the notation introduced above:

A = the event that the person has long hair

B = the event that the person has blood type AX3

C = the event that person is female

P(C|A) = 0.9

P(C|B) = 0.8

P(C) = 0.5 (i.e. let's assume an equal ratio of men and women in the population at large)

it does not seem possible to assume that events A and B are conditionally independent given C ! That leads directly to a contradiction: if $P(A \wedge B | C) = P(A| C) \cdot P(B| C) = P(C| A) \frac{P(A)}{P(C)} \cdot P(C| B) \frac{P(B)}{P(C)}$

then

$P(C| A \wedge B ) = P(A \wedge B | C) \cdot \left( \frac{P(C)}{P(A \wedge B)} \right) = P(C| A) \frac{P(A)}{P(C)} \cdot P(C| B) \frac{P(B)}{P(C)} \cdot \left( \frac{P(C)}{P(A \wedge B)} \right) $

If we now assume that A and B are independent as well: $P(A \wedge B) = P(A) P(B)$ most terms cancel and we end up with

$P(C| A \wedge B ) = \frac{P(C| A) \cdot P(C| B)}{P(C)} = \frac{0.9 \cdot 0.8}{0.5} > 1$

Following up on whuber's wonderful geometric representation of the problem: While it is true that generally speaking $P(C | A \wedge B)$ can assume any value in the interval $[0,1]$ the geometric constraints do narrow the range of possible values significantly for values of $P(A)$ and $P(B)$ that are not "too small". (Though we can also upper bound the marginals: $P(A)$ and $P(B)$)

Let us compute the {\bf smallest possible value} for $P(C | A \wedge B)$ under the following geometric constraints:

1. The fraction of the upper area (A TRUE) covered by the upper rectangle must be equal to $P(C|A)=0.9$

2. The sum of the areas of the two rectangles must be equal to $P(C)=0.5$

3. The sum of the fraction of the areas of the two colored rectangles (i.e. their overlap with event B) must be equal to $P(C|B)=0.8$

4. (trivial) The upper rectangle cannot be moved beyond the left boundary and should not be moved beyond its minimum overlap to the left.

5. (trivial) The lower rectangle cannot be moved beyond the right boundary and should not be moved beyond its maximum overlap to the right.

These constraints limit how freely we can slide the hashed rectangles and in turn generate lower bounds for $P(C | A \wedge B)$. The figure below (created with this R script ) shows two examples enter image description here

Running through a range of possible values for P(A) and P(B) (R script) generates this graph enter image description here

In conclusion, we can lower bound the conditional probability P(c|A,B) for given P(A), P(B)

Markus Loecher
  • 788
  • 3
  • 11
  • 2
    Markus, the first paragraph belongs as a separate question rather than within an answer. The subsequent material looks like a good observation but it is hard to follow without being told what $A, B,$ and $C$ represent. Please bear in mind that different users will see the answers in different sequences, according to their preferences and when the answers were last edited, so each answer has to be readable independently of the others (although of course you can link to other answers). – whuber Jul 03 '12 at 14:34
  • 1
    @whuber: thanks for the useful comment ! I hope the new edits make it more readable and clear. – Markus Loecher Jul 03 '12 at 18:36
  • @whuber and others: I had hoped to reignite the discussion but the thread seems to have gone inactive ? No more comments by anyone ? – Markus Loecher Jul 08 '12 at 20:36
1

Make the hypotheses is that the person behind a curtain is a woman.

We area given 2 pieces of evidence, namely:

Evidence 1: We know the person has long hair (and we're told that 90% of all people with long hair are female)

Evidence 2: We know the person has a rare blood type AX3 (and we're told that 80% of all people with this blood type are female)

Given just Evidence 1, we can state that the person behind a curtain has a 0.9 probability value of being a woman (assuming 50:50 split between men and women).

Regarding the question posed earlier in the thread, namely "Would you agree that the answer must be GREATER than 0.9?", without doing any Math, I would say intuitively, the answer must be "yes" (it is GREATER than 0.9). The logic is that Evidence 2 is supporting evidence (again, assuming a 50:50 split for the number of men and women in the world). If we were told that 50% of all people with AX3 type blood were female, then Evidence 2 would be neutral and have no bearing. But since we're told that 80% of all people with this blood type are female, Evidence 2 is supporting evidence and logically should push the final probability of a woman above 0.9.

To calculate a specific probability, we can apply Bayes' rule for Evidence 1 and then use Bayesian updating to apply Evidence 2 to the new hypothesis.

Suppose:

A = the event that the person has long hair

B = the event that the person has blood type AX3

C = the event that person is female (assume 50%)

Applying Bayes rule to Evidence 1:

P(C|A) = (P(A|C) * P(C)) / P(A)

In this case, again if we assume 50:50 split between men and women:

P(A) = (0.5 * 0.9) + (0.5 * 0.1) = 0.5

So, P(C|A) = (0.9 * 0.5) / 0.5 = 0.9 (Not surprising, but it would be different if we didn't have 50:50 split between men and women)

Using Bayesian updating to apply Evidence 2 and plugging in 0.9 as the new prior probability, we have:

P(C|A AND B) = (P(B|C) * 0.9) / P(E)

Here, P(E) is the probability of Evidence 2, given the hypotheses that the person already has a 90% chance of being female.

P(E) = (0.9 * 0.8) + (0.1 * 0.2) [this is law of total probability: (P(woman)*P(AX3|woman) + P(man)*P(AX3|man)] So, P(E) = 0.74

So, P(C|A AND B) = (0.8 * 0.9) / 0.74 = 0.97297

  • 1
    There are a few statements in your answer that do not make sense to me. (1) P(C|A)=0.9 by assumption. Nowhere was it said that P(C)=0.9. We assumed P(C)=0.5. (2) How did you get the result for P(E)? P(woman)=P(man)=0.5 by assumption where you write P(woman)=0.9. – Michael R. Chernick Jun 21 '12 at 11:39
  • The value of P(C) is assumed at 0.5, which is what I've used. The value for P(E) is the probability of Evidence 2 after applying Evidence 1 (which leads to a new hypotheses that the probability that the person is female is 0.9). P(E) = (probability that the person is a woman (given Evience 1) * probability the the person has AX3 if a woman) + (probability that the person is a man (given Evience 1) * probability the the person has AX3 if a man) = (0.9 * 0.8) + (0.1 * 0.2) = 0.74 – RandomAnswer Jun 21 '12 at 14:42
  • Your definition of probability of E is a bit confusing and the terms you are using to calculate it look different from what you wrote before. It really doesn't matter though. The answer is apparently correct based on Huu's nicely presented answer. – Michael R. Chernick Jun 21 '12 at 14:57
  • @Michael Except it appears Huu made mistakes. – whuber Jun 21 '12 at 15:12
  • I didn't notice the mistakes. What are they? Is it because of the use of P(A|C) instead of P(C|A)? – Michael R. Chernick Jun 21 '12 at 16:04
  • 2
    This answer is simply wrong. There may be other errors, but this one is glaring. You state a definitive answer for P("Has Long Hair") (your P(A)), and then use that to give your final definitive answer. There simply isn't enough information to determine this, even assuming P(F) = 0.5. Your line to calculate P(A) seems to come from nowhere. Here is the correct formula using Bayes theroem: P(A) = P(A|F)P(F)/P(F|A) from which, using your stated assumptions, get to P(A) = P(A|F)*5/9. However we still don't know P(A|F), which could be anything. – Bogdanovist Jun 22 '12 at 04:04
0

Question Restatement and Generalisation

$A$, $B$, and $C$ are binary unknowns whose possible values are $0$ and $1$. Let $Z_i$ stand for the proposition, "The value of $Z$ is $i$". Also let $(X | Y)$ stand for "The probability that $X$, given that $Y$". What is $(A_a | B_b C_c I)$, given that

  1. $(A_{a_1} | B_{b_1} I) = u_1$ and $(A_{a_2} | C_{c_2} I) = u_2$
  2. $(A_{a_1} | B_{b_1} I) = u_1$ and $(A_{a_2} | C_{c_2} I) = u_2$ and $(B C | I) = (B | I)(C | I)$
  3. $(A_{a_1} | B_{b_1} I) = u_1$ and $(A_{a_2} | C_{c_2} I) = u_2$ and $(A_0 | I) = \frac{1}{2}$
  4. $(A_{a_1} | B_{b_1} I) = u_1$ and $(A_{a_2} | C_{c_2} I) = u_2$ and $(A_0 | I) = \frac{1}{2}$ and $(B C | I) = (B | I)(C | I)$

and that $I$ contains no relevant information besides what is implicit in the assignments? The last conjunct of conditions 2 and 4 is shorthand for the independence statement $$ (B_j C_k | I) = (B_j | I)(C_k | I) \quad , \quad j = 0, 1 \quad k = 0,1 $$ Treat each of the four cases in turn.

Answers

Case 1

We have to specify the distribution $(ABC | I)$. The problem is underdetermined, because $(ABC | I)$ requires eight numbers, but we have only three equations---the two given conditions and the normalisation condition.

It has been shown by various esoteric means that the distribution to assign when the information doesn't otherwise determine a solution is the one that, of all distributions consistent with the known information, has the greatest entropy. Any other distribution implies that we know more than the known information, which of course is a contradiction.

All we need to do, therefore, is assign the maximum entropy distribution. This is more easily said than done, and I have not found a general closed-form solution. But particular solutions can be found using a numerical optimiser. We maximise $$ - \sum_{i,j,k} (A_i B_j C_k | I) \ln (A_i B_j C_k | I) $$ subject to the constraints $$ \sum_{i,j,k} (A_i B_j C_k | I) = 1 $$ and $$ (A_{a_1} | B_{b_1} I) = u_1 \quad\quad \text{i.e.} \quad \frac{\sum\limits_k (A_{a_1} B_{b_1} C_k | I )}{\sum\limits_{i,k} (A_i B_{b_1} C_k | I)} = u_1 $$ and $$ (A_{a_2} | C_{c_2} I) = u_2 \quad\quad \text{i.e.} \quad \frac{\sum\limits_j (A_{a_2} B_j C_{c_2} | I)}{\sum\limits_{i,j} (A_i B_j C_{c_2} | I)} = u_2 $$ Now let's apply this to the question. If we have

  1. "The person is female" $\longleftrightarrow A_1$
  2. "The person has long hair" $\longleftrightarrow B_1$
  3. "The person has blood type AX3" $\longleftrightarrow C_1$

then $a = 1$, $b = 1$, $c = 1$, $a_1 = 1$, $b_1 = 1$, $a_2 = 1$, $c_2 = 1$, $u_1 = 0.9$, $u_2 = 0.8$, and we find that for the maximum entropy solution, $(A_1 | B_1 C_1 I) \simeq 0.932$. Therefore the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.932.

Case 2

Now we repeat the exercise with the extra constraint that for a given person, knowing the value of $B$ (the hair state) does not affect our estimate of the value of $C$ (the blood type state), and vice versa. Everything is the same as in Case 1, except there are two extra constraints in the optimisation, namely: \begin{align*} (B_0 | C_l I) &= (B_0 | I) \quad , \quad l = 0, 1 \\ \end{align*} i.e. \begin{align*} \frac{\sum\limits_i (A_i B_0 C_l | I)}{\sum\limits_{i,j} (A_i B_j C_l | I)} &= \sum_{i,k} (A_i B_0 C_k | I) \quad , \quad l = 0, 1 \end{align*} This gives $(A_1 | B_1 C_1 I) \simeq 0.936$, so the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.936.

Case 3

Now we remove the independence condition and replace it with the prior condition that there is an equal chance that a given person is male or female: $$ (A_0 | I) = \frac{1}{2} \quad \quad \text{i.e.} \quad \sum_{j,k} (A_0 B_j C_k | I) = \frac{1}{2} $$ This time $(A_1 | B_1 C_1 I) \simeq 0.973$, so the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.973.

Case 4

Finally we reintroduce the independence constraints of Case 2, and find that $(A_1 | B_1 C_1 I) \simeq 0.989$. Therefore the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.989.

CarbonFlambe
  • 423
  • 2
  • 7
-2

I believe now that, if we assume a ratio of men and women in the population at large, then there is a single indisputable answer.

A = the event that the person has long hair

B = the event that the person has blood type AX3

C = the event that person is female

P(C|A) = 0.9

P(C|B) = 0.8

P(C) = 0.5 (i.e. let's assume an equal ratio of men and women in the population at large)

Then P(C|A and B) = [P(C|A) x P(C|B) / P(C)] / [[P(C|A) x P(C|B) / P(C)] + [[1-P(C|A)] x [1-P(C|B)] / [1-P(C)]]]

in this case, P(C|A and B) = 0.972973

ProbablyWrong
  • 333
  • 3
  • 7
  • P[C|A and B)= P(A and B and C)/P(A and B)=P(A and B and C)/ [P(A|B) P(B)]. How did you get your formula? – Michael R. Chernick Jun 21 '12 at 10:12
  • There is probably a way to add conditions so that you get a unique answer. – Michael R. Chernick Jun 21 '12 at 10:19
  • To add by independence of A and B the formula simplifies to P(A and B and C}/[P(A) P(B)]=P(B and C|A)/P(B). – Michael R. Chernick Jun 21 '12 at 10:22
  • I got my formula by working through an example in a spreadhseet, and realizing that the only parameters that mattered were 0.9, 0.8 and 0.5. I believe that the formula give a unique and correct answer, without adding any further conditions (beyond the assumption that A and B are independent) – ProbablyWrong Jun 21 '12 at 10:47
  • 2
    The intent of my question was really for you to justify the formula. I don't understand how it would be derived. – Michael R. Chernick Jun 21 '12 at 11:24
  • to comment directly on your simplified formula (Michael), I can't see how to apply this to get an actual answer to the question... – ProbablyWrong Jun 21 '12 at 11:33
  • I am not asserting that the formula leads to your result. You haven't convinced me that your result is true. What I am asserting is that my formula is correct. I still don't understand why yours would be. – Michael R. Chernick Jun 21 '12 at 11:42
  • I believe your formula is correct - however, I can't see how to apply it to get any result at all (to corroborate my result or otherwise). My formula gives an identical result to a further answer on this thread, one supported with "Bayes rule" - perhaps not 100% convincing, but a pretty big coincidence if it's wrong! – ProbablyWrong Jun 21 '12 at 11:50
  • 2
    No, the answer that supposedly used Bayes Rule is incorrect. I'm not sure why you are confused, MC's formula above is correct and cannot be used to get any result, that's what his and Whuber's answers to the question explained! – Bogdanovist Jun 22 '12 at 04:08
-2

Note: In order to get a definitive answer, the below answers assume that the probability of a person, a long-haired man, and a long-haired women having AX3 are approximately the same. If more accuracy is desired, this should be verified.

You start out with the knowledge that the person has long hair, so at this point the odds are:

90:10

Note: The ratio of males to females in the general population does not matter to us once we find out the person has long hair. For example, if there were 1 female in a hundred in the general population, a randomly-selected long-haired person would still be a female 90% of the time. The ratio of females to males DOES matter! (see the update below for details)

Next, we learn that the person has AX3. Because AX3 is unrelated to long hair, the ratio of men to women is known to be 50:50, and because of our assumption of the probabilities being the same, we can simply multiply each side of the probability and normalize so that the sum of the sides of the probability equals 100:

(90:10) * (80:20)
==> 7200:200

    Normalize by dividing each side by (7200+200)/100 = 74

==> 7200/74:200/74
==> 97.297.. : 2.702..

Thus, the chance that the person behind the curtain is female is approximately 97.297%.

UPDATE

Here's a further exploration of the problem:

Definitions:

f - number of females
m - number of males
fl - number of females with long hair
ml - number of males with long hair
fx - number of females with AX3
mx - number of males with AX3
flx - number of females with long hair and AX3
mlx - number of males with long hair and AX3
pfl - probability that a female has long hair
pml - probability that a male has long hair
pfx - probability that a female has AX3
pmx - probability that a male has AX3

First, we are given that 90% of long-haired people are females, and 80% of people with AX3 are female, so:

fl = 9 * ml
pfl = fl / f
pml = ml / m 
    = fl / (9 * m)

fx = 4 * mx
pfx = fx / f
pmx = mx / m 
    = fx / (4 * m)

Because we assumed that the probability of AX3 is independent of gender and long hair, our calculated pfx will apply to women with long hair, and pmx will apply to men with long-hair to find the number of them that likely have AX3:

flx = fl * pfx 
    = fl * (fx / f) 
    = (fl * fx) / f
mlx = ml * pmx 
    = (fl / 9) * (fx / (4 * m)) 
    = (fl * fx) / (36 * m)

Thus, the likely ratio of the number of females with long-hair and AX3 to the number of males with long-hair and AX3 is:

flx             :   mlx
(fl * fx) / f   :   (fl * fx) / (36 * m)
1/f             :   1 / (36m)
36m             :   f

Because it is given that there is an equal number of 50:50, you can cancel both sides and end with 36 females to every male. Otherwise, there are 36*m/f females for every male in the specified subgroup. For example, if there were twice as many women as men, there would be 72 females to each male of those that have long-hair and AX3.

Briguy37
  • 117
  • 4
  • 1
    This solution relies on assuming more than is currently stated in the problem: namely, that long hair, AX3, *and* gender are independent. Otherwise, you cannot justify "applying" pfx to women with long hair, etc. – whuber Jun 21 '12 at 19:35
  • @whuber: Yes, I do make that assumption. However, isn't the purpose of probability to give the best approximation based on the data that you have? Thus, since you already know that long-hair and AX3 are independent for the general population, you SHOULD carry forward that assumption to males and females until you explicitly learn otherwise. Granted, it is not a universally correct one, but it is the best one you can make until you get more info. Q: With only the current data, if you had to give the % chance that it was a woman behind the curtain, would you really say "between 0 and 100%"? – Briguy37 Jun 21 '12 at 21:44
  • 1
    We have an important difference in philosophy, @Briguy. I strongly believe in *not* making unfounded assumptions. It is not clear in what sense the mutual independence assumption is "best": I will grant it may be in certain applications. But in general, that seems dangerous to me. I would prefer being clear about the assumptions needed to solve a problem, so people can decide whether it is worthwhile collecting the data to check those assumptions, rather than assuming things that are mathematically convenient for the sake of obtaining an answer. That's the difference between stats and math. – whuber Jun 21 '12 at 23:14
  • To answer your question: yes, 0% - 100% is exactly the answer I would give. (I have given similar answers to comparable questions on this site.) That range accurately reflects the uncertainty. This issue is closely related to the [Ellsberg paradox](http://en.wikipedia.org/wiki/Ellsberg_paradox). Ellsberg's original paper is well written and clear: I recommend it. – whuber Jun 21 '12 at 23:16
  • @whuber: Thanks for taking the time to dialogue with me. I see your point about the importance of thinking through and listing the assumptions made, and have updated my answer accordingly. However, in regards to your answer, I believe it is incomplete. The reason for this is that you can consider all unknown cases and find the average probability of across all of them to arrive at your final answer. E.G. Though both are still possible, probabilities above 50% are much more prevalent than probabilities below 50% across all cases, so we are surely better off guessing that it is a woman. – Briguy37 Jun 22 '12 at 13:56
  • That's an interesting argument: it's exactly what Ellsberg was evaluating in his paper. The averaging assumes a uniform prior on the possibilities, but the setting of our question provides no basis for assuming such a prior. *Not all ignorance is, or should be, modeled by probability distributions!* For more on this you could check out my paper [Ignorance is Not Probability](http://onlinelibrary.wiley.com/doi/10.1111/j.1539-6924.2010.01361.x/abstract). – whuber Jun 22 '12 at 14:14
-4

98% Female, simple interpolation. First premise 90% female, leaves 10%, second premise only leaves 2% of the existing 10%, hence 98% female

xcythe
  • 1