Likelihood in Bayesian inference: p(x|theta, I) = p(x| I)?

Question

In page 164 of the book “Probability theory: the logic of science” the author says that:

$$ p(D|\theta I) = \prod_{i=1}^{n} p(x_i|\theta I) = \theta^r(1-\theta)^{n-r} $$

$ \theta $, in this equation, represents the proposition:

$$ \theta = p(x_i = 1 | I), \forall i $$

How is $ p(x_i = 1|\theta I) $ equal to $ \theta $ when it is clearly not the same as $ p(x_i = 1 | I) $?

Edit:

This thread helped me a lot - How is data generated in the Bayesian framework and what is the nature on the parameter that generates the data?

Hi, I don't agree that $\theta$ represents that proportion. I also don't see that claim supported in the text. — Arya McCarthy, Jul 19 '21 at 15:55
@AryaMcCarthy. Sorry, I mean proposition not proportion. I’ve edited my question — , Jul 19 '21 at 16:00

score 2 · Accepted Answer · answered Jul 19 '21 at 16:44

2

It doesn’t say what you’re implying. The likelihood is a binomial distribution

$$ \prod_i p(x_i | \theta I) = \theta^r (1-\theta)^{n-r} $$

He uses fancy notation $\theta I$ where the indicator $I$ means that for each $x_i$ we use the same value of $\theta $. It does not specify the prior distribution for $\theta $ itself, i.e. $p(\theta)$.

answered Jul 19 '21 at 16:44

Tim

108,699
20
212
390

Thank you for your reply. I believe $ I $ refers to the prior information that we have about the problem. In the page that I have referenced in my question, the author writes “The prior information $ I $ species that there is parameter $ \theta $ …” Also in page 140 of the same book, the author says “With only prior information $ I $, we assign a probability $ P(A|I) $ for $A$.” I’m quite sure this is what the author meant as he uses similar notation in many other places in the book – Jul 19 '21 at 18:16
1

@HaziqMuhammad it’s an indicator https://en.wikipedia.org/wiki/Indicator_function one for relevant cases, zero otherwise. – Tim Jul 19 '21 at 19:13
2

@HaziqMuhammad: $I$ is a prior information, but it just says that the $x_i$'s are independent. Moreover, it says that they are identically distributed, so $p(x_i=1)=\theta$ for all $i$ and "at each trial [...] we have the [same] probability $\theta$ of a success." This is why the likelihood is a binomial distribution. – Sergio Jul 19 '21 at 19:28
@Tim I am familiar with characteristic functions but I thought it would be more likely that $ I $ represents the prior information as the author uses this symbol to represent the prior information in numerous places in the book including the page that I have referenced in my question – Jul 19 '21 at 19:30
1

@HaziqMuhammad prior *information* so information you have prior to looking at data, not prior distribution. However I don’t have the book at hand so hard to comment on details. Your quote discusses the likelihood function along and prior distribution is not a part of likelihood. – Tim Jul 19 '21 at 19:48
@Sergio Appreciate your reply. I understand that the prior information $ I $ entails that “ there is a parameter $ \theta $ such that at each trial we have, independently of anything we know about other trials, the probability $ \theta $ for a success, therefore the probability $ (1 − \theta) $ for a failure.” But I fail to understand why the author equates $ p(x_i = 1| \theta I ) $ to $ \theta $ as $ \theta $ is only equal to $ p(x_i = 1| I) $. Feel like I’m missing something super obvious :) – Jul 19 '21 at 19:52
1

@HaziqMuhammad but the author doesn’t do that. He says nothing on what $\theta$’s distribution is in the quote. – Tim Jul 19 '21 at 19:56
2

@HaziqMuhammad: $D\equiv\{x_1,\dots,x_n\}$. If $x_i\sim\text{Bernoulli}(\theta)$, i.e. if $p(x_i=1)=\theta$, for all $i=1,\dots,n$ and they area independent (my prior information $I$), then the likelihood of $D$ is a binomial distribution, i.e. the distribution of the sum of i.i.d. Bernoulli random variables. This is what I read in the page you quoted. I can't find neither $\theta=p(x_i=1\mid\theta I)$ nor $\theta=p(x_i=1\mid I)$. – Sergio Jul 19 '21 at 20:13
@Sergio It does make sense to me that the likelihood of $ D $ is a binomial distribution from a classical interpretation of probability where the distribution of $ x_i $ is “objective” and is parametrised by $ \theta $. However, it is not clear to me how the author arrives at this conclusion, that the likelihood of $ D $ is a binomial distribution, from the paradigm of “objective Bayesianism”. The author, Edwin Jaynes, is an advocate of this interpretation and the book is based on this interpretation – Jul 20 '21 at 07:02
1

@HaziqMuhammad the data are binary events, so the distribution is binomial. It uses the same values of $\theta$ based on "prior information". The definition of the likelihood follows directly from the nature of the data and the prior information. – Tim Jul 20 '21 at 07:05
@Tim Under a classical interpretation of probability, if we write $p(x_i | \theta = \alpha)$, we are conditioning on the parameter $\theta$ of the “objective” Bernoulli distribution being equal to $ \alpha $. Under objective Bayesianism, if we say $p(x_i | \theta = \alpha, I)$, what are we conditioning on? – Jul 20 '21 at 07:24
@Tim Under this interpretation, if I were to talk about an “objective” distribution, I would be committing the mind projection fallacy: https://en.wikipedia.org/wiki/Mind_projection_fallacy – Jul 20 '21 at 07:26
1

@HaziqMuhammad the likelihood above is $p(X|\theta I)$, you condition on $\theta I$. $\theta$ is a random variable for the parameter of a binomial distribution. $I$ is just a notational trick to say "all the $\theta$'s are the same". – Tim Jul 20 '21 at 07:28
@Sergio If $p(x_i = 1 | \theta I)$ is not $\theta$ and $p(x_i = 0 | \theta I)$ is not $\theta - 1$ then how did the author arrive at $\prod_{i=1}^{n} p(x_i | \theta I) = \theta^n(1 - \theta)^{n-r}$ – Jul 20 '21 at 07:31
1

@HaziqMuhammad what you mention above is the *likelihood* function that is conditional on $\theta$. It tells you nothing on $\theta$'s distribution. The quote above tells nothing about it as well. $\theta$ is just a placeholder for a random variable with an *unknown* distribution for now, we didn't assume prior yet, nor we have a posterior for it. The author "arrived" at binomial distribution by noticing that those are i.i.d. Bernoulli trials so jointly they are binomial, that's all. – Tim Jul 20 '21 at 07:33
@Tim “$ \theta $ is a random variable for the parameter of a binomial distribution.” Which binomial distribution is it a parameter of? Under a classical interpretation of probability, I would have said that $ \theta $ is a parameter of the “objective” Bernoulli distribution that describes $ x_i $. What does a proposition like $ \theta = 0.5 $ represent in objective Bayesianism? – Jul 20 '21 at 07:54
1

@HaziqMuhammad you seem to be overcomplicating something, I'm afraid I cannot help anymore. I answered your question, this is what is meant by the notation. You seem to be putting in the author's mouth what the author didn't say. This is just a description of the likelihood function assumed for the data, nothing more. – Tim Jul 20 '21 at 07:58
@Tim Thanks a lot for your help. Much appreciated – Jul 20 '21 at 08:02

Likelihood in Bayesian inference: p(x|theta, I) = p(x| I)?

1 Answers1