1

Let's say we have a machine that produces an output which is either defective or it is not. N trial units of output have been produced. Given some prior distribution of the probability of a defective output, we can now set up a joint probability distribution from which inference can be drawn on the probability of a defective output. How would one define what such a prior probability distribution actually means?

If one thinks of the probability of a defective unit as some physical attribute of the machine, the prior distribution can mean ones degree of belief as to which physical attribute this machine has. In the interpretation of probability as a degree of belief it seems to be avoided having to view probabilities as "physical attributes", they are simply degrees of belief. So what exactly is the underlying outcome, if not a physical attribute, on which beliefs are formed in this example of a prior distribution?

edit(attempt to clarify question): when we refer to a concrete probability space the events to which probabilities are assigned should have a specific well defined meaning. Now when we refer to a prior distribution of some parameter it should therefore be clear what is being said about reality if the parameter takes a certain value.

Now if the different values of the parameter aren't refering to different members of a population (for example urns with a different amount of red balls) then the parameter is only refering to different distributions. For this to have any meaning it seems to me that these distributions must be seen as "real" in the sense of the frequentist interpretation of a "physical probability". For the degree of belief interpretation of probability it would be nice not to have to refer to the concept of a physical probability, since it is not without it's own issues. My question is how/if it would be possible to assign a meaning to different values of a prior parameter which does not stem from a population, without refering to physical probabilities.

robot112
  • 21
  • 2
  • I'm afraid your edit didn't clarify much. Maybe you could give concrete example, rather then abstract one? – Tim May 15 '20 at 17:25

2 Answers2

1

Imagine that you found a black box on a street. It has attached post-it note saying "if red light blinks, it is going to rain in 2h". It is a solid box, made of some hard material, with no visible screws, so there is no obvious way of opening it and looking inside. You really have no clue what it does, how does it work, or if it is going to work at all. As you grab it, it blinks the red light. Should you trust it? Would it rain?

This is a pure subjective probability scenario. You know nothing about how does the box work, you can only make a bet and check what happens. The subjective prior that you'd choose is how much do you believe that it may work. If after two hours it rained, you could apply Bayes theorem to update your belief. Now you'd be inclined to trust it a little bit more, since you saw that at least once it "worked". You are a statistician and you know that correlation is not causation, so observing it once, doesn't convince you yet, but it shifts your opinion a little bit towards trusting it. That is what Bayes theorem does, when you update your prior.

For more formal discussion, you can check an answer of mine on the subjectivist interpretations of probability. Finally, it is not an either-or scenario, when using Bayesian approach, you are not forbidden to use frequentist, or physical, interpretations of probability. The subjectivist interpretation is just the broadest one, since knowing the physical process behind something directly translates to your beliefs, so that you can make assumptions based on your knowledge.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
Tim
  • 108,699
  • 20
  • 212
  • 390
  • In my example the prior probabilities are assigned to the "states" the machine has a probability of p of failing. Probabilities are meant to be assigned to states whose meaning is clearly defined. So in my example would it be possible to define what these states mean without invoking physical probability? – robot112 May 15 '20 at 14:50
  • @robot112 if there's a physical process, that is known, then why would you assume something that diverges from your knowledge about the process? Moreover even if you know how does the machine work, but the "states" would fail at random, or they together are a complex, chaotic system that is hard to predict, then in the end you might still need to make a bet, since the physical knowledge would not help that much. But the example is abstract, so hard to discuss. – Tim May 15 '20 at 14:59
  • In the bayesian setup of inference we generally have a joint distribution of an unobservable parameter together with the data which we observe. In an example where we have an urn with 10 balls and we don't know how many are red we might specify our prior probabilities over the number of red balls in the urn. In my example there is no population of machines. So without invoking physical probabilities what is meant when we say that the machine has a probability of p of having a probability of q of failing. – robot112 May 15 '20 at 15:16
  • @robot112 we mean that $p \in [0, 1]$ is the number that we use to quantify how much do we *believe* it will fail. See link in the answer for more formal explanation. – Tim May 15 '20 at 16:40
  • Not really though, in my example p is refering to the prior distribution, not the probability of failure. I think you aren't quite getting what I mean. This may be because the question is phrased poorly or doesn't make sense. To be clear my problem is not with the interpretation of probability as a degree of belief. – robot112 May 15 '20 at 16:51
  • @robot112 $p$ may be a distribution as well, depending on context, but if your question is not about interpreting the meaning of subjective probability, then I don't understand what you are asking? – Tim May 15 '20 at 16:56
  • i will edit the question to try and make it clearer – robot112 May 15 '20 at 17:04
0

This is not so much an "answer" as an attempt to provide a framework for the question (with the ultimate goal to give an answer). Note: I like to use the symbol $p$ to denote a probability density function (or mass function) as is common in the Bayesian literature, so I will use $\theta$ to denote the probability of a defective output.

Let $y_i \in \{0,1\}$ denote the outcome of a Bernoulli trial where 1 denotes "success" and 0 denotes "failure". The probability density function (sometimes called the mass function) is $$ p(y_i|\theta) = \textsf{Bernoulli}(y_i|\theta) = \theta^{y_i}\,(1-\theta)^{1-y_i} , $$ where $\theta \in [0,1]$ is the probability of success: $$ p(y_i = 1|\theta) = \theta . $$ With this setup, a defective output from the machine counts as a "success".

Let $s = \sum_{i=1}^n y_i$ denote the number of successes in $n$ independent trials. The distribution of $s$ is binomial: $$ p(s|n,\theta) = \textsf{Binomial}(s|n,\theta) = \binom{n}{s}\,\theta^s\,(1-\theta)^{n-s} . $$ To complete the model, let $p(\theta)$ denote the prior distribution for $\theta$.

The prior predictive distribution for $y_1$ is $$ p(y_1) = \int p(y_1|\theta)\,p(\theta)\,d\theta , $$ which is a Bernoulli distribution for which the probability of success equals the prior expectation of $\theta$: $$ p(y_1 = 1) = \int p(y_1=1|\theta)\,p(\theta)\,d\theta = E[\theta] . $$ Therefore we can write $$ p(y_1) = \textsf{Bernoulli}(y_1|E[\theta]) . $$

The joint distribution for $s$ and $\theta$ conditional on $n$ is $$ p(s,\theta|n) = p(s|n,\theta)\,p(\theta) . $$ The posterior distribution for $\theta$ conditional on $s$ and $n$ is $$ p(\theta|s,n) \propto p(s,\theta|n) . $$ Finally, the posterior predictive distribution for $y_{n+1}$ is \begin{equation} p(y_{n+1}|s,n) = \int p(y_{n+1}|\theta)\,p(\theta|s,n)\,d\theta = \textsf{Bernoulli}(y_{n+1}|E[\theta|s,n]) , \end{equation} where $$ E[\theta|s,n] = \int \theta\,p(\theta|s,n)\,d\theta $$ is the posterior expectation for $\theta$.

That's the framework. As I understand it, the question is about the prior distribution $p(\theta)$. The prior should incorporate all non-sample information available. For example, if only certain values of $\theta$ are possible because of physical considerations, then the prior should reflect that.

mef
  • 2,521
  • 1
  • 15
  • 14