1

Background: As I understand the role of probability in Quantum Mechanics, the idea is that no observable event can have negative probability, but that it can make sense for unobserved quantities to have negative probability, so long as when marginalizing over unobserved events we obtain a proper probability distribution over observable ones.

Bayesian hierarchical models often contain unobservable latent variables. Conceivably, endowing some of these latent variables negative probability (densities) could still result in a well defined probability density over the observable nodes. And, also conceivably, doing so would allow for a more parsimonious description of the marginal probability distribution than is possible using latent variables with positive probabilities only.

More Background: Maybe an example is in order. Let's consider the standard two-cluster Gaussian mixture model with unknown mean and identical, known variance in dimension 1, for now with positive probabilities as usual:

$$z \sim bern(\rho)$$

$$y|z=0 \sim N(\mu_0,1)$$

$$y|z=1 \sim N(\mu_1,1)$$

Here, the unobserved latent variable is $z$, and when marginilizing over it, we are left with:

$$\delta_y(y) = \rho \delta_{N(\mu_0,1)}(y) + (1-\rho) \delta_{N(\mu_1,1)}(y)$$

where $\delta_y$ is the density of $y$ and $\delta_{N(a,b)}$ gives the density of a normal distribution with mean $a$ and variance $b$.

Here's what that density looks like for $\mu_0=1$,$\mu_1=-1$ and $\rho=0.9$:

standard mixture

(Since the peaks are close relative to the standard error it looks much like a single normal distribution).

Now I'm going to change the second density from being a $N(\mu_1,1)$ to a distribution with negative probabilities by changing it's density to this function:

$$\delta_{\aleph} := \delta_{N(\mu_1,0)}\textrm{sgn}(\mu_1-y)$$

where $\textrm{sgn}(a)$ is the sign function, so it looks like this: negative density

If we just plug this into the previous expression for the marginal density, we get:

$$\delta_{-}(y) = \rho \delta_{N(\mu_0,1)}(y) + (1-\rho) \delta_{\aleph}(y)$$

which looks like this

negative marginal

This is by all accounts a proper marginal: if I had thought of it, I could happily use it as a likelihood for my data without any need for negative probabilities by directly applying this distribution.

Question: Are there examples of negative densities being used in the context of latent variable models as a modeling device in order to define a generative model of observed data?

Such a thing exists in the context of the outcome of quantum phenomena, such as the double slit experiment. See this Wikipedia article for more on negative probabilities.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
John Madden
  • 1,752
  • 9
  • 22
  • 3
    There is no such thing as a negative probability. Such instances of the term are a misuse of terminology. – DifferentialPleiometry Nov 12 '21 at 17:07
  • @Galen if you prefer to call what I've just described above something other than "negative probability", you should feel free to. But the concept as I've defined it clearly does exist, no? – John Madden Nov 12 '21 at 17:11
  • 1
    Yup. That is definitely my preference. Just because something is related to probability doesn't mean it 'is' a probability. Find another term. – DifferentialPleiometry Nov 12 '21 at 17:14
  • 3
    Allow me to direct you to review Galen's comment. There is no such thing as a negative probability. The concept as you've defined it does not exist. Probabilities are defined as non-negative: https://en.wikipedia.org/wiki/Probability_measure . You can achieve the same effect using valid probability distributions. – jbowman Nov 12 '21 at 17:14
  • @jbowman Agreed, this is precisely my view. – DifferentialPleiometry Nov 12 '21 at 17:19
  • 2
    I have used linear combinations of probability distributions with negative coefficients in several posts here on CV. For instance, [my expression for the distribution of a sum of Gamma variables](https://stats.stackexchange.com/a/72486/919) can have negative coefficients. I have also provided explanations of phenomena that involve adding negative density values to densities, such as https://stats.stackexchange.com/a/299765/919. Are these the sorts of things you are looking for? – whuber Nov 12 '21 at 17:41
  • @Galen and jbowman Thanks for sharing your thoughts on the semantics. Of course, sharing common terminology as a community has its benefits. Can you help me understand what terminology you would use for the concept which has as the title of its wikipedia page "Negative Probability"? Thanks. See here for the link: https://en.wikipedia.org/wiki/Negative_probability – John Madden Nov 12 '21 at 17:47
  • @JohnMadden Depends on context. For example, as in whuber's post, you can have linear combinations of probabilities where some coefficients are negative in order to achieve a new distribution function or some other function that isn't a probability distribution. – DifferentialPleiometry Nov 12 '21 at 17:52
  • @jbowman Thanks for your comment. In a comment above I have a follow-up question for you on negative probabilities (their existence not withstanding ;)). Of course, as you say (and as I say in the question), " You can achieve the same effect using valid probability distributions". But isn't it conceivable that there are learning problems for which the target density may be written down in fewer terms making use of negative densities than strictly positive ones? Or is there a result showing this isn't so? Similarly, perhaps instead of "fewer terms" we mean some other measure of parsimony. – John Madden Nov 12 '21 at 17:52
  • @whuber Thanks for your comment. If I understand your answers correctly, they deal with subtraction of probability densities basically in the context of a proof of some property of a distribution. I've updated my question to make more clear that I'm wondering if anyone has used these as part of their "modeling toolkit" in order to create models of observed real world data (reserving, of course, the negative probabilities for governing latent variables which are never observed. As I understand it, this is effectively their use in quantum mechanics). – John Madden Nov 12 '21 at 18:02
  • @Galen Also, I was inclined to take you at your word when you said"there is no such thing as a negative probability". But I'm struggling now that I see there is a wikipedia page with title "Negative Probability". And a recent article dealing with it: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1773077 And that Paul Dirac said (according to the Wiki) "Negative energies and probabilities should not be considered as nonsense. They are well-defined concepts mathematically, like a negative of money." Is there a movement I haven't run into that seeks to reject all this? – John Madden Nov 12 '21 at 18:09
  • 1
    @JohnMadden Indeed, I'm giving Dirac's comment some thought. It clearly contradicts [Kolmogorov's axioms](https://en.wikipedia.org/wiki/Probability_axioms), but it is interesting. – DifferentialPleiometry Nov 12 '21 at 18:21
  • I don't understand the distinction you seem to be making between subtraction and use in modeling. After all, if someone were to use the linear combination formula to model a sum of Gamma variates, would that, or would it not, involve "negative probabilities" for you? And if not, exactly what would?? – whuber Nov 12 '21 at 18:23
  • 1
    BTW, negative probabilities are not involved in double slit phenomena. QM (at least the classical, non-relativistic formulation) models *phases* of wave functions, but at the end when it is applied to observations, expectations of the *moduli* of those phases are used to compute probabilities--and those are automatically non-negative. – whuber Nov 12 '21 at 18:23
  • @Galen you may enjoy this video (which actually prompted this question :)): https://www.youtube.com/watch?v=std9EBbtOC0 – John Madden Nov 12 '21 at 18:23
  • @whuber Regarding "subtraction" vs "negative probability": There is no distinction if the latent variables have been marginalized out and we're directly considering the marginal density. But I'm picturing a complex Bayesian hierarchical model where inference is to be conducted via simulation (and hence there is no analytic marginal density). I'm wondering if there are problems that this would be able to solve more readily than simulation with latent variables with strictly nonnegative densities, say, in the context of a Variational Auto-Encoder or nonlinear Hidden Markov Model. – John Madden Nov 12 '21 at 18:28
  • @whuber I'm not claiming to understand the double slit experiment, so I can't address your point :). But it is mentioned as an example on the Wiki page for negative probability. – John Madden Nov 12 '21 at 18:30
  • 3
    Thank you--that's a helpful reference and remark. As far as I can tell, that portion of the Wiki page fails adequately to distinguish between the *value* of a wave function and its *expected amplitude.* I do not deny that conceptualizing of such things as negative probabilities has merits (I think of distributions in a similar way, as mixtures of positive and negative ones): but we are forewarned that this automatically implies our conceptualization violates the axioms of probability. Thus, any rigorous deductions therefrom would have to reprove all the theorems on which they rely. – whuber Nov 12 '21 at 18:35
  • 2
    Here is a relevant example. The [Inclusion-exclusion principle](https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle#In_probability) allows the computation of probabilities of unions of events from an additive/subtractive expansion of marginals and joint probabilities. This is compatible with $P(A \cap B)$ being a probability even if $-P(A \cap B)$ isn't. – DifferentialPleiometry Nov 12 '21 at 18:46
  • In quantum mechanics probabilities are positive. That's one way how it connects to the world. The main object that QM operates is a complex valued amplitude $\psi(s)$ of a state $s$, the probability of the state is $\psi(s)^*\psi(s)>0$, where * denotes a conjugate of a complex value. – Aksakal Nov 16 '21 at 02:33

1 Answers1

0

Since posting this question, I came across this article. In section 2, the author speculates about applying quantum probability laws to modeling in psychometrics and other fields and makes it out to be future work. I would therefore answer the question in the negative.

John Madden
  • 1,752
  • 9
  • 22
  • 1
    There are no quantum probability laws. It's annoying people shove "quantum" into something and make it sound cool. There's no reason why would QM laws apply to macro world. However, if you want to use QM's math apparatus then its essence is amplitudes and all kinds of thing you can do with them using matrix operators. Statisticians are pretty good with linear algebra, but they apply the operators on probability vectors. So, do the same with complex valued amplitudes and Hermitian operators and call it "quantum XYZ" if you wish. Trying to somehow extract probability laws from QM is a masochism – Aksakal Nov 16 '21 at 02:55
  • @Aksakal Could you please read section 2 of that article and suggest a different terminology than "quantum probability laws" to describe their ideas? In particular, the second to last paragraph is what I'm trying to describe. – John Madden Nov 16 '21 at 13:34
  • I read section 2. It’s lame. These are micro world phenomena which do not extend to macro world. He’s doing what Feynman looked at in 1980s. You don’t use math that makes things more difficult. You use math that makes things easy. In QM is easy to work with amplitudes, you get used to it. It Is just journeymen who want to bring their priest to the church who complain about amplitudes. – Aksakal Nov 16 '21 at 14:04
  • @Aksakal Thanks for taking the time to read it. I want to emphasize that there is no need for a direct link between quantum and macro phenomena in order for this to be a useful framework. In their psychometric example, they make no claim that quantum phenomena have an influence on the outcome of the experiment. Rather, they are saying that the way the double slit experiment is modeled could also serve as a model for the psychometric phenomenon of one question being asked interfering with the answer given for a second question. – John Madden Nov 16 '21 at 15:13