Mutual Information as probability

Question

Could the mutual information over the joint entropy: $$ 0 \leq \frac{I(X,Y)}{H(X,Y)} \leq 1$$

be defined as:"The probability of conveying a piece of information from X to Y"?

I am sorry for being so naive, but I have never studied information theory, and I am trying just to understand some concepts of that.

Welcome to CV, luca maggi! What a lovely first question! – Alexis Jan 17 '19 at 19:56 — Alexis, Jan 17 '19 at 19:56

Tim · Accepted Answer · 2019-01-17T10:35:34.597

11

The measure you are describing is called Information Quality Ratio [IQR] (Wijaya, Sarno and Zulaika, 2017). IQR is mutual information $I(X,Y)$ divided by "total uncertainty" (joint entropy) $H(X,Y)$ (image source: Wijaya, Sarno and Zulaika, 2017).

As described by Wijaya, Sarno and Zulaika (2017),

the range of IQR is $[0,1]$. The biggest value (IQR=1) can be reached if DWT can perfectly reconstruct a signal without losing of information. Otherwise, the lowest value (IQR=0) means MWT is not compatible with an original signal. In the other words, a reconstructed signal with particular MWT cannot keep essential information and totally different with original signal characteristics.

You can interpret it as probability that signal will be perfectly reconstructed without losing of information. Notice that such interpretation is closer to subjectivist interpretation of probability, then to traditional, frequentist interpretation.

It is a probability for a binary event (reconstructing information vs not), where IQR=1 means that we believe the reconstructed information to be trustworthy, and IQR=0 means that opposite. It shares all the properties for probabilities of binary events. Moreover, entropies share a number of other properties with probabilities (e.g. definition of conditional entropies, independence etc). So it looks like a probability and quacks like it.

Wijaya, D.R., Sarno, R., & Zulaika, E. (2017). Information Quality Ratio as a novel metric for mother wavelet selection. Chemometrics and Intelligent Laboratory Systems, 160, 59-71.

edited Jan 17 '19 at 10:35

answered Jun 12 '17 at 10:11

Tim

108,699
20
212
390

1

How is IQR function defined for $A\subset\Omega$ in order to check against the defining properties of the probability measure? Are you introducing $I(X',Y')$ and $H(X',Y')$ with $X':=XI(A),\, Y':=YI(A)$ where $I$ is the characteristic function? – Hans Jan 14 '19 at 03:35
Well, my question is directed at a part of your answer and not a stand alone question. Are you suggesting that I open a new question and link and direct it to your answer? – Hans Jan 14 '19 at 09:31
@Hans What I said, is that this measure easily fits the definition, correct me if I'm wrong. Axioms 1. and 2. are obvious. For axiom 3., $I(X, Y)$ is the overlap, $H(X, Y)$ is the total space, so the fraction can be easily seen as probability. – Tim Jan 14 '19 at 10:24
1

A probability is defined on a sample space and its sigma field $(\Omega, \mathscr{F})$. I am confused as to what these are for this probability measure IQR. There is already a sample space and its sigma field for the probability measure defined for the random variables $X$ and $Y$. Is the sample space and field of the new probability measure IQR the same as those of the old probability measure associated with $X$ and $Y$? If not, how are they defined? Or, are you saying these need not be defined? How then do you check it against the axioms? – Hans Jan 14 '19 at 19:07
@Hans I stated it explicitly that this is consistent with axioms, but it is hard to say probability of what exactly this would be. The interpretation I suggested is probably of reconstructing signal. This is not a probability distribution of X or Y. I guess you could go deeper into interpreting and understanding it. The question was if this could be interpreted as probability and answer was that formally yes. – Tim Jan 14 '19 at 19:42
For a probability measure to be defined, you need the sample space and its associated sigma field, just as in the definition of Billingsley you have quoted. My question is if you want to check IRQ against that definition, do you not want to specify what that $(\Omega, \mathscr F)$ is? So for this case, what is that $(\Omega, \mathscr F)$? Then you need to check $IQR(A_1\cup A_2) = IQR(A_1)+IQR(A_2)$ for disjoint $A_1,A_2\subset\mathscr F$. You say this $(\Omega, \mathscr F)$ is not that of the $X$ and $Y$, then what is it, how is it defined? – Hans Jan 15 '19 at 01:11
You need to specify everything in the definition (axiom) in this particular case order to check the case against the definition. Otherwise, what are you checking? – Hans Jan 15 '19 at 01:12
@Hans it seems that it's either me oversimplifying it, or you overthinking it. We started repeating ourselves, so this leads nowhere. You can provide negative answer, it be happy to read about your arguments in greater detail. – Tim Jan 15 '19 at 06:47
I have put up an answer. My main point is that you need to check the properties of IQR against the items of the definition of a probability measure one at a time and all of them. – Hans Jan 16 '19 at 10:16

Hans · Answer 2 · 2019-01-18T02:59:16.030

2

Here is the definition of a probability space. Let us use the notations there. IQR is a function of a tuple $(\Omega,\mathscr F,P,X,Y)$ (The first three components form the probability space the two random variables are defined on). A probability measure has to be a set function that satisfy all the conditions of the definition listed in Tim's answer. One will have to specify $\Theta:=(\Omega,\mathscr F,P,X,Y)$ as some subset of a set $\tilde\Omega$. Moreover, the set of $\Theta$'s has to form a field of subsets of $\tilde\Omega$, and that $\text{IQR}(\Omega,\mathscr F,P,X,Y)$ has to satisfy all three properties listed in the definition of probability measure listed in Tim's answer. Until one constructs such an object, it is wrong to say IQR is a probability measure. I for one do not see the utility of such a complicated probability measure (not the IQR function itself but as a probability measure). IQR in the paper cited in Tim's answer is not called or used as probability but as a metric (The former is one kind of the latter, but the latter is not one kind of the former.).

On the other hand, there is a trivial construction that allows any number on $[0,1]$ to be a probability. Specifically in our case, consider any given $\Theta$. Pick a two-element set as the sample space $\tilde\Omega:=\{a,b\}$, let the field be $\tilde{\mathscr F}:=2^{\tilde\Omega}$ and set the probability measure $\tilde P(a):=\text{IQR}(\Theta)$. We have a class of probability spaces indexed by $\Theta$.

edited Jan 18 '19 at 02:59

answered Jan 16 '19 at 08:15

Hans

855
4
14

For your information, I edited my answer to simplify & clarify it. Probability *is* a metric that has some special properties. We are talking about set of all the possible pairs of messages and their reconstructions $(x_i, y_i)$. Here the random variable is a complicated, unknown function that tells us if the reconstruction was "good" or not. Returning trustworthy reconstruction can be thought as a binary event, my answer is simply that IQR can be thought as probability for such event (or rather approximation of it). – Tim Jan 17 '19 at 10:56
@Tim: The previous version of your answer is a much better answer as it provides a clear definition one can check against. There is no way to circumvent a definition. Probability is a metric, but not all metric with "some special properties" is a probability. Until we can verify all the "special properties" of this metric fit the defintion, it is not one. However, I did give a trivial construction of a class of probability spaces indexed by the parameter tuple $\Theta:=(\Omega,\mathscr F,P,X,Y)$. – Hans Jan 18 '19 at 02:49
That is also the case if you use complicated neural network with sigmoid activation function at the end, can you prove that the output is probability in metric-theoretical terms..? Yet, we often choose to interpret this as probability. – Tim Jan 18 '19 at 10:09
@Tim: Of course you can. That is an easy one to deal with using the pullback measure. The sigmoid function is a measurable function which already stipulates the sigma fields of the domain and range ($[0,1]$ with the (conventional) Borel field) of the function. The probability measure of a subset $A$ of the sample space $P(A):=\mu(f(A))$ where $\mu$ is the (conventional) Borel measure of $R$ and $f$ is the sigmoid function. QED – Hans Jan 18 '19 at 20:50
1

Sorry, but I never found this kind of discussions & measure theory interesting, so I'll withdraw from the further discussion. I also do not see your point in here, especially since your last paragraph seems to say exactly the same thing I was saying from the very begging. – Tim Jan 18 '19 at 21:07
@Tim: Sure, everyone has his own personal preferences. To each his own. The axiomatic, i.e. mathematical logic, approach gives a definitive answer to any question. Other approaches are always fuzzy and ambiguous. I see now after the fact the connection between your statement and the last paragraph of my answer. Echoing my second last sentence, with all due respect and pardon my honesty, I find your answer vague and my proposition mathematically concrete and unequivocal. With the axiomatic approach, there is no need to say "looks like, smells like, and quacks like", it is either is or is not. – Hans Jan 18 '19 at 22:25

score 1 · Answer 3 · answered Jan 30 '22 at 10:15

1

Going back in history a bit, the role of $\frac{I(X,Y)}{H(X,Y)} $ as a measure of probability can be seen, in part, in the 1961 article by Rajski: A Metric Space of Discrete Probability Distributions. This article outlines the development of the Rajski Distance ${(D_R)}$ is:

$${D_R}=1 - \frac{I(X,Y)}{H(X,Y)} $$

answered Jan 30 '22 at 10:15

Mari153

385
5
16

Interesting that it is a metric. +1. – Hans Feb 26 '22 at 06:42

Mutual Information as probability

3 Answers3

Linked