What does maximizing mutual information do?

Question

In information theory, there is something called the maximum entropy principle. Are other information measures, such as mutual information, also commonly maximized? If mutual information describes the reduction in uncertainty of characterizing one random variable (r.v. 1) based on full knowledge of a second random variable (r.v. 2), then would maximizing mutual information mean knowing everything about r.v. 2 will allow us to also have full knowledge of r.v. 1?

doubllle · Answer 1 · 2020-07-31T13:10:50.120

1

Just to give one example in machine learning context, for unsupervised learning, maximizing $I(X, Y)$ can build good representations $Y$ of input $X$, see here and here

edited Jul 31 '20 at 13:10

answered Jul 31 '20 at 10:28

doubllle

1,348
11
19

3

Mutual information is a measure of the amount of information one variable reveals about another where 0 indicates independence, larger values indicate stronger dependence. It always helps to go back to the source, in this case, Gelfand and Yaglom, *Calculation of amount of information about a random function contained in another such function*, American Mathematical Society Translations, 1957, Series 2. 12: 199–246. Also, there's this thread..https://stats.stackexchange.com/questions/81659/mutual-information-versus-correlation – Mike Hunter Jul 31 '20 at 12:42
yeah, I totally agree, and my answer isn't a proper one and needs refinement. I just wanted to quickly comment on what maximizing MI can do. @MikeHunter – doubllle Jul 31 '20 at 12:57
if elements in the **mutual information matrix** are nothing more than measures of dependence between variables in a multivariate dataset, similar to the correlation matrix, than how does maximizing the mutual information matrix differ from maximizing the correlation matrix? Is it simply able to pick out non-linear dependencies that the correlation matrix can't? – develarist Aug 04 '20 at 08:57
1

Being able to pick up nonlinear correlation is nice, isn't it? However, maximum of MI is hard to obtain due to the joint probability term, so the linked papers are maximizing its lower bound instead in practice. So practically, it is hard to fully recover $X$ from $Y$. – doubllle Aug 04 '20 at 10:04
@doubllle can you describe what the maximization of mutual information accomplishes for the two attached papers in your answer? – develarist Sep 07 '20 at 02:50
@develarist will do that later – doubllle Sep 07 '20 at 07:39
what is the lower bound of MI? the Gelfand and Yaglom paper was not found as pdf – develarist Dec 20 '20 at 08:10

What does maximizing mutual information do?

1 Answers1