Questions tagged [divergence]

a function that establishes the "distance" of one probability distribution to the other on a statistical manifold.

"In statistics and information geometry, divergence or a contrast function is a function which establishes the "distance" of one probability distribution to the other on a statistical manifold. The divergence is a weaker notion than that of the distance, in particular, the divergence need not be symmetric (that is, in general, the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality." --Wikipedia

53 questions
4
votes
1 answer

Estimating the $\chi^2$-divergence with Monte Carlo: which distribution to sample from?

Notation: let the $\chi^2$-divergence between $p, q$ be defined as $$\chi^2 (p||q) := \int \left ( \frac{p(x)}{q(x)} \right )^2 q(x)\mathrm{d}x -1 = \int \frac{p(x)}{q(x)} p(x)\mathrm{d}x - 1. $$ Suppose $q$ is a fully known prior distribution,…
user
  • 2,010
  • 8
  • 23
4
votes
1 answer

KL divergence for joint probability distributions?

I have a pair of joint probability distributions. I want to measure their similarity/dissimilarity. If they were single-dimensional probability distributions, then I could measure the Kullback–Leibler (KL) divergence or the Jensen–Shannon divergence…
4
votes
2 answers

Does maximizing Jensen–Shannon divergence maximize Kullback–Leibler divergence?

Does maximizing the Jensen–Shannon divergence $D_{\mathrm{JS}}(P \parallel Q)$ maximize the Kullback–Leibler divergence $D_{\mathrm{KL}}(P \parallel Q)$? If so, I'd like to be able to show that it does. I have managed to express $D_{\mathrm{JS}}(P…
4
votes
1 answer

Is relative entropy equal to cross-entropy during optimization?

I came across a saying that estimates of KL divergence, otherwise known as relative entropy, of the truth of a random variable and its prediction ($y$ and $\hat{y}$) is equal to their cross entropy because entropy + KL divergence = cross entropy or…
3
votes
1 answer

What to consider when choosing between f-divergence measures? (e.g.: kl-divergence, chi-square divergence, etc.)

I have some baseline population, and I have a non random sample from that population. For both the population and the sample I have observation of some measure (for simplicity, let's say age). I would like to measure how "un-similar" my sample is…
3
votes
1 answer

What is meant by divergence in statistics?

I have learned about the Intuition on the Kullback-Leibler (KL) Divergence as how much a model distribution function differs from the theoretical/true distribution of the data. The two most important divergences are the relative entropy…
Pluviophile
  • 2,381
  • 8
  • 18
  • 45
3
votes
1 answer

Optimizing forward/reverse KL divergence for Gaussian distributions

The forward/reverse formulations of KL divergence are distinguished by having mean/mode-seeking behavior. The typical example for using KL to optimize a distribution $Q_\theta$ to fit a distribution $P$ (e.g. see this blog) is a bimodal true…
adamconkey
  • 561
  • 4
  • 11
3
votes
1 answer

Is limiting density of discrete points (LDDP) equivalent to negative KL-divergence?

Is limiting density of discrete points (LDDP), which is a corrected version of differential entropy, equivalent to the negative KL-divergence (or relative entropy) between a density function $m(x)$ and a probability distribution $p(x)$? What are the…
2
votes
1 answer

Formal arguments for why an asymmetric f-divergence might be favourable to a symmetric one in analyzing importance sampling

I am reading Importance Sampling and Necessary Sample Size: an Information Theory Approach. Below is a quote from paragraph 3, section 3 of the article. While [total variation distance] and [Hellinger distance] can be shown to be distances in P(X),…
user
  • 2,010
  • 8
  • 23
2
votes
1 answer

Is there a name for $\sum P(x) \frac{P(x)}{Q(x)}$ ? (P and Q are pmf)

I know that $\sum P(x) log \left( \frac{P(x)}{Q(x)} \right)$ is the kl-divergence. I'd like to know if there is a name for $\sum P(x) \left( \frac{P(x)}{Q(x)} \right)$ (no log), but couldn't find one. Any pointers? Thanks!
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
2
votes
0 answers

Strong data processing inequality in multiplicative channels

We know that postprocessing will not increase the information. For two random variables $X$ and $Y$, $D(X||Y)>= D(f(X)||f(Y))$ for any operation $f()$ and divergence $D$. A strong data processing inequality implies $D(X||Y)> D(f(X)||f(Y))$. This…
HSxiao
  • 21
  • 1
2
votes
0 answers

Is KL-divergence just the multiplication rule for independent events, reformulated in terms of entropy?

We know KL-divergence is sometimes expressed like this: which shows it's capturing the deviation between the joint distribution of X and Y, and the product of marginals for X and Y. This suggests KL-divergence is simply the multiplication rule for…
2
votes
1 answer

Sensitivity of KL Divergence

I am very new to the concept of KL divergence. Although I have grasped the fundamental formulations, I have a confusion comparing the KL divergence across the different distributions. Suppose I have 3 random distributions [$y_1$,$y_2$ and $y_3$] and…
2
votes
1 answer

Does minimizing KL-divergence result in maximum entropy principle?

The Kullback-Leibler divergence (or relative entropy) is a measure of how a probability distribution differs from another reference probability distribution. I want to know what connection it has to the maximum entropy principle, which says that the…
2
votes
0 answers

Show that a sequence of random variables diverges to infinity in probability

I have sequences of real-valued random variables $\{X_T\}, \{Y_T\}$ and a sequence of real numbers $\{a_T\}$. As $T\rightarrow\infty$, I know that $$ a_T \rightarrow \infty $$ and $$ X_T \overset{d}{\rightarrow} X $$ where $X$ is a real-valued…
L D
  • 83
  • 9
1
2 3 4