1

Suppose $f$ and $g$ are two probability density functions. I have seen economists use $\int f(x)g(x) dx$ as some kind of similarity measure. For example, Jaffe (1986) uses sum of product of two proportions of budget in each area as a measure of similarity of two firms. https://www.nber.org/system/files/working_papers/w1815/w1815.pdf.

Specifically, $F_{i}\in\mathbb{R}^d$ is defined as the proportion of budget that firm $i$ devotes to in $d$ areas so $\sum_j F_{ij}=1$ and $0 \leq F_{ij} \leq 1$ for all $j = 1, 2, \ldots, d$. The similarity measure between $F_i$ and $F_j$ is defined as $P_{ij} = \frac{F_i^\top F_j}{\|F_i\|_2\|F_j\|_2}$.

If we think of $F_i$ as the probability mass function of a multinomial distribution or more generally the probability density function of a distribution, what is $P_{ij}$ measuring? It has the form of (uncentered) correlation of two p.m.f.'s/p.d.f.'s but is there any justification? Any statistician has used it as some sort of distance/angle between two measures? What is the relationship with correlation of the random variables that are endowed with these two p.d.f.s?

cccfran
  • 55
  • 5
  • 2
    The integral $\int f(x)g(x)\mathrm{d}x$ doesn't make a whole lot of sense generally because it changes with the unit of measure of $x.$ The integral $\int\sqrt{f(x)g(x)}\mathrm{d}x$ would have an invariant meaning and could be interpretable as a cosine dissimlarity (between the $L^2$ functions $\sqrt{f}$ and $\sqrt{g}$). But the stuff you write after "specifically" seems to have little to do with this. Are you trying to ask about the general formulation you began with or about the *discrete* distributions you wind up discussing? – whuber Nov 04 '20 at 21:07
  • 2
    See https://stats.stackexchange.com/questions/296361/intuition-of-the-bhattacharya-coefficient-and-the-bhattacharya-distance/296604#296604. – kjetil b halvorsen Nov 04 '20 at 23:17
  • @whuber I think the second paragraph is a discrete version of the general question where $f(x)$ and $g(x)$ are probability mass functions of multinomial distributions. But i think the Bhattacharyya distance kjetil b halvorsen mentioned with his post is what I am looking for. So thanks both! – cccfran Nov 05 '20 at 02:40

0 Answers0