14

I am trying to better understand the Difference in "Probability Measure" and "Probability Distribution"

I came across the following link : https://math.stackexchange.com/questions/1073744/distinguishing-probability-measure-function-and-distribution

" The difference between the terms "probability measure" and "probability distribution" is in some ways more of a difference in connotation of the terms rather than a difference between the things that the terms refer to. It's more about the way the terms are used. "

The answer over here suggests that these two concepts might be the same thing?

In this case - could we consider the "Normal Probability Distribution Function" as a "Probability Measure"?

Thanks!

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
stats_noob
  • 5,882
  • 1
  • 21
  • 42

5 Answers5

14

First off, I am not used to the term "Probability Distribution Function". In case you are referring to a "PDF" when you are using "Probability Distribution Function", then I would like to point out that PDF is rather the abbreviation for "Probability Density Function". Below, I will presume you meant probability density function.

Second, the word "distribution" is used very differently by different people, but I will refer to the definition of this notion in the scientific community.

In a nutshell: The distribution of a random variable $X$ is a measure on $\mathbb{R}$, while the PDF of $X$ is a function on $\mathbb{R}$ and the PDF doesn't even always exist. So they are very different.

And now we get to the mathematical details: First, let's define the term random variable because that is what all those terms usually refer to (unless you get a step further and want to talk about random vectors or random elements, but I will restrict this here to random variables). I.e., you talk about the distribution of a random variable.

Given a probability space $(\Omega, \cal{F}, p)$ ($\Omega$ is just a set, $\cal{F}$ is a sigma algebra on $\Omega$, and $p$ is a measure on $(\Omega, \cal{F})$), a random variable $X$ is a measureable map $X: \Omega \to \mathbb{R}$.

Then we can define: The distribution $p_X$ of the random variable $X$ is the measure $p_X = p \circ X^{-1}$ on $\mathbb{R}.$

I.e. you push the measure $p$ from $\Omega$ forward to $\mathbb{R}$ via the measurable function $X$.

Next we define the probability density function (PDF) of a random variable $X$: The PDF $f_X$ of a random variable $X$, if it exists, is the Radon-Nikodym derivative of its distribution w.r.t. the Lebesgue measure $\lambda$, i.e. $f_X = \frac{d\,p_X}{d\,\lambda}$.

So the distribution $p_X$ and the PDF $f_X$ of a random variable are very different entities (to a stickler, at least). But very often, if the PDF $f_X$ exists, it contains all of the relevant information about the distribution $p_X$ and can thus be used as a handy substitute for the unwieldy $p_X$.

frank
  • 1,434
  • 1
  • 8
  • 13
5

You are right, when we are just starting out our stat studies, these terms can be super confusing, especially to people that are very detail-driven or very particular about terminologies. You do need a very good understanding here before you can master the applications of them. And using R functions such as pnorm(), dnorm() forces you to understand these.

And I’d say the post you quoted put it pretty well - especially where it pointed out that “probability functions” - “probability density function” and “probability mass function” are precisely defined, and the other terms can be understood as their names suggest - that the “probability measure” is a measure for the probabilities - is it cumulative or not? It doesn’t say. It’s a measure, so it can refer to both. A probability distribution can be similar. But don't confuse it with distribution function. Distribution function in most cases refers to "cumulative distribution function", which is also clearly defined as the probability taking on a value equal to or less than a specified value.

Maybe an example can help

enter image description here

This is a simple normal distribution chart. And here,

  • The probability function - which is a probability density function here since it’s continuous variable - describes the red curve. It’s a line, that’s represented by this function $$\large{f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}}$$ where $\mu$ is the mean and $\sigma$ the standard deviation.

  • Calculating integrals under this “red line” gets us different “probability” values - for example, the orange area is an area that represents 1 standard deviation away from the mean with 34% area of the total - which is 34% probability; whereas the blue area represents 17% probability. I’d say the entire chart depicting the probabilities with a normal distribution is a probability measure and a probability distribution. But only and precisely the red line is a probability function, or more specifically, a probability density function.

Don't stress too much about what exactly is the definition for "probability measure"/"probability distribution". I believe as you see more examples, you will feel comfortable using these terms.

Peiran Yu
  • 61
  • 5
  • 1
    Most authors in math and stats make a clear distinction between the distribution function, also known as a CDF, and the probability density function, or PDF. Your use of "probability distribution" and your equation of measures with distribution functions ("it can refer to both") looks unusual and might be confusing to those familiar with the standard terminology. – whuber Feb 11 '22 at 22:23
  • Of course, distribution function and probability density function are not the same at all. As shared in my explanation, the probability density function is clearly and precisely defined. – Peiran Yu Feb 12 '22 at 03:15
  • CDF is also a lot of times represented by the term "distribution function". I haven't found a good definition for probability measure yet. Some say it's a function. // All in all, I think we all agree probability functions are well defined and clear. The other terms aren't always so much. Would love to know a good source of standard definitions if you know of them! – Peiran Yu Feb 12 '22 at 03:19
  • Walter Rudin, *Real and Complex Analysis,* is the classic account of measures. Various approaches are possible from the perspectives of measure theory or functional analysis (where measures appear as elements of the dual space to $L^\infty(\mathbb R),$ effectively defining probability measure in terms of expectation). – whuber Feb 12 '22 at 15:47
4

You mention the post from mathstack: https://math.stackexchange.com/questions/1073744/distinguishing-probability-measure-function-and-distribution The answers there are great and they should be self-sufficient. I recommend anyone to go read it if they are interested in the maths details.

If you are asking here the "same" question, this is probably because you are not familiar with the terminology and the mathematics of probability theory.

What I do not like about the other answers is that they are focusing on the densities instead of what is really important: distributions and measures.

For this reason, here is some vocabulary, aimed at non professional mathematicians. It is a very quick and dirty presentation:

Functional analysis:

Distributions is a mathematical object that you develop in functional analysis and you do not need to know the details here. See https://en.wikipedia.org/wiki/Distribution_(mathematics)

Density is the concept that arises FROM distribution in nice scenarios. When everything is nice, densities are the derivatives of distributions.

Probability Theory:

Cumulative Distribution Function, it is a "normalised" distribution, and for this reason its value over the whole domain is equal to $1$ ( over $ \mathbb R $ if you wish, $ \lim_{x \to \infty } F_X( x ) = 1$),

Probability Density Function, same as above, but for Cumulative Distributions.

Measure Theory:

Random Variable: it is essentially a measurable function. If you do not understand this, skip it, and come back later to this topic. For now consider it as a function. All measurable functions are functions, but all functions are not measurable. However, in most situations, what you could think about is measurable.

Measure: a measure is a function from sets to reals. In other words, it attributes a weight to sets of elements. Measures have to respect some properties that I do not detail here.

Probability Measure: a probability measure is a normalised measure such that the measure of the whole space equals to $1$.


Now that we know the vocabulary, the important theorems you should be aware are:

  1. A random variable is ALWAYS associated to a CDF (cumulative distribution function). It is not possible to have one without the other. If I explicitly define $X$ you can find the CDF of $X$ coined $F_X$ and inversely.

  2. For every CDF, there exists a unique associated probability measure.

What does it mean? it means that if you start with a random variable, you have a CDF, which corresponds to a measure! And inversely!

  1. A bonus, when the PDF (probability density function) is equivalent to the CDF (in the sense that the derivative of the CDF is equal to the PDF and the anti-derivative of the PDF gives the CDF), then what I said about the CDF (that it uniquely characterises the random variable) is also true for the PDF! Be careful to cases when the PDF and CDF are not equivalent.

The conclusion is that yes, distributions and measures are equivalent. If you have one, you can construct the other. This is great because it simplifies many cases where you can just work out the expression that is easier to work with.

  • 1
    @RuiBarradas - a minor point, but the integral of the CDF is bounded if the variate has an upper bound, e.g., a Uniform$(0,1)$ variate. – jbowman Feb 12 '22 at 19:57
  • @jbowman Yes, thanks. And that's not a minor point. – Rui Barradas Feb 12 '22 at 22:09
  • @RuiBarradas - I was thinking it was minor because your statement was based on an oversight, not on a misunderstanding. – jbowman Feb 13 '22 at 00:04
4

The concept of a "probability distribution" is an umbrella term that refers to a particular type of object that can be represented uniquely in multiple ways. One way to represent a probability distribution is through its probability measure, another is through its characteristic function, another is through its cumulative distribution function, and another is through its probability density function (including specification of a dominating measure for the density). All of the latter are specific mathematical objects that describe a probability distribution in a different way. The term "probability distribution" does not refer to a specific mathematical object; it can be thought of as an umbrella term that refers holistically to the "thing" that each of these objects is describing.

Ben
  • 91,027
  • 3
  • 150
  • 376
1

Probability measure can be seen as a generalization of probability density function (PDF). Probability distribution function is not really a term, but when I saw it used it was always meant to refer to the cumulative distribution function (CDF). I wouldn't use this as a term, since it's confusing.

In Excel NORM.S.DIST() function has an argument that switches it between cumulative and density functions. To make things more confusing Microsoft called the density function as "probability mass function", which is obviously wrong because PMF is for discrete distributions.

It's ok to use probability distribution term since it references everything we know about the distribution, including PDF and CDF and moments etc.

Aksakal
  • 55,939
  • 5
  • 90
  • 176