54

For long time I did not understand why the "sum" of two random variables is their convolution, whereas a mixture density function sum of $f(x)$ and $g(x)$ is $p\,f(x)+(1-p)g(x)$; the arithmetic sum and not their convolution. The exact phrase "the sum of two random variables" appears in google 146,000 times, and is elliptical as follows. If one considers an RV to yield a single value, then that single value can be added to another RV single value, which has nothing to do with convolution, at least not directly, all that is is a sum of two numbers. An RV outcome in statistics is however a collection of values and thus a more exact phrase would be something like "the set of coordinated sums of pairs of associated individual values from two RV's is their discrete convolution"...and can be approximated by the convolution of the density functions corresponding to those RV's. Even simpler language: 2 RV's of $n$-samples are in effect two n-dimensional vectors that add as their vector sum.

Please show the details of how the sum of two random variables are a convolution and a sum.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • Here is a visualization on youtube: https://www.youtube.com/watch?v=Ma0YONjMZLI&feature=youtu.be – kjetil b halvorsen Mar 06 '18 at 11:01
  • 1
    To some extent this question may duplicate the existing [Why does convolution work?](https://stats.stackexchange.com/questions/124085/why-does-convolution-work/) – Silverfish Mar 06 '18 at 15:43
  • @Silverfish For the sentence, "Gold is rare" the question, "What is gold?" would have little to do with "What do you mean, 'rare'?" – Carl Mar 06 '18 at 18:17
  • This may be easier to intuit if you look at the sum of 100, or 1000, RVs works out. something is converging to a certain something – Carl Witthoft Mar 06 '18 at 19:32
  • 6
    I don't really believe that it is 'sum' in an *abstract algebraic* sense. When we make a 'sum of variables' then we refer to the typical arithmetic operation as we know when adding natural numbers or real numbers. That means that we make a new variable by 'adding' the other variables together. The notion of 'a sum of variables' also exist outside the realm of statistics and is independent from the expressions about convolutions and probabilities. So, indeed 'the sum of variables *is* a convolution', is wrong. But nobody is implying this. We should change the word 'is' in that statement. – Sextus Empiricus Mar 06 '18 at 20:37
  • 5
    This is like arguing that $f(x) \cdot g(x)$ should not be called 'the product of two functions f and g' (or only interpreted as some abstract algebraic notion of 'product') because it is a convolution in terms of the Fourier transforms of those functions. – Sextus Empiricus Mar 06 '18 at 20:51
  • $a\neq$ the sum of variables, rather the algebraic sum of random variables, which is defined as an operation via their convolution. – Carl Mar 06 '18 at 21:04
  • 17
    The "notice" is misleading. A sum of random variables $X$ and $Y$ is meant in precisely the same sense "sum" is understood by schoolchildren: for each $\omega$, the value $(X+Y)(\omega)$ is found by adding the numbers $X(\omega)$ and $Y(\omega).$ There's nothing abstract about it. These RVs have *distributions.* There exist many ways to represent the distributions. The distribution function of $X+Y$ is the *convolution* of the DFs of $X$ and $Y$; the characteristic function of $X+Y$ is the *product* of their CFs; the cumulant generating function of $X+Y$ is the *sum* of their CGFs; and so on. – whuber Mar 06 '18 at 22:01
  • @whuber Explain this, please: $\text{ListConvolve}\left[\left\{x_1,x_2\right\},\left\{y_1,y_2,y_3\right\}\right]==\left\{x_2 y_1+x_1 y_2,x_2 y_2+x_1 y_3\right\}$, whereas $\left\{x_1,x_2\right\}+\left\{y_1,y_2,y_3\right\}$ is undefined as a $2\times1$ and $3\times1$ matrix addition is undefined. – Carl Mar 07 '18 at 00:08
  • 3
    I don't see either random variables or distributions in your calculation. – whuber Mar 07 '18 at 00:21
  • @whuber See comment for Ilmari Karonen directly below his post for probability convolution of mass function example. – Carl Mar 07 '18 at 01:38
  • @whuber Notice deleted. When you write $X+Y$, do you mean $\Sigma_{i=1}^n\left(X_i+Y_i\right)$? – Carl Mar 07 '18 at 06:04
  • 2
    @Carl: No he doesn't. That's not what the sum of two random variables is. I've tried to explain in my answer. – Scortchi - Reinstate Monica Mar 07 '18 at 14:01
  • 8
    In the language of my post at https://stats.stackexchange.com/a/54894/919, a pair of random variables $(X,Y)$ consists of a box of tickets on each of which are written two numbers, one designated $X$ and the other $Y.$ The sum of these random variables is obtained by adding the two numbers found on each ticket. The computation literally is a task you could assign to a third-grade classroom. (I make this point to emphasize both the fundamental simplicity of the operation as well as showing how strongly it is connected with what everybody understands a "sum" to mean.) – whuber Mar 07 '18 at 14:35
  • 2
    @whuber Your single paragraph helped way more than long answers below—amazing! I’d love for it to be literally just copy-pasted into an answer, but upvoting the comment for now so future readers might spot it. – Yatharth Agarwal Feb 14 '19 at 06:58
  • @Yatharth OK, I did the copy-paste and then expanded a bit on the answer. Thank you for the encouragement. – whuber Feb 14 '19 at 16:12
  • @MartijnWeterings In proper mathematical language in this context "sum" means **$n$-space vector addition**. Unfortunately, statistical notation and language often reinvents concepts that are better developed elsewhere. In specific, RV's would be a subset of a subset of vector types elsewhere and the rules for manipulation of such are much better documented elsewhere. – Carl Feb 15 '19 at 18:10
  • 3
    @Carl 'sum of two variables' means sum in the same way as you would add the numbers of two dice rolls together. It seems to be you who is inventing 'n-space vector'. I have never heard of it before in this context. You are just mixing up too many concepts as I explained before in the comments under answer. – Sextus Empiricus Feb 15 '19 at 18:29
  • @MartijnWeterings The essential concept is that the outcomes are paired, and the pairs are added. That is how $n$-space vectors add because each of the $n$ dimensions is orthogonal to each other, which means that the only addition that can occur is within each dimension. There are many very useful rules for manipulating vectors, e.g. vector length is the root mean square, vectors have dot and cross products, etc. Learning is better than complaining about not knowing, and not knowing is not a badge of honor worth displaying. – Carl Feb 16 '19 at 08:44

10 Answers10

46

Notation, upper and lower case

https://en.wikipedia.org/wiki/Notation_in_probability_and_statistics

  • Random variables are usually written in upper case roman letters: $X$, $Y$, etc.
  • Particular realizations of a random variable are written in corresponding lower case letters. For example $x_1$, $x_2$, …, $x_n$ could be a sample corresponding to the random variable $X$ and a cumulative probability is formally written $P ( X > x )$ to differentiate random variable from realization.

$Z=X+Y$ means $z_i=x_i+y_i \qquad \forall x_i,y_i$


Mixture of variables $ \rightarrow $ sum of pdf's

https://en.wikipedia.org/wiki/Mixture_distribution

You use a sum of the probability density functions $f_{X_1}$ and $f_{X_2}$ when the probability (of say Z) is a defined by a single sum of different probabilities.

For example when $Z$ is a fraction $s$ of the time defined by $X_1$ and a fraction $1-s$ of the time defined by $X_2$, then you get $$\mathbb{P}(Z=z) = s \mathbb{P}(X_1=z) + (1-s) \mathbb{P}(X_2=z)$$ and $$f_Z(z) = s f_{X_1}(z) + (1-s) f_{X_2}(z)$$

. . . . an example is a choice between dice rolls with either a 6 sided dice or a 12 sided dice. Say you do 50-50 percent of the time the one dice or the other. Then $$f_{mixed roll}(z) = 0.5 \, f_{6-sided}(z) + 0.5 \, f_{12-sided}(z)$$


Sum of variables $ \rightarrow $ convolution of pdf's

https://en.wikipedia.org/wiki/Convolution_of_probability_distributions

You use a convolution of the probability density functions $f_{X_1}$ and $f_{X_2}$ when the probability (of say Z) is a defined by multiple sums of different (independent) probabilities.

For example when $Z = X_1 + X_2$ (ie. a sum!) and multiple different pairs $x_1,x_2$ sum up to $z$, with each the probability $f_{X_1}(x_1)f_{X_2}(x_2)$. Then you get the convolution $$\mathbb{P}(Z=z) = \sum_{\text{all pairs }x_1+x_2=z} \mathbb{P}(X_1=x_1) \cdot \mathbb{P}(X_2=x_2)$$

and $$f_Z(z) = \sum_{x_1 \in \text{ domain of }X_1} f_{X_1}(x_1) f_{X_2}(z-x_1)$$

or for continuous variables

$$f_Z(z) = \int_{x_1 \in \text{ domain of }X_1} f_{X_1}(x_1) f_{X_2}(z-x_1) d x_1$$

. . . . an example is a sum of two dice rolls $f_{X_2}(x) = f_{X_1}(x) = 1/6$ for $x \in \lbrace 1,2,3,4,5,6 \rbrace$ and $$f_Z(z) = \sum_{x \in \lbrace 1,2,3,4,5,6 \rbrace \\ \text{ and } z-x \in \lbrace 1,2,3,4,5,6 \rbrace} f_{X_1}(x) f_{X_2}(z-x)$$

note I choose to integrate and sum $x_1 \in \text{ domain of } X_1$, which I find more intuitive, but it is not necessary and you can integrate from $-\infty$ to $\infty$ if you define $f_{X_1}(x_1)=0$ outside the domain.

Image example

example of 'sum of variables' resulting in 'convolution of pdfs'

Let $Z$ be $X+Y$. To know $\mathbb{P}(z-\frac{1}{2}dz<Z<z+\frac{1}{2}dz)$ you will have to integrate over the probabilities for all the realizations of $x,y$ that lead to $z-\frac{1}{2}dz<Z=X+Y<z+\frac{1}{2}dz$.

So that is the integral of $f(x)g(y)$ in the region $\pm \frac{1}{2}dz$ along the line $x+y=z$.

bilibraker
  • 86
  • 1
  • 7
Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • Best explanation so far. Would you be so kind as to include the continuous definitions as well, please? – Carl Mar 06 '18 at 10:54
  • Well, OK, I think. This is only a sum, I think, in the sense of summing each variously incremental or individual datum of one variable over the entire range of the other. That is, there are a heck of a lot of sums. – Carl Mar 06 '18 at 11:29
  • No need, I know how to convolve most anything. I just had trouble understanding the jargonesque utilization of the word "sum". It is, at best, a sum of sums. My preference would be to say that random variables ***combine*** by convolution. – Carl Mar 06 '18 at 11:37
  • 7
    @Carl it is not jargonesque. The convolution can indeed be seen as a sum of many sums. But, this is not what *'the sum of variables'* refers to. It refers to such things as when we speak of a 'a sum of two dice rolls', which has a very normal meaning and interpretation in every day life (especially when we play a board game). Would you rather like to say that we take a combination of two dice rolls when we do use the algebraic sum of two dice rolls? – Sextus Empiricus Mar 06 '18 at 12:01
  • actually the convolution is a sum of many products but that is besides the point – Sextus Empiricus Mar 06 '18 at 12:08
  • In the discrete case, we are adding every single combination of those variables are we not? Or, am I missing something? – Carl Mar 06 '18 at 12:08
  • 3
    The probability of rolling 7 with **the (single) sum** of two dice is **the sum of (many)** probabilities for rolling 1-6, 2-5, 3-4, 4-3, 5-2, 6-1. The term sum occurs *two* times and in the first case, when it refers to a single summation expression, it is what the statement 'sum of two variables' refers to, as in 'sum of two dice rolls'. – Sextus Empiricus Mar 06 '18 at 12:12
  • In the continuous case, it is a stretch to call a convolution **integral** a sum. In the [Riemann Integral](http://mathworld.wolfram.com/RiemannIntegral.html) sense we perform an infinite summation of products. That is not a simple sum. Even in the discrete case, it would be better to say that random variables combine by exhaustive summation of products. To my way of thinking, "exhaustive summation of products" $\neq$ "sum." – Carl Mar 06 '18 at 12:24
  • 6
    Indeed, the integral replaces the sum of probabilities. But, that relates to the *second* use of the term sum, not the *first* use of the term sum. So we can still refer to the sum of two variables (which is the first use of the term). That is because the term 'sum' is not used to refer to the convolution operation or summation operation of the probabilities, but to the summation of the variables. – Sextus Empiricus Mar 06 '18 at 12:29
  • The only way to understand what that first use of the word "sum" means is from the meaning and structure inherited from the convolution performed. To me, that is jargonesque, i.e., sum:=convolution, not ordinary arithmetic that. – Carl Mar 06 '18 at 13:01
  • 9
    at least it is not jargonesque to state 'the probability density for a sum of dice rolls is defined by the convolution of the probability densities for the individual dice rolls'. The term 'a sum of dice rolls' has a very normal interpretation in every day life when there are no statisticians around with their jargon. It is in this sense (sum of dice rolls) that you need to interpret (sum of variables). This step is neither not jargonesque. People use 'sums of variables' all the time. It is only the statistician who thinks about the probabilities for these sums and starts applying convolutions – Sextus Empiricus Mar 06 '18 at 13:21
  • I think it would be nice to point out in mixture of variables that $Z = UX_1 + (1-U)X_2$ with $U \sim \mathcal{B}(p)$. To see how is it that it differs from the actual sum. – Manuel Mar 06 '18 at 16:34
  • A univariate mixture distribution of weighted sum(s) of distributions, not of random variables themselves, would be used for Hidden Markov chains, or blood plasma drug concentrations. – Carl Mar 07 '18 at 03:58
  • 2
    "In the continuous case, it is a stretch to call a convolution integral a sum." And nobody is proposing to do that... –  Mar 08 '18 at 14:04
  • @Pakk Actually, there are at least three ways to understand convolution integrals: the mutually exclusive Fourier and Laplace transforms/inverse transforms, or the real space [limiting Riemann sum](https://en.wikipedia.org/wiki/Riemann_sum#Connection_with_integration) treatment, which latter I favor precisely because it is not complex field and more easily understood. – Carl Feb 16 '19 at 18:39
  • @Carl if you like, you can consider the more general case in my final image which shows a joint distribution of $X$ and $Y$ and ask for the distribution of $Z=f(X,Y)$, ie a variable that is some function of the two. If $f$ is a continuous function then you can solve this with an integral ([product distributions](https://en.m.wikipedia.org/wiki/Product_distribution) and [ratio distributions] $Z=XY$ (https://en.m.wikipedia.org/wiki/Ratio_distribution) $Z=X/Y$ are examples). If $f $ is a summation then this integral becomes a convolution. The term 'sum' refers to this form of the function $f $ – Sextus Empiricus Feb 17 '19 at 08:08
  • 3
    @Carl: I think you misunderstood my statement. You were saying that it is not good to call a convolution integral a sum, implying that somebody calls the convolution integral a sum. But nobody here is saying this. What was said is that a convolution integral is the pdf of the sum of certain variables. You were changing the statement to something false, and then complained that it is false. –  Feb 17 '19 at 12:29
  • @Pakk "a convolution integral is the pdf of the sum of certain variables" sorry, this does not seem to be true. Suppose we are convolving $\text{pdf}_1*\text{pdf}_2=\text{pdf}_3$ then at some independent axis point $x_k$ the functional value of $\text{pdf}_3(x_k)$ is an evaluated (convolution) integral at that point. That integral is only a sum in Riemann sense that all integrals are infinite sums. – Carl Feb 18 '19 at 18:45
  • @Pakk Do you mean something like "a convolution written as a continuous function is a model for the discrete paired sums of discrete random variables", or "a discrete convolution is the discrete paired sums of discrete random variable" or something else? – Carl Feb 18 '19 at 19:34
  • 1
    @Carl: Again, I never said that the integral is a sum. Get that out of your head. Nobody said that. Nobody makes the claim that the integral is a sum, so you don't have to argue that it doesn't. –  Feb 18 '19 at 20:58
  • 1
    @Carl: What I said was: " I said: "a convolution integral is the pdf of the sum of certain variables". If $f_x$ is the pdf of $X$, and $f_y$ is the pdf of $Y$, then $f_x * f_y$ is the pdf of $(X+Y)$. This has nothing to do with the notion that an integral is in some sense a limit of sums. –  Feb 18 '19 at 21:00
  • @Pakk Yes, and it is inaccurate. Optimistically speaking, a convolution integral is a model that is often assumed to be a model for the sum of random variables in the classical sense that random variables are discrete real valued, i.e., listable functions. The problem you have is that pdf's are not pmf's and you are confusing the two, you are saying that a pdf "is" a pmf, and it is not. – Carl Feb 18 '19 at 21:08
  • @Carl: You are reading things that I never wrote. I never even mentioned pmf's. You are again attacking things that nobody said. Don't do that, it is useless. (This is not meant so sound annoyed, although I realize it might appear this way. I see that you are genuinely making attempts to get a better feeling for it, and I would really like to help with this, but it looks to me like you are being an obstacle to yourself, because you don't read accurately.) –  Feb 18 '19 at 21:22
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/89907/discussion-between-carl-and-pakk). – Carl Feb 18 '19 at 21:22
  • 1
    @Carl can you stop the nonsense you are writing here. Look back to the confusion in your post on March 6 *"the jargonesque utilization of the word "sum". It is, at best, a sum of sums"*. It is beside the point to discuss the difference between integration or summation. This consideration of integrals as a Riemann sums is *not* what the term 'sum' refers to in the phrase 'sum of two random variables'. Stop fighting something that is not. The statement *"convolution is the sum of random variables"*, which you find misleading, is your own statement, nobody but you is suggesting this statement. – Sextus Empiricus Feb 19 '19 at 01:01
  • I do not agree with you. The talk page on Wikipedia for [random variables](https://en.wikipedia.org/wiki/Talk:Random_variable#Definition_is_not_correct) has a number of comments to the effect that pdf's are not random in any sense of the word random. The confusion I faced in making this post was precisely that. I support @whuber's definition of random variables, they are discrete, not continuous. If you think otherwise, provide a physical example. – Carl Feb 19 '19 at 06:23
  • 4
    Carl, you keep on going but it is irrelevant. The term 'sum' in 'the sum of random variables' refers to something else than the integral operation or a convolution. The sum of variables is *not* equal to a convolution, in the literal sense. – Sextus Empiricus Feb 19 '19 at 08:07
  • 1
    @Carl: Whuber wrote an excellent answer. But that answer does not define random variables as (exclusively) discrete. And besides that: it is irrelevant! Whuber showed that 'sum' means exactly what it always means! I can't understand that you ignore the part where he answered your question, and remembered the part that he did not even write! –  Feb 21 '19 at 07:48
39

Convolution calculations associated with distributions of random variables are all mathematical manifestations of the Law of Total Probability.


In the language of my post at What is meant by a “random variable”?,

A pair of random variables $(X,Y)$ consists of a box of tickets on each of which are written two numbers, one designated $X$ and the other $Y$. The sum of these random variables is obtained by adding the two numbers found on each ticket.

I posted a picture of such a box and its tickets at Clarifying the concept of sum of random variables.

enter image description here

This computation literally is a task you could assign to a third-grade classroom. (I make this point to emphasize both the fundamental simplicity of the operation as well as showing how strongly it is connected with what everybody understands a "sum" to mean.)

How the sum of random variables is expressed mathematically depends on how you represent the contents of the box:

The first two of these are special insofar as the box might not have a pmf, pdf, or mgf, but it always has a cdf, cf, and cgf.


To see why convolution is the appropriate method to compute the pmf or pdf of a sum of random variables, consider the case where all three variables $X,$ $Y,$ and $X+Y$ have a pmf: by definition, the pmf for $X+Y$ at any number $z$ gives the proportion of tickets in the box where the sum $X+Y$ equals $z,$ written $\Pr(X+Y=z).$

The pmf of the sum is found by breaking down the set of tickets according to the value of $X$ written on them, following the Law of Total Probability, which asserts proportions (of disjoint subsets) add. More technically,

The proportion of tickets found within a collection of disjoint subsets of the box is the sum of the proportions of the individual subsets.

It is applied thus:

The proportion of tickets where $X+Y=z$, written $\Pr(X+Y=z),$ must equal the sum over all possible values $x$ of the proportion of tickets where $X=x$ and $X+Y=z,$ written $\Pr(X=x, X+Y=z).$

Because $X=x$ and $X+Y=z$ imply $Y=z-x,$ this expression can be rewritten directly in terms of the original variables $X$ and $Y$ as

$$\Pr(X+Y=z) = \sum_x \Pr(X=x, Y=z-x).$$

That's the convolution.


Edit

Please note that although convolutions are associated with sums of random variables, the convolutions are not convolutions of the random variables themselves!

Indeed, in most cases it is not possible to convolve two random variables. For this to work, their domains have to have additional mathematical structure. This structure is a continuous topological group.

Without getting into details, suffice it to say that convolution of any two functions $X, Y:G \to H$ must abstractly look something like

$$(X\star Y)(g) = \sum_{h,k\in G\mid h+k=g} X(h)Y(k).$$

(The sum could be an integral and, if this is going to produce new random variables from existing ones, $X\star Y$ must be measurable whenever $X$ and $Y$ are; that's where some consideration of topology or measurability must come in.)

This formula invokes two operations. One is the multiplication on $H:$ it must make sense to multiply values $X(h)\in H$ and $Y(k)\in H.$ The other is the addition on $G:$ it must make sense to add elements of $G.$

In most probability applications, $H$ is a set of numbers (real or complex) and multiplication is the usual one. But $G,$ the sample space, often has no mathematical structure at all. That's why the convolution of random variables is usually not even defined. The objects involved in convolutions in this thread are mathematical representations of the distributions of random variables. They are used to compute the distribution of a sum of random variables, given the joint distribution of those random variables.


References

Stuart and Ord, Kendall's Advanced Theory of Statistics, Volume 1. Fifth Edition, 1987, Chapters 1, 3, and 4 (Frequency Distributions, Moments and Cumulants, and Characteristic Functions).

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Associativity with scalar multiplication from [algebraic properties](https://en.wikipedia.org/wiki/Convolution#Algebraic_properties) relates that $$a ( f ∗ g ) = ( a f ) ∗ g$$ for any real (or complex) number $a$. Whereas one nice property is that the convolution of two density functions is a density function, one is not restricted to convolving density functions, and convolution is not in general a probability treatment, sure it can be, but it can be a time series treatment, e.g., a treatment of water runoff in lakes after a rainfall, a drug concentration model following dosing, etc. – Carl Feb 14 '19 at 21:08
  • @Carl How does that comment comport with your original question, which asks about *sums of random variables*? At best it is tangential. – whuber Feb 14 '19 at 21:45
  • I am asking you to not overgeneralize. To begin a sentence with "convolution is" without saying "convolution of RV's is" is elliptic. My whole problem here was with the elliptic notation. Vector addition of two $n$-space vectors is convolution, whether or not those vectors are normalized. If they are normalized, they need not be probabilities, That is the whole truth, not just part of it. – Carl Feb 14 '19 at 21:59
  • 1
    Thank you: I will clarify the first sentence to emphasize that I am answering your question. – whuber Feb 14 '19 at 22:03
  • New addition is true for convolution of RV's, which is technically what I asked. And perhaps I am equivocating but convolution is not always of RV's but can always be reduced to some scale factors of density functions times those density functions, where the scalars are multiplicative and where the density functions are sometimes RV's, in which case the scale factors are the multiplicative identity, i.e., 1. – Carl Apr 06 '19 at 19:37
  • Since you think this post is incorrect, then please do not accept it. Since I see nothing in it that is incorrect, then possibly what you are trying to say is that I might not have interpreted your question as you intended. In that case, please clarify your question. – whuber Apr 22 '19 at 22:23
  • Perhaps I misspoke, a bit. Nothing that you said is wrong. There is something else bothering me that I will try to get across elsewhere. – Carl Apr 22 '19 at 22:38
36

Your confusion seems to arise from conflating random variables with their distributions.

To "unlearn" this confusion, it might help to take a couple of steps back, empty your mind for a moment, forget about any fancy formalisms like probability spaces and sigma-algebras (if it helps, pretend you're back in elementary school and have never heard of any of those things!) and just think about what a random variable fundamentally represents: a number whose value we're not sure about.

For example, let's say I have a six-sided die in my hand. (I really do. In fact, I have a whole bag of them.) I haven't rolled it yet, but I'm about to, and I decide to call the number that I haven't rolled yet on that die by the name "$X$".

What can I say about this $X$, without actually rolling the die and determining its value? Well, I can tell that its value won't be $7$, or $-1$, or $\frac12$. In fact, I can tell for sure that it's going to be a whole number between $1$ and $6$, inclusive, because those are the only numbers marked on the die. And because I bought this bag of dice from a reputable manufacturer, I can be pretty sure that when I do roll the die and determine what number $X$ actually is, it's equally likely to be any of those six possible values, or as close to that as I can determine.

In other words, my $X$ is an integer-valued random variable uniformly distributed over the set $\{1,2,3,4,5,6\}$.


OK, but surely all that is obvious, so why do I keep belaboring such trivial things that you surely know already? It's because I want to make another point, which is also trivial yet, at the same time, crucially important: I can do math with this $X$, even if I don't know its value yet!

For example, I can decide to add one to the number $X$ that I'll roll on the die, and call that number by the name "$Q$". I won't know what number this $Q$ will be, since I don't know what $X$ will be until I've rolled the die, but I can still say that $Q$ will be one greater than $X$, or in mathematical terms, $Q = X+1$.

And this $Q$ will also be a random variable, because I don't know its value yet; I just know it will be one greater than $X$. And because I know what values $X$ can take, and how likely it is to take each of those values, I can also determine those things for $Q$. And so can you, easily enough. You won't really need any fancy formalisms or computations to figure out that $Q$ will be a whole number between $2$ and $7$, and that it's equally likely (assuming that my die is as fair and well balanced as I think it is) to take any of those values.

But there's more! I could just as well decide to, say, multiply the number $X$ that I'll roll on the die by three, and call the result $R = 3X$. And that's another random variable, and I'm sure you can figure out its distribution, too, without having to resort to any integrals or convolutions or abstract algebra.

And if I really wanted, I could even decide to take the still-to-be-determined number $X$ and to fold, spindle and mutilate it divide it by two, subtract one from it and square the result. And the resulting number $S = (\frac12 X - 1)^2$ is yet another random variable; this time, it will be neither integer-valued nor uniformly distributed, but you can still figure out its distribution easily enough using just elementary logic and arithmetic.


OK, so I can define new random variables by plugging my unknown die roll $X$ into various equations. So what? Well, remember when I said that I had a whole bag of dice? Let me grab another one, and call the number that I'm going to roll on that die by the name "$Y$".

Those two dice I grabbed from the bag are pretty much identical — if you swapped them when I wasn't looking, I wouldn't be able to tell — so I can pretty safely assume that this $Y$ will also have the same distribution as $X$. But what I really want to do is roll both dice and count the total number of pips on each of them. And that total number of pips, which is also a random variable since I don't know it yet, I will call "$T$".

How big will this number $T$ be? Well, if $X$ is the number of pips I will roll on the first die, and $Y$ is the number of pips I will roll on the second die, then $T$ will clearly be their sum, i.e. $T = X+Y$. And I can tell that, since $X$ and $Y$ are both between one and six, $T$ must be at least two and at most twelve. And since $X$ and $Y$ are both whole numbers, $T$ clearly must be a whole number as well.


But how likely is $T$ to take each of its possible values between two and twelve? It's definitely not equally likely to take each of them — a bit of experimentation will reveal that it's a lot harder to roll a twelve on a pair of dice than it is to roll, say, a seven.

To figure that out, let me denote the probability that I'll roll the number $a$ on the first die (the one whose result I decided to call $X$) by the expression $\Pr[X = a]$. Similarly, I'll denote the probability that I'll roll the number $b$ on the second die by $\Pr[Y = b]$. Of course, if my dice are perfectly fair and balanced, then $\Pr[X = a] = \Pr[Y = b] = \frac16$ for any $a$ and $b$ between one and six, but we might as well consider the more general case where the dice could actually be biased, and more likely to roll some numbers than others.

Now, since the two die rolls will be independent (I'm certainly not planning on cheating and adjusting one of them based on the other!), the probability that I'll roll $a$ on the first die and $b$ on the second will simply be the product of those probabilities: $$\Pr[X = a \text{ and } Y = b] = \Pr[X = a] \Pr[Y = b].$$

(Note that the formula above only holds for independent pairs of random variables; it certainly wouldn't hold if we replaced $Y$ above with, say, $Q$!)

Now, there are several possible values of $X$ and $Y$ that could yield the same total $T$; for example, $T = 4$ could arise just as well from $X = 1$ and $Y = 3$ as from $X = 2$ and $Y = 2$, or even from $X = 3$ and $Y = 1$. But if I had already rolled the first die, and knew the value of $X$, then I could say exactly what value I'd have to roll on the second die to reach any given total number of pips.

Specifically, let's say we're interested in the probability that $T = c$, for some number $c$. Now, if I know after rolling the first die that $X = a$, then I could only get the total $T = c$ by rolling $Y = c - a$ on the second die. And of course, we already know, without rolling any dice at all, that the a priori probability of rolling $a$ on the first die and $c - a$ on the second die is $$\Pr[X = a \text{ and } Y = c-a] = \Pr[X = a] \Pr[Y = c-a].$$

But of course, there are several possible ways for me to reach the same total $c$, depending on what I end up rolling on the first die. To get the total probability $\Pr[T = c]$ of rolling $c$ pips on the two dice, I need to add up the probabilities of all the different ways I could roll that total. For example, the total probability that I'll roll a total of 4 pips on the two dice will be: $$\Pr[T = 4] = \Pr[X = 1]\Pr[Y = 3] + \Pr[X = 2]\Pr[Y = 2] + \Pr[X = 3]\Pr[Y = 1] + \Pr[X = 4]\Pr[Y = 0] + \dots$$

Note that I went a bit too far with that sum above: certainly $Y$ cannot possibly be $0$! But mathematically that's no problem; we just need to define the probability of impossible events like $Y = 0$ (or $Y = 7$ or $Y = -1$ or $Y = \frac12$) as zero. And that way, we get a generic formula for the distribution of the sum of two die rolls (or, more generally, any two independent integer-valued random variables):

$$T = X + Y \implies \Pr[T = c] = \sum_{a \in \mathbb Z} \Pr[X = a]\Pr[Y = c - a].$$


And I could perfectly well stop my exposition here, without ever mentioning the word "convolution"! But of course, if you happen to know what a discrete convolution looks like, you may recognize one in the formula above. And that's one fairly advanced way of stating the elementary result derived above: the probability mass function of the sum of two integer-valued random variable is the discrete convolution of the probability mass functions of the summands.

And of course, by replacing the sum with an integral and probability mass with probability density, we get an analogous result for continuously distributed random variables, too. And by sufficiently stretching the definition of a convolution, we can even make it apply to all random variables, regardless of their distribution — although at that point the formula becomes almost a tautology, since we'll have pretty much just defined the convolution of two arbitrary probability distributions to be the distribution of the sum of two independent random variables with those distributions.

But even so, all this stuff with convolutions and distributions and PMFs and PDFs is really just a set of tools for calculating things about random variables. The fundamental objects that we're calculating things about are the random variables themselves, which really are just numbers whose values we're not sure about.

And besides, that convolution trick only works for sums of random variables, anyway. If you wanted to know, say, the distribution of $U = XY$ or $V = X^Y$, you'd have to figure it out using elementary methods, and the result would not be a convolution.


Addendum: If you'd like a generic formula for computing the distribution of the sum / product / exponential / whatever combination of two random variables, here's one way to write one: $$A = B \odot C \implies \Pr[A = a] = \sum_{b,c} \Pr[B = b \text{ and } C = c] [a = b \odot c],$$ where $\odot$ stands for an arbitrary binary operation and $[a = b \odot c]$ is an Iverson bracket, i.e. $$[a = b \odot c] = \begin{cases}1 & \text{if } a = b \odot c, \text{ and} \\ 0 & \text{otherwise}. \end{cases}$$

(Generalizing this formula for non-discrete random variables is left as an exercise in mostly pointless formalism. The discrete case is quite sufficient to illustrate the essential idea, with the non-discrete case just adding a bunch of irrelevant complications.)

You can check yourself that this formula indeed works e.g. for addition and that, for the special case of adding two independent random variables, it is equivalent to the "convolution" formula given earlier.

Of course, in practice, this general formula is much less useful for computation, since it involves a sum over two unbounded variables instead of just one. But unlike the single-sum formula, it works for arbitrary functions of two random variables, even non-invertible ones, and it also explicitly shows the operation $\odot$ instead of disguising it as its inverse (like the "convolution" formula disguises addition as subtraction).


Ps. I just rolled the dice. It turns out that $X = 5$ and $Y = 6$, which implies that $Q = 6$, $R = 15$, $S = 2.25$, $T = 11$, $U = 30$ and $V = 15625$. Now you know. ;-)

Ilmari Karonen
  • 1,589
  • 11
  • 13
  • 5
    This should be the accepted answer! Very intuitive and clear! – Vladislavs Dovgalecs Mar 07 '18 at 01:12
  • (+) 1 for your thoughtful simple contribution, quite the effort and I respect it. However, the summation of joint (independent product) probabilities such as $\Pr[T = 4] = \Pr[X = 1]\Pr[Y = 3] + \Pr[X = 2]\Pr[Y = 2] + \Pr[X = 3]\Pr[Y = 1]$ is not a simple sum, it is a "summation of exhaustive (of possible) products rule." I suppose you could call that operation $\oplus$, but $\otimes$ would be more commonly used for convolution, what it is not is a pairing of two numbers called $+$. Sure, we call it $+$, but I insist that is an abstraction. – Carl Mar 07 '18 at 01:30
  • 3
    @Carl: The point I'm trying to make is that the sum *of the random variables* is indeed a simple sum: $T = X + Y$. If we wish to calculate the *distribution* of $T$, then we'll need to do something more complicated, but that's a secondary issue. The random variable is not its distribution. (Indeed, a random variable is not even fully characterized by its distribution, since the (marginal) distribution alone doesn't encode information about its possible dependencies with other variables.) – Ilmari Karonen Mar 07 '18 at 02:10
  • 3
    @Carl: ... In any case, if you wanted to introduce a special symbol for "addition of random variables", then for consistency you should also have special symbols for "multiplication of random variables" and "division of random variables" and "exponentiation of random variables" and "logarithm of random variables" and so on. All of those operations are perfectly well defined *on random variables, viewed as numbers with an uncertain value*, but in all cases calculating the *distribution* of the result is far more involved than just doing the corresponding calculation for constants. – Ilmari Karonen Mar 07 '18 at 02:13
  • One problem is that the $+$ is particularly confusing. Sometimes is really means $+$, as in scaled addition of univariate mixture densities, or in expectation or variance sums, sometimes it means convolution or better "sum distribution." Not the same problem for "division" because we would be more likely to say "[ratio distribution](https://en.wikipedia.org/wiki/Ratio_distribution)," or quotient distribution, which is a good hint that correspond number pairs are randomly associated, as opposed to being actual division of one function by another at fixed dependent axis values. – Carl Mar 07 '18 at 03:11
  • 5
    @Carl: The confusion goes away when you stop confusing a random variable with its distribution. Taking the distribution of a random variable is not a linear operation in any meaningful sense, so the distribution of the sum of two random variables is (usually) not the sum of their distributions. But the same is true for *any* nonlinear operation. Surely you're not confused by the fact that $\sqrt{x + y} \ne \sqrt x + \sqrt y$, so why should you be confused by the fact that $\Pr[X + Y = c] \ne \Pr[X = c] + \Pr[Y = c]$? – Ilmari Karonen Mar 07 '18 at 03:24
  • There is a subtle difference between nomenclature being confusing, and my being confused. Consider the distribution of outcomes of one die divided by the paired sometimes different outcomes of another die tossed and read at the same time. No one, I think, is going to confuse that with ordinary division. – Carl Mar 07 '18 at 03:40
  • 4
    @Carl: Wait, what? I roll two dice, write down the results $X$ and $Y$, and then calculate $Z = X/Y$. How is that not ordinary division? (And yes, it's still ordinary division even if I do it *before* I roll the dice. In that case, the values of $X$ and $Y$ just aren't fixed yet, and therefore neither is the value of $Z$.) – Ilmari Karonen Mar 07 '18 at 03:58
  • No, the average outcome of many many divisions over time as a function of the first die result, i.e., the empirical or theoretical quotient distribution. – Carl Mar 07 '18 at 04:04
  • 1
    Well-written explanation. @Carl: It is hard to see how you still can think that the nomenclature is confusing, after reading this answer. (I believe you are, but it is just hard to understand why.) $X + Y$ is just simple addition... If $X=3$ and $Y=4$ then $X+Y=7$. The symbol "$+$" means exactly the same as what you learned when you were in primary school. What makes you think that it means something else? –  Mar 07 '18 at 15:20
  • @Pakk We usually have data lists. And if you want to express what we actually do as data operations, this would involve pairwise list operations, not operations on a single data pair. So, no, I have no idea what you mean because I cannot relate it to the real world the way it is expressed. – Carl Mar 07 '18 at 20:30
  • @Carl: Who are "we"? It looks like there is a lot of context that you are not sharing. I see from your profile that you are a nuclear scientist, so I guess that you have a very specific application of convolution in mind. But Ilmari's answer gives (in my view) a perfectly clear explanation of how a convolution of distributions corresponds to a sum of random variables. And the symbol "$+$" has its normal meaning here. Without knowing why you think it has a different meaning, it is impossible to explain this any better... –  Mar 07 '18 at 20:49
  • 1
    And the "pairwise list operations", do you mean the convolution written out as a sum of products of probabilities? In that case, you are calculating with distributions. Not with the random variables. Each convolution of distributions corresponds to a sum of certain random variables, but it might be that in your work you never explicitly see these random variables written out. –  Mar 07 '18 at 20:58
  • @Pakk "We" is just a placeholder. Let data exist as pairwise list. I am a nuclear physician (an MD). My *user:99274 convolutions* search has 16 results. The addendum above suggests a summation of "and", i.e., an sum of intersections. I do not understand how $\Sigma(and)\rightarrow +.\,\,\,\,$ That is, unless it is an operation on pairwise associated data lists. For example, if I want to see what $A+B$ looks like I would generate pairs of pseudo-random numbers, $A$ and $B$ then associate each pair using addition. – Carl Mar 07 '18 at 21:28
  • You don't understand how $\sum(and) \rightarrow +$, but do you understand $+ \rightarrow \sum(and)$? –  Mar 08 '18 at 01:30
  • @Pakk I finally do understand and it is a pairwise operation on lists of outcomes, when the operation is on data, and a convolution when performed on continuous density functions, see my [answer](https://stats.stackexchange.com/a/332543/99274). So, the two processes are only equivalent in the limit as the $n\rightarrow \infty$, and even then the mesh coverage may be [incomplete](https://stats.stackexchange.com/q/273185/99274). – Carl Mar 19 '18 at 22:53
10

Actually I don't think this is quite right, unless I'm misunderstanding you.

If $X$ and $Y$ are independent random variables, then the sum/convolution relationship you're referring to is as follows: $$ p(X+Y) = p(X)*p(Y) $$ That is, the probability density function (pdf) of the sum is equal to the convolution (denoted by the $*$ operator) of the individual pdf's of $X$ and $Y$.

To see why this is, consider that for a fixed value of $X=x$, the sum $S=X+Y$ follows the pdf of $Y$, shifted by an amount $x$. So if you consider all possible values of $X$, the distribution of $S$ is given by replacing each point in $p(X)$ by a copy of $p(Y)$ centered on that point (or vice versa), and then summing over all these copies, which is exactly what a convolution is.

Formally, we can write this as: $$ p(S) = \int p_Y(S-x)p_X(x)dx $$ or, equivalently: $$ p(S) = \int p_X(S-y)p_Y(y)dy $$

Edit: To hopefully clear up some confusion, let me summarize some of the things I said in comments. The sum of two random variables $X$ and $Y$ does not refer to the sum of their distributions. It refers to the result of summing their realizations. To repeat the example I gave in the comments, suppose $X$ and $Y$ are the numbers thrown with a roll of two dice ($X$ being the number thrown with one die, and $Y$ the number thrown with the other). Then let's define $S=X+Y$ as the total number thrown with the two dice together. For example, for a given dice roll, we might throw a 3 and a 5, and so the sum would be 8. The question now is: what does the distribution of this sum look like, and how does it relate to the individual distributions of $X$ and $Y$? In this specific example, the number thrown with each die follows a (discrete) uniform distribution between [1, 6]. The sum follows a triangular distribution between [1, 12], with a peak at 7. As it turns out, this triangular distribution can be obtained by convolving the uniform distributions of $X$ and $Y$, and this property actually holds for all sums of (independent) random variables.

Ruben van Bergen
  • 6,511
  • 1
  • 20
  • 38
  • Summing many sums is more *combining* than a single sum worth notating with a '+' sign. My preference would be to say that random variables ***combine*** by convolution. – Carl Mar 06 '18 at 11:44
  • 7
    A convolution could be called a sum of many sums, sure. But what you have to understand is that the convolution applies *strictly* to the PDFs of the variables that are summed. The variables themselves are *not* convolved. They are just added one to the other, and there is no way to construe that addition as a convolution operation (so the basic premise of your question, as it is now stated, is incorrect). – Ruben van Bergen Mar 06 '18 at 12:37
  • 4
    You are misunderstanding that reference. It states: _The probability distribution of the sum of two or more independent random variables is **the convolution of their individual distributions**_. It does not say that a sum of two random variables is the same as convolving those variables. It says that the _distribution_ of the sum is the convolution of the _distribution_ of the individual variables. A random variable and its distribution are two different things. – Ruben van Bergen Mar 06 '18 at 13:03
  • Sure, you *can* convolve random variables. But the sum/convolution property that is widely known and discussed in that article (and in my answer above) does *not* deal with convolutions of random variables. It is specifically concerned with *sums* of random variables, and the properties of the distribution of that sum. – Ruben van Bergen Mar 06 '18 at 13:22
  • 2
    ("Sure, you can convolve random variables". Can you? My understanding was that because to get the distribution function of the sum of random variables you convolve the mass/density functions of each, many people talk (loosely) of convolving distributions, & some talk (wrongly) of convolving random variables. Sorry to digress, but I'm curious.) – Scortchi - Reinstate Monica Mar 06 '18 at 15:06
  • @Scortchi: We are in agreement. My point was merely that while it might be possible to convolve two random variables, it's irrelevant because that's not the situation considered in wiki article that Carl referred to in his question (which deals with a sum of random variables). But now that I actually think about it, convolving two variables isn't actually even possible, because convolution applies to functions, not variables (obviously). – Ruben van Bergen Mar 06 '18 at 15:19
  • Thanks! I had a foggy notion that because the formal definition of a random variable is a function turning outcomes into numbers, there might be some literal meaning to "convolving random variables" I was unaware of. – Scortchi - Reinstate Monica Mar 06 '18 at 15:42
  • @Scortchi: Well, technically, you could have function-valued random variables... – Ilmari Karonen Mar 10 '18 at 11:50
9

Start by considering the set of all possible distinct outcomes of a process or experiment. Let $X$ be a rule (as yet unspecified) for assigning a number to any given outcome $\omega$; let $Y$ be too. Then $S=X+Y$ states a new rule $S$ for assigning a number to any given outcome: add the number you get from following rule $X$ to the number you get from following rule $Y$.

We can stop there. Why shouldn't $S=X+Y$ be called a sum?

If we go on to define a probability space, the mass (or density) function of the random variable (for that's what our rules are now) $S=X + Y$ can be got by convolving the mass (or density) function of $X$ with that of $Y$ (when they're independent). Here "convolving" has its usual mathematical sense. But people often talk of convolving distributions, which is harmless; or sometimes even of convolving random variables, which apparently isn't—if it suggests reading "$X + Y$" as "$X \ \mathrm{convoluted\ with} \ Y$", & therefore that the "$+$" in the former represents a complex operation somehow analogous to, or extending the idea of, addition rather than addition plain & simple. I hope it's clear from the exposition above, stopping where I said we could, that $X+Y$ already makes perfect sense before probability is even brought into the picture.

In mathematical terms, random variables are functions whose co-domain is the set of real numbers & whose domain is the set of all outcomes. So the "$+$" in "$X + Y$" (or "$X(\omega) + Y(\omega)$", to show their arguments explicitly) bears exactly the same meaning as the "$+$" in "$\sin(\theta)+\cos(\theta)$". It's fine to think about how you'd sum vectors of realized values, if it aids intuition; but that oughtn't to engender confusion about the notation used for sums of random variables themselves.


[This answer merely tries to draw together succintly points made by @MartijnWeterings, @IlmariKaronen, @RubenvanBergen, & @whuber in their answers & comments. I thought it might help to come from the direction of explaining what a random variable is rather than what a convolution is. Thank you all!]

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
  • (+1) For effort. Answer too deep for me fathom. However, it did lead me to one. Please read that and let me know your thoughts. – Carl Mar 11 '18 at 21:52
  • It is the elliptic notation that confused me: $S_i=X_i+Y_i$ for all $i=1,2,3,...,n-1,n$, in other words, **vector** addition. If someone had said, **"vector addition"** rather than **"addition"**, I would not have been scratching my head wondering what was meant, but not said. – Carl Feb 14 '19 at 21:22
  • Well, if you put realizations of $X$ & $Y$ into vectors, & wanted to calculate the vector of realizations of $S$, then you'd use vector addition. But that seems rather tangential. After all, would you feel the need to explain '$\sin(\theta) + \cos(\phi)$' using vectors, or say that the '$+$' in that expression signifies vector addition? – Scortchi - Reinstate Monica Feb 16 '19 at 11:37
  • To do what? The context was discrete data, e.g., RV's, not continuous functions, e.g., PDF's or $\sin(\theta)$, and $\sin(\theta) + \cos(\phi)$ is an ordinary sum. – Carl Feb 16 '19 at 16:29
  • You seem to be fishing for something, here is a hint. Correlated variance ${\sigma_M}^2={\sigma_{F,N}}^2={\sigma_F}^2-{\sigma_N}^2-2{\rho}_{F,N}{\sigma}_F{\sigma}_N$ is from the [dot product of a vector with itself](https://en.wikipedia.org/wiki/Dot_product#Application_to_the_law_of_cosines). If you want to visualize the math and give meaning to things, e.g., that Pearson correlation is numerically equivalent to the cosine of the angle between two vectors, then vector calculus is the way to go. – Carl Feb 16 '19 at 16:33
  • Random variables *are* functions though, mathematically. (I called them "rules" in my answer so as not to be too technical.) Their sums are ordinary sums too. – Scortchi - Reinstate Monica Feb 16 '19 at 17:07
  • Not really. What summing is done ordinary but what are summed are $n$ pairs of outcomes, which not the same thing as a single sum or a lump sum, and leaving out the indexing, the $i$'s, was confusing because it leaves out essentials, e.g., that both RV's are of length $n$, and confused not only me but at least some of the 10,000+ people who have visited this question, which signals to me that the question itself has merit, and if it did not have merit, why did you bother to answer? – Carl Feb 16 '19 at 18:21
  • There's no indexing left out when writing $X$ for a random variable, any more than there is when writing $\sin(\theta)$ for the trigonometric function. Again, [random variables](https://en.wikipedia.org/wiki/Random_variable) *are* functions; that's not an analogy. What's (usually) left out is the argument - an outcome: you can write $X(\omega)$ to be explicit. So $S(\omega) = X(\omega) +Y(\omega)$ defines a function in terms of the sum of other functions. – Scortchi - Reinstate Monica Feb 16 '19 at 19:54
  • Most confusing. Discrete stochastic outcomes are often functions when randomly real valued. I am having a hard time understanding what the context is that you are assuming. The link you gave is of no help because the definition of a random variable could be discrete real, discrete integer, continuous, etc. and $\omega$, as well as $X$ do not seem to correspond to any specific data types. What are the properties of $X$ and $\omega$, i.e., the data types, ranges and admissible values that apply to your statements? – Carl Feb 17 '19 at 20:27
  • The outcomes need to be mutually exclusive & jointly exhaustive. The set of all outcomes $\Omega$ can be finite, or countably or uncountably infinite. $X$ has domain $\Omega$, & co-domain the real numbers. These stipulations are general enough to cover discrete, continuous, & mixed-type random variables. – Scortchi - Reinstate Monica Feb 19 '19 at 09:38
  • You lost me at $\infty$ because, countable or not, the entire universe does not have that many quantum states. Would you please give an example that can have that many outcomes? As for $S(\omega) = X(\omega) +Y(\omega)$, mathematically speaking, $\omega$ appears to be a parameter in a parametric equation sum, which is not the first thing that comes to mind when $X+Y$ is written, so the notation $X+Y$ seems misleading if given without preamble. – Carl Feb 22 '19 at 04:20
  • 1
    @Carl: (1) If a biologist models the no. eggs laid in a duck's nest as a Poisson r.v., they're not really countenancing the possibility of an infinity of eggs. If you've got a question about the role of infinite sets in Mathematics, ask it on Mathematics or Philosophy SE. (2) Though quite standard, the nomenclature can indeed mislead; hence my answer. – Scortchi - Reinstate Monica Feb 23 '19 at 18:33
  • (1) Indeed. The inadequate [mesh coverage](https://stats.stackexchange.com/q/273185/99274) of the right tail of an approximately Poisson countably infinite number of eggs may not have a limiting greatest $x$-axis interval of one, and pmf Poisson may not perfectly model the countably infinite egg problem. (2) Agreed. Last, my only problem with your post is the quip "Why shouldn't =+ be called a sum?" the answer to which seems to be that parametric sums are not-ordinary, not-single point sums, but are numerous and paired. – Carl Feb 23 '19 at 20:57
4

In response to your "Notice", um, ... no.

Let $X$, $Y$, and $Z$ be random variables and let $Z = X+Y$. Then, once you choose $Z$ and $X$, you force $Y = Z - X$. You make these two choices, in this order, when you write $$ P(Z = z) = \int P(X = x) P(Y = z - x) \mathrm{d}x \text{.} $$ But that's a convolution.

Eric Towers
  • 351
  • 1
  • 5
3

The reason is the same that products of power functions are related to convolutions. The convolution always appears naturally, if you combine to objects which have a range (e.g. the powers of two power functions or the range of the PDFs) and where the new range appears as the sum of the original ranges.

It is easiest to see for medium values. For $x + y$ to have medium value, either both have to have medium values, or if one has a high value, the other has to have a low value and vice versa. This matches with the form of the convolution, which has one index going from high values to low values while the other increases.

If you look at the formula for the convolution (for discrete values, just because I find it easier to see there)

$(f * g)(n) = \sum_k f(k)g(n-k)$

then you see that the sum of the parameters to the functions($n-k$ and $k$) always sums exactly to $n$. Thus what the convolution is actually doing, it is summing all possible combinations, which have the same value.

For power functions we get

$(a_0+a_1x^1+a_2x^2+\ldots+a_nx^n)\cdot(b_0+b_1x^1+b_2x^2+\ldots+b_mx^m)=\sum_{i=0}^{m+n}\sum_k a_k*b_{i-k}x^i$

which has the same pattern of combining either high exponents from the left with low exponents from the right or vice versa, to always get the same sum.

Once you see, what the convolution is actually doing here, i.e. which terms are being combined and why it must, therefore, appear in many places, the reason for convolving random variables should become quite obvious.

LiKao
  • 2,329
  • 1
  • 17
  • 25
3

Let us prove the supposition for the continuous case, and then explain and illustrate it using histograms built up from random numbers, and the sums formed by adding ordered pairs of numbers such that the discrete convolution, and both random variables are all of length $n$.

From Grinstead CM, Snell JL. Introduction to probability: American Mathematical Soc.; 2012. Ch. 7, Exercise 1:

Let $X$ and $Y$ be independent real-valued random variables with density functions $f_X (x)$ and $f_Y (y)$, respectively. Show that the density function of the sum $X + Y$ is the convolution of the functions $f_X (x)$ and $f_Y (y)$.

Let $Z$ be the joint random variable $(X, Y )$. Then the joint density function of $Z$ is $f_X (x)f_Y (y)$, since $X$ and $Y$ are independent. Now compute the probability that $X + Y ≤ z$, by integrating the joint density function over the appropriate region in the plane. This gives the cumulative distribution function of $Z$.

$$F_Z(z)=\mathrm{P}(X+Y\leq z)= \int_{(x,y):x+y\leq z} f_X(x)\,f_Y (y)\,dy\,dx$$ $$= \int_{-\infty}^\infty f_X(x)\left[\int_{y\,\leq \,z-x} f_Y(y)\,dy \right] dx= \int_{-\infty}^\infty f_X(x)\left[F_Y(z−x)\right]\,dx.$$

Now differentiate this function with respect to $z$ to obtain the density function of $z$.

$$f_Z(z) = \frac{dF_Z(z)}{dz} = \int_{-\infty}^\infty f_X(x)\,f_Y ( z-x)\,dx.$$

To appreciate what this means in practice, this was next illustrated with an example. The realization of a random number element (statistics: outcome, computer science: instance) from a distribution can be viewed as taking the inverse cumulative density function of a probability density function of a random probability. (A random probability is, computationally, a single element from a uniform distribution on the [0,1] interval.) This gives us a single value on the $x$-axis. Next, we generate another $x$-axis second random element from the inverse CDF of another, possibly different, PDF of a second, different random probability. We then have two random elements. When added, the two $x$-values so generated become a third element, and, notice what has happened. The two elements now become a single element of magnitude $x1+x2$, i.e., information has been lost. This is the context in which the "addition" is taking place; it is the addition of $x$-values. When multiple repetitions of this type of addition take place the resulting density of realizations (outcome density) of the sums tends toward the PDF of the convolution of the individual densities. The overall information loss results in smoothing (or density dispersion) of the convolution (or sums) compared to the constituting PDF's (or summands). Another effect is location shifting of the convolution (or sums). Note that realizations (outcomes, instances) of multiple elements afford only sparse elements populating (exemplifying) a continuous sample space.

For example, 1000 random values were created using a gamma distribution with a shape of $10/9$, and a scale of $2$. These were added pairwise to 1000 random values from a normal distribution with a mean of 4 and a standard deviation of $1/4$. Density scaled histograms of each of the three groups of values were co-plotted (left panel below) and contrasted (right panel below) with the density functions used to generate the random data, as well as the convolution of those density functions. enter image description here

As seen in the figure, the addition of summands explanation appears to be plausible as the kernel smoothed distributions of data (red) in the left hand panel are similar to the continuous density functions and their convolution in the right hand panel.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • @whuber Finally, I think I understand. The sum is of random events. Take a look at my explanation and tell me if it is clear now, please. – Carl Mar 11 '18 at 21:27
  • 3
    It helps to be careful with the language. Events are *sets*. Rarely are they even sets of numbers (that's why their elements are termed "outcomes"). Events don't add--the values of random variables do. The issue about "impressively complicated" is just a distraction. Indeed, if you want to get to the heart of the matter, make sure one of the summands in your example is a zero-mean random variable, because the mean effects an overall shift in the location. You want to understand intuitively what convolution does *otherwise* than shift the location. – whuber Mar 11 '18 at 22:47
  • @whuber Thanks-useful. Only in statistics is an outcome a single element of a sample space. For the rest of us an outcome is the result of an event. Smoothing AND shifting. What I show is the least confusing example of many as it reduces collision of the superimposed plots. – Carl Mar 12 '18 at 07:24
  • +1 for the new introduction: it clearly lays out your program in a non-polemical way, effectively setting the stage for the rest of the post. – whuber Feb 20 '19 at 14:52
  • @whuber polemical---nice word---thanks. – Carl Feb 23 '19 at 03:21
  • NB: Your characterization of a mixture model is incorrect. You can see what the problems might be by considering two sets that are not disjoint. Your new description of a convolution also is incorrect. – whuber Apr 05 '19 at 13:49
  • @whuber Thanks, I can change it, but, first I would like to understand why. If the sets are not "disjoint" I assume that means they are like dice, as real numbers do not repeat. Are you saying that it is not a union because I would not count a six twice? I would count it as a joined list, does that help? As for why a convolution is not a vector sum, I flat out do not understand. – Carl Apr 06 '19 at 17:11
  • Well, in your answer the vectors in question are the density functions $f_X$ and $f_Y,$ with a vector sum of $f_X+f_Y.$ Obviously this differs from their convolution in general. I'm having a hard time formulating any mixture model in terms of unions of sets, but perhaps I have different concepts of "mixture model" and "set" than you are working with. – whuber Apr 06 '19 at 17:23
  • @whuber Agreed. I changed the notation to reflect your valid POV as far as I could understand it. Resolving ambiguous notation is difficult, and most do not have the patience for it. Remaining is "Why not vector sums?" that is what is done numerically and implies a large set of preexistent mathematical operations that are entirely relevant, that is, I think so, and await your comments. – Carl Apr 06 '19 at 18:26
  • 1
    I see now how you are thinking of mixture models. You are constructing what are sometimes known as "multisets." (Usually a constructor other than brackets $\{,\}$ is used in order to clarify the notation.) The idea appears to be that of an empirical distribution function: the empirical distribution of a multiset $A$ and the empirical distribution of a multiset $B$ give rise to the empirical distribution of their multiset union, which is the mixture of the two distributions with relative weights $|A|$ and $|B|.$ – whuber Apr 06 '19 at 18:49
  • 1
    I think I detect a potential source of confusion in these ongoing edits. Because it would take too long to explain in a comment, I have appended an edit to my answer in the hope it might help a little. Indeed, the original first line of my answer was misleading on that account, so I have fixed it, too, with apologies. – whuber Apr 06 '19 at 19:21
  • Agreed. Not sure about my current answer. Will have to think about it. – Carl Apr 06 '19 at 20:01
3

This question may be old, but I'd like to provide yet another perspective. It builds on a formula for a change in variable in a joint probability density. It can be found in Lecture Notes: Probability and Random Processes at KTH, 2017 Ed. (Koski, T., 2017, pp 67), which itself refers to a detailed proof in Analysens Grunder, del 2 (Neymark, M., 1970, pp 148-168):


Let a random vector $\mathbf{X} = (X_1, X_2,...,X_m)$ have the joint p.d.f. $f_\mathbf{X}(x_1,x_2,...,x_m)$. Define a new random vector $\mathbf{Y}=(Y_1, Y_2,...,Y_m)$ by

$$ Y_i = g_i(X_1,X_2,...,X_m), \hspace{2em}i=1,2,...,m $$

where $g_i$ is continuously differntiable and $(g_1,g_2,...,g_m)$ is invertible with the inverse

$$ X_i = h_i(Y_1,Y_2,...,Y_m),\hspace{2em}i=1,2,...,m $$

Then the joint p.d.f. of $\mathbf{Y}$ (in the domain of invertibility) is

$$ f_\mathbf{Y}(y_1,y_2,...,y_m) = f_\mathbf{X}(h_1(x_1,x_2,...,x_m),h_2(x_1,x_2,...,x_m),...,h_m(x_1,x_2,...,x_m))|J| $$

where $J$ is the Jacobian determinant

$$ J = \begin{vmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2} & ... & \frac{\partial x_1}{\partial y_m}\\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} & ... & \frac{\partial x_2}{\partial y_m}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial x_m}{\partial y_1} & \frac{\partial x_m}{\partial y_2} & ... & \frac{\partial x_m}{\partial y_m}\\ \end{vmatrix} $$


Now, let's apply this formula to obtain the joint p.d.f. of a sum of i.r.vs $X_1 + X_2$:

Define the random vector $\mathbf{X} = (X_1,X_2)$ with unknown joint p.d.f. $f_\mathbf{X}(x_1,x_2)$. Next, define a random vector $\mathbf{Y}=(Y_1,Y_2)$ by

$$ Y_1 = g_1(X_1,X_2) = X_1 + X_2\\ Y_2 = g_2(X_1,X_2) = X_2. $$

The inverse map is then

$$ X_1 = h_1(Y_1,Y_2) = Y_1 - Y_2\\ X_2 = h_2(Y_1,Y_2) = Y_2. $$

Thus, because of this and our assumption that $X_1$ and $X_2$ are independent, the joint p.d.f. of $\mathbf{Y}$ is

$$ \begin{split} f_\mathbf{Y}(y_1,y_2) &= f_\mathbf{X}(h_1(y_1,y_2),h_2(y_1,y_2))|J|\\ & = f_\mathbf{X}(y_1 - y_2, y_2)|J|\\ & = f_{X_1}(y_1 - y_2) \cdot f_{X_2}(y_2) \cdot |J| \end{split} $$

where the Jacobian $J$ is

$$ J = \begin{vmatrix} \frac{\partial x_1}{\partial y_1} & \frac{\partial x_1}{\partial y_2}\\ \frac{\partial x_2}{\partial y_1} & \frac{\partial x_2}{\partial y_2} \end{vmatrix} = \begin{vmatrix} 1 & -1\\ 0 & 1 \end{vmatrix} = 1 $$

To find the p.d.f. of $Y_1 = X_1 + X_2$, we marginalize

$$ \begin{split} f_{Y_1} &= \int_{-\infty}^\infty f_\mathbf{Y}(y_1,y_2) dy_2\\ &= \int_{-\infty}^\infty f_\mathbf{X}(h_1(y_1,y_2),h_2(y_1,y_2))|J| dy_2\\ &= \int_{-\infty}^\infty f_{X_1}(y_1 - y_2) \cdot f_{X_2}(y_2) dy_2 \end{split} $$

which is where we find your convolution :D

Mossmyr
  • 133
  • 4
0

General expressions for the sums of n continuous random variables are found here:

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0216422

"Multi-stage models for the failure of complex systems, cascading disasters, and the onset of disease"

For positive random variables, the sum can be simply written in terms of a product of Laplace transforms and the inverse of their product. The method is adapted from a calculation that appeared in E.T. Jaynes "Probability Theory" textbook.

  • Welcome to our site. You might find the thread at https://stats.stackexchange.com/questions/72479, as well as the Moschopolous paper it references, to be of interest. – whuber Jun 23 '19 at 12:04