Where does the expected value definition come from?

Question

The definition of the expected value on the domain $[a,b]$ is given by

$$E[X] := \int_a^b x f(x) \, \mathrm dx $$

I understand what the mean is, but I don't fully understand how this specific equation gives you the mean. How can I understand this without just taking it at face value?

[This post](https://stats.stackexchange.com/a/133370/22228) graphically shows the relationship between a sum an an integral, though it's to show why $\sum p(x)$ is analogous to $\int f(x) dx$ the concept for why $\sum xp(x)$ = $\int f(x) dx$ is similar. — Silverfish, Jul 03 '20 at 19:43
Could you give us more detail as to what's tripping you up? Is it that there's an $x$ in there? — Filip Milovanović, Jul 04 '20 at 10:45
I don't like to close upvoted questions with multiple answers, but in this case there is a clear duplicate. Moreover, although it has only [one answer,](https://stats.stackexchange.com/a/398918/919) it's a great answer that covers all the ground in the answers here and more. Additional explanation is available in the secondary duplicate threads. — whuber, Jul 04 '20 at 15:54

score 7 · Accepted Answer · edited Jul 03 '20 at 11:59

It's analogous to the discrete version. It's generally useful to think $P(X=x)\approx f(x)\Delta x$ in the continuous case. In the limiting case, as $\Delta x$ goes to $0$, this probability is $0$. So, the expected value will be $$E[X]\ \approx \sum x\ P(X=x)\ =\ \sum_{x\in\{a,a+\Delta x,...,b\}} x\ f(x)\Delta x$$ If you take the limiting case, this is going to be the original integral. Note that, this is not a formal proof, (e.g. $f(x)$ is Riemann integrable here), but an intuitive argument connecting continuous and discrete cases.

Dilip Sarwate · Answer 2 · 2020-07-04T15:20:42.340

Consider a discrete random variable $X$ taking on values $x_1, x_2, \cdots, x_n$ with positive probabilities $p_1, p_2, \ldots, p_n$ respectively. Two shibboleths that statisticians don't just murmur but instead shout from the rooftops are that probabilities are nothing but long-term frequencies, and that an event of probability $p$ will occur approximately $pN$ times in $N$ independent trials of the experiment, especially when $N$ is large. So suppose that the experiment haas been conducted $N$ times where $N$ is whatever number one thinks of as large (hopefully much larger than $\frac{1}{\min_i p_i}$), resulting in $X$ taking on values $X_1, X_2, \cdots, X_N$ where, of course, each $X_i \in \{x_1, x_2, \cdots, x_n\}$. Thus, the average observed value of $X$ on these $N$ trials is $$\text{average observed value of } X = \frac{X_1+X_2+ \cdots + X_N}{N}.\tag{1}$$ Now, one way of computing the right side of $(1)$ is to add up the $N$ numbers and divide the sum by $N$, but another way is to note that some number $N_1$ of the $X_i$ have value $x_1$, some number $N_2$ of the $X_i$ have value $x_2$, and so on, where, of course, $\sum_{i=1}^n N_i = N$. Thus, we get that \begin{align} \text{average observed value of } X &= \frac{X_1+X_2+ \cdots + X_N}{N}\\ &= \frac{N_1x_1 + N_2x_2 + \cdots + N_n x_n}{N}\\ &\approx \frac{(p_1N)x_1 + (p_2N)x_2 + \cdots + (p_nN) x_n}{N}\\ &= \sum_{i=1}^n p_i x_i.\tag{2} \end{align} In short, the average observed value of a discrete random variable $X$ over a very large number of independent trials of the experiment can be expected to be close to $\sum_{i=1}^n p_i x_i$ and so we define the average, or the expected value, or expectation of $X$ as $\sum_{i=1}^n p_i x_i$ and denote this number by $E[X]$ or $\mathbb E[X]$.

The expectation of a discrete random variable $X$ taking on values $x_1, x_2, \cdots, x_n$ with positive probabilities $p_1, p_2, \ldots, p_n$ respectively is denoted as $E[X]$ and is given by $$E[X] = \sum_{i=1}^n p_i x_i. \tag{3}$$

Sadists amongst us even call this number the mean of $X$ so that they can later enjoy the pleasure of demeaning $X$.

The obvious generalization of $(3)$ to discrete random variables taking on a countably infinite number of values $x_1, x_1, x_2, \cdots$ is $$E[X] = \sum_{i=1}^\infty p_i x_i \tag{4}$$ but is a little harder to justify in terms of average observed value over a finite number $N$ trials since "most" of the possible values of $X$ will not be observed even once in the $N$ trials. We also need to start worrying about whether the sum $\sum_{i=1}^M p_i x_i$ converges as $M\to\infty$ or diverges, or, when $X$ takes on countably infinite numbers of both positive and negative values $$\cdots, x_{-2}, x_{-1}, x_0, x_1, x_2, \cdots$$ whether the sum $\sum_{i=-\infty}^\infty p_i x_i$ can even be defined at all: it might work out to be of the form $\infty-\infty$ (cf. Why does the Cauchy distribution have no mean?).

Similar considerations also arise in the extension of the notion of expectation to continuous random variables with density functions. The integral in the formula $$E[X] = \int_{-\infty}^\infty xf_X(x) \mathrm dx\tag{5}$$ can be viewed as a natural extension of notion of expectation as $\sum_{i=-\infty}^\infty p_i x_i$. We are multiplying the value $x_i$ that $X$ might take on by a probability $f_X(x_i) \Delta x_i$, creating the (Riemann) sum $\sum_i f_X(x_i) \Delta x_i$ and then taking the limit of the sum as all the $\Delta x_i \to 0$. That is, the integral in $(5)$ is essentially a glorified version of the sum in $(4)$ and can justified in exactly the same way. Statisticians steeped in measure theory will shudder at this explanation but it can serve us lesser mortals.

score 1 · Answer 3 · answered Jul 03 '20 at 11:23

Hello and welcome to this community.

In order to clarify your doubts, you can first think of discrete spaces.

Take for example the random experiment concerning the toss of a fair dice. In this case, the sample space (that just mean the set of all possible outcomes) is $\Omega = \{1,2,3,4,5,6\}$, and we have a probability measure $\mathbb{P}$ that assigns to each of these outcomes a value of $1/6$.

Let $X$ denote the Random Variable arising from the random event of throwing a fair dice. The probability mass function induced by the RV $X$ is simply $p_X(i) = 1/6$ for all $i = 1,\dots,6$, i.e. for example $p_X(1) = \frac{1}{6}$ means that the probability that you throw a dice and your outcome is $X=1$ is exactly equal to $\frac{1}{6}$. Note that this is a discrete RV since it can only assume a finite number of values, namely from 1 to 6.

Now the expected value tells you what is the expected outcome of the RV $X$ that

$\mathbb{E} = \sum_{i = 1}^6 i\cdot p_X(i) = \frac{1}{6}(1+2+3+4+5+6) = 3.5$

Intuitively it means that if you toss the dice a bunch of times and take the average (i.e. sum the values you get and divide by the number of rolls), the number is going to be close to 3.5. The more rolls you make, the closer the value is likely to be to exactly 3.5.

Now imagine that the dice is not fair and for example you are more likely to observe 1s when tossing your dice. Let's say that observing a $1$ is $5$ times more likely, then you have that your pmf is $p_X(1) = 25/30$ and $p_X(i) = 1/30$ for $i \neq 1$. Then your expectation becomes

$\mathbb{E} = \sum_{i = 1}^6 i\cdot p_X(i) = 25/30 + 2/30 +3/30 + 4/30 +5/30 +6/30 = 1.5$.

The mean, in this case, is reduced since in the long run, you will observe more ones, and thus your mean is closer to that value.

So you can think of the expected value operator as a sum of what are the possible outcomes of your experiments times the probability that each outcome can happen.

This can be extended to the continuous case and the summation is translated into an integral.

I hope this help. Have a nice day!

Could you explain a little bit as to _why_ it is _intuitive_ that "if you toss the dice a bunch of times and take the average (i.e. sum the values you get and divide by the number of rolls), the number is going to be close to 3.5." I am not denying the truth of what you state but want more explanation as to why _you_ consider it to be intuitively clear. — Dilip Sarwate, Jul 03 '20 at 22:14
Hi @DilipSarwate, the intuition arises from the fact that if all the outcomes from 1 to 6 are equiprobable (1/6 probability), then in the long run you will observe the same amount of 1s, 2s, ..., 6s outcomes from your die toss. Thus the expected value of the outcome will simply be the arithmetic average of [1,...,6] that is 3.5. — Apprentice, Jul 04 '20 at 09:32

Where does the expected value definition come from?

3 Answers3