Consider a discrete random variable $X$ taking on values $x_1, x_2, \cdots, x_n$ with positive probabilities $p_1, p_2, \ldots, p_n$ respectively. Two shibboleths that statisticians don't just murmur but instead shout from the rooftops are that probabilities are nothing but long-term frequencies, and that an event of probability $p$ will occur approximately $pN$ times in $N$ independent trials of the experiment, especially when $N$ is large. So suppose that the experiment haas been conducted $N$ times where $N$ is whatever number one thinks of as large (hopefully much larger than $\frac{1}{\min_i p_i}$), resulting in $X$ taking on values $X_1, X_2, \cdots, X_N$ where, of course, each $X_i \in \{x_1, x_2, \cdots, x_n\}$. Thus, the average observed value of $X$ on these $N$ trials is
$$\text{average observed value of } X = \frac{X_1+X_2+ \cdots + X_N}{N}.\tag{1}$$
Now, one way of computing the right side of $(1)$ is to add up the $N$ numbers and divide the sum by $N$, but another way is to note that some number $N_1$ of the $X_i$ have value $x_1$, some number $N_2$ of the $X_i$ have value $x_2$, and so on, where, of course, $\sum_{i=1}^n N_i = N$. Thus, we get that
\begin{align}
\text{average observed value of } X &= \frac{X_1+X_2+ \cdots + X_N}{N}\\
&= \frac{N_1x_1 + N_2x_2 + \cdots + N_n x_n}{N}\\
&\approx \frac{(p_1N)x_1 + (p_2N)x_2 + \cdots + (p_nN) x_n}{N}\\
&= \sum_{i=1}^n p_i x_i.\tag{2}
\end{align}
In short, the average observed value of a discrete random variable $X$ over a very large number of independent trials of the experiment can be expected to be close to $\sum_{i=1}^n p_i x_i$ and so we define the average, or the expected value, or expectation of $X$ as $\sum_{i=1}^n p_i x_i$ and denote this number by $E[X]$ or $\mathbb E[X]$.
The expectation of a discrete random variable $X$ taking on values
$x_1, x_2, \cdots, x_n$ with positive probabilities $p_1, p_2, \ldots, p_n$ respectively is denoted as $E[X]$ and is given by
$$E[X] = \sum_{i=1}^n p_i x_i. \tag{3}$$
Sadists amongst us even call this number the mean of $X$ so that they can later enjoy the pleasure of demeaning $X$.
The obvious generalization of $(3)$ to discrete random variables taking on a countably infinite number of values $x_1, x_1, x_2, \cdots$ is
$$E[X] = \sum_{i=1}^\infty p_i x_i \tag{4}$$
but is a little harder to justify in terms of average observed value over a finite number $N$ trials since "most" of the possible values of $X$ will not be observed even once in the $N$ trials. We also need to start worrying about whether the sum
$\sum_{i=1}^M p_i x_i$ converges as $M\to\infty$ or diverges, or, when $X$ takes on countably infinite numbers of both positive and negative values
$$\cdots, x_{-2}, x_{-1}, x_0, x_1, x_2, \cdots$$ whether the sum $\sum_{i=-\infty}^\infty p_i x_i$ can even be defined at all: it might work out to be of the form $\infty-\infty$ (cf. Why does the Cauchy distribution have no mean?).
Similar considerations also arise in the extension of the notion of expectation to continuous random variables with density functions. The integral in the formula
$$E[X] = \int_{-\infty}^\infty xf_X(x) \mathrm dx\tag{5}$$
can be viewed as a natural extension of notion of expectation as $\sum_{i=-\infty}^\infty p_i x_i$. We are multiplying the value $x_i$ that $X$ might take on by a probability $f_X(x_i) \Delta x_i$, creating the (Riemann) sum $\sum_i f_X(x_i) \Delta x_i$ and then taking the limit of the sum as all the $\Delta x_i \to 0$. That is, the integral in $(5)$ is essentially a glorified version of the sum in $(4)$ and can justified in exactly the same way. Statisticians steeped in measure theory will shudder at this explanation but it can serve us lesser mortals.