@user20160 has already given you nice answer to your (1)-(3) questions, but the last one seems to be not yet fully answered.
- How can a representation of a probability density function arise from a weighted sum of $\delta(\cdot)$s that themselves take only
values of either zero or infinity?
Let me start with quoting Wikipedia as it provides a pretty clear description in this case (notice the bolds I added):
The Dirac delta can be loosely thought of as a function on the real
line which is zero everywhere except at the origin, where it is
infinite,
$$\delta(x) = \begin{cases} +\infty, & x = 0 \\ 0, & x \ne 0
\end{cases}$$
and which is also constrained to satisfy the identity
$$\int_{-\infty}^\infty \delta(x) \, dx = 1$$
This is merely a heuristic characterization. The Dirac delta is not a
function in the traditional sense as no function defined on the real
numbers has these properties. The Dirac delta function can be
rigorously defined either as a distribution or as a measure.
Further on, Wikipedia provides more formal definition and lots of worked examples, so I'd recommend you go through the whole article. Let me quote one example from it:
In probability theory and statistics, the Dirac delta function is
often used to represent a discrete distribution, or a partially
discrete, partially continuous distribution, using a probability
density function (which is normally used to represent fully continuous
distributions). For example, the probability density function $f(x)$
of a discrete distribution consisting of points $x = \{x_1, \dots,
x_n\}$, with corresponding probabilities $p_1, \dots, p_n$, can be
written as
$$ f(x) = \sum_{i=1}^n p_i \delta(x-x_i) $$
What this equation is saying is that we take sum over $n$ continuous distributions $\delta_{x_i} = \delta(x-x_i)$ that have all their mass around $x_i$'s. If you'd try to imagine $\delta_{x_i}$ distributions in terms of cumulative distribution functions, it needs to be
$$
F_{x_i}(x) =
\begin{cases}
0 & \text{if } x < x_i \\
1 & \text{if } x \ge x_i
\end{cases}
$$
So we can re-write previous density to cumulative distribution function
$$ F(x) = \sum_{i=1}^n p_i F_{x_i}(x) = \sum_{i=1}^n p_i \mathbf{1}_{x \ge x_i} $$
where $\mathbf{1}_{x \ge x_i}$ is an indicator function pointing at $x_i$. Notice that this basically is a categorical distribution in disguise. Moreover, you can define Dirac delta in terms of arbitrary function
$$ \int_{-\infty}^\infty f(x) \delta(x-x_i) dx = f(x_i) $$
so it "works" as continuous version of indicator function.
The take-away message is that Dirac delta is not a standard function. It's also not equal to infinity at zero -- if it was, it would be useless because infinity is not a number, so we couldn't perform any arithmetic operations over it. You can think of Dirac delta simply as an indicator function pointing at some $x_i$ that is continuous and integrates to unity. No black magic involved, it is just a way to hack the calculus to deal with discrete values.