Because the main problem concerns applying a fully general and abstract formula to a somewhat complicated model (regression), let's address it by examining a simple concrete case. Ordinary regression is a good choice because it is well known, well understood, and serves as the archetype of all more complex regression models. But even this comes in several "flavors." The one that seems most relevant for prediction is the one in which the values of $p$ separate regressor ("independent") variables are specified by the experimenter, whose objective is to predict a random response whose distribution depends on the regressors. (As is usual, one of these $p$ regressors may take on a constant value.)
The standard notation for this is that vectors of regressor values, $x_1, x_2, \ldots, x_n$ are available (as data). They have been measured precisely along with corresponding responses $y_i$. A model for these responses is that each $y_i$ is an independent realization of a Normal variable with variance $\sigma^2$ and mean $x_i\beta$. (Each $x_i$ is a $p$-covector and $\beta=(\beta_1,\ldots,\beta_p)^\prime$ is a $p$-vector.)
Let's review: the values of the $x_i$ are known and not modeled as random variables; the values of the $y_i$ are modeled as realizations of random variables (which we could roll into an $n$-vector $y=(y_1,\ldots,y_n)^\prime$); and the values of the parameter $\theta=(\beta_0,\beta_1,\ldots,\beta_p,\sigma)$ are unknown.
Suppose the objective is to predict a response $y_0 = x_0\beta$ for a regressor $x_0$. One standard method says to predict it to be $$\hat y_0 = x_0\hat\beta$$ where $$\hat\beta = (X^\prime X)^{-}X^\prime y \tag{1}$$ and I have let $X$ be the "model matrix" obtained by stacking all $n$ of the covectors $x_i$ into an $n\times p$ matrix.
Let's pause for a moment to observe that the model and the model matrix $X$ completely determine the distribution of $\hat y_0$. This is because (a) the independence of the $y_i$ gives $y$ an $n$-variate Normal distribution; (b) its mean is given by $X\beta$; and (c) its covariance matrix is $\sigma^2$ times the $n\times n$ identity matrix.
What is not routinely specified is the loss function $L$. This measures the cost to our client when they act as if the correct value of $y_0$ is $\hat y_0$. Because it can depend on both $y_0$ and $\hat y_0$, it is formally written $L(y_0, \hat y_0)$. In the generic notation of the question, the procedure to guess $\hat y_0$ from the data is called $\delta$ and "$x$" refers to the data, which in our application are $X, x_0$, and $y$. Often it is taken to be the squared difference, $L(u,v)=(u-v)^2$. In general, loss functions might as well be zero when $u=v$ (you can't do any better than that) and they increase as $u$ and $v$ get further apart.
If you want to unwrap the preceding formulas, you could expand this out as
$$L(y_0, \hat y_0) = (y_0-\hat y_0)^2 = (x_0\beta - x_0 (X^\prime X)^{-}X^\prime y)^2.$$
Because we model $y$ as a multivariate Normal vector, this loss is a random variable. Its expectation is taken with respect to the distribution of $y$. The expected loss is the risk of our procedure. It depends on the (unknown) parameter $\theta$ and on the procedure itself. Since we're talking about a definite procedure based on equation $(1)$, it really is just a function of $\theta$:
$$R(\theta) = E(x_0\beta - x_0 (X^\prime X)^{-}X^\prime y)^2.$$
Since the right hand side is a random variable whose distribution is completely determined by $\theta$, this all makes sense and is well-defined. We could even write it out explicitly in terms of $X, x_0$ (specified constants all), and $\theta$.
Incidentally, for the "expected prediction error" referenced in the question, where $L(u,v)=v-u$, it's easy to show in this case that the risk is zero.