Random vs deterministic predictors in regression

Question

I am reading Elements of Statistical Learning (ESL) and trying to have more of a grasp of machine learning techniques. I am a little bit confused about when to treat predictors as fixed, and when to treat them as random, and what is most common?

In previous linear models courses I have studied, we never assumed any sort of randomness about the predictors $\textbf{x}$ of the linear model, only about the response $y$. Formally, if we can write $y=\textbf{x}^T\beta+\epsilon$ (or more generally, $y=f(\textbf{x})+\epsilon$), then we choose $\textbf{x}_1,...,\textbf{x}_n$ and from the random error $\epsilon$ obtain $y_1,...,y_n$. Making the predictors deterministic made sense to me, as we could think of this as sampling points in an experiment, and observing some random response. However, I don't think ESL ever explicitly say that they ever consider the predictors as fixed, but they implicitly do when they estimate $\beta$ via maximum likelihood estimation (their maximum likelihood estimator never takes into account the randomness of $\textbf{x}$).

ESL allows for the possibility of random predictors, which I have never seen before, but it makes sense: we might only observe the predictors, and not choose them. My question is this. Am I correct in saying that there are 4 different approaches to supervised learning:

Random relationship, deterministic predictors: $y=f(\textbf{x})+\epsilon$, $\textbf{x}$ fixed (as described above)
Random relationship, random predictors: $y=f(\textbf{x})+\epsilon$, $\textbf{x}$ random
Deterministic relationship, random predictors: $y=f(\textbf{x})$, $\textbf{x}$ random
Deterministic relationship, deterministic predictors: $y=f(\textbf{x})$, $\textbf{x}$ fixed

Many thanks. Let me know if you want me to clarify anything.

edit: I tried searching but couldn't find anything, but scrolled through the tags of "regression" and found this excellent answer by kjetil b halvorsen. I think I understand it now more or less, but if anyone has any further comments I would love to hear them.

The last two seem unusual to me. In (3), y is a function of random variables, so y will be random itself. I suppose this is possible in a simulation. Another way to think about it as a special case of (2) where you get to observe the error. The only place I have seen (4) come up is a setting with an unknown black box decision rule with clearly defined data inputs where you get to probe the rule by feeding it fixed data that you choose in hope of backing the rule out. This sort of thing comes up in adversarial machine learning. In short, the first two seem sufficient to cover most cases. — dimitriy, Mar 07 '21 at 17:04
If you view the regression as a model for the *conditional distribution* of $Y$ given $X=x$, it does not matter whether $X$ is fixed or random. — BigBendRegion, Mar 07 '21 at 21:34

Random vs deterministic predictors in regression

0 Answers0