A Data Generating Process is the mathematical model generating the data.
For example, if you run a regression model with regressors $X$ and dependent variable $Y$, you implicitly hypothesize a data generating process for $Y$. This data generating process can be described by the statistical model
\begin{align}
Y = X\beta + \varepsilon,
\end{align}
Where $X$ is a $1xk$ vector of random variables, $\beta \in \mathbb{R}^k$ is the $kx1$ vector of coefficients.
An example for variable selection in the case of a regression model would be where you have two sets of regressors, say $X_1$ and $X_2$ such that $X_2 \subset X_1$. Suppose that the true Data Generating Process is
\begin{align}
Y = X_1\beta + \varepsilon,
\end{align}
but that you have all regressors in $X_2$ at your disposal. Then model selection (in theory) helps you to discern the relevant regressors (i.e., $X_1$) from those that are not relevant (i.e., $X_2 \setminus X_1$). This can be done with the BIC, the AIC, or t-statistics. Note that this might affect statistical inference, see also my recent post here: Post Model Selection Inference problems - which remedies exist?
On a sidenote, the notion of a Data Generating Process is fragile. In specifying a statistical model, we impose the Axiom of correct specification. In a regression model, this happens insofar as we consider only linear combinations of the regressors we hypothesize to have an effect on $Y$. How do we know these combinations are not nonlinear? We don't! We simply have to assume it. This is why recently, a new school of statisticians operates without this axiom when doing inference. The only thing they try is to select statistical models (such as the regression model) that can approximate your true Data Generating Process well enough. To make this clearer, suppose the true Data Generating Process for our above regression model is
\begin{align}
Y = \sum_{i=1}^{\infty}X_i\frac{c}{i} + \varepsilon.
\end{align}
While there are infinitely many random variables $x_i$ that affect Y, their coefficients decay at rate $O(i)$. Hence, a good feature selection scheme would select the first $k$ to approximate the true Data Generating Process reasonably well.
Similarly, this applies to other statistical models.