The process of predicting inputs from outputs is called Inverse Modeling. In this context your learned function (e.g. classifier/regression model), which maps inputs to outputs, is known as a forward model, and evaluating this function is the forward problem.
Typically the forward model may be deterministic, but is a many-to-one mapping, so that the inverse problem has no unique solution. Even if the forward function is one-to-one, inversion is commonly ill posed problem, as the solution is highly sensitive to perturbations in the (output) data, such as measurement errors.
Common classical (pre-Machine Learning) examples of inverse problems include tomography and deconvolution. Typically solutions to these classical inverse problems use some form of regularization (Bayesian prior) to obtain a unique solution (this commonly corresponds to some MAP estimate). In other cases, stochastic inversion may attempt to more fully characterize the set of possible solutions (commonly expressed as a Bayesian posterior distribution).
Note that while "inverse modeling" is a general concept, this terminology is perhaps more common in fields such as geophysics, scientific computing, and signal processing. A particular sub-problem quite popular in machine learning recently is representation learning (for example autoencoders). Though this certainly involves learning inverse mappings, the language of inverse modeling may be less likely to be used*.
Finally, it is worth noting that the inverse problem of data assimilation spans from "classical" to "modern", and the associated algorithms underpin much of modern technology.
(*The main exception being the term regularization, which is essentially ubiquitous.)
EDIT: The OP asks "How does this relate to generative vs discriminative models?", so I will attempt to provide some context. (Here I focus on the inverse modeling/data assimilation approaches familiar to me, which may not be fully general.)
In machine learning the typical setup is a data set $(x,y)$, where $x$ is a set of predictor variables, and $y$ is a set of variables to be predicted. Then a model for the joint distribution $p(x,y)$ is generative, whereas a model for the conditional distribution $p(y|x)$ is discriminative.
Now consider a deterministic forward model $f[x]$, and a data set $y=f[x]+\epsilon$, where $\epsilon$ is measurement noise. Then in forward mode, a discriminative model would be $p(y|x)=p_{\epsilon}(y-f[x])$. As noted above, $f[x]$ is typically a many-to-one mapping, so even without measurement noise $\epsilon$ there is no unique answer to the inverse problem $x=g[y]$ (and $\epsilon$ just makes the non-uniqueness worse).
So practical solution to the inverse problem requires some additional constraint, i.e. prior information on $x$ (a.k.a. regularization). In Bayesian terms, inversion typically incorporates a prior $p[x]$, i.e.
$$p[x|y]=\frac{p[y|x]p[x]}{p[y]}$$
The numerator is the joint distribution $p[x,y]=p[y|x]p[x]$, while for a given data set the denominator $p[y]$ is fixed, hence this is essentially a generative model.