Supervised ML techniques based on Distribution of Dependent Variable

Question

Are there are any supervised ML techniques that take into account the distribution of target variables? In Statistics, quantile regressions are a good way to model the data based on conditional distribution of dependent variables.

I am curious if there are any techniques in ML literature that would be a good choice for modeling based on conditional distribution of dependent variable data?

Doesn't the ML literature say anything about maximum likelihood? — BigBendRegion, Jan 03 '22 at 00:19
Yes, ensemble techniques and neural networks can impose distribution on the target. — msuzen, Jan 03 '22 at 03:46
@MehmetSüzen could you elaborate on how they impose distribution on the target variable? — kms, Jan 03 '22 at 07:10
@kms There is no single way to achieve this. As mentioned in the Answer, one way is to loss function but in deep learning it isn't sufficient. In deep learning usually output layer is used to impose the target distribution. The best is to inspect known library implementations [h2o distributions](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/distribution.html). From classical statistics, see GLM link functions, [here](https://stats.stackexchange.com/questions/48594/purpose-of-the-link-function-in-generalized-linear-model). — msuzen, Jan 03 '22 at 08:19
@BigBendRegion - you would be amazed, shocked, and possibly horrified at how often I see ML people who haven't been taught the basics of statistics thinking that, for example, linear regression is done via neural nets, or, as I ran into a couple of weeks ago, that quantile regression is the go-to method for estimating probability distributions (because who knows what a probability distribution is?) Here's a fun link (https://www.reneshbedre.com/blog/pytorch-regression.html) to someone who I suspect knows better. — jbowman, Jan 04 '22 at 03:42
@jbowman What does it even mean for linear regression to be done via neural networks? — Dave, Jan 04 '22 at 03:47
In my skim of that link, it seemed like the author thought that neural networks with no hidden layers could do determine nonlinear relationships, which is false unless the programmer specifies something like a spline. However, I do see reasons to show that neural networks are kind of an extension of GLMs, even if the implementation in a deep learning library is massive overkill when all we want to do is a linear regression with the usual $\hat\beta=(X^TX)^{-1}X^Ty$. — Dave, Jan 04 '22 at 03:58

score 2 · Answer 1 · answered Jan 03 '22 at 01:27

Yes, but be aware regression is also a machine learning method. What you're looking to learn more about are loss functions, and you want to use a quantile loss function. You can usually specify your own loss function. In sklearn for example, you might have a loss parameter for your estimator, such as in sgdregressor.

See here for more info Quantile regression: Loss function

Supervised ML techniques based on Distribution of Dependent Variable

1 Answers1