1

Are there are any supervised ML techniques that take into account the distribution of target variables? In Statistics, quantile regressions are a good way to model the data based on conditional distribution of dependent variables.

I am curious if there are any techniques in ML literature that would be a good choice for modeling based on conditional distribution of dependent variable data?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
kms
  • 530
  • 2
  • 17
  • 1
    What about a quantile regression neural network? – Dave Jan 02 '22 at 22:58
  • Doesn't the ML literature say anything about maximum likelihood? – BigBendRegion Jan 03 '22 at 00:19
  • Yes, ensemble techniques and neural networks can impose distribution on the target. – msuzen Jan 03 '22 at 03:46
  • @MehmetSüzen could you elaborate on how they impose distribution on the target variable? – kms Jan 03 '22 at 07:10
  • @kms There is no single way to achieve this. As mentioned in the Answer, one way is to loss function but in deep learning it isn't sufficient. In deep learning usually output layer is used to impose the target distribution. The best is to inspect known library implementations [h2o distributions](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/distribution.html). From classical statistics, see GLM link functions, [here](https://stats.stackexchange.com/questions/48594/purpose-of-the-link-function-in-generalized-linear-model). – msuzen Jan 03 '22 at 08:19
  • @BigBendRegion - you would be amazed, shocked, and possibly horrified at how often I see ML people who haven't been taught the basics of statistics thinking that, for example, linear regression is done via neural nets, or, as I ran into a couple of weeks ago, that quantile regression is the go-to method for estimating probability distributions (because who knows what a probability distribution is?) Here's a fun link (https://www.reneshbedre.com/blog/pytorch-regression.html) to someone who I suspect knows better. – jbowman Jan 04 '22 at 03:42
  • 1
    @jbowman What does it even mean for linear regression to be done via neural networks? – Dave Jan 04 '22 at 03:47
  • @Dave - follow the link, you'll see :( – jbowman Jan 04 '22 at 03:52
  • In my skim of that link, it seemed like the author thought that neural networks with no hidden layers could do determine nonlinear relationships, which is false unless the programmer specifies something like a spline. However, I do see reasons to show that neural networks are kind of an extension of GLMs, even if the implementation in a deep learning library is massive overkill when all we want to do is a linear regression with the usual $\hat\beta=(X^TX)^{-1}X^Ty$. – Dave Jan 04 '22 at 03:58

1 Answers1

2

Yes, but be aware regression is also a machine learning method. What you're looking to learn more about are loss functions, and you want to use a quantile loss function. You can usually specify your own loss function. In sklearn for example, you might have a loss parameter for your estimator, such as in sgdregressor.

See here for more info Quantile regression: Loss function

yes
  • 36
  • 2