0

When we use deep neural networks (DNNs) to solve a 1-dimention regression problem, we can approximate data distribution with the output of a DNN like the picture below.
My question is that DNN does not have the assumption of gaussian distribution or any other distribution of itself. It just knows what value to output when it sees an input. So how do you know the probability distribution of the DNN? For example, if someone asks, what is the probability of the point appearing in (5, 0). Can DNN answer this kind of questions?

enter image description here (pic from https://medium.com/@sunnerli/dnn-regression-in-tensorflow-16cc22cdd577)

Lion Lai
  • 115
  • 4

1 Answers1

2

For many regression algorithms, not only neural networks, the model is that the data is distributed by $y \sim \mathcal{N}(f(x;\theta), \sigma^2)$, where $\theta$ are the model parameters and $\sigma^2$ is the variance of the distribution (often a hyperparameter).

Maximizing the log-likelihood of the data with respect to $\theta$ is equivalent to minimizing the mean squared error loss between the $y_i$ and $f(x_i;\theta)$.

Therefore, to compute the probability density of $(5,0)$, you would just find the density of a gaussian with mean $f(5; \theta)$ and a variance of $\sigma^2$, where $f$ is your neural network.

shimao
  • 22,706
  • 2
  • 42
  • 81
  • Thanks for your answer. But I still have two questions. 1: Does apply DNN to a regression problem also must have the assumption that the data has to be gaussion distributed? As I know, we just care about mean square error (MSE) of difference of output value and ground truth value. There is no gaussian distribution involved. 2. How do I find out a DNN's multiple means and variances from its weights and biases only? Is this even possible? – Lion Lai Jan 30 '18 at 04:09
  • 1. Using MSE (L2) loss corresponds to the data being distributed normally. Using L1 loss corresponds to data being distributed according to the laplacian distribution. In general, there is a mapping between loss functions and probability distributions. 2. Not sure what you mean by a DNN's multiple means and variances. – shimao Jan 30 '18 at 04:11
  • 1. Are you refering to regularization? Can you add references of them? 2. After traing a DNN model, all we have are network's weights and biases. How can I calculate the density funtion from these numbers. Thank you. – Lion Lai Jan 30 '18 at 04:18
  • 1. No, I am referring to the prediction loss. L1 and L2 are just mathematical functions which can be applied to either the difference between prediction and ground truth $y-f(x)$, or also to the model parameters $\theta$. In this question only the former is relevant. 2. Given input $x$, you feed it through the network to produce $f(x)$. The density function is a gaussian distribution centered on $f(x)$ with variance $\sigma^2$. – shimao Jan 30 '18 at 04:24
  • Hi, I think this post is related your answer, right? https://stats.stackexchange.com/questions/288451/why-is-mean-squared-error-the-cross-entropy-between-the-empirical-distribution-a/288453 – Lion Lai Jan 31 '18 at 02:54
  • Yes, that's what I meant. – shimao Jan 31 '18 at 02:58
  • Can you provide a concrete example of how to train a DNN outputing probability density function instead of a numeric value on a regression problem? Thanks – Lion Lai Jan 31 '18 at 04:07
  • 1
    If you train an DNN like usual, you can already interpret the ouput as a probability distribution. It is a gaussian distribution. The mean of the gaussian distribution is the output of the network, and the variance of the distribution is a fixed hyperparameter. There is no additional step that you have to do. – shimao Jan 31 '18 at 04:14
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/72484/discussion-between-lion-lai-and-shimao). – Lion Lai Jan 31 '18 at 04:18
  • I think this is only applied to linear distribution. In nonlinear distribution problems, I think we can still use MSE as cost function, right? – Lion Lai Feb 01 '18 at 03:46
  • What is a linear vs nonlinear distribution? I have not heard these terms before. – shimao Feb 01 '18 at 15:29
  • In this http://www.statisticssolutions.com/assumptions-of-linear-regression/, it explains the assumptions of linear regression. My question is, even if the dataset doesn't fit the assumptions, we can still use DNN to calculate the probability of a given point (e.g.:(5,0)? – Lion Lai Feb 02 '18 at 02:26
  • 1
    Yes, it still applies. The only assumption we need to apply MSE loss and obtain our PDF is that the distribution of the target $y$ is a function of the input $x$ plus some gaussian noise. That function $f$ doesn't have to be linear and in this case is modeled by a highly nonlinear neural network. – shimao Feb 02 '18 at 02:30