1

In a regression setting, one wants to identify some model of a process of interest, based on noisy measurements. The model usually goes like this: $$ y_i = f(x_i, \theta_1) + \varepsilon_i.$$ Here, $y_i$ and $x_i$ are measurements, $\varepsilon_i\sim P_\varepsilon(x_i, \theta_2)$ is noise (summarizing all kinds of disturbances), and $\theta_1, \theta_2$ are parameters to be identified by the regression procedure. $P_\varepsilon(x_i, \theta_2)$ is the distribution of the noise, which is sometimes also to be learned from the data and might depend on the value of $x$.

A usual approach in ML appears to be to take

  1. a super complex and over-parameterized model $f(x, \theta_1)$ for the process, e.g., a deep neural network, and
  2. a super simple and under-parameterized model $P_\varepsilon(x, \theta_2)$, e.g., $P_\varepsilon(x, \theta_2) = \mathcal{N}(0, \sigma_\varepsilon^2)$ with only the (constant) variance to be learned.

Now I know that there are many approaches with more complex noise models (Gaussian process regression comes to mind), but the above appears to be a standard approach to me.

Questions:

  • Is my impression of what's currently "usually" done in deep learning wrong? Do people routinely use complex noise models? If so, which / how?
  • If the above depiction is indeed somewhat correct, how is it possible to separate signal and noise in any meaningful way, if the model of the noise is so unrealistic? For instance, it should have an effect on the estimate of $f(x)$ whether measurements in a certain region should be assumed reliable or not (since that scales the influence of the prior over $f(x)$).
jhin
  • 749
  • 4
  • 12
  • Very interesting question! The answer also depends on where "machine learning" begins and ends. Probabilistic approaches (say, some Bayesian non- or semi-parametrics) that don't even mention machine learning can still be considered as refinements of it. As a side comment, the normal distribution is sometimes an amazingly accurate representation of our prior knowledge/uncertainty about the noise, especially if the noise sources are known to be many and different. – pglpm Jul 06 '20 at 11:30
  • @pglpm re normal distribution: I agree that the normality assumption is probably the least restrictive part here, it's more the homoskedasticity and, possibly, the zero-mean assumption that bothers me. Upon further reflection, something like Gaussian process regression is probably reasonably close to what I am looking for. The question then remains whether my depiction of current deep learning approaches is reasonably accurate and if yes, why that works as well as it does. :) – jhin Jul 06 '20 at 11:41
  • I rephrased the question to focus on current deep learning approaches, not ML in general. – jhin Jul 06 '20 at 11:44
  • As far as I understand the zero-mean can simply be achieved by de-trending, that is, redefining the zero of your signal. Maybe the first two chapters of [Bretthorst's book](https://bayes.wustl.edu/glb/book.pdf) can give you some inspiration or insight into a bigger picture? – pglpm Jul 06 '20 at 11:55
  • Modeling the noise becomes more relevant when the data set is small and/or noise is heavy-tailed. In general people use deep learning for large data sets and are more focused on minimizing the error without getting stuck on the error surface. You may have some information on noise modeling on my answer here: https://stats.stackexchange.com/questions/378274/how-to-construct-a-cross-entropy-loss-for-general-regression-targets/445761#445761 – Cagdas Ozgenc Jul 06 '20 at 12:10
  • 1
    @pglpm That's an interesting book and I will have a look into it, thanks! (I only recently discovered Jaynes' book and was very intrigued.) Regarding detrending: sure, you can do that - but in doing so, you're essentially saying "this part belongs to the noise and this part belongs to the signal", i.e., you're reformulating your model implicitly. E.g., if your assumption is that f(x) is very smooth, then you might want to allow for an offset in the noise in some regions, no? – jhin Jul 06 '20 at 12:12
  • @CagdasOzgenc that is a great answer (+1'd) and an interesting approach which I wasn't aware of (I'm not working with neural networks myself, as is probably quite obvious). However, is that what people usually do in the classical, successful applications (face/letter/voice/image recognition)? – jhin Jul 06 '20 at 12:22
  • Absolutely agree with you about mean, noise, and signal. The mean of the noise is important in general; it becomes less so only in special situations, eg when you know a priori that the signal is harmonic or periodic, or similar. – pglpm Jul 06 '20 at 13:49

0 Answers0