Following from a previously unanswered question, regression tasks involving measurements with normally distributed noise apply Gaussian processes. But are there any recommended approaches for regression when the uncertainty is not necessarily Gaussian but may follow an arbitrary distribution function? References to relevant literature and especially numerical packages tackling this problem would be greatly appreciated.
-
1Not sure I understand the question. 1) GP regression and Gaussian noise are separate concepts. One could formulate many regression models (including GP regression) with non-Gaussian noise. Are you asking about GP regression in particular, or regression more broadly? 2) By "arbitrary distribution function" do you mean some chosen family of distributions that's not Gaussian, or that the distribution family itself is unknown? Do you want to assume additive, i.i.d. noise? – user20160 Jan 21 '19 at 18:28
-
@user20160 Does not GPR inherently assume normally distributed noise? For example, to add noise would be equivalent to adding a diagonal matrix to the observations in the covariance matrix. But this is inherently Gaussian noise (albeit heteroscedastic if wanted). How do you propose accounting for non-Gaussian noise in GPR? I am asking about GPR specifically, although thoughts/extensions for general are welcome. By arbitrary distribution function I would like to account for any distribution (e.g. derived from sampling or can be well-defined). – Mathews24 Jan 22 '19 at 13:52
-
The trick is to define a latent function that represents the expected output, given the input. A GP prior is placed on the latent function. The observed outputs are modeled as the output of the latent function, plus non-Gaussian noise. The latent function trick is somewhat analogous to the way GP regression is extended to GP classification. – user20160 Jan 24 '19 at 07:07
1 Answers
There are several ways in modeling a non-Gaussian error distribution in a Bayesian fashion. But ultimately, it's always how you think a priori about the distribution of your dependent variable/errors.
I would recommend you the following: look at kernel density plots of your dependent variable. Is your dependent variable bounded on a specific interval (e.g. only positive values, etc.pp.)? Is it possible to transform your variable taking logarithm or differences (depending whether you have cross-sectional or time series data)? If you can by any chance transform it to a stochastic process similar to a Normal you will make your life easier.
Otherwise, you may want to look into the BMA (Bayesian Model Averaging) literature. They explicitly take model uncertainty into account, by conditioning on the model they use,
\begin{equation} p(\theta| y, M_k) = \frac{p(y|\theta, M_k)p(\theta|M_k)}{p(y|M_k)}, \qquad k = 1,...,K \end{equation}
where the difficulty lies in computing $p(y|M_k)$, the marginal likelihood of the model. Most packages in the realm of BMA are looking at model uncertainty regarding the inclusion/exclusion of exogenous variables, but in principle this can be extended/adapted to specify a set of specific distributions for the errors. Unfortunately, I am not aware of literature specifying different distributional assumptions. A good starting point in the BMA literature may be Hoeting et. al. (1999).
Another approach would be to use mixture models. They specify more than on distribution on the data with weights attached to them. They can be represented in a general way, s. t. $p(y_i) = \sum^K_{k=1} \eta_k p(y_i | \theta_k)$, where
\begin{equation} y_i = \begin{cases} T(\theta_1) \text{ if } S_i = 1\\ ... \\ T(\theta_K) \text{ if } S_i = K \end{cases}, \end{equation}
where $T(\cdot)$ is any distribution with a set of parameters $\theta_k$. For example, if you have a multimodal distribution a mixture model with distinctive means may be appropriate. Also distributions with heavy skewness can be characterized by a mixture of normals with distinctive variances. But you can also specify different distributions for each mixture component. A good reference is the monograph by Frühwirth-Schnatter (2006).
I would strongly recommend you to look at your data and start from there in thinking about the statistical assumptions you want to make.
References:
Frühwirth-Schnatter, Sylvia (2006) Finite Mixture and Markov-Switching Models, New York: Springer-Verlag.
Hoeting, J. A., Madigan, D. Raftery, A. E. and Volinsky, C. T. (1999) Bayesian Model Averaging: A Tutorial, Statistical Science, 14(4): 382-401.

- 368
- 1
- 11
-
1Would you happen to know of computational packages that apply BMA and can use arbitrary user-defined probability distributions? – Mathews24 Jan 19 '19 at 05:36
-
1No, afaik the BMA packages out there are mostly concerned about model selection due to variable selection. – Louki Jan 19 '19 at 22:02
-
1Could you possibly expand upon that and give references to such codes/examples in the answer? – Mathews24 Jan 20 '19 at 19:50
-
1If you're interested in BMA from a variable selection perspective, you should look into the BMS package in R. There's also a repository online and I think MatLab code too: http://bms.zeugner.eu . I think there's another package called "BMA", but I guess it's also just for linear models. – Louki Jan 20 '19 at 20:10
-
I don't understand how this answer addresses the question of non-Gaussian noise in regression. Maybe you could elaborate on the connection? – user20160 Jan 24 '19 at 07:11
-
As far as I understood the question it was referring to uncertainty concerning the distributional assumptions of the error term. BMA is suited to take model uncertainty into account (although it is mostly used in the context of linear processes with Gaussian noise and uncertainty regarding the inclusion/exclusion of variables) and mixture models are suited to classifiy observations (or parameters) in one of the components, which don't have to be necessarily Gaussian. So it's another approach trying to find a statistical way of finding a 'suitable' distributional assumption for your data. – Louki Jan 24 '19 at 12:18