0

In the frequentist paradigm, regression analysis in its most general form is given by:

$$y_i=E(y_i|X)+\epsilon_i$$

Where $E$ is the conditional expectation on $X$ and $X$ is some set of variables.

As far as I understand, this formula is essentially based on the fact that if we have a relation $$y_i=h(X)+\epsilon$$ then $$E(y_i|X)=\arg \max_h\{(y_i-h(X))^2\}$$

However, I think Bayesians would prefer not to think in terms of expectations, since these are not fundamental, and instead think of: $$P(y|X)$$ Which they would then analyse using Bayes' theorem and other things. It seems to me that the whole concept of an "error" as used in regression is alien to Bayesians, am I right?

So what do Bayesians think about the equation $y_i=E(y_i|X)+\epsilon_i$ ?

user56834
  • 2,157
  • 13
  • 35
  • I'm not knowledgeable on Bayesian statistics (so if anything I wrote is wrong please correct me!), but residuals are important in ordinary linear models as a byproduct of the assumption of Gaussianity, in opposition to the GLM framework, where the distribution of residuals is not elemental to the method. Also, imposing priors on coefficients is the Bayesian equivalent of regularization. And, at last, the inference can be done fully Bayesian as well, instead of caring about point nulls, focusing on meaningful effects instead. – Firebug Oct 25 '17 at 17:44
  • 4
    This question conflates regression with least squares fitting. They are not the same! Regression is vastly more general than OLS and certainly is not confined to one philosophy, one interpretation of probability, one model, one loss function, or one area of statistical practice. – whuber Oct 25 '17 at 17:50
  • 1
    @whuber, I think you mean Least Squares, rather than Ordinary Least Squares? I think OLS is specifically related to linear regression, and the equation I wrote is more general. In any case I suspect you still stand behind your point, so could you say then what you think "regression" means? Because I am under the impression that the equation $y_i=E(y_i|X)+\epsilon_i$ more or less captures exactly what regression is, and I'm also under the impression that Bayesians, instead of calculating a conditional expectation, would want to calculate a posterior distribution. – user56834 Oct 25 '17 at 18:18
  • 3
    I appreciate your generosity in overlooking my failure to distinguish OLS from LS. At https://stats.stackexchange.com/questions/233013/regression-definition/233049#233049 I quote one broad characterization of regression given by Mosteller and Tukey. As a model, we may view "regression" quite generally as estimating the conditional probability distribution of $y$ with respect to $X$. A Bayesian might be willing to assign a personal prior probability to the joint distribution of $(X,y)$ whereas others would either not use a prior or would base it on other information. – whuber Oct 25 '17 at 18:33
  • @whuber, that is clarifying. To connect this to my question: Regression would then be characterized by finding $P(y_i|X)$. However, the equation that I originally posted, which is what I've been taught is the most general form of regression, $y_i=E(y_i|X)+\epsilon_i$, this equation is in fact far less general than estimating $P(y_i|X)$. The reason it is far less general, is that we effectively throw away all information that $X$ can tell us about $y_i$, EXCEPT what the mean of $y_i$ would be. i.e. we're taking the distribution $P(y_i|X)$, and throwing away everything but the expectation. – user56834 Oct 26 '17 at 06:51
  • This last comment is incorrect: in writing $y_i=E(y_i|X)+\epsilon_i$ you specify entirely $y_i$ and hence the law of $y_i$, including the relation between $\epsilon_i$ and $X$ [which should be $X_i$]. – Xi'an Oct 26 '17 at 06:56
  • @Xi'an, firstly, I don't think it should be$X_i$ necessarly, because data from other samples helps you to estimate the relation between $y_i$ and $X_i$. This is at least how I think Bayesians think about this. Also, you don't really explain why you think that the equation specifies $y_i$ entirely, even though I argued that you throw away all information except the mean. – user56834 Oct 26 '17 at 07:00
  • 1
    And to follow on @whuber comments, there is nothing Bayesian or non-Bayesian in the question as stated, since the model is not precisely specified. What is there to estimate there? The posterior distribution is _not_ the law of $y_i$ given $X$, – Xi'an Oct 26 '17 at 07:00
  • 2
    As a generative model, $y_i=E(y_i|X)+\epsilon_i$ means that $y_i$ is produced by the computation of the expectation term, given $X$, and by the realisation of the random variable $\epsilon_i$, whose law must be completely specified, given $X$, to generate this random variable. – Xi'an Oct 26 '17 at 07:06
  • @Xi'an, "The posterior distribution is not the law of $y_i$ given $X$". The posterior distribution of $y_i$ given $X$ is simply $pdf(y_i|X)$ Indeed this is not the same as $y_i=E(y_i|X)+\epsilon_i$. The first one contains more information (maximal information in fact, if no approximations are made), than the latter, simply because the first one determines the latter, but the latter is consistent with many different posterior distributions. But from your second comment I suspect you're saying that in $y_i=E(y_i|X)+\epsilon_i$ the distribution of $\epsilon_i$ is conditional on $X$? – user56834 Oct 26 '17 at 08:07
  • 1
    No, in a Bayesian model, the posterior is not the distribution of the observation. – Xi'an Oct 26 '17 at 08:14

0 Answers0