0

Assume we have the posterior distribution of this linear regression model $y = w^Tx$, $P(w | D,\theta)$, where $D = \{(x_i,y_i)\}_{i \in \{1,\dots,n\}}, n $ is the number of data instances, $\theta$ is the set contains $w$ and all hyperparmaters. For a testing point $\hat{x}$, we calculated the posterior distribution as following: $$P(w|D,\theta) = \int_\theta P(\hat{x} | \theta) P(\theta|D)d\theta$$. Since we have a continous possible values of $w$, how to "discretize" the values of $w$?

Assume we already did that, and we have several values of $w$, $\{w_1,\dots,w_k\}$, with $P(w = w_j | D,\theta), j \in \{1,\dots,k\}$, then the prediction based on these different values of $w_j$ must be associated with their posterior values. For instance, I'd trust the prediction $y_1$ (i.e., based on $w_1$) more than $y_3$ as $P(w = w_1 | D,\theta) > P(w = w_3 | D,\theta)$. How to reflect such confidence to the predicted $y$?

rando
  • 211
  • 1
  • 8
  • 1
    Please add the [tag:self-study] tag & read its [wiki](https://stats.stackexchange.com/tags/self-study/info). Then tell us what you understand thus far, what you've tried & where you're stuck. We'll provide hints to help you get unstuck. Please make these changes as just posting your homework & hoping someone will do it for you is grounds for closing. – kjetil b halvorsen Sep 28 '21 at 22:51
  • I'm not sure what you mean by discretising $w$. The random variable $w$ has a domain (or event space) that is continuous in most regression analyses. – stephematician Sep 29 '21 at 00:23
  • @kjetilbhalvorsen It is true that it is a self study but this is not a homework. I ask this question to help me better understand what I am self-studying. – rando Sep 29 '21 at 01:05
  • @stephematician Assume $w \in [0,1000]$. It is not practical to search for all $w$'s as it has infinite possibility. One possible way is to sample $V$ values that are representative for the all possible (infinite) values. – rando Sep 29 '21 at 01:07
  • @rando Ok, my understanding is that sampling and discretisation are different concepts. I'm assuming you mean to ask: if you can obtain a finite sample from the posterior of $w$ - how do you describe predictions for some new value of $x$? You also need to know the distribution of your new $x$ for this. – stephematician Sep 29 '21 at 04:23
  • Check https://stats.stackexchange.com/questions/252577/bayes-regression-how-is-it-done-in-comparison-to-standard-regression – Tim Sep 29 '21 at 06:09

1 Answers1

0

A common approach is to to approximate the posterior (as oppose of calculating the full posterior). We usually model the posterior with "easy to work with distribution" for example distribution that can be described by finite number of parameters.

A common solution is to model the posterior as a normal distribution, and thus for each parameter we only need to find $\mu, \sigma$. In this case, the prior is also modeled as the standard normal distribution, and the way to learn the (approximated) posterior is via variational inference.

Once you learn the posterior you can do prediction by computing the integral of the posterior distribution i.e. weighted average according to the posterior weights. Practically, computing the integral is impossible (as there are infinite values) and thus the average is done by sampling the posterior (in this setup sampling each parameter according its normal distribution.

ofer-a
  • 1,008
  • 5
  • 9