1

Imagine the task where I'm getting some observations of the person moving, and need to determine whether the guy is walking/running/standing. If I want to apply the bayes decision rule:

$p(x|w_j)p(w_j) > p(x|w_k)p(w_k), k = 1,..., C; k \neq j$

In our example we have 3 classes, so $C=3$.

We then try to estimate the $p(x|w_j)$ by finding the parameters of normal distribution (as one of the approaches.)

One thing I don't understand, what would be the probabilities $p(w_j)$? I can't say what is the probability of the person in standing class or walking, I guess all the $p(w_j) = 1/3, j=1,2,3$. Am I right?

maximus
  • 224
  • 3
  • 7

2 Answers2

1

Yes. The prior p(w) is your "domain knowledge" or "belief" that does not depend on any observation. A uniform distribution is the most agnostic approach if there is nothing else you can assume about the environment.

If you are developing some computer vision assistance system for athletes, you might want to give p(running) a higher probability. If you're building something related to security at an airport, p(running) is probably going to be lower.

Best you can do is to learn this distribution from real data. If you have some logs or recordings of measurements in a real environment, just count how many seconds each person is standing/walking/running and use that as an empirical estimate.

If no such data is available and you can't make any educated assumptions, go with the uniform distribution.

Pavel
  • 126
  • 2
  • Note that uniform priors are not so totally uninformative! https://stats.stackexchange.com/a/20535/247274 – Dave Jul 31 '20 at 10:56
1

When there are three competing hypotheses $w_1, w_2, w_3$ that are assumed to be equally likely to be true, then the Bayesian decision rule: decide in favor of the hypothesis which has the maximum a posteriori probability $$p(w_i\mid x) = \frac{p(x\mid w_i)p(w_i)}{p(x)}\tag{1},$$ is the same as the frequentist decision rule: decide in favor of the hypothesis for which $p(x\mid w_i)$ is largest. Note that on the right side of $(1)$, $p(w_i) = \frac 13$ for $i=1,2,3$ while $p(x)$ is the same regardless of whether $i$ equals $1, 2$ or $3$. So, instead of laboriously calculating the right side of $(1)$ and comparing $p(w_1\mid x), p(w_2\mid x), p(w_3\mid x)$ to see which is the largest, the canny Bayesian can just skip all that and just compare $p(x\mid w_1), p(x\mid w_2), p(x\mid w_)3$ and thus arrive at the same decision and by the same calculation as if he were a avowed frequentist.

If the three hypotheses are not assumed to be equally likely to be true, the canny Bayesian can look at $(1)$ and choose to just compute $p(x\mid w_1)p(w_1)$, $p(x\mid w_2)p(w_2)$, and $p(x\mid w_3)p(w_3)$ and choose the largest; no need to compute $p(x)$ and divide each $p(x\mid w_i)p(w_i)$ by $p(x)$ before choosing the largest.

Dilip Sarwate
  • 41,202
  • 4
  • 94
  • 200