In our machine learning class, we were given an example of a naive bayes classifier. Say, you classify a day as being good/bad depending on 2 conditions (the "X" input) - weather(X1 - hot/cold) and wind(X2 - high/low). Using bayes theorem and the naive assumption- $$P(Y = good | X1X2) = \frac{P(X1X2|Y)*P(Y)}{P(X)} = \frac{P(X1|Y)*P(X2|Y)*P(Y)}{P(X)}$$
Now, assuming X|Y follows a binomial distribution, we're told the prior conjugate is beta. However, isn't this prior distribution on X itself and not on Y? Y here is just a category - how does P(Y) denote a prior distribution of X? I understand this in the coin flip case, where we talk about a parameter $$ \theta = P(head) $$ and how it follows that $$ P(\theta|D) \propto P(D|\theta)*P(\theta)$$
So, we get an overall beta distribution. However, Y is not a parameter of X - how does its distribution allow us to get MAP estimates. I'd be greatfull if someone could please explain this to me?