Why is MAP and ML widely accepted?

Question

(ML as in Maximum Likelihood and MAP as in Maximum A-posteriori)

I'm going trough a course book on my own, and without really having peers to talk to I'm turning to stack exchange with these rather rudimentary question, I can't tell if I'm over thinking or if I'm missing something rather obvious.

MAP/ML based classification/inference is widely accepted, even due to unrealistic assumptions, why?

So, here I assume the unrealistic assumptions are the fact that we assume that we can model the feature distribution and source distribution – i.e. assume they're random variables, which brings us to why they're widely accepted... We can work with random variables in a statistical framework, meaning it's convenient. Second, this is widely accepted because it works well, which we prove with minimum-error rates etc.

The alternative unrealistic core assumption is IID, but from my understanding MAP/ML doesn't necessarily mean we have to assume IID? Just because it's convenient to add up log likelihoods, doesn't mean we have to... but, is this the actual right answer? We basically always assume IID, so that's the core assumption and not that our spaces are random variables...?

A significant effort is paid to learn distributions for each class, what are two main practical problems? How do we mitigate the problems?

Enough data, because we're trying to model the true distribution with a conditioned distribution on our input data. Calculations, so this is where I think I should be mentioning IID, I mean as a mitigation of the problem of tricky calculations we assume IID.

What do you think? Am I on the right track?

This is very wide, for the Q about IID see https://stats.stackexchange.com/questions/344794/exchangeability-and-iid-random-variables — kjetil b halvorsen, May 28 '21 at 12:35

score 6 · Accepted Answer · answered May 28 '21 at 12:28

"All models are wrong, but some are useful." - G.E.P. Box. If you wanted to have a realistic simulation of the universe, you would need to simulate each of its atoms, so you would need to have a computer larger than the universe itself. Every other simulation would be an approximation, making simplified assumptions. It would not be "correct" but would be doable. The point of statistical models is to approximate the observed phenomenons so that we can draw conclusions from the data, or make predictions. A model that would be too complicated would be impossible to interpret and hard to fit to the data. It is always about how realistic you need it to be and what simplifications are acceptable for the particular problem you are trying to solve.
The big simplification statistics make is treating the unobserved factors as random noise. For example, if you toss a coin the result of a toss is a deterministic process guided by the laws of physics. The problem is that there is a lot of factors that lead to observing a particular result of a coin toss. Even a butterfly flying around might have changed the movement of air influencing the result. Hopefully, often all the unobserved factors taken together can be easily approximated using a probability distribution and we can use this to draw approximate conclusions about such phenomena.
The point of i.i.d. assumption is to be able to treat all the data as being "of the same kind", otherwise you wouldn't be able to have a single model to explain them all, or the model would need to be overly complicated. As a side note, we often do not assume the samples being i.i.d., but only them being exchangable, that's a loser assumption. Again, it is about the "on average" conclusions, they wouldn't make sense for completely unrelated things, or things that heavily influence each other.

Why is MAP and ML widely accepted?

1 Answers1