I understand why MAP is variant under parameterization mathematically, but I don't really understand it intuitively.
To help me out, my professor gave me an example where reparameterizing MAP "compressed the distribution", resulting in an different maximum. This made sense, but it still didn't give me too much intuition about what's going on.
If I understand correctly, MAP essentially combines a prior and some given information (observed values) to generate a probability distribution for the model parameters, and then returns the maximum. What's essentially confusing me is that "what you know beforehand", and the given information hasn't changed in a reparameterization - no new information has been added or removed from the system, so why is the math suggesting a new model?
In other words, if there's no change in information, and the models are the same too, shouldn't the most likely model be the same too? As I see it, reparameterizing simply assigns each model a new "name"; for instance, you go from identifying models by their likelihoods to identifying them by their their log odds. It's like going from one hot encoding to a regular encoding. Nothing has actually changed, so I wouldn't expect the math to work out any differently.