3

I understand why MAP is variant under parameterization mathematically, but I don't really understand it intuitively.

To help me out, my professor gave me an example where reparameterizing MAP "compressed the distribution", resulting in an different maximum. This made sense, but it still didn't give me too much intuition about what's going on.

If I understand correctly, MAP essentially combines a prior and some given information (observed values) to generate a probability distribution for the model parameters, and then returns the maximum. What's essentially confusing me is that "what you know beforehand", and the given information hasn't changed in a reparameterization - no new information has been added or removed from the system, so why is the math suggesting a new model?

In other words, if there's no change in information, and the models are the same too, shouldn't the most likely model be the same too? As I see it, reparameterizing simply assigns each model a new "name"; for instance, you go from identifying models by their likelihoods to identifying them by their their log odds. It's like going from one hot encoding to a regular encoding. Nothing has actually changed, so I wouldn't expect the math to work out any differently.

Farhad
  • 303
  • 2
  • 8

1 Answers1

2

First, this question is much general what you state and could be restated as "why mode of a distribution is not invariant under parameterisation" i.e. $mode(f) \ne \phi^{-1}(mode(g))$ with $\phi(X)=Y$ and $X \sim f$ and $Y \sim g$ and much generally refers to how interpreting a continuous density function.

First this invariance exists in the discrete case. For the continuous case, you must remember that a point-wise definition of the density function as for the discrete case is not directly possible and that a possible proper manner to think about a density $f$ is to consider it as the derivative of its respective cumulative distribution $F$: $$ f(x) = \frac{d}{dx} F(x), $$ or in another way as: $$ \Pr[a\le X\le b] = \int_a^b f(x) \, dx. $$ So $f$ is related to $X$ through a differentiation operator and the fact is that reparametrisation over $X$ does not acts similarly on $f$ because of this operation.

Then what does it means ? Interpret continuous density function point values with caution.

peuhp
  • 4,622
  • 20
  • 38