Why do ridges in parameter space make things hard to estimate?

Question

In this answer https://stats.stackexchange.com/a/279111/151862 Glen_b says

The reason is that the d.f. parameter [of a t distribution] is very hard to estimate well from data, particularly if you're also estimating the scale parameter. Indeed you can often end up with either silly estimates or unstable estimates (e.g. from a ridge in parameter space)

I'm assuming he is talking about the geometry of the likelihood as a function of the parameters. Can someone explain why exactly a ridge makes things harder to estimate?

Have you seen https://stats.stackexchange.com/questions/7308? That concerns precisely this circumstance. The answers illustrate the geometry and connect it with the optimization procedure. — whuber, Jun 29 '17 at 20:10
Thanks! Basically, the optimizer gets inside the ridge, can't climb out and thinks it's the minimum right? — badmax, Jun 29 '17 at 20:32
That's about right. In the case I referenced, there is no "out" to climb towards: it's an infinite valley with infinitely high sides. Think of a real valley whose slope eventually becomes essentially flat: its river stops flowing appreciably and just disappears into a meandering swamp. Numerical searches for minima suffer the same fate: they meander slowly, without being able to tell where "downhill" is, and may never be able to determine the lowest point (which could be miles away within the swamp). A good optimizer will tell you that's what happened (usually in a return code). — whuber, Jun 29 '17 at 20:55
But in that case, if the likelihood is so flat wrt to the parameters, does it really matter? Pick anything from the swamp and the result is roughly the same. — badmax, Jun 29 '17 at 21:15
One major point of the thread I linked to is the results can be entirely different! Even qualitatively the two fits found by `R` and *Mathematica* are obviously different. In statistics, one major purpose of optimization is to estimate parameters: one is not directly interested in what the particular value of the objective function might be. (It does play a role in assessing the uncertainty of the estimates.) — whuber, Jun 29 '17 at 21:24
NB: A "ridge" is an upside-down valley. Just as the optimizer may have trouble navigating the valley because it appears flat everywhere, it will have trouble navigating a ridge and can conclude that points on the ridge are *minima,* even though they are nearly local *maxima.* Thus, settling for almost good enough--in response to your "does it really matter question"--could be a profound mistake. — whuber, Feb 03 '20 at 14:51

Why do ridges in parameter space make things hard to estimate?

0 Answers0