1

In this answer https://stats.stackexchange.com/a/279111/151862 Glen_b says

The reason is that the d.f. parameter [of a t distribution] is very hard to estimate well from data, particularly if you're also estimating the scale parameter. Indeed you can often end up with either silly estimates or unstable estimates (e.g. from a ridge in parameter space)

I'm assuming he is talking about the geometry of the likelihood as a function of the parameters. Can someone explain why exactly a ridge makes things harder to estimate?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
badmax
  • 1,659
  • 7
  • 19
  • 2
    Have you seen https://stats.stackexchange.com/questions/7308? That concerns precisely this circumstance. The answers illustrate the geometry and connect it with the optimization procedure. – whuber Jun 29 '17 at 20:10
  • Thanks! Basically, the optimizer gets inside the ridge, can't climb out and thinks it's the minimum right? – badmax Jun 29 '17 at 20:32
  • 1
    That's about right. In the case I referenced, there is no "out" to climb towards: it's an infinite valley with infinitely high sides. Think of a real valley whose slope eventually becomes essentially flat: its river stops flowing appreciably and just disappears into a meandering swamp. Numerical searches for minima suffer the same fate: they meander slowly, without being able to tell where "downhill" is, and may never be able to determine the lowest point (which could be miles away within the swamp). A good optimizer will tell you that's what happened (usually in a return code). – whuber Jun 29 '17 at 20:55
  • But in that case, if the likelihood is so flat wrt to the parameters, does it really matter? Pick anything from the swamp and the result is roughly the same. – badmax Jun 29 '17 at 21:15
  • 2
    One major point of the thread I linked to is the results can be entirely different! Even qualitatively the two fits found by `R` and *Mathematica* are obviously different. In statistics, one major purpose of optimization is to estimate parameters: one is not directly interested in what the particular value of the objective function might be. (It does play a role in assessing the uncertainty of the estimates.) – whuber Jun 29 '17 at 21:24
  • 1
    NB: A "ridge" is an upside-down valley. Just as the optimizer may have trouble navigating the valley because it appears flat everywhere, it will have trouble navigating a ridge and can conclude that points on the ridge are *minima,* even though they are nearly local *maxima.* Thus, settling for almost good enough--in response to your "does it really matter question"--could be a profound mistake. – whuber Feb 03 '20 at 14:51

0 Answers0