0

I'm currently having a heated debate with coworkers on whether it's acceptable to use estimates derived directly from the data as starting parameters for modeling.

For example, if I want to fit a Normal distribution on a dataset, my coworker thinks it's acceptable to compute the mean and std from the dataset, and use these values as starting parameters.

I think that regardless of the method used, it's providing the data to the model twice, and thus unreasonable: doesn't that mean the fitting algorithm will start where you would like it to end up, and a good way to ensure it will not explore the parameter space?

Can you point me to a reference for good practice regarding this particular issue?

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
  • 1
    Choosing a starting seed based on reasonable estimates should make your method converge to a final answer *faster*, but you have bigger issues if it results in an entirely *different* final answer. If your method's result is highly dependent on your starting estimate, *any* means of choosing that starting point can have drawbacks, meaning you should be doing many random restarts anyway. – Nuclear Hoagie Apr 06 '20 at 16:31
  • 1
    Can you clarify what you mean by starting parameters? What type of model are you fitting? – RyanFrost Apr 06 '20 at 16:31
  • Let's say I want to use a MLE to fit a Normal distribution to a dataset, I have to provide my algorithm with starting values: a mean and a std as a starting point for the algorithm to get to work. Is it acceptable to use actual estimates from the data, not reasonable guesses, as starting parameters (provided I don't try more than one set of starting values, which is already a mistake, I know) – Dadabazooka Apr 06 '20 at 16:45
  • But you do not need an iterative algorithm to fit a normal distribution ... some more real example, please! – kjetil b halvorsen Apr 06 '20 at 16:48
  • I want to fit a series of Normal distributions with monotonously varying mu and sigma depending on a parameter A. Each variation in mu and sigma with A is defined as a sigmoid function with 5 parameters: maxvalue, minvalue, slope, x50, and A which is my independent variable. The sigmoid of mu and sigma will have their own parameters to be estimated. Does this make sense? Can you explain why the exact model I want to fit matters? I thought this was more of a theoretical issue rather than a model-dependent one. – Dadabazooka Apr 06 '20 at 16:58
  • We're just trying to figure what you're talking about--don't take our comments as an indication the model matters. But generally, why should there be any argument about the *algorithm* used to implement a statistical procedure? Indeed, since your algorithm is intended to produce estimates, how could it possibly be invalid to use estimates to get it started?? – whuber Apr 06 '20 at 17:21
  • Well, it seemed to me that starting values indicate the first values to be used in the model fitting procedure for MLE, and that one criterion to ensure the estimated parameters at the end of the procedure are OK is that a large chunk of parameter space has been explored. I can't see how you would explore the parameter space if you tell your algorithm to start where it's supposed to end up. Furthermore, it seems like I would be inputting the data twice (once as starting values, a second time as the data to be fitted), which didn't seem like good practice. But maybe I'm wrong? – Dadabazooka Apr 06 '20 at 17:27
  • Suppose you were looking for the highest point in the world. You could consult a satellite Digital Elevation Model (DEM), identify Everest as the highest peak, and go there to confirm its altitude. What, then, if somebody were to object, stating that it was cheating to use the DEM; that you are unlikely to have found the highest point because you didn't physically visit most of the globe; and you really do have to painstakingly search most of the points on the earth's surface in order to obtain the answer?? – whuber Apr 07 '20 at 16:27
  • Using the DEM would not be cheating, but only checking out the vicinity of the Everest may lead me to falsely conclude that it is the highest peak based on my prior assumptions because I did not care to look elsewhere, no? – Dadabazooka Apr 08 '20 at 17:54
  • Only if you use a poor algorithm. The approach suggested in this metaphor is completely standard: see the detailed description I posted at https://stats.stackexchange.com/a/160575/919 for an example. – whuber Apr 08 '20 at 21:30
  • Alright! Thank you for the additional resources. – Dadabazooka Apr 09 '20 at 16:03

0 Answers0