Guide to self-starter estimators (parameter initialization) for "simple" functions

Question

Background

I have a collection of functions with trainable parameters that I am implementing as Keras model classes, which enables immediate use of a variety of objective functions, optimizers, and training-related methods (e.g. early stopping callback).

These functions take a single variable, output a single variable, and have no more than a dozen parameters. The number of explicitly-written operators ('+', '-', '*', '/', 'exp', 'log', 'arctan') is also around a dozen, although I caution that this measure of model complexity is unreliable (i.e. equivalent expressions could have greater or fewer the number of explicitly-written operators). The point is that these are not the enormously-complex models like those used in deep learning. Let the following exemplify this description.

Example

Verhulst growth model $$P(t) = \frac{K}{1+ \left( \frac{K-P_0}{P_0} \right) \exp \{-rt\}}$$ where $P(t)$ is a population size at time $t$, $K$ is the carrying capacity, $P_0$ is the initial population size, and $r$ is the "unimpeded" exponential rate constant.

Problem Statement

I started off with random initialization of paramters by sampling from either a standard normal distribution or a uniform distribution over $[0,1]$. But I have encountered the following issues for which I have made only partial progress on addressing:

Non-convexity of loss function (often mean-squared error) over the parameters of many of these models, combined with sampling the parameter space near the boundaries of convex regions, has resulted in parameter estimations that simply started in the wrong "valley".
If an initial parameter is quite far away from its optimal value, even within the same convex region, it can take an extremely long time to converge.

I have found studying 3D surface plots and contour plots of the loss function over pairs of parameters useful, along with Hessian-based tests of convexity. For sufficiently small datasets and simple models, I have found it possible to copy-paste the data into tools like Desmos calculator and manually tune parameters, but this does not scale. I have room to grow on this subject, and a source that accelerates my learning could make a tangible difference in my productivity in building the training methods of my models.

Question

Does there exist a guide for designing self-starter estimators of such parametric functions?

R has a concept of `selfStart` functions for this purpose, see for instance https://stackoverflow.com/questions/19013180/self-start-function-in-r and search that site for ideas! — kjetil b halvorsen, Nov 25 '21 at 14:35
@kjetilbhalvorsen The link appears to use linear regression to approximate initial parameters. I am doing something similar with the Verhulst model: $y = mx + b$ is a fitted simple linear model, and I assign $r := \text{sign} (m) \ln |m|$. This approach is much better than initializing by sampling a value from a standard normal, however I have found it often produces $r$ values that are larger in absolute value than desired. — DifferentialPleiometry, Nov 25 '21 at 15:01
If you look at other `selfStart`-functions, you might find other approaches. You don't need a perfect method, only a value sufficiently good that the following iterations converge — kjetil b halvorsen, Nov 25 '21 at 15:04
@kjetilbhalvorsen I agree. A formalization of some desired criterion might include that the estimate (1) sits within an order of magnitude (for a chosen base) of the optimal value and (2) sits within the same convex region as the optimal value. But really, as long as an estimator works well in practice that matters more than adhering to these criteria. — DifferentialPleiometry, Nov 25 '21 at 15:09

Guide to self-starter estimators (parameter initialization) for "simple" functions

Background

Example

Problem Statement

Question

0 Answers0