How to find the start values of constants in a nonlinear logistic regression model

Question

These are my data:

USPop
   year population
1  1790   3.929214
2  1800   5.308483
3  1810   7.239881
4  1820   9.638453
5  1830  12.860702
6  1840  17.063353
7  1850  23.191876
8  1860  31.443321
9  1870  38.558371
10 1880  50.189209
11 1890  62.979766
12 1900  76.212168
13 1910  92.228496
14 1920 106.021537
15 1930 123.202624
16 1940 132.164569
17 1950 151.325798
18 1960 179.323175
19 1970 203.302031
20 1980 226.542199
21 1990 248.709873
22 2000 281.421906

and I fit this model to it:

populationmodel = nls(population ~ theta1/(1+exp((theta2-year)/theta3)),
                  start=list(theta1=400,theta2=-49,theta3=.025), data=USPop, trace=TRUE)

but for some reason I keep getting an error when I run it that suggests a problem with how I calculated $\theta_1, \theta_2$, and $\theta_3$. Can anyone explain to me how I would go about calculating the start values for these three variables?

This question is explored in general at http://stats.stackexchange.com/questions/52080 . Using the principles shown there, and given your starting value of $\theta_1$ (which suggests the growth reached its peak at a population of 200, which was around 1970), and assuming about 2.5% maximum growth rate per annum, you should start with $\theta=(400, 1970, 100/2.5)$ instead of $(400, -49, 2.5/100)$. Notably, the values of $\theta_2$ and $\theta_3$ you supply are so far from realistic that the software has little hope of succeeding: it will succumb to severe over- and underflow problems. — whuber, Feb 23 '16 at 23:23
What language are you using? You should try to make your data and model expressed in a more readable form (looks messy in my browser). Also, for a nonlinear model, your initial values for the coefficients isn't often trivial to guess. You might want to consider trying some global optimization approach (like Genetic Algorithms, Particle Swarm, etc) to ballpark some initial location — spektr, Feb 23 '16 at 23:25
I should also add that `nls` (which minimizes the sum of squares of residuals) is entirely the wrong model for fitting these data, although--by luck alone--it can do OK. It will be controlled primarily by recent data. For historical data it will do poorly (on a relative basis). For a quick and dirty but more plausible fit, use `weights=1/population` in the call to `nls`. Actually, `weights=1/population^2` is even more justifiable and will give a decent fit through 1920--and the lack of fit since then may be illuminating. — whuber, Feb 26 '16 at 14:25
In R there are selfstarting models which find the initial estimates (starting values) automatically — kjetil b halvorsen, Sep 07 '17 at 12:14

How to find the start values of constants in a nonlinear logistic regression model

0 Answers0