7

I tried to fit a curve to the black points using the following code. Why is the fit so bad? Do I need to fit another type of function?

fit <- nls(grad ~ theta1/(1 + exp(-(theta2 + theta3*x1))), 
           start=list(theta1 = 4, theta2 = 0.09, theta3 = 0.31), trace=TRUE)

p = predict(fit)

plot(x1, grad)
points(x1, p, col = "red")

I want to fit a curve to this data. The red curve is my attempt. It is bad

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user32289
  • 71
  • 1
  • 2
  • 4
    You're fitting a particular kind of sigmoid function to the data. There are many alternative sigmoid curves. The question here seems to be "Is this function an adequate summary of the data, or are your data just especially noisy?" You'd get a better fit if you had a model with more weight in the tails - but that's also forcing the data into an alternative form. Is that suitable? Why or why not? – Sycorax Nov 04 '13 at 15:26
  • 3
    At first glance, the curve seems like it might be the *complementary log log* (see my answer here: [difference between logit and probit models](http://stats.stackexchange.com/questions/20523//30909#30909) for a picture). There are other sigmoid curves that can be used as well; I list some in my answer here: [Is the logit function always the best for regression modeling of binary data?](http://stats.stackexchange.com/questions/48072//48137#48137) – gung - Reinstate Monica Nov 04 '13 at 15:34
  • 1
    What are `x1` and `grad`? Why do you fix the lower limit of `grad` at 0 but estimate the upper limit? Would it make substantive sense to either fix both limits or estimate both limits? – Ray Koopman Nov 04 '13 at 21:38
  • The less parameters, the more constraints. You could try a four-parameter logistic curve. – Stéphane Laurent Sep 26 '14 at 08:18
  • @Ray Why allow the lower y-value to not have a limit of zero? I ask because I have count data (a Gaussian model predicts negative counts - not possible). The upper limit is totally dependent on the model. Hell, the Poisson model works well, but Gamma distribution looks pretty good (even though I'm working on discrete data). Oh, I'm running Generalized Linear Mixed Effects in R. Not looking for specific solutions, just opinions. –  Sep 26 '14 at 08:12

1 Answers1

4

It appears that you have a misspecified functional form in your model. You are fitting a particular type of sigmoid function, but there are lots of types of sigmoid functions besides that one. Sigmoid functions are mostly discussed in the context of link functions for the regression of binary data. You can get some information about the different possibilities that exist from my answer here: Is the logit function always the best for regression modeling of binary data? A particular link / sigmoid function that looks like it might be appropriate is the complementary log log. You can see a picture of how that compares to the logit and probit links (with a little discussion) in my answer here: Difference between logit and probit models. I would suggest trying the cloglog.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650