0

I am fitting a simple regression model on my data and I am using nls() the only problem is it does not provide the coefficient of determination $R^2$ and adjusted $R^2$.

My question is how can I calculate both of $R^2$ and adjusted $R^2$.

The data and codes are

# generate data

beta <- 0.012

n <- 300

Data<- data.frame(y = exp(beta * seq(n)) + rnorm(n), x = seq(n))

# plot data

plot(Data$x, Data$y)

# fit non-linear model

mod <- nls(y ~ exp(a + b * x), data = Data, start = list(a = 0, b = 0))

# add fitted curve

lines(Data$x, predict(mod, list(x = Data$x)))
EdM
  • 57,766
  • 7
  • 66
  • 187
Memo
  • 73
  • 1
  • 2
  • 6
  • 2
    These values aren't reliable for nonlinear fits; see, for example, [this page](http://blog.minitab.com/blog/adventures-in-statistics/why-is-there-no-r-squared-for-nonlinear-regression). Their omission from `nls()` is almost certainly a feature, not a bug. If your predictor variables can be expressed in ways that `lm()` can handle, such as polynomial terms, then use `lm()`. – EdM Oct 13 '15 at 16:49
  • 2
    It never hurts to be explicit about the software you are using (here R), regardless of a tag. It can hurt to assume that everyone can work out what the software you cite is doing (not too difficult in this case, but that is not always true). If you seek specific R code to calculate descriptive measures (against some advice), then the question is off-topic on CV. – Nick Cox Oct 13 '15 at 16:55
  • @ Nick Cox What do you mean? – Memo Oct 13 '15 at 17:03
  • 2
    Which bit don't you understand? Are you asking for code or statistical advice? See details on software-related questions at http://stats.stackexchange.com/help/on-topic – Nick Cox Oct 13 '15 at 17:06
  • @NickCox Which bit don't you understand?, I think the question is clear, read the answer below, you will understand – Memo Oct 13 '15 at 17:20
  • 2
    I am not commenting on your question as such, which I understand rather well, but on your style. There is an assumption that people here all easily read and understand R code; that's just not so. This is not an R forum. I am not against R; it's just not a universal language for statistical science that everyone uses. – Nick Cox Oct 13 '15 at 17:22
  • I just happen to use R all the time. Not all users of this site do. – EdM Oct 13 '15 at 17:31
  • @EdM That's clearly fine and consistent with my point, which is aimed at the OP. – Nick Cox Oct 13 '15 at 17:33

1 Answers1

1

You will probably end up in trouble if you try to use $R^2$ or similar characterizations of a non-linear fit produced by nls(). There are good reasons why there is No R-Squared for Nonlinear Regression; the values for non-linear fits aren't necessarily even limited between 0 and 1. This freely available publication shows how misleading $R^2$ values can be in nonlinear fits.

Your particular example would be handled well by lm() with a logarithmic transformation of $y$, if there were a more typical relation of error magnitudes to scale. Typically in practice in this type of situation the error in $y$ would be proportional to its magnitude rather than, as in your example, independent of magnitude. Then you can have your nonlinear fit accomplished by lm(), check the quality of the fit with standard tools, and get useful values of $R^2$ and so forth if that's what you want.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • thank you EdM for the answer, could you explain more please please about using lm() in this example – Memo Oct 13 '15 at 17:38
  • lm() does not work as the log(y) gives NaNs – Memo Oct 13 '15 at 17:51
  • 1
    The problem is not with R's `lm()`: it is with what you are feeding it. Your model simulation evidently generates some negative numbers as the addition of Gaussian (normal) noise can drive the response below zero when the deterministic response is small. Given that, attempts at logarithmic transformation will inevitably fail in those cases. The combination of exponentiation and additive noise perhaps needs to be re-considered. – Nick Cox Oct 13 '15 at 18:59
  • 1
    @Memo : If $y$ values in real data are all positive (or can be gently transformed to such a form, as with $\log(1+y)$) then `lm()` will work without complaining. Even your example data show that `lm()` can work with the log transform; ignore the NaN warnings and examine the output from `lm()` (which ignores those points), as I tried. You will find a pretty reasonable fit, although on the log scale residual error decreases with fitted values. But that is how you set up the sample data, with additive rather than proportional noise added to the underlying exponential. – EdM Oct 13 '15 at 20:56
  • @EdM and I are in almost exact agreement. I regard $\log(1 + y)$ as a gentle transformation if and only if most values are $\gg 1$. Assuming that rather than $\log y$ is however a far from trivial change to model form, however. The big issue remains that $\exp()$ guarantees postive responses, but additive noise subverts that. – Nick Cox Oct 14 '15 at 10:56
  • Thanks Nick Cox and EdM I did lm() and it does after ignoring the NaN warning. Thanks once again – Memo Oct 14 '15 at 14:06
  • @Memo : don't forget to examine the quality of the fit produced by `lm` after transformation when you go on to real data, for example with [`plot.lm()`](http://stats.stackexchange.com/q/58141/28500). The log transformation often works well, but you need to make sure it works in your case. Note that residuals were poorly behaved after log transformation and `lm()` in the example you provided. They would have been OK if you had instead written `y = exp(beta * seq(n) + rnorm(n))`, including the noise within the exponentiation. – EdM Oct 14 '15 at 14:58
  • @EdM really appreciate your help, I got the idea. Thanks – Memo Oct 14 '15 at 19:47