1

I have this data set of Xs and Ys. I am trying to fit an equation to it using R, of the form: y ~ log(x+constant)

Any ideas how to do it?

Graph of y as function of x

Yoav
  • 141
  • 4
  • Are you doing linear regression? Have you used the `lm' function? – Greenparker Mar 17 '16 at 14:16
  • 1
    Can you be explicit about the equation and coefficients you want to estimate? If you're fitting something like Y = log(b1*X + b0) and the fit is good I'd rearrange to something like exp(Y) = b1*X + b0 and fit linear. If you can't rearrange to linear and/or the difference between minimising errors in Y and errors in exp(Y) matters, you may need to go the non-linear route; `nls()`. – user20637 Mar 17 '16 at 14:43
  • 1
    A special case of this analysis is extensively discussed at http://stats.stackexchange.com/questions/30728. – whuber Mar 17 '16 at 15:37
  • If that's a picture of the data you're trying to fit, your proposed model will not fit it well. Indeed in the top 20% or so it curves the wrong way. – Glen_b Mar 18 '16 at 02:18

1 Answers1

2

If you actually want to specify that $e^y$ is related to $x$ so you want to linearize by taking logs, but add a constant to avoid taking log of 0, don't do it this way. However, what you should do depends on what you need.

Assuming you actually do want what you said instead, there are many ways to achieve something like the form: y ~ log(x+constant) depending on precisely what you mean and how the random variation enter into it.

If you mean that $E(y|x) = \beta_0+\beta_1 \log(x+k)$ for known $k$ and you expect that $\text{Var}(y|x)$ is constant then it would be reasonable to just fit that with regression (i.e. use essentially your formula in lm but with the value of the constant where you wrote constant).

[However, as noted in comments, that won't be a very good fit to the data if it looks like that shown in the plot in your question.]

If the error term enters in some other fashion than suitable for linear regression (e.g. if it enters multiplicatively rather than additively, perhaps) then you'll need something else.

It may be that you instead want a generalized linear model or some nonlinear least squares model, or indeed some other model but there's not really enough to go on to say much more.

For example, if you thought that $E(e^y|x)$ was linear in $x$ (which is not quite what you specified) but the variance was constant on the log scale, you might fit a gamma glm with identity link.

If this doesn't quite solve your problem you will need to give more details.

Glen_b
  • 257,508
  • 32
  • 553
  • 939