2

I have two datasets:

x1              y1
0.7439235       0.5188408
1.0410700       0.4130867
0.6749701       0.3671449
0.7434786       0.3496216
0.8445187       0.3421567
1.1150534       0.4264965
0.7862458       0.3854942
1.6617124       0.4192246
1.4345286       0.4677291
1.0828542       0.4579055

x2             y2
1.1216862       0.4885919
1.2602874       0.5155386
1.3590971       0.5243502
0.8830182       0.4038656
1.5884793       0.4334869
0.7780006       0.3950941
0.7921088       0.2812734
0.6734612       0.2870370
0.8862152       0.4248144
1.0093133       0.2963311
1.1850700       0.3981337
1.4932780       0.5235984
1.1655496       0.4318588
1.0776927       0.3863196
1.0995091       0.4906934
1.3563977       0.3884854
0.8559308       0.3610551
1.2442205       0.4170717
0.9974195       0.3739845
0.9797912       0.3932574
1.0747108       0.4180473
0.9032511       0.3197943
0.9720635       0.3356009
1.1681510       0.5092239
1.1422033       0.3033392

I fitted the two datasets to a known equation (below) using nonlinear least squares. $$ y= 1 + x - \bigg(\big(1 + (x^a)\big)^{\frac 1 a}\bigg) $$ Now, I have two curves. How can I test if two curves are statistically different?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Lee SK
  • 23
  • 1
  • 3
  • Maybe a good ol' t-test: check out `t.test` to test difference in your mean response or use the formula option! –  May 11 '18 at 03:44
  • 2
    This question lacks sufficient information to answer it. The parameter `a` would need to have a value from each fit, and there should be some description of the theoretical basis for that equations in the context of the data-gathering of those values. Then there might the possibility of offering sensible suggestions about assessing goodness of fit. – DWin May 11 '18 at 05:04

1 Answers1

1

This is a common question. In general, we would just add an interaction term to the regression model to determine if the lines differ (cf., How to test whether linear models fit separately to two groups are better than a single model applied to both groups?). The extra nuance here is that you have a very specific functional form that you are requiring the fit to take, and it has just one free parameter, $a$. You want to know if the fitted value for $a$ differs between the two sets. To do that, we just add a new parameter, $b$, that can move the value of $a$ up or down, and test $b$ against $0$. Note that you also need to create an indicator variable to identify which set is which.

Here is an example, coded in R, with your dataset:

d = read.table(text="x              y          ind
0.7439235       0.5188408      1
...
1.0828542       0.4579055      1
1.1216862       0.4885919      0
...
1.1422033       0.3033392      0", header=T)
m = nls(y~1 + x - ((1 + (x^(a+b*ind)))^(1/(a+b*ind))), data=d, start=list(a=1, b=0))
summary(m)
# Formula: y ~ 1 + x - ((1 + (x^(a + b * ind)))^(1/(a + b * ind)))
# 
# Parameters:
#   Estimate Std. Error t value Pr(>|t|)    
# a   1.4611     0.0228   64.15   <2e-16 ***
# b   0.0441     0.0464    0.95     0.35    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.0612 on 33 degrees of freedom
# 
# Number of iterations to convergence: 5 
# Achieved convergence tolerance: 1.74e-08

Thus the initial answer is that there is insufficient evidence in your data to see that the optimal value of $a$ differs between the two sets and still hold your long run type I error rate below $.05$.

It's always good to look at your data and your model fit, though.

xs = seq(.6,1.8,by=0.01)
p1 = predict(m, newdata=data.frame(x=xs, ind=1))
p0 = predict(m, newdata=data.frame(x=xs, ind=0))
windows()
  with(d, plot(x, y, col=ind+1, pch=ind+1, xlim=c(.6,1.8)), ylim=c(.25,.55))
  lines(xs, p1, col="red", lty="dashed")
  lines(xs, p0)
  legend("bottomright", legend=c("set 1", "set 2"), col=2:1, lty=2:1, pch=2:1)

enter image description here

This looks potentially a little troubling. The red triangles (set 1) don't seem like they fit the required functional form very well. Whereas the provided curve requires the data to move diagonally up and to the right, if anything, set 1 looks like it may be flat or even move down and to the left. We can take a look at the residuals to make this easier to see (we don't have to worry about the curve):

r1 = with(d[d$ind==1,], y-predict(m, newdata=data.frame(x=x, ind=1)))
r0 = with(d[d$ind==0,], y-predict(m, newdata=data.frame(x=x, ind=0)))
windows()
  with(d[d$ind==1,], plot(x, r1, col=2, pch=2, ylab="residuals",
                          xlim=c(.6,1.8), ylim=c(-.12,.17)))
  with(d[d$ind==0,], points(x, r0))
  abline(h=0, col="lightgray")

enter image description here

Whereas the black circles, set 2, appear to be appropriately scattered around the 0 line, the red triangles are higher on the left side and lower on the right. Something is amiss here.

For what it's worth, the residuals do seem relatively normal, which the test assumes.

windows()
  qqnorm(c(r1,r0), col=rep(2:1, times=c(10,25)), pch=rep(2:1, times=c(10,25)))
  qqline(c(r1,r0), col="lightgray")

enter image description here

In the end, the $a$'s don't differ much when shoehorned into this functional form, but the specified functional form may not be right for the data from set 1.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thank you very much. Your advice helps me a lot!. I was meaning that I am fitting data onto a written equation varying "a". I am sorry for confusing. – Lee SK May 11 '18 at 16:53
  • You're welcome, @LeeSK. What are these data? & what is this equation? The datum in the upper left corner looks like a possible outlier. Otherwise, is it possible there's something else going on & a different form should be used? – gung - Reinstate Monica May 11 '18 at 17:30
  • These datasets are normalized potential evaporation by precipitation (x) and normalized actual evaporation by precipitation; and, the equation used for fitting is Fu's equation for Budyko framework. – Lee SK May 11 '18 at 17:56