Comparing difference between two polynomial regression models in R

Question

I've been having some trouble in attempting to compare sets of data. I can't seem to analyse whether two models describe the same set of data, or if they describe different sets.

Here is my a portion of my basic data:

   ZT     WT_PAL Line_37_PAL  WT_PhPRR5 Line_37_PhPRR5    WT_EOBI Line_37_EOBI   WT_EOBII Line_37_EOBII     WT_CM1 Line_37_CM1     WT_ADT
1   0 0.08017366 0.000959987 0.26035363     0.03264146 1.46476869  0.009786237 4.16477772   0.000742414 0.07395887 0.000456353 0.06000000
2   0 0.05930462 0.021197691 0.26147552     0.22926780 1.57837816  0.926847383 1.15031587   0.461807744 0.03682062 0.101097795 0.05322561
3   0 0.14389513 0.756356081 0.63035752     0.72129878 1.76452175  0.640368308 2.42348584   1.364089162 0.12954215 0.892205209 0.13821109
4   4 0.12194367 0.297290671 0.13444482     0.14225469 0.99144104  1.131902963 0.91522009   0.910081812 0.29664680 0.505630813 0.51706760
5   4 0.06025697 0.164053161 0.15448683     0.26627386 1.31917230  1.519721821 0.62925084   2.483566296 0.12296628 0.364813045 0.35061055
6   4 0.20896743 0.249435523 1.23052341     0.61818565 1.77819303  1.284683192 1.41398975   1.523446689 0.30023862 0.282538740 0.56811626
7   8 2.38864472 0.042225180 1.54472331     0.04236890 1.04169534  0.860432687 0.26977645   2.001020769 2.93724542 1.340914776 3.00230489
8   8 2.27484249 0.108464160 1.27963226     0.21218338 0.92997042  0.999347054 0.24756421   0.878011535 2.36280758 0.564269963 2.05923549
9   8 1.72728498 0.284489142 1.17311707     0.63301025 0.73380469  0.863829602 0.20109633   0.831139775 2.37338677 1.046991612 2.24797092
10 12 1.13821434 0.462596491 2.22919520     0.15287139 0.34310114  0.817010999 0.29965738   0.236064056 1.18592546 0.725928756 1.01932917
11 12 1.10145755 0.368458720 2.13568842     0.39531534 0.33147292  1.107039633 0.32343745   0.888220142 0.98362898 0.663785645 0.93808648
12 12 1.91985246 0.219754262 1.44412345     0.66775319 0.22753689  0.513590231 0.07657606   1.100251286 1.75011191 0.251849690 1.61130028
13 16 0.68005324 0.396014538 0.31868826     0.14759449 0.38865638  0.778205100 1.09767555   0.627603654 0.55060102 0.784160371 0.60319061
14 16 0.83616544 0.514261850 0.21921500     0.19384070 0.22801491  1.029590354 0.12193953   0.494258870 0.62367453 0.868126888 0.59068953
15 16 0.59058070 0.758966630 0.56687274     0.80844039 0.12417071  0.698339222 0.12503996   1.321782313 0.50518054 1.127351763 0.90570233
16 20 0.30896858 0.376021422 0.18652112     0.16757942 0.50239187  0.823056297 0.30242397   0.549940528 0.32069459 0.464616256 0.33701357
17 20 0.04854291 0.231663315 0.07268395     0.10814706 0.07590502  0.620767904 0.03008203   0.491554754 0.04180077 0.374756383 0.04942141
18 20 0.81359279 0.833815983 0.58218634     0.32892256 0.35501741  0.381413660 0.34660498   0.558786138 0.43100429 0.645363500 0.99771479

What I would like to do, is to see if the expression profile over time of Line_37_PAL is significantly different to that of WT_PAL

First thing I did was try to fit the model:

fitWT_PAL_1 <- lm(data1$WT_PAL ~ data1$ZT)
fitWT_PAL_2 <- lm(data1$WT_PAL ~ data1$ZT + I(data1$ZT^2))
fitWT_PAL_3 <- lm(data1$WT_PAL ~ data1$ZT + I(data1$ZT^2) + I(data1$ZT^3))
fitWT_PAL_4 <- lm(data1$WT_PAL ~ data1$ZT + I(data1$ZT^2) + I(data1$ZT^3) + I(data1$ZT^4))
fitWT_PAL_5 <- lm(data1$WT_PAL ~ data1$ZT + I(data1$ZT^2) + I(data1$ZT^3) + I(data1$ZT^4) + I(data1$ZT^5))

Which determined that fitWT_PAL_4 fit the data best.

I then did the same for the Line_37_PAL, fit37_PAL_5 proved to be the best fit.

I wanted to see here whether or not the two models adequately described the same data, or if the data they described were different (and that the models were in fact describing different expression profiles).

But when entering the anova I get:

> anova(fit37_PAL_4, fitWT_PAL_1)
Analysis of Variance Table

Response: data1$Line_37_PAL
              Df  Sum Sq  Mean Sq F value Pr(>F)
data1$ZT       1 0.10797 0.107974  1.7944 0.1901
I(data1$ZT^2)  1 0.00342 0.003422  0.0569 0.8131
I(data1$ZT^3)  1 0.12717 0.127171  2.1134 0.1561
I(data1$ZT^4)  1 0.04095 0.040949  0.6805 0.4157
Residuals     31 1.86536 0.060173               
Warning message:
In anova.lmlist(object, ...) :
  models with response ‘"data1$WT_PAL"’ removed because response differs from model 1

I'm assuming this is because my Y-values come from two different sets of data? Please correct me if I'm wrong, and I would be thankful for any advice you might be able to give.

I ran the predicted values of a model against the actual values using t.test(x,y, paired = TRUE), but that only describes the differences in means of the two populations, not the possible differences in expression patterns. Advice on how to proceed?

@Spacedman: I disagree. I don't think variable selection has anything to do with it. I think the OP is correct that the issue is because the response is different between the two models, so `anova()` doesn't know how to compare them properly. — Alex A., Mar 09 '15 at 14:38
Okay, but its about model comparison across models for different data - which is still a stats question! — Spacedman, Mar 09 '15 at 14:42

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

2

Since your two models have different response variables and are hence not nested, they can't be compared using the anova() function. That's why R is giving you that warning message.

See this question on Cross Validated, the Stack Exchange site for statistics, for a discussion on a proper approach to this problem.

As an aside, you may want to consider using the poly() function inside lm() to fit polynomial regression models since this will create orthogonal polynomials automatically. If you know you don't want orthogonal polynomials, you can use poly() with raw=TRUE, which is just shorthand for 1 + x + ... + I(x^n).

edited Apr 13 '17 at 12:44

Community

1

answered Mar 09 '15 at 14:54

Alex A.

322
2
7

1

Please read the cited paper. The AIC was presented by Aikaike NOT as a method for comparing two (or more) models on different data, but for comparing one (or multiple) model to a "TRUE" model. You will find no "cross data" comparisons in that paper, only comparisons of different models on the same data. Your point about using `poly` is a good one but I think the question really needs a more thorough statistical consideration. The suggestion to use raw =TRUE is unfortunate because it fails to warn the naive user (as it appears Andy is) that there are no adverse consequences of that choice. – DWin Mar 09 '15 at 16:11
Well that was not my hope.I was hoping you would correct the mis-attribution of the suggested procedure to Akaike. It is true that some statisticians suggest the method you propose but as far as I can tel Akaike is not one of them and many statisticians see that method as unsupported by theory. You might notice that there were two answers to that question and that the other respondent @BenBolker was less supportive of the suggestion. – DWin Mar 09 '15 at 16:16
@BondedDust: My phrasing was inaccurate. I didn't mean to imply that Akaike himself endorses that method but rather it was suggested on the linked Cross Validated post. – Alex A. Mar 09 '15 at 16:17
@BondedDust: Rolled back post but edited to encourage the OP to read the discussion. Removed anything that could be perceived as a personal endorsement of a particular methodology. I appreciate you bringing this all to my attention. – Alex A. Mar 09 '15 at 16:23
This question has been posted twice now (my mistake). Another user in the other post has suggested that I pool the data between the two lines together, then add a factor variable (WT or Line37 I assume) and attempt to run an anova from there. I'll give that a go and see where that gets me. @Dwin is correct in that I'm only vaguely familiar with the statistics that we're working with. I was under the impression that the AIC was typically used to pick a superior model from multiple that describe a single set of data. Would that work similarly in determining whether two models differ? – Andy Mar 09 '15 at 16:44

Comparing difference between two polynomial regression models in R

1 Answers1