4

I'm reading a paper here that does an analysis that I find odd, and I'm wondering whether it is reasonable to do what they are doing. The paper isn't available online but I think I can describe the important part:

The authors had a bunch of university students taking a science course, and they measured their "efficacy" at two points in time (pre-test before a course, and then post-test after the course).

They hypothesized that the students with the lowest pre-test efficacy would gain the most from pre to post (and that the students with the highest pre-test efficacy would improve the least). So, they divided students into quartiles based on pre-efficacy score. Then they did a two-way within ANOVA. The IVs were "trial" (pre or post) and "quartile" (1, 2, 3, or 4). The DV was efficacy. They are looking for a trial:quartile interaction (hoping that post-hoc tests will show that the lowest quartile will gain tons and the highest quartile will gain little).

I guess what concerns me is:

  1. The "quartile" is not really a within-subjects IV, because for each subject it is constant. I imagine that in a long format, their data would have had two rows for each subject (for the efficacy scores on pre and post), and quartile would have been the same in both.

  2. The students in the top quartile have the least to gain. The lowest quartile could very well gain more than the top quartile could actually gain (ceiling effect on the top quartile).

  3. Also worries me that the quartile and pre-efficacy scores are highly correlated. I don't know whether this is OK, but to me the quartile is calculable from the pre-efficacy, so including both seems suspicious.

  4. Why quartiles? Why not halves or thirds, or somehow just use the pre-efficacy directly instead of breaking it into quartiles? For example, something like (post - pre) ~ pre and then a negative sig effect of pre would show that as pre increases the difference score decreases?

I'm hoping to do an analysis like this, but couldn't get myself to move forward in my current state of unease over the way this analysis was done. I am not a statistician at all so I could be completely wrong here.

Comments are appreciated as to whether the analysis is sound or not.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Dan
  • 125
  • 1
  • 3

2 Answers2

3

Regarding your points:

  1. The problem is not so much that pretest quartile is fixed, but that the data are dependent and that there will be regression to the mean.

  2. Whether there are ceiling effects and floor effects depends on more than just quartile, it also depends on the difficulty of the test. If lots of people are getting 100 then there are clear ceiling effects, but if the high score is below the max, then there aren't. If there are ceiling and floor effects, it might be better to use beta regression, which allows for bounded dependent variables. However, eve if there are no ceiling effects it will still be the case that the students who did best can gain least.

  3. From your description, it doesn't seem like they included both of these as IVs, but if they did, then yes, it would be a big problem

  4. I agree that making quartiles when they could use the score instead is not optimal. Categorizing a continuous variable rarely is. The only reason for doing this that I can see is if they want to do a stratified analysis.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Thanks Peter! For point 3), they really did do that: one of the IVs was "quartile" and the other was "trial". "trial" was pre or post. As Gaël noted, perhaps "quartile" really was between, even though they said it was within. That would make more sense to me, since then the post hocs would be one repeated-measures ANOVA per quartile. – Dan Jun 10 '13 at 14:11
3

Adding to Peter's excellent points:

  1. Are you sure this isn't merely some terminological issue? There are many slightly different approaches to this type of data and a lot of confusion about which is which. For example, in SPSS, an ANOVA with one between-subject and one within-subject factor can be run through the “General Linear Model”/“Repeated Measures” menu item. Some people would call this a “repeated measures” or “within-subject” ANOVA while other sources describe it as a “mixed-model ANOVA”.

    Note that it is easy and regrettably common to apply some inappropriate “canned” statistical procedure on dependent data (e.g. by moving things around in Excel and then running an independent sample T-test or something on data in the long format) but it seems more difficult to mistakenly treat something as a “within-subject” factor, at least with many common statistical packages.

  2. Your situation is different because there is no control group and no randomization but you might find the references given in the answers to Best practice when analysing pre-post treatment-control designs useful to think about this problem.

  3. Not much to add besides the fact that it would seem like the biggest problem in all this.

  4. This is generally not recommended and you can find many articles specifically criticizing this practice. Some references have been provided in Justification for low/high or tertiary splits in ANOVA

Gala
  • 8,323
  • 2
  • 28
  • 42