5

Is there any easy-to-use software for Tukey median-polishing rows and columns with lots of missing values?

user42520
  • 51
  • 1

1 Answers1

7

Well R has medpolish built in, and it can deal with some level of missingness:

 > a  # some data
          [,1]     [,2]     [,3]     [,4]
 [1,] 32.45884 29.50403 38.54330 30.06207
 [2,] 27.92059 25.00838       NA 13.93309
 [3,] 37.91911 23.98091 36.00139 27.73731
 [4,] 29.20283 29.68059 18.41809 29.92471
 [5,]       NA 30.98312 23.55309 22.63105
 [6,] 24.96472 33.52443 24.85243 37.43364

The medpolish command is simple:

 > medpolish(a,na.rm=TRUE)    # Pretty easy to use
 1 : 86.06071 
 Final: 85.59585 

 Median Polish Results (Dataset: "a")

 Overall: 29.01548 

 Row Effects:
 [1]  2.2356134 -4.0668144  3.4436953 -0.1729532 -5.2644925  0.1729532

 Column Effects:
 [1]  1.2077470  0.4488938 -0.1978902 -1.1544723

 Residuals:
          [,1]     [,2]     [,3]      [,4]
 [1,]  0.00000 -2.19595   7.4901 -0.034543
 [2,]  1.76418 -0.38917       NA -9.861103
 [3,]  4.25219 -8.92715   3.7401 -3.567392
 [4,] -0.84743  0.38917 -10.2265  2.236662
 [5,]       NA  6.78324   0.0000  0.034543
 [6,] -5.43146  3.88711  -4.1381  9.399689

This is not particularly hard to do in a spreadsheet by the way (but note that you would normally iterate it; nevertheless it's quite doable).

However if you have really large amount of missingness, you may not be able to estimate effects for all rows and columns (if one is all-missing for example)

Edit: as whuber notes below, a lot of missingness may result in problems of bias or nonconvergence

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • 2
    upvoted because I didn't know anything about median polishing and your example is clear enough to get at least a superficial idea of it! – Elvis Dec 19 '12 at 21:02
  • 4
    @Elvis Thanks. I tend to think of it as a bit like a two-way main-effects ANOVA model... but for medians rather than means. There's good coverage of it in "*Understanding Robust and Exploratory Data Analysis*", Hoaglin, Mosteller and Tukey (eds); it's also in Mosteller and Tukey's "*Data Analysis and Regression*". Also description and an example [here](http://www.stats.ox.ac.uk/pub/MASS4/VR4stat.pdf) (starts on page 7). – Glen_b Dec 19 '12 at 21:39
  • 3
    (+1) Median polish is used extensively throughout Tukey's *EDA* text. It is easily implemented even in a spreadsheet. With any appreciable amount of missingness it becomes problematic, being potentially biased and often not converging at all. – whuber Dec 20 '12 at 02:25