I ran a comparison where I estimated the parameters of a model using $S$ different methods several times. More specifically I simulated $M$ datasets $\boldsymbol{y}_1, \dots, \boldsymbol{y}_m$ from a model parametrized by $\boldsymbol{\theta}$, where $\boldsymbol{\theta}$ is $p$-dimensional. Then I have estimated the model parameters by using each dataset and each method. As a results I have $M \times p$ point estimates for each statistical method.
So far I'm presenting the results by using a table where each element is an average squared error (over the $M$ runs):
Model 1 Average Squared Errors
theta_1 theta_2 theta_3 theta_4
Method1 0.1 0.01 0.091 1.11
Method2 0.3 0.003 0.047 0.79
Method3 0.24 0.03 0.042 0.08
Method4 0.65 0.01 0.007 0.4
Best Method1 Method2 Method4 Method3
In statistical methodology literature this is seems to be the standard way of presenting results, but I find it quite hard to interpret. In particular, it is not clear which method is the winner.
Drawing conclusions is even more difficult when you try the methods on several models, i.e. you have a second table:
Model 2 Average Squared Errors
theta_1 theta_2 theta_3 theta_4
Method1 0.5 0.14 0.01 0.3
Method2 0.1 0.03 0.007 0.7
Method3 0.14 0.023 0.1 0.8
Method4 0.05 0.1 0.02 0.4
Best Method4 Method2 Method1 Method1
In my case I am using 5 methods and 6 model, so I can't ask the reader to go through page after page of tables. So if you really wanted to come up with a ranking of the methods, how would you pool the results together to come up with it?
Finally this kind of tables are really boring to read, and there must be more entertaining ways of presenting such results!