0

I am using repeated 10-fold CV to calculate the accuracy of my ordinal regression model. I have 6 predictors, 10 ordered response categories, and a total of 1166 data points.

For the ordinal model, I have defined accuracy as 1 - loss, with loss being a simple linear function of the distance between observed and predicted classes, assuming classes are equidistant.

I chose the number of repetitions for the 10-fold CV by examining the stability of the results as advised here, and decided to use 5 repetitions.

Finally, I am using this measure of accuracy to compare the quality of the predictions given different predictor values. More precisely, I have a range of possible storm wave conditions and associated parameters, 10 levels of increasing damage for ships caught in the storm, and I am looking to find the wave conditions which have more likely triggered the damage. So I run the model for each one and compare the accuracy.

My problem is, for all conditions tested my accuracy is suspiciously high i.e. > 80%. I am not expecting this kind of accuracy given the quality of some of my predictors (for example, the wave velocity has been estimated numerically on a coarse grid with a number of simplifying assumptions).

Also, there is not much variation in accuracy between the model outputs for different wave conditions, maximum +/- 1%. Again I would have expected more than this.

I am wondering if there is something I am missing here, regarding my estimation of accuracy. What could be the cause of this issue?

Neodyme
  • 753
  • 1
  • 7
  • 16
  • Is one of your categories particularly common? – probabilityislogic Mar 05 '14 at 10:14
  • Also, its probably better to use your likelihood function to assess accuracy. I think its something like cross entropy for categorical models. Take "residual" to be $-\log (\hat {p}_{ki}) $ where k is the observed response category for the ith oobservation, and $\hat {p}_{ki} $ is the model probability that the ith observation is in the kth category. – probabilityislogic Mar 05 '14 at 10:21
  • Actually yes, 65% of the data lies in the last category so I guess particularly common... – Neodyme Mar 06 '14 at 03:05
  • Also I found this article http://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/ on cross entropy but could you suggest maybe some more official sources I could check? ie. papers? – Neodyme Mar 06 '14 at 03:23
  • What would be the effect of having a particularly common category on accuracy results? – Neodyme Mar 06 '14 at 06:15
  • 1
    Well, if I just predict the most common category, I'll get 65% accuracy - so in this case 80% is not necessarily a large gain. – probabilityislogic Mar 06 '14 at 11:02
  • 1
    You can check whether this messes up your results (and the classifier) by looking at accuracy for each class (in medical context this is sensitivity). – cbeleites unhappy with SX Mar 06 '14 at 13:15
  • 1
    Another common problem that causes suspiciously goodlooking CV results is a "leak" between training and test data, e.g. pre-processing calculations that are done on the whole data set (centering, scaling, etc.) – cbeleites unhappy with SX Mar 06 '14 at 13:23

0 Answers0