Do I have numerical data or ordinal data?

Question

I have collected data and I am analyzing data from a game with 20 contestants over 20 games.

I am running a regression to try to model the final position of a player in the game.

The position of a player is determined by when he/she gets voted out by the other players.

His position is 20th if he is voted out first and his position will be 1st (Winner) if he is voted out last (or equivalently when the 1st runner up is voted out).

I have first assumed the data to be numerical and hence I have ran a linear regression. But do I really have numerical data or do I have ordinal data with 20 levels?

score 1 · Answer 1 · answered Jun 26 '19 at 13:37

Technically you have ordinal outcome data. Then again, continuous data are also ordinal.

On one hand, as @David says in his answer, sometimes a linear regression will work OK if there are many levels.

On the other hand, sometimes continuous outcomes can be hard to model with regression if there isn't a simple linear relation between the outcome and the predictor values. In that case, a type of ordinal regression called a continuous probability model provides a flexible way to proceed. With that type of ordinal regression approach you don't need to model one intercept for each level of your outcome variable.

So if a linear regression seems to work OK in your case, that will be OK. But ordinal regression can potentially provide more flexibility while maintaining efficiency.

score 0 · Answer 2 · answered Jun 26 '19 at 13:12

As per your explanation the target variable isn't continuous in nature and regression is also not a good fit for it. I would rather treat it as a classification problem with discrete levels for each rank. So, your models output layer should have 20 units followed by a softmax operation to convert it into class/rank probabilities. With this setting you can use cross-entropy loss for optimizing the network.

score 0 · Answer 3 · answered Jun 26 '19 at 13:18

0

Since there are really a lot of levels, it's not a big mistake to consider your data as numerical (we do something similar with age all the time)

Ordinal regression would be a great idea if we had a ton of data, but, since you will have to estimate 20 intercepts rather than one, I don't think it is really worth it.

My approach would be to try fitting the regression model first, then validate it. If it is good enough, there is no reason to overcomplicate things

answered Jun 26 '19 at 13:18

David

2,422
1
4
15

I have fit the regression and looked at the diagnostic plots. 1. QQplot showed that the residuals are close to normal, 2. residual vs fitted plot showed that the linear assumption was not violated and that the expectation is 0, 3. the scale-location plot showed circular clustering from position 6 to 14 - i dont know how to interpret the homo/hetero characteristic here. Similar clustering also exists in residual vs fitted plot. What do you think? – plstellmewhyitisso Jun 26 '19 at 13:24
Can you share those graphs? By what you're telling, it seems that everything is pretty normal – David Jun 26 '19 at 13:25
I cant figure out how to do share it O.O :[ – plstellmewhyitisso Jun 26 '19 at 13:30
OK, so for the homo/heteroskedasticy thing I would be OK if the residuals vs fitted values (maybe try square os residuals vs fitted values) plot looks like noise. This is no guarantee, but dangerous heteroskedasticity is often reflected here. How big are prediction errors? – David Jun 26 '19 at 13:33
They look like noise, but clustered noise and only around position 6 - 14. I can't say for sure, now when i look at it, maybe there is a funnel (?). Min. 1st Qu. Median Mean 3rd Qu. Max. -11.490 -3.269 0.022 0.000 3.160 10.695 What do you think? – plstellmewhyitisso Jun 26 '19 at 13:44
I suspect your model is predicting a lot of values in the 6-14 range and very few outside of it. Am I right? – David Jun 26 '19 at 13:50
yes it seems like it. Why is it so? – plstellmewhyitisso Jun 26 '19 at 13:52
It could be because the effects of the predictors are small and therefore predictions then to go closer to the mean. Is it at least the case that good rankings tend to be associated with good predictions? (Root of Mean-squared error or mean-absolute error may give us a hint there) – David Jun 26 '19 at 13:55
Sorry I don't understand what you mean by your question about good rankings tend to be associated with good predictions. rmse: 4.66, mae: 3.83. How do i tell if good rankings is associated with good predictions? – plstellmewhyitisso Jun 26 '19 at 14:07

Do I have numerical data or ordinal data?

3 Answers3