I try to create a model to predict football (socker) results with a performance variable. It doesn't really matter how this performance is calculated since any performance variable is an adequote algorithm for the superiority for those teams in reality, for example, ManU has always high scores in any performance measurement...
So the result is very disappointing when trying to have a probit model with a performance variable as a predictor. That is, the R2 is below 0.1. Interestingly using odds or the inverse of an odd (german notation is odd of 1.1 implies the inverse of 91% chance of winning) which is an implicit probability does not yield better R2. So are odds just random? Again, this would seem unintuitive since teams like ManU always have low odds and they seem fair. What does this model then predict if R2 is so bad? Also trying to find any structural issues in odds like they are biased for some ranges haven't been found. In general the loss of betting any odd is on average about the margin of the bookie.
So I found this new model for prediction. It is based on a simple performance variable (goal difference). Again, the performance variable does not matter so much for a probit model since the R2s are always horrible. The beauty of this model is that the explanatory variable has only integers and ranges from -32 to +32. Now it calculates how many matches are won on average for every possible outcome (-32 to +32). If you take 1 for a win and 0 otherwise you end up having something like 0.42 matches won on average for a goal difference of 0 or 0.75 for a goal difference of +12. This is quite intuitive. You could say the probability of winning when the goal difference is 0.42 is about 42% for any match. Now, using the goal differance as an explanatory variable and the calculated matches won on average as independent variable in an OLS model you will get an R2 of 0.84. This confused me so much. Because the prediction now seems very good.
Now to sum it up. I don't really understand the R2 in a probit model compared to an OLS model.
So if odds would be a bad predictor