Curve fitting in the presence of prior beliefs about the relationship between x and y

Question

In the figure which follows each dot represents a game of a particular sport.

The x-axis represents the home team's margin of victory, and so around the top-right we can see a game where a home team won by almost 150 points. At the bottom left there's a game where a home team lost by about 140 points.

The y-axis represents the home team's pre-game probability of victory, as judged by their bookmaker odds before the game started (or rather by a model applied to the those bookmaker odds).

I'm interested in the question of what is a 'par' victory score, given a team with a particular probability of victory. Ultimately I want to add those to an Elo rating system where teams can be rewarded for achieving a margin of victory above par (based on the respective ratings of the two teams), and punished for achieving a margin of victory below par.

To that end I fitted a logistic curve in SPSS, and it's the black line in the figure y = 1 / ( 1 + 0.825 * 0.981^x ).

The black curve looks vaguely plausible but there are some things that bother me about it.

The black curve doesn't cut through y=0. Indeed, when the home team's probability of winning is 0.55 its margin of victory is 0. Intuitively, it would seem that when a team's probability of winning is estimated at 0.5 its 'par' margin of victory should be 0.
At the far-left and far-right the black curve indicates that (for example) a team with 0.95 probability of victory should be expected to win by about 150 points. Because I have some prior knowledge of this game, I think this is unrealistic and that a team would need more like a .999 probability of victory before a victory of 150 points would be 'par' for them.

In red I've drawn by hand what I regard as a more appropriate curve. I also wondered whether curve fitting was appropriate in the first place, given my goal of generating a par score for a team with a given win probability. I also drew a straight line in green. Again, it was done just from intuition but it seems like it would approximate simple regression line if we used probability to predict margin of victory.

I realize that given I have fairly strong preconceptions about where the curve should be, it might be asked why I don't make my own curve without any consideration of these data points. Indeed, it seems to me that many people doing this sort of modelling do ignore all empirical data and do just that. However, I am hoping to find some compromise between prior beliefs and the data.

My question is what I should do differently to obtain a curve that indicates a reasonable 'par' score for a team with a given probability of victory. As Glen_b notes here, there are many other sorts of curves that 'level off' and I picked logistic purely because it was an option in SPSS. I'm not wedded to doing this in SPSS and could use MATLAB or R instead if it was more appropriate.

score 3 · Accepted Answer · answered Oct 24 '18 at 08:54

Your hand-drawn line appears to be roughly attempting to "go through the middle" of the data, and is thus roughly minimising some aggregate of the orthogonal residuals between the points and the curve. This is similar to formal fitting methods like total least squares. In contrast, the line generated for the logistic curve is based on the more orthodox approach of minimising some aggregate of the vertical error between the points and the curve. If you would like a curve that is more like the one you have drawn then you should use a fitting procedure based on orthogonal residuals. There are some discussions of the distinction between fitting methods using orthogonal residuals and vertical residuals here, here and here.

In answer to your posted question, yes, it is possible to fit a curve that incorporates prior beliefs about the resltionship between the variables. This is generally done by postulating a Bayesian model with a curve described by some parameters, and setting a prior for those parameters. However, I don't think that is what you need here. Your "prior" belief has clearly been formed from seeing the data, and its divergence from the formal fitted model you have used is essentially due to your visual sense that you should be minimising orthogonal residuals rather than vertical residuals. What you need to do here is consider if that is really what you want. If so, then you should move to a fitting method that uses orthogonal residuals. If not, you have learned an important deficiency of your visual intuition!

I now realize I was using the term prior belief in an ambiguous way. I had just meant that the black curve didn't match my expectations, inasmuch as it isn't useful for generating predictions about what a typical margin of victory is for a team with a given probability of victory. I will look into the orthogonal methods you mentioned, but do you think that - given my aims - there is any reason to be fitting a curve at all to this data, as opposed to a straight line? I've edited the original post to impose a straight line over the data. — user1205901 - Reinstate Monica, Oct 24 '18 at 11:09
The data pattern appears curved, so a straight line is unlikely to fit it well. — Ben, Oct 24 '18 at 22:20

Curve fitting in the presence of prior beliefs about the relationship between x and y

1 Answers1