Measuring quality of a timeseries model

Question

I want to create a model predicting whether an event will happen (let's say if Player A wins a match of ping-pong) conditional on a state (current score of the match).

I have already developed a Markov chain model (implementing rules of ping-pong, iid assumption about individual points and saying that probability of Player A to win a single point is p), that can be used to calculate $$P[A_{win} | State = s_i].$$

My plan is to experiment a little bit with Machine Learning and try to use decision trees or neural networks to predict the same thing and see if my predictions can get any better. I have a dataset of matches (each one is point-by-point, i.e. a sequence of scores) and their outcomes (if Player A won)

The question is:

How would you measure the quality of such model? Does Average of (average of log-losses over all states of one match) over all matches make any sense? What are other possible metrics?
If I train the model so that it predicts the outcome, will it learn the distribution (will it return probability?)?
Could RNNs be of any use?

Any recommendations/sources to read are welcome!

Thanks :)

score 0 · Accepted Answer · answered May 13 '20 at 08:10

You are looking for proper scoring rules, which do precisely what you want: they map probabilistic predictions and actual outcomes to scores and are optimized if the predictions give the correct probability. (Note that there are different conventions as to whether larger or smaller is better for scoring rules.) Take a look at Wikipedia and at our scoring-rules tag. And yes, log-loss is a proper scoring rule, the so-called "logarithmic score".
That will depend very much on your model, and on the target function it optimizes. You should be good if you use a proper scoring rule as a target function, but don't use accuracy.
They may well be.

Measuring quality of a timeseries model

1 Answers1