Sequential classification, combining predictions

Question

What is the best way to combine outputs from a binary classifier, which outputs probabilities, and is applied to a sequence of non-iid inputs?

Here's a scenario: Say I have a classifier which does an OK, but not great, job of classifying whether or not a cat is in an image. I feed the classifier frames from a video, and get as output a sequence of probabilities, near one if a cat is present, near zero if not.

Each of the inputs is clearly not independent. If a cat is present in one frame, it's most likely it will be present in the next frame as well. Say I have the following sequence of predictions from the classifier (obviously there are more than six frames in one hour of video)

12pm to 1pm: $[0.1, 0.3, 0.6, 0.4, 0.2, 0.1]$
1pm to 2pm: $[0.1, 0.2, 0.45, 0.45, 0.48, 0.2]$
2pm and 3pm: $[0.1, 0.1, 0.2, 0.1, 0.2, 0.1]$

The classifier answers the question, "What is the probability a cat is present in this video frame". But can I use these outputs to answer the following questions?

What is the probability there was a cat in the video between 12 and 1pm? Between 1 and 2pm? Between 2pm and 3pm?
Given say, a day of video, what is the probability that we have seen a cat at least once? Probability we have seen a cat exactly twice?

My first attempts at this problem are to simply threshold the classifier at say, 0.5. In which case, for question 1, we would decide there was a cat between 12 and 1pm, but not between 1 to 3pm, despite the fact that between 1 and 2pm the sum of the probabilities is much higher than between 2 and 3pm.

I could also imagine this as a sequence of Bernoulli trials, where one sample is drawn for each probability output from the classifier. Given a sequence, one could simulate this to answer these questions. Maybe this is unsatisfactory though, because it treats each frame as iid? I think a sequence of high probabilities should provide more evidence for the presence of a cat than the same high probabilities in a random order.

Is there a reason not to train a classifier to learn from a sequence of frames (more specifically to learn from the vector of probabilities of the binary classifier)? If you are using a fixed length of frames as in the examples above, that would be unproblematic. Are there enough frame sequences with and without the cat? — Jacques Wainer, Oct 18 '20 at 17:48
So the problem is contrived, there may be other ways to frame this particular description. — bill_e, Oct 20 '20 at 19:01

Guillem · Answer 1 · 2020-10-14T21:01:23.913

This is an interesting problem. My intuition is that if you wait long enough, we may consider the inputs as independent and from there it is much easier to answer your two questions. For instance, if we can derive a single probability for each hour (or 2/, 3, ... hours) of the day, then we can answer question 2 analytically using a Poisson-Binomial distribution.

Then the question is, how long is long enough? I don't think it is easy to answer this question from the data, but we can elicit your domain expertise: how fast is a cat? If there is a cat, how long does it stay on average? etc.

Once we have a number, let's call it the characteristic time $\tau$, my strategy would be to do combine the output probability by doing a moving average, for example a simple moving average with a window length $\tau$ or an exponential smoothing, with an smoothing factor $\alpha = 1 - \exp(- \Delta T / \tau)$ where $\Delta T$ is the time between frame.

And finally I guess you can represent the probability for a specific time frame of length $\tau$ by the expected value or the midpoint.

I am making a lot of assumptions here, so it would be interesting to investigate how the results are sensitive to these assumptions. For instance, you can assume a prior distribution for $\tau$ and propagate the uncertainty to your final estimates using Monte-Carlo simulations.

Sequential classification, combining predictions

1 Answers1

Linked