Sequential classification methods

Question

I am curious to know if there are methods that exists for sequential modeling of binary outputs? Let me give an example to help further clarify the question:

I have a problem where I have binary outcomes and some covariate information that I want to model using some sort of statistical model for binary outputs (think logistic regression type models). My setup is the following, I start with some initial set of data, say of size $n_0$, at time $t=0$ and for each subsequent time point $t=1,...,k$ I run an experiment and then receive an additional output which is binary. Now, in my situation, I get binary outcomes, however, in the beginning I am almost surely guaranteed to have all of my $n_0$ outcomes equal to 0 (in my case say the binary outcome is either 0 or 1). Furthermore as I collect more data, I may not see an outcome of a 1 for some time. So in this situation logistic regression becomes infeasible because I can't fit the model if I have only observed one type outcome.

So now that the problem setup has some context, is there a sequential method that is appropriate for modeling this type of data scheme? Or is there a suggestion of how this kind of data should be handled? Ultimately at each iteration of the algorithm I want to be able to fit a model, then use that model to predict something, and based on that prediction I then get a new data outcome so being able to build a model that predicts well is also tantamount.

Did you consider online learning? Vowpal Wabbit can sequentially pass through your data and learn as you go. — Vladislavs Dovgalecs, Apr 18 '16 at 18:40
It sounds like you want sequential updates with stratified random sampling? I.e., biasing toward the rarer class. — Alex R., Aug 04 '16 at 20:02
I find your problem description still somewhat abstract. Besides online learning, it also sounds like it could relate to (contextual) multi-armed bandit problems. — Sven, Aug 04 '16 at 20:58
First thought off the top of my head: Bayesian logistic regression with observation number as a covariate. Bayes allows you to avoid the identifiability problems when you start with constant data. — Kodiologist, Aug 08 '16 at 01:32
So I guess that you are saying that the distribution of your samples change over time following some pattern? This is the common case of *time series* predictions. — caveman, Aug 08 '16 at 04:36
@sven please elaborate on what else you would like me to include in the problem description. — RustyStatistician, Aug 08 '16 at 15:20
@Kodiologist does bayesian logistic regression work when all of the outputs (responses) are only one value. So think binary outcome but all I have observed is 0 but no 1. — RustyStatistician, Aug 08 '16 at 15:21
@RustyStatistician Yes, that's what I meant by "constant data". — Kodiologist, Aug 08 '16 at 15:24
@RustyStatistician it might be helpful to know what the exact problem is to solve (that is: I am trying to detect spam vs I have a binary output variable). I find such context very helpful to give a better idea what you are looking for and what the limitations are. — Sven, Aug 08 '16 at 18:17

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

Two views ...

Reinforcement Learning

Here is why:

You get new data based on the last prediction / action of the model
You start off with no clue what is leading to the positive outcome, so you have to explore
The available data for the next prediction changes after each iteration, so the data can be treated as state.

So framed as Reinforcement Learning problem you try to learn the state-action-value-function $Q(s,a)$, where

$s$ is the data available at that point in time
$a$ is the potential predictions/actions to make (1 or 0).

I have described an approach to learn Q(s,a) with Value Function Approximation in this answer (How to fit weights into Q-values with linear function approximation), where you can also find helpful references, above all the excellent book Reinforcement Learning: An Introduction by Sutton and Barto.

Some remarks:

$Q(s,a)$ is in general not learned for each point in time separately, but only one model based on a careful formulation of the state
When not a single positive outcome is available yet, one has to explore (i.e. try out actions / predictions) to find positive outcomes. In the long run less and less exploration is needed, so the model can fully exploit the collected information by relying only on predictions. This is unfortunate, but without information, there is little to learn.

Classical View

In my earlier days we tried to predict who is going to respond to mailing campaigns with mails sent out in batches. The recipients of next batch have been selected based on the responders of the previous batches. Overall goal was to get as many responses as possible while minimizing the number of mails sent. We had some success with a combination of broad sampling at the start (so that the model generalizes well) and some fast-learning algorithm.

This view assumes a data set, where only the labels are missing, so no additional rows come in after each iteration. In this case, you may try One Class Classification to improve the model even faster. I have been told that Transductive Learning also helps in this case, but I have not tried it yet.

Sequential classification methods

1 Answers1