Sequential Prediction: Data Modeling and Classical Algorithms

Question

I have data that can be called demographic data.

Raw data

Person 0001

\begin{array}{|c|c|} \hline Feb\,1981- Apr\,85 & engaged\,\,in\,\,\underline{activity}\,\,\textit{A}\,\,of \,\,\underline{type}\,\,\textbf{square}\,\\ \hline Apr\,1985- July\,86 & engaged\,\,in\,\,\underline{activity}\,\,\textbf{$x_1$}\,\,at \,\,\underline{location}\,\,\textbf{beta}\,\,of \,\,\underline{kind}\,\,\textbf{red}\,\,\\ \hline July\,1986- Nov\,87 & engaged\,\,in\,\,\underline{activity}\,\,\textbf{$x_2$}\,\,at \,\,\underline{location}\,\,\textbf{beta}\,\,of \,\,\underline{kind}\,\,\textbf{red}\,\,\\ \hline Nov\,1987- Apr\,88 & engaged\,\,in\,\,\underline{activity}\,\,\textbf{$x_3$}\,\,at \,\,\underline{location}\,\,\textbf{beta}\,\,of \,\,\underline{kind}\,\,\textbf{red}\,\,\\ \hline Apr\,1988- June\,91 & engaged\,\,in\,\,\underline{activity}\,\,\textbf{$y_1$}\,\,at \,\,\underline{location}\,\,\textbf{gamma}\,\,of \,\,\underline{kind}\,\,\textbf{red}\,\,\\ \hline June\,1991- Sep\,92 & engaged\,\,in\,\,\underline{activity}\,\,\textbf{$y_2$}\,\,at \,\,\underline{location}\,\,\textbf{gamma}\,\,of \,\,\underline{kind}\,\,\textbf{red}\,\,\\ \hline ...\,....- ....\,.... &............\\ \hline Present\,\,time & engaged\,\,in\,\,\underline{activity}\,\,\textbf{$z_1$}\,\,at \,\,\underline{location}\,\,\textbf{kappa}\,\,of \,\,\underline{kind}\,\,\textbf{red}\,\,\\ \hline \end{array}

The data is available for many thousands of persons.
The start date is different from each person.
The first Activity A (which can be series of chronological activities) is essentially different from other activities.
Activity A has bearing on how the subsequent activities change.
Activity A, type and the kind can take dozens of categorical values. Each drawn from its own set of categorical values.
The subsequent activities $x_i, y_i, z_i, …… $ can take thousands of categorical values from the same set.
The data for each person can be assumed to be iid:
- that is the data for the person 1, person 2, person 3 ..... arise from the same random process
While the data of an individual person is interdependent
- The value of activity $x_2$ is dependent on $x_1$ and Activity A which in turn is influenced by the value of activity A and so on.
That is to say that the process is not first order Markovien.

Desired Outcome

While I would like to predict both when the location and the activity change,

Predicting location change is more important at the moment.

Ideally the outcome will be in the form of probabilities:

given that in Sep 92 the activity was $y_2$, what is the probability that it will still be $y_2$ in Oct, Nov, Dec,…….

If the activity changes then can we predict what will it be.

Training

I want to be able to train the data on all the many thousands of persons and be able to make prediction on the new data from a new person.

Solution Proposed

Index the data by time in the following manner:

Let January 1980 $\,\,\textbf{$[m_{1980.1}]$}\,\,$ be the arbitrary starting point for all the data.

\begin{array}{|c|c|c|c|c|c|c|c|c|c|} \hline & .... & m_{1980.2} & m_{1981.3} & ... & m_{1985.4} & m_{1985.5} & ... & m_{1986.7} & .... \\ \hline person\,\,0001 & ... & A\,\,of\,\,type\,\,fast & A\,\,of\,\,type\,\,fast & ... & x_1\,\,at\,\,location\,\,beta\,\,of\,\,kind\,\,red & x_1\,\,at\,\,location\,\,beta\,\,of\,\,kind\,\,red & ... & x_2\,\,at\,\,location\,\,beta\,\,of\,\,kind\,\,red & ... \\ \hline. \end{array}

This will make it a time indexed ordered sequential data.
While the each sequences (the data of a person) can be assumed to be iid
The sub-sequences of a sequence very clearly are not iid.

The problem then becomes one of - training on thousands of sequences - predicting upto next dozen or more sub-sequences of new sequences.

Further comments:

The very first activity A or (series of activities in other cases) is in a way different from subsequent activities $x_i, y_i.....$.
Similarly type, location, kind are different.
At the moment the intention is to model them as a part of the sub-sequence. Is there any different way. For instance since activity A occurs only at the outset maybe we can include it as a different kind of parameter?

Algorithms

In the present modelling of data the algorithm that look most look suitable are the one used in either
- PoS tagging in NLP
- predicting the next word in NLP: what will be the next word given the previous sequence of words.
- Object detection: Where will be the object next move given its history.
Following are the algorithms that I have been able to research. The above application is very novel so I seek help on how to adapt them for my purpose.
- Conditional Random fields: It will permit dependence of sub-sequences within the sequence but I haven't seen any practical implementation in this area.
- n-order Markovian model with n = number of months from the arbitrary starting point. I couldn't find any example.
- Kalman filters: again couldn't find any practical example outside of signal processing.
- Anomaly detection: On most of the month the location remains the same so change can be considered anomalous albeit the system remains in that anomalous state in the future!

Requested help:

Is the modelling of the data most fit for the purpose.
Do the proposed algorithms serve the purpose. In particular:
- Do they suitably deal with the problem of variable length of the sequence.