Estimating Markov transition probabilities from sequence data

Question

I have a full set of sequences (432 observations to be precise) of 4 states $A-D$: eg

$$Y=\left(\begin{array}{c c c c c c c} A& C& D&D & B & A &C\\ B& A& A&C & A&- &-\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ B& C& A&D & A & B & A\\ \end{array}\right)$$

EDIT: The observation sequences are of unequal lengths! Does this change anything?

Is there a way of calculating the transition matrix $$P_{ij}(Y_{t}=j|Y_{t-1}=i)$$ in Matlab or R or similar? I think the HMM package might help. Any thoughts?

eg: Estimating Markov chain probabilities

You have $4$ states: $S=\{1:=A,2:=B,3:=C,4:=D\}$. Let $n_{ij}$ be the number of times the chain made a transition from state $i$ to state $j$, for $ij,=1,2,3,4$. Compute the $n_{ij}$'s from your sample and estimate the transition matrix $(p_{ij})$ by maximum likelihood using the estimates $\hat{p}_{ij}=n_{ij}/\sum_{j=1}^4 n_{ij}$. — Zen, Sep 11 '12 at 16:29
These notes derive the MLE estimates: http://www.stat.cmu.edu/~cshalizi/462/lectures/06/markov-mle.pdf — Zen, Sep 11 '12 at 16:30
Similar question:http://stats.stackexchange.com/questions/26722/calculate-transition-matrix-markov-in-r — B_Miner, Sep 11 '12 at 20:18
@B_Miner could you write your code in pseudo-code form for me? Or explain it in lay terms... However I see it works in my R console. — HCAI, Sep 11 '12 at 21:41
I have a question: I understand your implementation and it lokks fine to me, but i was wondering why can't i simply use the Matlab hmmestimate function to compute the T matrix? Something like: states=[1,2,3,4] [T,E]= hmmestimate ( x, states); where T is the transition matrix i'm interested in. I'm new to Markov chains and HMM so I'd like to understand the difference between the two implementations (if there is any). — Any, Nov 20 '13 at 11:53
I have a time series of dry, wet and neutral conditions for 4 seasons during 60 years (240 conditions). My data has text file format with 3 columns (year, season and rainfall conditions). How can I do markov chain by R. I want to calculate the transition probability matrix for the concurrency of wet, dry and neutral spells. — Habib, Dec 08 '14 at 12:07
If you have a new question, please ask it by clicking the [Ask Question](http://stats.stackexchange.com/questions/ask) button. Include a link to this question if it helps provide context. — Xi'an, Dec 08 '14 at 12:44

Zen · Accepted Answer · 2019-08-01T16:45:03.300

19

Please, check the comments above. Here is a quick implementation in R.

x <- c(1,2,1,1,3,4,4,1,2,4,1,4,3,4,4,4,3,1,3,2,3,3,3,4,2,2,3)
p <- matrix(nrow = 4, ncol = 4, 0)
for (t in 1:(length(x) - 1)) p[x[t], x[t + 1]] <- p[x[t], x[t + 1]] + 1
for (i in 1:4) p[i, ] <- p[i, ] / sum(p[i, ])

Results:

> p
          [,1]      [,2]      [,3]      [,4]
[1,] 0.1666667 0.3333333 0.3333333 0.1666667
[2,] 0.2000000 0.2000000 0.4000000 0.2000000
[3,] 0.1428571 0.1428571 0.2857143 0.4285714
[4,] 0.2500000 0.1250000 0.2500000 0.3750000

A (probably dumb) implementation in MATLAB (which I have never used, so I don't know if this is going to work. I've just googled "declare vector matrix MATLAB" to get the syntax):

x = [ 1, 2, 1, 1, 3, 4, 4, 1, 2, 4, 1, 4, 3, 4, 4, 4, 3, 1, 3, 2, 3, 3, 3, 4, 2, 2, 3 ]
n = length(x) - 1
p = zeros(4,4)
for t = 1:n
  p(x(t), x(t + 1)) = p(x(t), x(t + 1)) + 1
end
for i = 1:4
  p(i, :) = p(i, :) / sum(p(i, :))
end

edited Aug 01 '19 at 16:45

answered Sep 11 '12 at 17:05

Zen

21,786
3
72
114

Looks great! I'm not sure what the 3rd line does in your code though (mainly because I'm familiar with Matlab). Any chance you could write it in matlab or pseudo-code? I'd be much obliged. – HCAI Sep 11 '12 at 17:22
2

The third line does this: the chain values are $x_1,\dots,x_n$. For $t=1,\dots,n-1$, increment $p_{x_t,x_{t+1}}$. – Zen Sep 11 '12 at 19:32
The fourth line normalizes each line of the matrix $(p_{ij})$. – Zen Sep 11 '12 at 19:34
Bare with my slowness here. I do appreciate the MATLAB code translation although I still can't see what it's attempting to do in your first `for` loop. The 3rd line from the original code is counting the number of times $x$ goes from state $x_i$ to state $x_j$? If you could say it in words I'd appreciate that a lot. Cheers – HCAI Sep 11 '12 at 20:35
I have implemented your R code and it works just as you explained. Which is great! My query is now with regards to the $x$ vector... My observation sequences are of uneven lengths. Eg the matrix I provided above. How would I change your code to handle this? – HCAI Sep 11 '12 at 21:44
A realization of the chain is something like $1, 1, 2, 1, 2, 1, 2, 4, 3, 1$. By inspection, it jumped from $1$ to $1$ one time, from $1$ to $2$ three times, and so on. What exactly you don't understand? – Zen Sep 12 '12 at 00:41
Can you clarify for me that your vector x is the same as concatenating my matrix rows? Ie one long sequence instead of many short ones? – HCAI Sep 15 '12 at 22:41
1

No, $x$ is just one row. Don't concatenate because you will introduce "false" transitions: last state of one line $\to$ first state of the next line. You have to change the code to loop through the lines of your matrix and count the transitions. At the end, normalize each line of the transition matrix. – Zen Sep 15 '12 at 23:57
I'd really appreciate it if you could comment on whether you think it is possible to create a confidence interval for the transition probabilities? – HCAI Jun 30 '13 at 08:38
@Zen Suppose instead of finding state transition matrix for 1 history step I wish to calculate for 2 history step. P(State1|(State1, State2)): Probability of getting state1 given t-1 was state1 and t-2 was state2. How do I calculate these probabilities? – Ankit Aug 30 '13 at 07:09
If it is a Markov chain, then $P(1\mid 1,2)=P(1\mid 2)$. So, there is nothing new. If it is not a Markov chain, keep a record of the $n^3$ triple of three consecutive transitions, simulate the process and compute the fraction of each triple. – Zen Aug 30 '13 at 11:47

score 11 · Answer 2 · answered Oct 23 '13 at 20:06

11

Here is my implementation in R

x <- c(1,2,1,1,3,4,4,1,2,4,1,4,3,4,4,4,3,1,3,2,3,3,3,4,2,2,3)
xChar<-as.character(x)
library(markovchain)
mcX<-markovchainFit(xChar)$estimate
mcX

answered Oct 23 '13 at 20:06

Giorgio Spedicato

3,444
4
29
39

1

[user32041](http://stats.stackexchange.com/users/32041/user32041)'s request (posted as an edit instead of a comment since he/she lacks reputation): How can I coerce the transitionMatrix of the markovchainFit result to a data.frame? – chl Oct 29 '13 at 14:33
You can convert to $data.frame$ using $as(mcX,"data.frame")$ – Giorgio Spedicato Nov 14 '13 at 23:23
@GiorgioSpedicato can you comment on how to deal with sequences of unequal lengths (I cannot concatenate) please in your package? – HCAI Aug 12 '19 at 18:52
@HCAI, please see the current vignette page 35-36 – Giorgio Spedicato Aug 12 '19 at 20:54
@GiorgioSpedicato thank you for the reference https://cran.r-project.org/web/packages/markovchain/vignettes/an_introduction_to_markovchain_package.pdf. I still have n transition matrices, one for each sequence. What I’m after is one general one that takes into account all the sequence observations. Is there something I’m missing? – HCAI Aug 13 '19 at 03:37

score 2 · Answer 3 · answered Jun 01 '17 at 20:26

Here is a way to do it in Matlab:

x = [1,2,1,1,3,4,4,1,2,4,1,4,3,4,4,4,3,1,3,2,3,3,3,4,2,2,3];
counts_mat = full(sparse(x(1:end-1),x(2:end),1));
trans_mat = bsxfun(@rdivide,counts_mat,sum(counts_mat,2))

Acknowledgement owed to SomptingGuy: http://www.eng-tips.com/viewthread.cfm?qid=236532

Estimating Markov transition probabilities from sequence data

3 Answers3