8

I have got an empirical transitions count matrix Q. I have a theoretical first order Markov chain P. Say N is the number of transitions. I would like to test if Q is compatible with P. Is it correct to find the theoretical count transition matrix (N*P) calculating the chi-square statistics, $\sum_{i,j}^{K} \frac{(Q_{ij}-(N*P_{ij}))^2}{N*P_{ij}}$, and then calculating the p-value of a $\chi^2$ distribution with $K*(K-1)$ degrees of freedom?

GeoMatt22
  • 11,997
  • 2
  • 34
  • 64
Giorgio Spedicato
  • 3,444
  • 4
  • 29
  • 39
  • 2
    I am not very familiar with chi-square tests, but skimming around, it appears to be commonly used for multinomial data (e.g. [here](http://stats.stackexchange.com/q/248558/127790)). I would think each row of $P$ should correspond to a multinomial distribution? So then you might use $n_i$ for row $i$, that is, the number of transitions "from $i$". That is, "$N$" might vary depending on the starting state? – GeoMatt22 Dec 08 '16 at 23:01

1 Answers1

5

Assuming your matrices are something like $$P_{ij}=\Pr[j\mid\!i] \,,\, Q_{ij}=\sum_{t=1}^N\big[x_t=i\,\&\,x_{t+1}=j\,\big]$$ then you could interpret each row $i$ as a multinomial distribution with parameters $$p_i=P_{i,:} \,,\, n_i=\sum_{j=1}^{K}Q_{ij}$$

I am not sure that you can lump all of the rows together, because the "number of trials" will vary between rows.

For example say $K=3$ and your data is $x=[1,1,2,1,2,3,1,2]$. So there are $N=7$ transitions, with $n_1=4$ coming from $x=1$, but $n_2=2$ from $x=2$ and only and $n_3=1$ from $x=3$. So I would think your confidence in $\hat{p}_1$ should generally be higher than your confidence in $\hat{p}_3$.

(In the extreme case, maybe for this example $K$ was actually $4$, but you have no data at all on those transitions, as $n_4=0$. Treating "absence of evidence as evidence of absence" would seem problematic to me here.)

I am not very familiar with chi-squared tests, but this suggests you might want to treat the rows independently (i.e. sum only over $j$, and use $n_i$ rather than $N$). This reasoning does not seem specific to the chi-squared test, so should also apply to any other significance test you might use (e.g. exact multinomial).

The key issue is that the transition probabilities are conditional, so for each matrix-entry only the transitions which satisfy its pre-condition are relevant. Indeed, presumably the transition matrix will satisfy $\sum_jP_{ij}=1$, hence the "empirical transition matrix" should be $\hat{P}_{ij}=Q_{ij}/n_i$.


Update: In response to query by OP, a clarification on the "test parameters".

If there are $K$ states in the Markov chain, i.e. $P\in\mathbb{R}^{K\times{K}}$, then for row $i$, the corresponding multinomial distribution will have probability vector $p_i\in\mathbb{R}^K$ and number of trials $n_i\in\mathbb{N}$, given above.

So there will be $K$ categories, and the probability vector $p_i$ will have $K-1$ degrees of freedom, as $\sum_{j=1}^K(p_i)_j=1$. So for row $i$ the corresponding $\chi^2$ statistic would be $$\chi^2_i=\sum_j\frac{\left(Q_{ij}-n_iP_{ij}\right)^2}{n_iP_{ij}}$$ which will asymptotically follow a chi-squared distributed with $K-1$ degrees of freedom (as stated here and here). See also here for a discussion of when the $\chi^2$ test is appropriate, and alternative tests which may be more appropriate.

It may be possible to do a "lumped test", assuming $\chi^2_P=\sum_i\chi^2_i$ follows a chi-squared distribution with $K(K-1)$ dof's (i.e. summing dofs over rows). However I am not certain if the $\chi^2_i$ can be treated as independent. In any case, the row-wise tests would seem to be more informative, so may be preferable to a lumped test.

GeoMatt22
  • 11,997
  • 2
  • 34
  • 64
  • Clever idea to treat it as a multinomial distribution. The sum of two Chi-squared variables is chi-squared so the test statistics for each row can be computer separately and summed together to yield a new chi-squared test statistic. This will have $N-K$ degrees of freedom – Hugh Dec 09 '16 at 00:00
  • @Hugh I am not familiar enough to evaluate, but this could very well be reasonable. My main point was more that the "row by row" approach seems justifiable, and more informative, than the "lumped" approach. (I guess a secondary point is that all the work on chi-square for multinomials, e.g. asymptotic convergence, could be a good starting point. All I know on these topics I learned just now from skimming CV posts though, so that's about all I can offer!) You might consider posting a short answer addressing the chi-square aspect more directly. – GeoMatt22 Dec 09 '16 at 00:15
  • @GeoMatt22 ... So is it ok the number of degrees of freedom for the Chi- Square test to be equal to $N^2-N$ being N the size of the dtmc? – Giorgio Spedicato Dec 12 '16 at 13:16
  • Giorgio, see my update. – GeoMatt22 Dec 12 '16 at 14:53
  • @Hugh please see my updated answer. Note that [Wikipedia says](https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Other_distributions) "It should be noted that the degrees of freedom are not based on the number of observations". I am not sure if my $K(K-1)$ dof's for a "lumped test" is correct, but also uncertain where your $N-K$ dof's would come from! Any clarification? – GeoMatt22 Dec 12 '16 at 14:57