2

I have a system which its state is described by a vector $v=(a, b, c)$, where $a$, $b$ and $c$ can take any value between $0$ and $100$ and where $a+b+c <= 100$. I have observations of the state of the system for about 10 years (one per year, so 10 observations).

How can I model the states of the system?

Can I accurately model the state of the system using Markov chain? Or shall I use another technique?

The first approach that I used was to cluster the possible states of the system (for example a cluster contains all the observations where: $a<33$ & $b<33$ & $c<33$). Then I fitted a transition matrix and made the predictions (considering a homogeneous transition matrix) This works fine but the predictions are of course not accurate because of the clustering.

Giulia Martini
  • 354
  • 1
  • 15

1 Answers1

1

If $(A+B+C)<=100$ all three are non-negative, then you can pretend there's fourth non-negative variable $D$ whose value you don't include in the output, so that $(A+B+C+D)=100$.

Next, we can divide by 100 to rescale, so now $(a+b+c+d) = 1$, where $a = A/100;b=B/100$ etc.

This gives us something that looks a lot like a Dirichlet distribution with $K=4$.

Now that you can definitely fit; you can throw it into a Gibbs sampler or some variational approach at very least.

If you find a stationary distribution, all that's left is to remember to transform the 'lowercase' probabilities back into 'uppercase' values of the state by multiplying them by 100 again.


Re: comment:

Bayesian updates are asymmetric by design; if you're conditioning on time, it's time-asymmetric.

For a time-homogenous chain, by simple application of Bayes:

$$p(V_{T+dT}|V_T) = p(V_T|V_{T+dT})*p(V_{T+dT})/p(V_T)$$

where $V_T$ is your pick of $(a,b,c,d)$ at a point in time $T$, with $dT$ as a delay.

Until my time machine is fixed, past is independent of the present, so $p(V_t|V_{T+dT}) = 1$. This leaves us with the problem of finding the ratio $p(V_{T+dT})/p(V_T)$, which corresponds to a transition matrix for some time skip between two states.

For $T_0$, you'd use your best guess, e.g. $0.50$ for two variables and $0.25$ for four if you have no good reason to favor the odds for any one of them. Then find a transition matrix $M \sim Dir$ , satisfying $V_{T_1} = V_{T_0} * M_V$, plug it into $V_{T_2} = V_{T_1} * M_V = V_{T_0} * {M_V}^2$, etc.

You'll want to use whatever time step you can get based on your data. E.g. if you have daily aggregate data, this will give you a change over 1 day; however, as per the $T_2$ case above, you can trivially deal with missing datapoints.

The same procedure works for higher-order chains, including chains with conditional dependencies between pairs of nodes, but I'm not going to write it all up here for now.

jkm
  • 1,904
  • 1
  • 8
  • 12
  • Thanks! Just one question, how would this method take into account the time component? Sorry but I am not that familiar with the topic – Giulia Martini Nov 23 '19 at 15:28
  • Edited to explain, too long to fit a comment. – jkm Nov 23 '19 at 18:53
  • -1 There's nothing dynamic about the model presented, except the process of discovery of the distribution FIXED in time. But there is a confusion of state space dimension with the number of states. – Konstantin Nov 24 '19 at 10:46
  • There absolutely is something dynamic. $M$ acts as a derivative of parameters over lag $dT$. Perhaps I should have been clearer: I've described an algorithm to *fit* the model, where the $p(V_{T+dT}|V_T)$ *is a given*, based on your training dataset; the desideratum is the $M$, which you can then apply N times to any starting state to predict its evolution for lag N time. Otherwise, I don't see what you're objecting to, exactly. – jkm Nov 24 '19 at 12:02
  • I would love to be proven wrong. But to me many things seemed to be off in your write up and I even slept on my comment before posting it. Maybe you could add a reference, that inspired you for this approach to the problem, some resource to help me understand? – Konstantin Nov 24 '19 at 16:54
  • @jkm until then one thing that seems off: you say that the transition matrix $M$'s entries sum up to 1, which is too restrictive for markovian models. Before that you start talking about probabilities of states (which is ok) and then switch to fitting a ***linear*** model $V_{T+dT}=MV_T$ (which is hard to connect to the probabilistic reasoning at the start). Again, I will be glad to be proven terribly wrong. – Konstantin Nov 24 '19 at 17:07
  • I cannot prove you wrong or right, because I don't see what point you're trying to make. I've laid out my assumptions: 1) with the addition of a pseudovariable $D$, the sum of the four real-valued variables is a rescaled Dirichlet (same as every Gaussian is a scaled and translated Standard Normal); 2) Bayes rule holds true; 3) the prior of Dirichlet parameters is also Dirichlet; 4) whatever is explicitly stated in the model. – jkm Nov 24 '19 at 17:08
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/101440/discussion-between-konstantin-and-jkm). – Konstantin Nov 24 '19 at 17:13
  • I said nothing about the entries of $M$ summing up to 1. The probabilities of the prior and posterior distributions do, but if $p(A|T_1)=0.3, p(B|T_1)=0.7$ and $p(A|T_2)=0.9 and p(B|T_2)=0.1$ then $M_A=3, M_B=(1/7)$. – jkm Nov 24 '19 at 17:13
  • Wait, there is $M\sim Dir$ in the text there, then it must sum up to 1. – Konstantin Nov 24 '19 at 17:17
  • ...yeah, that's not right, my bad, sorry. It doesn't really affect anything, it's just both wrong and superfluous. However, you've made a good point in the chat re: posterior. – jkm Nov 24 '19 at 18:09