5

i have multiple sequences for each of two states. I'd like to train a HMM with these to predict the state for unkown sequences.

Here is an example for this problem:

states <- c("good", "bad")
good_obs<- list(
  c("a","b","c")
  ,c("a","b","c","c")
  ,c("a","c","c")
)
bad_obs<- list(
  c("d","b","c")
  ,c("b","c","c","a")
  ,c("c","c","a")
  ,c("c","c","a","a")
)
unknown_obs<- list(
  c("d","b","c")
  ,c("c","a")
  ,c("c","c","c","a")
  ,c("c","a","a")
)

so what would be the way to use hmm <- initHMM(States, Symbols) and baumWelch(hmm, observation)?

carlos
  • 63
  • 1
  • 7
  • Are your observations out of sync? How come are the lenghts of your vectors different? (e.g. `list( c("a","b","c") ,c("a","b","c","c") )` ) – Zhubarb Jul 02 '14 at 17:08
  • It's like different number of actions per timeframe. – carlos Jul 02 '14 at 17:11
  • So have you got 3 or 4 time frames? This is a bit confusing. – Zhubarb Jul 03 '14 at 07:24
  • they could be even longer. maybe its a solution to train two hmms? one for the good sequences and one for the bad ones? – carlos Jul 03 '14 at 08:50
  • 1
    I could probably help if I understood the question better but in its current form I am not sure i follow. In the meantime have a look at this [link](http://stackoverflow.com/questions/17487356/hidden-markov-model-for-multiple-observed-variables), which may be helpful. – Zhubarb Jul 03 '14 at 08:53
  • the goal ist to train a model on some good and bad sequences to classify unknown sequences into bad and good ones. – carlos Jul 03 '14 at 09:13

2 Answers2

1

I don't think you mean what you're saying. I don't think you are trying to predict the "state" of each sequence. A sequence of length, say $N$, will have $N$ states. And these are hidden, so you will only have several different ways of getting probability distributions over them.

Judging by one of the tags, I think you want to use HMMs to "classify" different time series. See this thread. I suspect you will have a possibility of a bad (or good) state at each time point for each sequence. And in addition to that, each sequence, in its entirety, can be thought of as "good" or "bad." I know I'm going out on a limb here, but maybe in trying to abstract away some details of your application you accidentally introduced an equivocation here.

Also I think you don't mean to call your time serieses "obs." If you do it's unclear. Each element of those lists is a time series. Each element/letter of each of those series is an observation.

Otherwise, you do mean what you say and the whole list is one time series. In that case, each observation (of each series) needs to be the same length/dimension. You don't have this, so I'll give you the benefit of the doubt and assume you're just using terminology I am unaccustomed to.

Taylor
  • 18,278
  • 2
  • 31
  • 66
0

A way to reach the goal without the use of a HMM but with markov chain would be the following:

library('markovchain')

trainMc<-function(sequences){
  sequence<-c()
  for (i in 1:length(sequences)){
    sequence<-c(sequence,"START",unlist(sequences[i]),"END")
  }
  mcFit<-markovchainFit(data=sequence)
  Mc<-as(mcFit$estimate, "markovchain")
  return(Mc)
}

sequenceprobability<-function(Mc, unknown_seq, min_prob=0.01){
  unknown_seq<-c("START",unknown_seq,"END")
  for (i in 2:length(unknown_seq)){
    trans_prob<-log(max(transitionProbability(Mc,unknown_seq[i-1], unknown_seq[i]),min_prob, na.rm = T))
    seq_prob<-seq_prob+trans_prob
  }
  return(seq_prob)
}

classify<-function(Mc1, Mc2,sequence){
  if (sequenceprobability(Mc1,sequence)>=sequenceprobability(Mc2,sequence)){
    return(1)
  } else {
    return(2)
  }
}

mybadMC<-trainMc(bad_obs)
mygoodMC<-trainMc(good_obs)
classify(mygoodMC,mybadMC,unlist(good_obs[1]))
classify(mygoodMC,mybadMC,unlist(good_obs[2]))
classify(mygoodMC,mybadMC,unlist(good_obs[3]))
classify(mygoodMC,mybadMC,unlist(bad_obs[1]))
classify(mygoodMC,mybadMC,unlist(bad_obs[2]))
classify(mygoodMC,mybadMC,unlist(bad_obs[3]))
classify(mygoodMC,mybadMC,unlist(unknown_obs[1]))
classify(mygoodMC,mybadMC,unlist(unknown_obs[2]))
classify(mygoodMC,mybadMC,unlist(unknown_obs[3]))

Output would be:

[1] 1
[1] 1
[1] 1
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2
[1] 2

Not really what i am searching, but a way to go.

carlos
  • 63
  • 1
  • 7