2

I am trying to learn about machine learning using Accord.Net.
I created a project that has a number of labeled states that represent screens a user visited in a sequence. I created some unit tests to submit a series of historical sequences and a new observed sequence and calculate the probability that new sequence fits.

My basic case unit tests work fine, but once I expand it to my test data it throws errors. Each historical sequence contains 50-60 items and the observations is a sequence of 26 items. My method calculates 30 distinct symbols (screens) included in the dataset.

However, my final assertion calculated with "actual = _engine.CheckConfidence(history, observed)" throws the error "Index out of Bounds" at the Learn() method and appears to be related to the symbols provided. I must be misunderstanding the usage here, but if there are 30 screens then I should have 30 symbols and 30 potential states correct?

My implementation...

    public int CheckConfidence(int[][] historical, int[] observations)
    {
        IEnumerable<int> symbols = observations.Distinct();
        foreach (int[] array in historical)
        {
            symbols = symbols.Union(array.Distinct()).Distinct();
        }

        double probabality = getLikelihood(historical, observations, symbols.Count(), symbols.Count());

        //caclulate confidence on percent scale
        return (int)(Math.Round(probabality, precision) * 100);
    }

    private double getLikelihood(int[][] historical, int[] observations, int states, int symbols)
    {
        HiddenMarkovModel hmm = new HiddenMarkovModel(states, symbols);
        BaumWelchLearning teacher = new BaumWelchLearning(hmm) { Tolerance = 0.001, };
        teacher.Learn(historical);
        return Math.Exp(hmm.LogLikelihood(observations));
    }

...and my unit tests.

 public void GetProbabilityTest()
    {
        try
        {
            //test perfect case
            int[][] history = new int[][]
            {
                new int[] { 0, 1, 0, 1 },
                new int[] { 0, 1, 0, 1 },
                new int[] { 0, 1, 0, 1 },
                new int[] { 0, 1, 0, 1 },
            };

            int[] observed = new int[] { 0, 1, 0, 1 };
            double actual = _engine.CheckConfidence(history, observed);
            Assert.AreEqual(100, actual); //100%

            //test prefectly WRONG case
            history = new int[][]
            {
                new int[] { 0, 1, 0, 1 },
                new int[] { 0, 1, 0, 1 },
                new int[] { 0, 1, 0, 1 },
                new int[] { 0, 1, 0, 1 },
            };

            observed = new int[] { 2, 2, 2, 2 };
            actual = _engine.CheckConfidence(history, observed);
            Assert.AreEqual(0, actual); //0%

            //do again with real numbers
            history = _engine.GetHistoricalData(_historicalData.ToList());
            observed = _engine.GetNewObservations(_newObservations.ToList());

            IEnumerable<int> symbols = observed.Distinct();
            foreach (int[] array in history)
            {
                symbols = symbols.Union(array.Distinct()).Distinct();
            }
            actual = _engine.CheckConfidence(history, observed);
            Assert.AreEqual(0, actual); //0%
        }
        catch (Exception ex)
        {
            Assert.Fail("Exception caught: " + ex.Message);
        }
    }

StackTrace...

   at Accord.Statistics.Distributions.Univariate.GeneralDiscreteDistribution.Fit(Int32[] observations, Double[] weights, GeneralDiscreteOptions options)
   at Accord.Statistics.Distributions.Univariate.GeneralDiscreteDistribution.Fit(Int32[] observations, Double[] weights)
   at Accord.Statistics.Models.Markov.Learning.BaseBaumWelchLearning`4.Fit(Int32 index, TObservation[] values, Double[] weights)
   at Accord.Statistics.Models.Markov.Learning.BaseBaumWelchLearning`4.UpdateEmissions()
   at Accord.Statistics.Models.Markov.Learning.BaseBaumWelchLearning`4.Learn(TObservation[][] x, Double[] weights)
   at AnomalyDetector.Engines.BehaviorEngine.getLikelihood(Int32[][] historical, Int32[] observations, Int32 states, Int32 symbols) in C:\Git\anomalydetector\AnomolyDetector.Core\Engines\BehaviorEngine.cs:line 134

EDIT: So I have figured out through unit testing that it begins to error out at 20 symbols. I haven't been able to get Accord.Net to build locally yet, but looking at the code I don't see why this would be constrained at 20.

EDIT: Still struggling to figure out if I'm providing the right input or this is an Accord bug.

for (int i = 0; i < observations.Length; i++)
    p[observations[i] - start] += weights[i] * observations.Length;

I got it building locally and the error is in Accord.Statistics\Distributions\Univariate\Discrete\GeneralDiscreteDistribution.Fit(int[] observations, double[] weights, GeneralDiscreteOptions options) observations is length 150 while p is length 29 so when the loop reaches index = 29 it blows up. I was under the impression symbols should be the unique (distinct) symbols in the set e.g. the alphabet. Setting the symbols to 150 actually avoids this error, but then my results are useless as I never seem to get a likelihood result that is positive even when passing in one of the historical arrays as the new observations.

  • I know this has been asked a long time ago, but if by any chance, you still have the code and data example that was causing this issue, please register it to the project's issue tracker so, if this was indeed an issue, it could get fixed in future releases. – Cesar Aug 18 '17 at 20:46
  • I used it for a presentation for a class project so I had to work around it somehow, I still have the code but I have to see if I can reproduce the issue or remember what I did – ShawnDigital Aug 19 '17 at 14:57
  • Sorry, forgot about this, but while I could provide the existing code I updated for my project, I don't have a record of the exact state it was in at this point. – ShawnDigital Oct 03 '17 at 20:49

0 Answers0