Questions tagged [sequential-pattern-mining]

Finding statistically relevant patterns between data examples where the values are delivered in a sequence [Wikipedia]

67 questions
10
votes
3 answers

High autocorrelation when taking the L-th order of difference of a sequence of independent random numbers

To explain this question in more detail, I'll first elaborate my approach: I simulated a sequence of independent random numbers $X = \{x_1,...,x_N\}$. I then take $L$ times the difference; i.e. I create the variables: $dX_{1} =…
10
votes
2 answers

Multivariate time series clustering

I am collecting a group of multivariate time sequences. For example, there are 2000 time series. Each time series is of 12 dimensions. Are there any systematic models/algorithms that can cluster multivariate time series? For instance, I would like…
9
votes
2 answers

Best use of LSTM for within sequence event prediction

Assume the following 1 dimensional sequence: A, B, C, Z, B, B, #, C, C, C, V, $, W, A, % ... Letters A, B, C, .. here represent 'ordinary' events. Symbols #, $, %, ... here represent 'special' events The temporal spacing between all events is…
8
votes
2 answers

Identifying sequential patterns

I am working with sequence data which are long lists of malware win-api calls. I am trying to cast the problem of identifying 'malware behavior' into one of finding sequential patterns. I treat each api call as a single item Itemset. The number of…
chet
  • 285
  • 1
  • 5
6
votes
1 answer

What's the algorithm for finding sequences used by TraMineR?

I'm working an analysis about finding frequent sequences in a event-state dataset using the R package TraMineR (and arulesSequences too). In arulesSequences the algorithm used to find frequent sequences is the cSPADE algorithm. But what is the…
Stefan
  • 63
  • 3
6
votes
2 answers

Machine learning on non-fixed-length sequential data?

I have a problem which I'd like to apply machine learning (supervised classification) to, however, the data is sequential and each row in the data vector has its own length. This implies that the number of features in each row is non-constant (think…
6
votes
3 answers

Cluster Sequences of data with different length

I need to cluster sequences of data that have different length. I am using Matlab and my first question is related to the method. Is KMeans sufficient to achieve this? IN KMeans I have to use the following command to cluster a set of data stored in…
user55534
  • 101
  • 2
  • 5
5
votes
1 answer

Binary Pattern Prediction

I have 400K observations, each one is a growing binary pattern where order matters. The goal is to predict the likelihood of the next character. For example, 10101010101? - as humans we can eyeball and say with high confidence that the next…
Josh
  • 383
  • 2
  • 9
4
votes
1 answer

Identify changepoints in 1/0 sequence

Say I have a sequence that looks like this: 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 In general terms, I am interested in determining when there is an over-abundance of 1's in close proximity. Not necessarily in a row, but within a…
4
votes
3 answers

pattern recognition for sequence data

There are a lot of data sequences, I am trying to find pairs of sequences that are similar with other. Trivially, we can define some distance measure, and compare each pair of sequence in terms of this measure. Or even we can solve this problem in a…
3
votes
1 answer

Sequential classification, combining predictions

What is the best way to combine outputs from a binary classifier, which outputs probabilities, and is applied to a sequence of non-iid inputs? Here's a scenario: Say I have a classifier which does an OK, but not great, job of classifying whether or…
3
votes
0 answers

Which distance metric to use to cluster categorical sequences (clickstreams or clickpaths)?

For my research, I want to cluster website visitors based on their clickstreams to understand different information behavior patterns (i.e., customer/visitor journeys). The data can be characterized as a number of sequences of predefined states…
3
votes
0 answers

Sequential Prediction: Data Modeling and Classical Algorithms

I have data that can be called demographic data. Raw data Person 0001 \begin{array}{|c|c|} \hline Feb\,1981- Apr\,85 & engaged\,\,in\,\,\underline{activity}\,\,\textit{A}\,\,of \,\,\underline{type}\,\,\textbf{square}\,\\ \hline Apr\,1985- July\,86 &…
3
votes
1 answer

Studying fluctuations / trajectories over time - objective methods of grouping?

I'm trying to study fluctuations of a disease's activity over time, for example the f;uctiation in severity of chronic pain (in the absence of obvious triggers). Individuals generally demonstrate one of a number of trajectories; eg. those with…
bobmcpop
  • 1,063
  • 1
  • 14
  • 20
3
votes
1 answer

How to detect variable seasonality pattern

We have predicted and actual (daily) data for past 3 years. We use 90 days of data for prediction. Generally our predictions are very accurate, but we receive unusual traffic for few days/weeks ( like thanksgiving-for few days, Christmas - around 2…
1
2 3 4 5