Questions tagged [sequence-analysis]

Analysis of a DNA, RNA, or peptide sequence to understand its features, function, structure, or evolution.

In bioinformatics, methodologies used include sequence alignment, searches against biological databases, and others.

The term "sequence analysis" also occurs in chemistry (identifying the order of monomers in a polymer) and marketing (analytical customer relationship management applications, such as NPTB models (Next Product to Buy).

-- http://en.wikipedia.org/wiki/Sequence_analysis

147 questions
106
votes
17 answers

What is the role of the logarithm in Shannon's entropy?

Shannon's entropy is the negative of the sum of the probabilities of each outcome multiplied by the logarithm of probabilities for each outcome. What purpose does the logarithm serve in this equation? An intuitive or visual answer (as opposed to a…
42
votes
4 answers

Is LSTM (Long Short-Term Memory) dead?

From my own experience, LSTM has a long training time, and does not improve performance significantly in many real world tasks. To make the question more specific, I want to ask when LSTM will work better than other deep NN (may be with real world…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
17
votes
4 answers

Framing the negative binomial distribution for DNA sequencing

The negative binomial distribution has become a popular model for count data (specifically the expected number of sequencing reads within a given region of the genome from a given experiment) in bioinformatics. Explanations vary: Some explain it as…
10
votes
2 answers

How to prove cooperation from behavioural sequences

Situation: Two birds (male and female) protect their eggs in nest against an intruder. Each bird can use either attack or threat for protection, and be either present or absent. There is a pattern emerging from data that behaviour may be…
Ladislav Naďo
  • 2,202
  • 4
  • 21
  • 45
9
votes
2 answers

similarity measure between two different ordered sequences

I know we can quantify the similarity between two sequences with the same length and same elements by rank order correlation. But how to measure similarity between two sequences of different length, and only having some elements in common? For…
sgyf
  • 93
  • 1
  • 3
9
votes
4 answers

Sequential pattern mining on single sequence

Can someone give me a hint about a good approach to find a frequent patterns in a single sequence. For example there is the single sequence 3 6 1 2 7 3 8 9 7 2 2 0 2 7 2 8 4 8 9 7 2 4 1 0 3 2 7 2 0 3 8 9 7 2 0 I am looking for a method that can…
MikeHuber
  • 1,119
  • 3
  • 13
  • 23
8
votes
1 answer

When and how to use weights for sequence analysis in social science?

Weighting in sequence analysis So far, I have scarcely found papers that address the issue of weighting for sequence analysis (using for example the optimal matching algorithm). Sequence analysis normally involves several steps: setting or…
8
votes
1 answer

How do I statistically rephrase this question

I am analyzing a dataset containing observations from n number of attempts by players in a game. If I am building a regression model to predict the outcome of each attempt given 1 or more descriptors regarding each players attempt, how do I measure…
8
votes
2 answers

Identifying sequential patterns

I am working with sequence data which are long lists of malware win-api calls. I am trying to cast the problem of identifying 'malware behavior' into one of finding sequential patterns. I treat each api call as a single item Itemset. The number of…
chet
  • 285
  • 1
  • 5
7
votes
0 answers

traditional state-space models and LSTMs

I am trying to understand the nature of LSTMs in relation to intuitions from traditional state-space models (e.g., Kalman filtering). The code below aims to simulate a simple univariate linear state-space + observation model; next, the simulated…
7
votes
1 answer

Optimum number of epochs and neurons for an LSTM network

I wanted to know if there's a way to select an optimum number of epochs and neurons to forecast a certain time series using LSTM, the motive being automation of the forecasting problem, i.e. the algorithm selects the right number of epochs and…
6
votes
2 answers

Other substitution matrices for missing value state in sequence analysis with TraMineR?

We have a question about how to deal with missing values/gaps within sequences. We like to set up our own substitution-cost matrix for the Optimal Matching process. As far as we know, TraMiner allows creating own cost matrices - but only in case…
Oliver
  • 71
  • 2
6
votes
1 answer

predicting tree structure

This topic is actually rather hard to google for as 'tree' has been overloaded in this domain to refer to decision trees. I'd be interested in having a learning algorithm produce code, such as used in Microsoft Power BI's feature to query databases…
6
votes
2 answers

Sums-of-Squares (total, between, within): how to compute them from a Distance Matrix?

I am having trouble understanding the concept of Sum of Squares in the context of distance matrices (Studer et al. 2010). The Sum of Squares I am familiar with is the classical $SS$ from ANOVA, performed on contingency table, such as sex FE…
giac
  • 821
  • 5
  • 20
6
votes
1 answer

Using Keras LSTM RNN for variable length sequence prediction

I have a set of sequences. Each sequence is the form $\{(s_1,l_1),(s_2,l_2) \ldots\}$ where $s_i$'s are real valued numbers and $l_i$s are labels from a fixed alphabet. It is important to note that the sequences may be of different lengths.…
1
2 3
9 10