Questions tagged [lstm]

A Long Short Term Memory (LSTM) is a neural network architecture that contains recurrent NN blocks that can remember a value for an arbitrary length of time.

An LSTM has the following core components (not present in RNNs):

  1. Forget gate-which allows the LSTM to forget its past state or remember some elements of it
  2. Input gate- this gate decides what part of the new input arriving at the current step should be allowed to influence the cell's state
  3. Output gate-this gate determines what part of the cell's output should be allowed to flow out-typically to be consumed as a prediction A "cell" is a word used interchangeably for an individual LSTM.
760 questions
55
votes
4 answers

How does LSTM prevent the vanishing gradient problem?

LSTM was invented specifically to avoid the vanishing gradient problem. It is supposed to do that with the Constant Error Carousel (CEC), which on the diagram below (from Greff et al.) correspond to the loop around cell. (source:…
TheWalkingCube
  • 653
  • 1
  • 6
  • 6
51
votes
6 answers

Understanding LSTM units vs. cells

I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter. From this very…
user124589
42
votes
4 answers

Is LSTM (Long Short-Term Memory) dead?

From my own experience, LSTM has a long training time, and does not improve performance significantly in many real world tasks. To make the question more specific, I want to ask when LSTM will work better than other deep NN (may be with real world…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
40
votes
1 answer

Training loss goes down and up again. What is happening?

My training loss goes down and then up again. It is very weird. The cross-validation loss tracks the training loss. What is going on? I have two stacked LSTMS as follows (on Keras): model = Sequential() model.add(LSTM(512, return_sequences=True,…
patapouf_ai
  • 503
  • 1
  • 5
  • 7
40
votes
4 answers

What are the advantages of stacking multiple LSTMs?

What are the advantages, why would one use multiple LSTMs, stacked one side-by-side, in a deep-network? I am using a LSTM to represent a sequence of inputs as a single input. So once I have that single representation— why would I pass it through…
38
votes
3 answers

Understanding input_shape parameter in LSTM with Keras

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of my data. I have as input a matrix of sequences of…
mazieres
  • 597
  • 1
  • 5
  • 9
37
votes
5 answers

Difference between feedback RNN and LSTM/GRU

I am trying to understand different Recurrent Neural Network (RNN) architectures to be applied to time series data and I am getting a bit confused with the different names that are frequently used when describing RNNs. Is the structure of Long…
Josie
  • 473
  • 1
  • 4
  • 5
31
votes
1 answer

What are attention mechanisms exactly?

Attention mechanisms have been used in various Deep Learning papers in the last few years. Ilya Sutskever, head of research at Open AI, has enthusiastically praised them: https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0 Eugenio…
DeltaIV
  • 15,894
  • 4
  • 62
  • 104
31
votes
5 answers

Why are the weights of RNN/LSTM networks shared across time?

I've recently become interested in LSTMs and I was surprised to learn that the weights are shared across time. I know that if you share the weights across time, then your input time sequences can be a variable length. With shared weights you…
beeCwright
  • 478
  • 1
  • 4
  • 8
28
votes
4 answers

What are "residual connections" in RNNs?

In Google's paper Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, it is stated Our LSTM RNNs have $8$ layers, with residual connections between layers ... What are residual connections? Why…
27
votes
3 answers

Difference between samples, time steps and features in neural network

I am going through the following blog on LSTM neural network: http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ The author reshapes the input vector X as [samples, time steps, features] for…
25
votes
2 answers

What optimization methods work best for LSTMs?

I've been using theano to experiment with LSTMs, and was wondering what optimization methods (SGD, Adagrad, Adadelta, RMSprop, Adam, etc) work best for LSTMs? Are there any research papers on this topic? Also, does the answer depend on the type of…
applecider
  • 1,175
  • 2
  • 11
  • 13
23
votes
1 answer

What is a feasible sequence length for an RNN to model?

I'm looking into using a LSTM (long short-term memory) version of a recurrent neural network (RNN) for modeling timeseries data. As the sequence length of the data increases, the complexity of the network increases. I am therefore curious what…
pir
  • 4,626
  • 10
  • 38
  • 73
22
votes
4 answers

RNN for irregular time intervals?

RNNs are remarkably good for capturing the time-dependence of sequential data. However, what happens when the sequence elements aren't equally spaced in time? E.g., the first input to the LSTM cell happens on Monday, then no data from Tuesday to…
20
votes
1 answer

How to train LSTM model on multiple time series data?

How to train LSTM model on multiple time series data? Use case: I have weekly sales of 20,000 agents for last 5 years. Need to forecast upcoming weekly sales for each agent. Do I need to follow a batch processing technique - take one agent at a…
1
2 3
50 51