Questions tagged [lstm]

A Long Short Term Memory (LSTM) is a neural network architecture that contains recurrent NN blocks that can remember a value for an arbitrary length of time.

An LSTM has the following core components (not present in RNNs):

Forget gate-which allows the LSTM to forget its past state or remember some elements of it
Input gate- this gate decides what part of the new input arriving at the current step should be allowed to influence the cell's state
Output gate-this gate determines what part of the cell's output should be allowed to flow out-typically to be consumed as a prediction A "cell" is a word used interchangeably for an individual LSTM.

760 questions

votes

4 answers

How does LSTM prevent the vanishing gradient problem?

LSTM was invented specifically to avoid the vanishing gradient problem. It is supposed to do that with the Constant Error Carousel (CEC), which on the diagram below (from Greff et al.) correspond to the loop around cell. (source:…

neural-networks lstm

asked Dec 08 '15 at 09:01

TheWalkingCube

votes

6 answers

Understanding LSTM units vs. cells

I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter. From this very…

neural-networks terminology lstm recurrent-neural-network tensorflow

asked Oct 23 '16 at 23:37

user124589

votes

4 answers

Is LSTM (Long Short-Term Memory) dead?

From my own experience, LSTM has a long training time, and does not improve performance significantly in many real world tasks. To make the question more specific, I want to ask when LSTM will work better than other deep NN (may be with real world…

machine-learning natural-language lstm sequence-analysis

asked Jun 18 '20 at 09:44

Haitao Du

32,885
17
118
213

votes

1 answer

Training loss goes down and up again. What is happening?

My training loss goes down and then up again. It is very weird. The cross-validation loss tracks the training loss. What is going on? I have two stacked LSTMS as follows (on Keras): model = Sequential() model.add(LSTM(512, return_sequences=True,…

machine-learning neural-networks loss-functions lstm

asked Mar 11 '16 at 10:18

patapouf_ai

votes

4 answers

What are the advantages of stacking multiple LSTMs?

What are the advantages, why would one use multiple LSTMs, stacked one side-by-side, in a deep-network? I am using a LSTM to represent a sequence of inputs as a single input. So once I have that single representation— why would I pass it through…

classification neural-networks deep-learning lstm recurrent-neural-network

asked Jul 27 '15 at 01:57

wordSmith

votes

3 answers

Understanding input_shape parameter in LSTM with Keras

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of my data. I have as input a matrix of sequences of…

lstm keras dimensions

asked Apr 19 '17 at 05:52

mazieres

votes

5 answers

Difference between feedback RNN and LSTM/GRU

I am trying to understand different Recurrent Neural Network (RNN) architectures to be applied to time series data and I am getting a bit confused with the different names that are frequently used when describing RNNs. Is the structure of Long…

neural-networks lstm recurrent-neural-network gru

asked Jul 07 '16 at 12:53

Josie

votes

1 answer

What are attention mechanisms exactly?

Attention mechanisms have been used in various Deep Learning papers in the last few years. Ilya Sutskever, head of research at Open AI, has enthusiastically praised them: https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0 Eugenio…

time-series deep-learning lstm recurrent-neural-network attention

asked May 04 '18 at 17:20

DeltaIV

15,894
4
62
104

votes

5 answers

Why are the weights of RNN/LSTM networks shared across time?

I've recently become interested in LSTMs and I was surprised to learn that the weights are shared across time. I know that if you share the weights across time, then your input time sequences can be a variable length. With shared weights you…

machine-learning lstm recurrent-neural-network

asked Jun 30 '16 at 17:27

beeCwright

votes

4 answers

What are "residual connections" in RNNs?

In Google's paper Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, it is stated Our LSTM RNNs have $8$ layers, with residual connections between layers ... What are residual connections? Why…

neural-networks lstm recurrent-neural-network residual-networks machine-translation

asked Jan 01 '18 at 03:03

user82135

votes

3 answers

Difference between samples, time steps and features in neural network

I am going through the following blog on LSTM neural network: http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ The author reshapes the input vector X as [samples, time steps, features] for…

neural-networks lstm recurrent-neural-network tensorflow tensor

asked Feb 28 '17 at 13:25

Vipul Jain

votes

2 answers

What optimization methods work best for LSTMs?

I've been using theano to experiment with LSTMs, and was wondering what optimization methods (SGD, Adagrad, Adadelta, RMSprop, Adam, etc) work best for LSTMs? Are there any research papers on this topic? Also, does the answer depend on the type of…

machine-learning neural-networks optimization lstm

asked Aug 24 '15 at 09:31

applecider

1,175
2
11
13

votes

1 answer

What is a feasible sequence length for an RNN to model?

I'm looking into using a LSTM (long short-term memory) version of a recurrent neural network (RNN) for modeling timeseries data. As the sequence length of the data increases, the complexity of the network increases. I am therefore curious what…

neural-networks deep-learning lstm

asked Jun 26 '15 at 17:04

pir

4,626
10
38
73

votes

4 answers

RNN for irregular time intervals?

RNNs are remarkably good for capturing the time-dependence of sequential data. However, what happens when the sequence elements aren't equally spaced in time? E.g., the first input to the LSTM cell happens on Monday, then no data from Tuesday to…

machine-learning neural-networks lstm recurrent-neural-network unevenly-spaced-time-series

asked Nov 08 '17 at 10:30

Tfovid

votes

1 answer

How to train LSTM model on multiple time series data?

How to train LSTM model on multiple time series data? Use case: I have weekly sales of 20,000 agents for last 5 years. Need to forecast upcoming weekly sales for each agent. Do I need to follow a batch processing technique - take one agent at a…

machine-learning time-series statistical-significance lstm artificial-intelligence

asked Oct 02 '17 at 05:34

Aljo Jose

2 3

…

50 51 Next