27

I am going through the following blog on LSTM neural network: http://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/

The author reshapes the input vector X as [samples, time steps, features] for different configuration of LSTMs.

The author writes

Indeed, the sequences of letters are time steps of one feature rather than one time step of separate features. We have given more context to the network, but not more sequence as it expected

What does this mean?

Vipul Jain
  • 373
  • 1
  • 3
  • 6

3 Answers3

24

I found this just below the [samples, time_steps, features] you are concerned with.

X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

Samples - This is the len(dataX), or the amount of data points you have.

Time steps - This is equivalent to the amount of time steps you run your recurrent neural network. If you want your network to have memory of 60 characters, this number should be 60.

Features - this is the amount of features in every time step. If you are processing pictures, this is the amount of pixels. In this case you seem to have 1 feature per time step.

Joonatan Samuel
  • 508
  • 4
  • 9
  • 1
    can you explain the difference between : X = numpy.reshape(dataX, (len(dataX), 3, 1)) and X = numpy.reshape(dataX, (len(dataX), 1, 3)) How does this affect the lstm? – Vipul Jain Feb 28 '17 at 13:39
  • 1
    (len(dataX), 3, 1) runs LSTM for 3 iterations, inputting a input vector of shape (1,). (len(dataX), 1, 3) runs LSTM for 1 iteration. Which means that it is quite useless to even have recurrent connections since there can't be any feedback from previous iterations. In this case input shape to RNN is of shape (3,) – Joonatan Samuel Feb 28 '17 at 14:24
  • _"(len(dataX), 3, 1) runs LSTM for 3 iterations"_ dont we use epoch for that. does it means same as epoch=3? – Vipul Jain Feb 28 '17 at 16:02
  • 1
    One epoch in rough translation means that we have trained once on every data point in our data set. While training, having looked through len(dataX) examples counts as 1 epoch. However, RNNs take data in sequentially. On every training example, you have to feed it data over multiple iterations. E.g, I have a word "car" and on every iteration I feed it one letter, let it complete computation and then feed the next letter. In order to complete processing the word "car" it needs 3 iterations to process the whole word letter by letter. – Joonatan Samuel Feb 28 '17 at 16:25
  • @JoonatanSamuel hi, I know a lot of time has passed but I am having a lot of trouble understanding the same topic. Your answer is very clear, but I am still a bit confused. Immagine we had a time series describing sales at each month for several years (say). Imagine len(data) = 3000, data.shape = (3000,1) for instance, so we have 3000 months recorded. **Predict the next**: If I want to use N previous observation to predict the next one (only the next one!), what should the shape of the input data to LSTM be? For instance if we wanted to use `t-n,..., t-2, t-1` to predict `t`. – Euler_Salter Oct 13 '17 at 09:28
  • I have two ideas but I am not sure. One idea would be to use a function to build a new array `newdata` that has the first `n` columns containing the data but shifted. So the first column would contain `t-n` values, the second `t-n+1` values etc and the last column would contain the value at `t`. Then the shape of `newdata` would be `(3000-n, n+1)` since I need to eliminate some values to make a proper array. So basically I would have `3000-n` samples, `n` features. Each of these columns has a lag of 1 with respect to the next. – Euler_Salter Oct 13 '17 at 09:32
  • So after dividing `newdata` into `xdata` and `ydata` by doing something like `xdata = newdata[:, :-1]` and `ydata = newdata[:, -1]`, I could feed it into LSTM. How should I reshape `xdata` though? My understanding is that I should do something like `xdata = xdata.reshape(xdata.shape[0], timesteps, xdata.shape[1])`. Is this correct? But exactly, in this context what would `timesteps` be? Each column (i.e. each feature) is lagged by 1 month (timestep?). Does it mean I HAVE to write `1` for timesteps? Or can I decide any number in there? What would that change? – Euler_Salter Oct 13 '17 at 09:36
  • As a more concrete example, imagine I have the `xdata` and `ydata` decribed above. Imagine that `xdata` has `3` columns. This means I want to predict the value at `t` using the values at `t-3`, `t-2`, `t-1`. I am using the window method. Then it means I would have `xdata.shape` equal to `(2997, 3)` and `ydata.shape` equal to `(2997, 1)`. – Euler_Salter Oct 13 '17 at 09:38
  • Then I could have `model = Sequential() model.add(LSTM(number_units, input_shape = (a,b,c)) model.add(Dense(1)) model.compile(loss = 'mse', optimizer = 'adam')` – Euler_Salter Oct 13 '17 at 09:39
  • but what values of `a`, `b` and `c` should I use? The number of samples is `2997` so I guess `a 2997`. What would be the number of timesteps? Would it have to be 1 in this example, or can I choose any number? And how do I choose? – Euler_Salter Oct 13 '17 at 09:41
  • Finally, on the same topic,how do I choose the batch size? So what is the difference between the `lag`, the `timestep` and the `batch_size`? When I will fit the model, I think it should be something like `model.fit(xtrain, ytrain, epochs = e, batch_size = bs)`. What would represent `bs` and what `e`? – Euler_Salter Oct 13 '17 at 09:44
5

It's a bit too late but just in case;
A Sample may refer to individual training examples. A “batch_size” variable is hence the count of samples you sent to the neural network. That is, how many different examples you feed at once to the neural network.

TimeSteps are ticks of time. It is how long in time each of your samples is. For example, a sample can contain 128-time steps, where each time steps could be a 30th of a second for signal processing. In Natural Language Processing (NLP), a time step may be associated with a character, a word, or a sentence, depending on the setup.

Features are simply the number of dimensions we feed at each time steps. For example in NLP, a word could be represented by 300 features using word2vec. In the case of signal processing, let’s pretend that your signal is 3D. That is, you have an X, a Y and a Z signal, such as an accelerometer’s measurements on each axis. This means you would have 3 features sent at each time step for each sample.

By Guillaume

Green
  • 151
  • 1
  • 2
  • Since you mentioned NLP, how would you construct your input if you wanted to take into account the raw words as well as their Part of Speech tagging (POS)? I.e. in my project I think that the words sequence as well as the POS sequence would be good features for the model. Any recommendation how to merge them? E.g. words-sequence followed by POS-sequence, or w1 pos1 w2 pos2 w3 pos3... ? – Alaa M. Sep 05 '21 at 11:50
5

My answer with an example: ["hello this is xyz","how are you doing","great man..."]

in this case "[samples, time steps, features]" means:

  • sample: 3 because there are 3 elements in the list
  • time steps: here you can take max_length = 4 length("hello this is xyz") = 4; length("how are you doing") = 4; length("great man...") = 2 (after removing punctuation "."). The reason of saying this is a time steps is, in first element "hello this is xyz" ==> t0("hello"), t1("this"), t2("is") and t3("xyz")
  • features: the size of embedding for each words. e.g, "hello": 50D array, "this": 50D array and so on
tintin
  • 51
  • 1
  • 1
  • How would you construct your input if you wanted to take into account the raw words as well as their Part of Speech tagging (POS)? I.e. in my project I think that the word sequences as well as the POS sequences would be good features for the model. Any recommendation on how to merge them? E.g. words-sequence followed by POS-sequence, or w1 pos1 w2 pos2 w3 pos3... ? – Alaa M. Sep 05 '21 at 11:52
  • Following your example, if we perform text classification to predict the next word, what will be the shape of the RNN output $\hat{y}$ be? (n_sample, n_steps, then what??) – siegfried Nov 27 '21 at 01:36