How can a cnn-lstm learn time-related aspects when these are gone by using a cnn in the first layers?

Question

I recently learned about cnn-lstm architectures for time series, where the cnn part of the architecture acts as a feature-extractor. However, I struggle to grasp why there is still a 'time-related' aspect to that data, because I thought the output of the cnn layers are just 'features'?

For example, sound identification can be done by such an architecture. You provide train data of sounds of shape (2048,1) and a corresponding class label. First, the cnn will extract features from that time series, and then a lstm with a dense layer afterwards will predict the label. Can somebody perhaps explain how a lstm will still make sense of such data coming from the cnn?

An example image of how this works:

Example paper where I found another take on this story, you can see it in Figure 2 of: A CNN–LSTM model for gold price time-series forecasting, written by Ioannis E. Livieris et al.

This seems like a [tag:self-study] question. What in particular confuses you about why this would work? Where do you see a contradiction? That’ll help us target our answers to the specific confusion. — Arya McCarthy, Aug 11 '21 at 13:45
It also would help if you said why you think a fully-connected layer would make sense, since fully-connected layers do feature extraction, too. // I disagree that this is a [tag:self-study] question in the usual sense of the tag. — Dave, Aug 11 '21 at 13:47
hi @AryaMcCarthy, this is no study question this is just an architecture that I saw in a few recent papers, and I do not see the concept of using an lstm on the features of the cnn. Because an lstm normally looks at a sequence of values, and now there is no sequence anymore, at least to my understanding. I would like to ask if anybody knows why this still works. — intStdu, Aug 11 '21 at 13:50
@AryaMcCarthy that is very interesting to me, I did not know that. Do you perhaps know a paper / resource where I could read up on this? — intStdu, Aug 11 '21 at 13:56
@intStdu Again, it will help if you write out what you think is going on with a fully-connected layer. CNNs are not much more than fully-connected layers with weight-sharing and some connections forced to be zero. — Dave, Aug 11 '21 at 14:56
I have not look at introductory CNN tutorials in years, but when I was first learning, I did not make the connection (pun totally not intended). I thought of them as little rectangles on top of bigger rectangles. It was after I got heavy into the statistics of machine learning that I made the connection, first noticing that logistic regression looked a lot like a neural network with no hidden layers. // [I have drawings posted on here about how a CNN works as a neural network.](https://stats.stackexchange.com/a/409172/247274) (Notice 16 connections but only four parameters (colors).) — Dave, Aug 11 '21 at 15:12
Okay I think I understand what you both are trying to make clear. The feature map of the CNN is a representation of the original time series, with new features so to say, that are in the order of the original time series. So the first new features represent the beginning of the time series (depends of course a bit on the settings of the CNN in terms of the strides and the kernel_size etc), and the last features from the feature map are the features of the end of the time series. I just never thought about the order of the values in the ccn feature map, because that was never necessary. — intStdu, Aug 11 '21 at 15:32

How can a cnn-lstm learn time-related aspects when these are gone by using a cnn in the first layers?

0 Answers0