2

I am currently working on an auto-encoder to create latent vectors from multivariate time series. The architectures I've tested so far all revolve around some flavours and combinations of 1d convolutions and resnets (wavenet style). Basically, my most basic standard building block is

1DConv()->ActivationFunction()

I've so far tested on Relus, tanh gates and sigmoid gates. I train all networks on reconstruction error of my time series using MSE or variants thereof.

So far, I've observed that removing non-linear activations completely from my networks actually tends to improve learning time and reconstruction capability of my nets. This seems counterintuitive to me, as I thought it's precisely the activation functions that give neural networks the expressive power they have. How can this be?

Some additional info:

-My data is rather noisy and I suspect that there is only little structural information to be learned and compressed using my auto encoders.

user3641187
  • 141
  • 3

2 Answers2

0

In general polynomial regression yields better results than neural networks, but they suffer from combinatorial explosion, so in order to tackle this we use neural networks with non-linear activations

Sometimes linear models are sufficient so if neural network with only linear activations does the job then use linear/logistic regression and compare results. I would expect that these should be very close

One remark: convolutional layers are one thing that helps in dimensionality reduction so this is one thing that is independent from activation functions

quester
  • 472
  • 3
  • 12
  • While I agree that the performance of neural networks is exaggerated, saying that "in general" polynomial regression outperforms them is simply not true. For example, for real-life image, or language data this would be unlikely to happen. – Tim Dec 28 '19 at 19:08
  • IMHO 2(+)Dconv-nets and RNNs are just different kind of models, but for tabular data trees and linear models should be more than enough – quester Dec 29 '19 at 15:42
  • They all fall into neural networks category and you didn't make this distinction in your answer. Moreover, embeddings + 1d conv layers would probably work better for language data then polynomial regression. – Tim Dec 29 '19 at 16:06
0

Deep dense (fully connected) neural network with only linear activation functions reduce to single-layer network, as explained in the following answer on StackOverflow. Using non-linear activation functions is generally good idea, because it lets network to use more complicated features and be more efficient (both in terms of performance and convergence time).

More advanced network structures like convolutional, or recurrent networks, enable us to process data in more flexible way, so using such layers even without activation functions would work more efficiently for some data, then simple dense network.

Recently there were many papers shaking some of the commonly held beliefs about neural networks. It can be the case, that for some problem, some atypical network structure could work. Such "crazy" ideas worked well in the past. Maybe your data is simple enough, that you don't need to add further complications to network structure? Still, if obtaining such results, you should check if you need to use deep network, as this could suggest that you do not need many layers, but maybe just more neurons and single hidden layer.

In every case when getting results that seem strange, double check your code for potential bugs, maybe ask someone for code review. You can check the What should I do when my neural network doesn't learn? thread for some hints on debugging neural networks code. Try different regularization, learning rate, optimizer, etc.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks for the extensive answer. I've generally been quiet extensive so far in terms of architectures and depths (I rely on single-batch overfitting to measure the max expressiveness of the networks). – user3641187 Jan 02 '20 at 16:05