RNN on microcontroller based systems

Question

I'm currently developing a RNN with Python and Keras/TensorFlow for time series prediction. My final objective is to port the (trained) RNN to an embedded system, which should collect the input time series from sensors and run them through the RNN and output a time series (prediction) in real time, or at least in approximately real time.

So, my questions are: (1) is it possible to port a trained RNN (Python based Keras/TensorFlow model) into an embedded system? And (2) if so, which embedded platforms would you recommend to perform such a task?

If you could share references which relate to the questions, it would also help me a lot.

Thank you.

Obs.: the RNN is composed of 5 layers with 32, 16, 8, 4 and 2 LSTM units each. The inputs of the network are 2 time series, with sizes ranging from 10000 to 200000 samples; in the PC I handle them using Python generators, training and running the network with batches of 128 samples.

I think that reducing the size of the network would not cause to many issues (regarding the precision of the output series).

Nenovrak · Accepted Answer · 2018-05-18T15:00:27.587

2

This would be my approach to it:

First, estimate the memory needed for the LSTM: How can calculate number of weights in LSTM

I get that calculation to be 8552 params for all your layers. Now consider that each parameter will need to be stored as a floating point value. If we assume 32-bit resolution of these parameters is ok, you will need 4 bytes per parameter, so your lower memory bound will be:

lstm_parameter_memory_size = parameters * 4

lstm_parameter_memory_size = 34208 bytes

Of course you will need more memory than this, but this is (for the LSTM) the dominating factor. The exact timing response will require quite the analysis, so I suggest to use a tool to check Keras/TensorFlow execution or make a rough estimate on the number of instructions per LSTM unit.

A more pragmatic and less theoretical approach: Buy something like an ESP-32, implement the LSTM in C and just try it. It is a powerful and inexpensive device: https://en.wikipedia.org/wiki/ESP32. This approach allows you to scale your hardware solution down or up also, since most platforms will run your C code with smaller adjustments.

edited May 18 '18 at 15:00

answered May 18 '18 at 12:51

Nenovrak

71
6

I just edited the question in order to inform the size of the RNN. Hope this helps in getting more detailed answers. – Jose Bueno May 18 '18 at 13:05
It helps, but please also include the input and output vector sizes. Also, do you have any size constraints on your system? Battery constraints? Price? There are a wide range of embedded systems ranging from puny rudimental devices to almost full-fledged mini desktops... – Nenovrak May 18 '18 at 13:28
I still have no constraints on the system settings (like size and battery) and price. One of the purposes of the question is to obtain guidance in selecting the system. – Jose Bueno May 18 '18 at 13:38
Do I understand correctly that your RNN reads 2 inputs at each timestep and outputs 1 prediction? (This is what I mean by input and output vectors) – Nenovrak May 18 '18 at 13:48
Yes, that is correct. – Jose Bueno May 18 '18 at 14:04
Do you know of any references on implementing LSTMs or translating the Keras models to C? Thank you for the answer! – Jose Bueno May 18 '18 at 15:16
This seems to be a small LSTM implementation in C with few dependencies (I haven't tried it or looked at the code though): https://github.com/tmbdev/clstm . Provided that Keras let's you dump the weights into a file, you should be able to "move" your model to another lib by just reformatting that file to fit the style of whatever ML lib you choose. – Nenovrak May 21 '18 at 07:38

score 2 · Answer 2 · answered Nov 08 '19 at 14:35

Empirical results often indicate (e.g. [1]) that simpler gated RNN's (e.g. MGU or GRU) perform comparably to LSTM, and since in microcontroller application memory consumption is important, I would suggest to work with simpler gated RNN models instead of LSTM. [2] is my old implementation of GRU for microcontrollers (e.g. teensy 3.6).

[1] https://arxiv.org/pdf/1412.3555v1.pdf
[2] https://github.com/povidanius/gru_neurocontroller

score 1 · Answer 3 · answered May 04 '20 at 10:10

Just to extend the possibilities stated by the accepted answer

1) Yes, it is possible to port RNNs to microcontrollers. An example of a library for doing so is CMSIS-NN. They even have a GRU example and more detailed procedures in its web page, such as how to convert a network.

Another possibility is TFMicro, but be aware that currently you will, I could assure, problems quantizing RNNs. However, if you do not quantize it, there shouldn't be any problem.

2) Depending on the easiness that you need I would choose one or another platform. For example, if you want an easy path you can choose a platform supported by TFLite for Microcontrollers. If you want to work on quantization patterns and so, I would, as stated by @Nenovrak, compute the model requirements in terms of size but also in terms of RAM, and then I would choose an appropriate microcontroller with an ARM CPU, juswt to use CMSIS on it.

score 1 · Answer 4 · answered Oct 29 '20 at 08:57

In addition to the tools mentioned by @BCjuan, there is also.

https://github.com/majianjia/nnom a platform independent inference engine for neural networks. It takes a Keras model as input.

X-CUBE-AI from ST Microelectronics. It is however proprietary and only supports their devices.

score 1 · Answer 5 · answered Jan 24 '21 at 08:09

Haven't tried but would seem the CMSIS examples have been ported to ESP32

https://github.com/UT2UH/ML-KWS-for-ESP32

Forked from ARM's Keyword spotting for Microcontrollers using frankie "whyengineer" fork of ARM's CMSIS for ESP32

This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword spotting on Microcontrollers. The scripts are adapted from Tensorflow examples and some are repeated here for the sake of making these scripts self-contained.

RNN on microcontroller based systems

5 Answers5