on-line regression with 1 output

Question

I have 12 input variables from sensor (IMU) to predict 1 output (Speed of a boat) variable. Is it possible to use regression (or something else?) in this case where it is a continuous data stream from sensor? if so, anyone have suggestion for regression methods to try to get me started, and please state why this method makes sense to try. Suggestions of literature to read is greatly appreciated too.

Thanks for any answers!

This seems to be about programming. Also, what is your objective? — user2974951, Feb 06 '19 at 12:34
Before you run any regressions, I suggest visually inspecting scatterplots of each input variable versus speed to see if the is any really obvious data transform such as log or exp that might help fit the data. This is usually fast to perform. — James Phillips, Feb 06 '19 at 19:54
@JamesPhillips thanks for suggestion. Sadly the sea trials to collect the IMU-data will not be performed before late this month, so Im kinda stuck regarding that. So basically I'm trying to find algorithms in the meantime that makes sense to test so I can practice the coding — hoddy, Feb 06 '19 at 20:04

score 1 · Answer 1 · answered Feb 06 '19 at 23:02

Yes, it is possible to use regression where we have a continuous data stream. In particular, we can have a nifty little solution by turning an otherwise static regression estimator to an online estimator by training a simple linear regression model by stochastic/mini-batch gradient descent. As such, instead of sampling our existing data as we would do with standard SGD, we use the incoming data-stream to update our estimated parameters. When new data are recorded our estimated parameters will be adjusted on-the-fly.

CV.SE has some great answers on: How could stochastic gradient descent save time comparing to standard gradient descent? and Batch gradient descent versus stochastic gradient descent that can really help built up once intuition on the matter. If you are interested in something more formal, Zhang (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms and Shalev-Shwartz et al. (2007) Pegasos:Primal Estimated sub-gradient solver for SVM are two standard references on the matter.

I am glad I could help! If you find this answer helpful you could consider upvoting it or if it answers your question, accept it as an answer. If you need further clarifications you are welcome to ask. — usεr11852, Feb 12 '19 at 10:47

score 0 · Answer 2 · answered Feb 07 '19 at 00:03

There are plethora kind of modelisations to compute a quantitative outputs.

There is the KNN (K-Nearest-Neighbors) used as regression even if it is more known for classification. You just must use the right object of the library you are using. For instance in scikit-learn there are KNeighborsClassifier and KNeighborsRegressor. The main problem with this algorithm is it is slow. And it is clearly a big issue with large dimensionalities dataset (more than 100) which is not your case.

There is linear regressions. Many kind available like OLS, Ridge based on the OLS but with more constraints, Lasso and so on. Generally these are ones the fastest in training. There are certainly the easier to explain too.

Decision tree (random forest for many decisions tree aggregated in one model, more used than decision tree alone). It is very good in training, so to explain a phenomenon is good with this, less easy than linear regression but possible. But as far as I know, it is bad in learning (for one single tree, random forest has the property to make trees correcting other trees so it is not concerning random forest). So not good to predict in a regression case. There is a great pros, you do not need to treat the data before: standardisation and so on. It is slower than linear models that is the cons.

For suggestions of litterature, I do not know your level to suggest you any litterature. In available litterature from PhD ppl who published their works, you must be great in maths. That is not always the case for every ML user and they do not always need it(an high very high level in maths) actually. Besides you have a constraint of time because you talked about stream data, and here a language and a library can be a huge difference in time execution for a same method, and I do not think university papers will guide you on this point. And even if you are using stream data, we do not know if you want an explanation of the phenomenon or a prediction of it no matter why. That can change totally the model you will use.

on-line regression with 1 output

2 Answers2