What machine learning algorithm can be used to predict the stock market?

Question

Alternatively, to predict foreign exchange markets. I know this can get pretty complicated, so as an introduction, I'm looking for a simple prediction algorithm that has some accuracy.

(It's for a M.Sc. university project that lasts four months)

I've read that a multi-layer neural network might be useful. Any thoughts on that? In addition, semantic analysis of social media may provide insight into market behavior which influences the stock market. However, semantic analysis is a bit outside the scope of the project at the moment.

If one believes the efficient-market hypothesis, it is impossible to consistently achieve higher than average market returns (without insider knowledge), which is built into current rates/prices. A lot of people disagree with this, but almost everyone agrees that it is true for a casual investor. In other words, a 3 line model based off of rand() is probably almost as good as the typical investor :P — rm999, Jan 20 '12 at 16:12
It seems unlikely that anyone would be willing to share an algorithm that has *any* out-of-sample accuracy. Except, perhaps, some published academic work where the anomaly is small and doesn't cover transaction costs. — NPE, Jan 20 '12 at 16:47
For academic work, it might be more worthwhile to model the prices rather than try to predict it. Prediction will probably prove to be unsuccessful, but modeling might at least provide some insight into how things actually work, and theoretically may be extended to prediction. — highBandWidth, Jan 20 '12 at 17:31
@highBandWidth, I never got this issue with statistics... if you assume prediction will failed. so why use it in the first place? why modeling ? What good does this for, except for academic paper ? please read...http://stats.stackexchange.com/questions/18896/practical-thoughts-on-explanatory-vs-predictive-modeling — Dov, Jan 20 '12 at 20:44
@Dov: In some sense prediction is the easier task since you may end up with a model with nonsensical (in the real world) coefficients that still ends up predicting rather well. Black boxes like Neural Networks and SVMs aren't even designed to provide interpretable coefficients. On the other hand, a model can take into account "the future" (i.e. data after the point in time you're analyzing), while actual predictions of course cannot. Perhaps that's what is meant here. — Wayne, Jan 20 '12 at 22:02
+1 I don't usually upvote casual, short answers like this, but it's spot on. — whuber, Jan 20 '12 at 17:28
@wayne, thanks for this interesting comment!! As I coming from the Machine learning world --> maybe prediction is indeed end up with nonsensical model. but if it successfully predict the stock market trends what do i care ? ... not sure how much prediction is easier. I see that highBandWidth suggest use modeling when he realized that prediction is a hard task... — Dov, Jan 21 '12 at 07:30
@Dov: I'm just trying to make sense of it myself. On the one hand, it sounds like highBandWidth was really saying, "Overfit your data with detailed, plausible factors and 'explain' what's going on". On the other, perhaps he's talking about generative versus discriminative models, or perhaps he's talking about the bias-variance tradeoff: academics often want unbiasedness, so that their coefficients are more accurate, even if the result is variance that makes prediction (in the stock market sense) useless. If the advice makes sense, I think it's one of these two concepts. — Wayne, Jan 21 '12 at 15:06
@wayne I don't think it's about overfitting, it's about allowing predictors that cannot be used for predictions, for example variables that occur during/after stock movements - if you find that apple and microsoft stock tend to correlate, this fact cannot be used to predict msft stock but can be very informative. — rm999, Jan 22 '12 at 07:43
@dov - that they correlate :P It was a made-up example, but it could indicate that there is a shared causation of their stock movements. — rm999, Jan 22 '12 at 08:10

score 18 · Accepted Answer · edited Apr 13 '17 at 12:44

18

As babelproofreader mentioned, those that have a successful algorithm tend to be very secretive about it. Thus it's unlikely that any widely available algorithm is going to be very useful out of the box unless you are doing something clever with it (at which point it sort of stops being widely available since you are adding to it).

That said, learning about autoregressive integerated moving average (ARIMA) models might be a useful start for forecasting time-series data. Don't expect better than random results though.

edited Apr 13 '17 at 12:44

Community

1

answered Jan 20 '12 at 16:45

Michael McGowan

4,561
3
31
46

6

+1: How many times I've been reading or been in a class and heard people who believe that if they have a complicated-enough algorithm, they could get rich in the stock/electricity/commodities markets. You try to explain overfitting, etc, but to no avail. Heck, as far as I know, not being an insider, successful stock-trading software has depended on no transaction fees, arbitrage, and high speed. The cutting edge now is to use loopholes in automated trading rules and high-speed proposal/withdrawl of bids to sucker-punch other automated traders. – Wayne Jan 21 '12 at 15:12
1

The other issue is where multiple people end up with the same algorithm because they did their training on exactly the same data, then put volume sales/purchases through. Would any algorithm be expected to have long-run accuracy? – Michelle Jan 24 '12 at 19:33
@Wayne there are strategies for reducing over-fitting, though they're difficult to implement on time series data. – Zach Jan 27 '12 at 16:53
1

@Zach: yes, there are ways to penalize overfitting, but it's the attitude I'm reflecting on: the folks who have done some basic (probably erroneous) curve fitting in Excel and feel that they could've made money with their secret sauce, but what they really need is one of those cutting-edge, sophisticated algorithms that the professor just won't share with the class. That algorithm would fit the data like a glove, and then predict so much better than all those other speculators using Excel spreadsheets... but the professor keeps droning on about overfitting and the limitations of data. Sigh. – Wayne Jan 27 '12 at 19:32

Darren Cook · Answer 2 · 2012-01-29T01:27:58.270

I think for your purposes, you should pick a machine learning algorithm you find interesting and try it.

Regarding Efficient Market Theory, the markets are not efficient, in any time scale. Also, some people (both in academia and real-life quants) are motivated by the intellectual challenge, not just to get-rich-quick, and they do publish interesting results (and I count a failed result as an interesting one). But treat everything you read with a pinch of salt; if the results are really good, perhaps their scientific method isn't.

Data Mining With R might be a useful book for you; it is pricey, so try and find it in your university library. Chapter 2 covers just what you want to do, and he gets best results with a neural net. But be warned that he gets poor results, and spends a lot of CPU time to get them. The Amazon reviews point out the book costs $20 more because that chapter mentions the word finance; when reading it I got the impression the publisher had pushed him to write it. He's done his homework, read the docs, perused the right mailing lists, but his heart was not in it. I got some useful R knowledge from it, but won't be beating the market with it :-)

A draft version (May 2003) of *Data Mining with R* can be found [here](http://datamining.dongguk.ac.kr/lectures/2009-2/dm/DataMiningWithR.pdf). (I don't have the book, so I can't say what's the gap between the two versions.) — chl, Jan 29 '12 at 11:40
@chi Thanks! I took a quick look, and only two of the four chapters are there. But the bigger difference is that the _Predicting Stock Market Returns_ chapter is _very_ different. No mention of xts or quantmod and instead using the ts package, and using acf and the MARS package for predictions. It is almost like a bonus chapter, and I'm going to make time to read it properly. He is still using neural nets, but not comparing them to SVMs as in the published book. — Darren Cook, Jan 30 '12 at 03:08

score 11 · Answer 3 · answered Jan 20 '12 at 22:14

To my mind, any run-of-the-mill strong AI that could do all of the following might easily produce a statistically significant prediction:

Gather and understand rumours
Access and interpret all government knowledge
Do so in every relevant country
Make relevant predictions about:
- Weather conditions
- Terrorist activity
- Thoughts and feelings of individuals
- Everything else that affects trade

Statistical analysis is the least of your worries, really.

score 4 · Answer 4 · answered Jan 20 '12 at 17:49

You could try the auto.arima and ets functions in R. You might also have some success with the rugarch package, but there's no existing functions for automated parameters selection. Maybe you could get parameters for the mean model from auto.arima, then pass them to rugarch and add garch(1,1)?

There's all sorts of blogs out there that claim some success doing this. Here's a system using an arima model (and later a garch model) and system using an SVM model. You'll find a lot of good info on FOSS trading, particularly if you start reading the blogs on his blogroll.

Whatever model you use, be sure to cross-validate and benchmark! I'd be very surprised if you found an arima, ets, or even garch model that could consistantly beat a naive model out-of-sample. Examples of time series cross-validation can be found here and here. Keep in mind that what you REALLY want to forecast is returns, not prices.

score 3 · Answer 5 · answered Nov 16 '16 at 12:25

I know of one machine learning approach which is currently in use by at least one hedge fund. numer.ai is using an ensemble of user-provided machine learning algorithms to direct the actions of the fund.

In other words: A hedge fund provides open access to an encrypted version of data on a couple of hundred investment vehicles, most likely stocks. Thousands of data scientists and the like train all sorts of machine learning algorithms against that data and upload the results to a scoreboard. The highest scorers get a small amount of money depending on the accuracy of their results and how long their result has been available online.

The best predictions are supposedly made by ensembles of algorithms.

So you have a lot of scientists providing trained guesses, some of which are themselves ensembles of guesses and the hedge fund uses the ensemble of all provided guesses to direct their investments.

This rather interesting hedge fund's results taught me two things:

Ensembles are often viewed as a good way of making predictions on the stock market.
Good predictions require more ensembles than I'm willing to build myself...

If you want to have a go, visit: https://numer.ai/ No, I'm NOT affiliated with them, I'd most likely not spend my days online were I connected to a hedge fund that employs thousands of people, but paying only those that provide measurable results :)

The numer.ai community has a forum where they discuss their approach so you CAN learn from others who are trying to do the same.

Personally I think anyone with a good algorithm is going to keep it very, very secret.

score 1 · Answer 6 · answered Jan 27 '12 at 16:48

You should try GMDH-type neural networks. I know that some successful commercial packages for stock market prediction are using it, but mention it only in the depths of the documentation. In a nutshell it is a multilayered iterative neural network, so you are on the right way.

score 0 · Answer 7 · answered Jan 29 '12 at 06:38

0

I think hidden markov models are popular in stock market. The most important thing to keep in mind is that you want an algorithm that preserves the temporal aspect of your data.

answered Jan 29 '12 at 06:38

Roronoa Zoro

133
8

What machine learning algorithm can be used to predict the stock market?

7 Answers7

Linked