0

In machine learning, for a given input instances you get an output what are present at the same time. But in stock market you have to predict the next price based on previous inputs. So if you want to predict the next price (output) with machine learning, how you do it lacking new input instances (for example: high price, low price, open price, close price, volume, etc.)?
I want to use a simple example to be clear what I want to understand here.

For example :

I use high price, low price, open price and volume as inputs and close price as output. I train the algorithm with input-output samples. Then I want to predict an output (close price). But the problem is that inputs and output appears together so this way I can`t predict the next price because it has already appeared with inputs. So how is that, how do they apply?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 4
    Learn this one weird kernel trick. Stockbrokers hate him! – Kodiologist Aug 20 '16 at 05:31
  • @Kodiologist, I've read a few times about this "weird kernel trick" joke. Is it a meme among statisticians? Ps I know what's a "kernel trick". I'm specifically referring to the joke. – DeltaIV Aug 20 '16 at 06:21
  • The title makes this sound to be overly broad, about finance, and a duplicate of http://stats.stackexchange.com/questions/21395, but the actual issue seems on-topic. Can you edit this to clarify that the question is about how to predict future outputs when future inputs have not appeared (though that may be duplicate too, but finding a duplicate would give you useful answers). Also, I don't understand in the example how the next price "has already appeared with inputs". – Juho Kokkala Aug 20 '16 at 08:54
  • 1
    @JuhoKokkala, good points. I think the question is pretty simple, and I answered it as such. Let us see if that satisfies the OP. – Richard Hardy Aug 20 '16 at 08:58
  • 2
    @DeltaIV See https://en.wikipedia.org/wiki/One_weird_trick_advertisements, http://knowyourmeme.com/memes/trainers-hate-him. – Kodiologist Aug 20 '16 at 14:15
  • Do you really believe one would give you a decent answer to this problem for free had one known a profitable solution? Read about Game Theory and you will know this question is simply useless from a practical point of view. Bear in mind I'm not stating **there isn't** a solution (though I'm sure someone will step in and say the "stock market" is a chaotic system and therefore, not predictable by ANNs, or by anything for that matter). I'm just saying that anything that anyone posts here most likely would NOT work and would NOT make you money after all transaction costs and fees are taken into a – Leonardo Cordeiro Dec 31 '16 at 17:53
  • I fail to see how this, in any reasonable sense, can be considered an answer to the question. Your line of reasoning would apply to any prediction question on the site, and hence no answers should exist, yet they somehow do. – Repmat Dec 31 '16 at 19:42
  • Well, it is more of a meta-answer. I've tried to answer the question by stating that if there is an answer it would not be credible nor reliable and, consequently, useless. If you're not familiar with basic Game Theory you will naturally fail to see this. And no, this line of reasoning would not apply to any prediction question on this site. Finding a prognostic gene expression signature of breast cancer or a better way to match drugs to specific mutations in cell membrane receptors is entirely different from finding a trading rule that would generate abnormal risk-adjusted returns. – Leonardo Cordeiro Dec 31 '16 at 20:07
  • I guess it is pretty easy to argue people will genuinely try to help on the former cases. But I may be wrong and people may give money away for free on this site. It is Holiday Season, after all! – Leonardo Cordeiro Dec 31 '16 at 20:10
  • You build your analysis on the unjustified assertion that no will give away knowledge for free, the entire concept of stack exchange seems to undermine that assumption. Conceptually there is no difference in prediction of X or Y, it is telling you something about the future that others would not know... This will always have a value, and by your answer should not be found anywhere for free. – Repmat Dec 31 '16 at 20:43
  • Again, you fail to see it depends on the problem domain. No one rational enough would give away knowledge that can be readily acted upon for free. Curing cancer is one thing; helping someone making money with little effort and without generating value to society is something completely different, is it not? In any case, unless you come up with an "economics" argument, this is my stop-loss post. – Leonardo Cordeiro Dec 31 '16 at 21:01
  • I was going through my old answers and noticed this one was not accepted. Do you perhaps need further clarification? – Richard Hardy Feb 20 '17 at 18:56
  • waste of time, forget about it – Aksakal Mar 18 '18 at 19:35

2 Answers2

3

If you want to predict future values knowing only the current and past values, that is also how you specify the model. If the variable of interest is $y$ and the variables that can be used for prediction are $x_1,\dotsc,x_K$, you formulate the model as

$$ y_{t+1} = f(x_{1,t},\dotsc,x_{K,t},\dotsc,x_{1,t-p},\dotsc,x_{K,t-p}) + \varepsilon_{t+1} $$

for some maximum lag $p$, where $f$ is the function the machine learning algorithm is trying to learn and $\varepsilon$ is something unpredictable. This way you can predict future $y_{t+1}$ using data that is available today (at time $t$).

When training your model, you will have a sample spanning $1,\dotsc,T$. Then your $t$ runs from $p+1$ to $T-1$ in the training sample. You can verify that by noticing that for $t<p+1$ or $t>T-1$ you would have to use data that you do not have within $1,\dotsc,T$.

Once you have formulated the model, training and prediction goes as usual.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
-1

Recurrent Neural Networks are quite well suited for this job.

Let's say we have an LSTM Recurrent Neural Network with 1 layer and 128 hidden units. Instead of predicting price at t+1 based on ohlc data, and then t+2 based on ohlc data up until t in addition to the output of t+1, you can use a dense layer to map those 128 hidden units to, let's say, 48 dense outputs, where 48 is the number of hours we want to predict.

I am using this approach with a pretty good success rate in terms of accuracy.