0

As the title suggests, how do you know if there exists a machine learning solution to a problem set? Earlier today I was working on building a neural network to predict whether stock prices will going up or down using data from the past 150 candlesticks (If price moves up 20 points first, the label will be 1, but if price moves down by 20 points first then label is 0).

The dataset was fairly intuitive and easy to normalize and clean. The entire problem just seems pretty straightforward. We feed chart data, and the model finds patterns. It sounded simple in my head. However, having tried using CNN, ANN and RNN, the validation accuracy seems to be stuck at 50% regardless of whatever tweaks I make to the network architecture. It's almost like the model is unable to find any pattern at all.

The dataset has a 50/50 split between my 2 labels (bullish vs bearish). The dataset is moderately large as well, sitting at 4784 instances, with each instance having the OHLC (+ other indicator) values of the past 500 candles.

Theoretically I don't see any problem with the way I'm approaching this problem, and the fact that accuracy only budges up and down ever so slightly tells me that the moment a conclusion is formed by the model, the next set of data disproves it, bringing the accuracy back to 50%. It just feels like a never-ending back-and-forth.

TLDR: So, how do you know if there exists a machine learning solution for a problem set? Conversely, how do you know if there isn't?

kneecaps
  • 3
  • 1
  • 4
    You're far from the first person to try and apply machine learning to stock trading https://stats.stackexchange.com/search?q=stock+%5Bmachine-learning%5D+answers%3A1+score%3A1 or prices generally. https://stats.stackexchange.com/search?q=stock+price+answers%3A1+score%3A1 There's a **lot** of literature about this that you can find using Google Scholar, but making accurate predictions will be hard and making money from it will be even harder. – Sycorax Jul 27 '21 at 14:09
  • I don't think this is a duplicate of the "hopeless" question, and have voted to reopen. Yes, I do link to the "hopeless" question in the answer I had already typed up :(, but there are wider aspects that can be meaningfully addressed here. – Stephan Kolassa Jul 27 '21 at 14:14

1 Answers1

1

There are various possible answers to your question. One way to know is: if someone is willing to pay for your ML solution, then it is a solution to the problem at hand, at least to some extent.

More generally, any approach that creates value is a solution. "Value" here needs to be defined specifically for your problem. If your problem is "forecasting supermarket sales to improve the stock position", then there is an ML solution (I apologize for the self-promotion). If your problem is "predict a fair coin toss more accurately than 50%", then there is likely no ML solution.

It always makes sense to compare your ML solution to a realistic benchmark. If your ML solution always predicts "heads" to the coin toss, then it will beat my drunken uncle who randomly predicts "heads" or "tails" - but that doesn't tell you that "there is an ML solution", only that you did not use a reasonable benchmark. (And that accuracy is not a good evaluation measure.)

Of course, it's very hard to prove the absence of something. Perhaps you are just unsuccessful because you didn't feed in the right predictors, because you don't know your problem well enough. (Yet more shameless self-promotion here.)

In your particular case, I recommend reading up on the Efficient Markets Hypothesis. In a nutshell: if it's easy to predict that a stock will go up tomorrow, then people will figure this out and buy the stock today to take advantage of the price rise tomorrow. But of course, when they do this, they bid up the price today, to the extent that late arrivals are faced with a stock price that is identical to the one they predict for tomorrow - and the predicted gains have already evaporated.

No, that does not mean that it's impossible to predict stock price movements. Only that it's very hard, and that you would need to beat thousands of people with lots of experience, impressive credentials, even more impressive computing power, and dedicated lines to the stock exchange computing centers, in order to shave a few milliseconds off their reaction times.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357