Poor classification performance on a balanced dataset

Question

I am trying to build a Stock Trend Prediction model (Target variable is 1 if the price goes up the next day and 0 if it goes down). The dataset consists 2724 samples and 60 features. Th data was split as 80% training and 20% test, with the target variable being balanced in both the training and test sets.

I tried predicting with some SVMs and Random Forests, and the best performing model had the scores as follows:

Precision = 0.5305676855895196

Recall = 0.8678571428571429

Accuracy = 0.5367647058823529

Negative Precision = 0.5697674418604651

Specificity = 0.1856060606060606

[[ 49 215]

[ 37 243]]. This is the confusion matrix, This performance was achieved with an RBF kernel SVM with C = 100 and gamma = 1

I would also like to add as a part of this edit, performances of sigmoid kernel SVMs.

Sigmoid kernel, C = 1

[[120 144]

[134 147]]

Sigmoid kernel, C = 10

[[123 141]

[134 147]]

Sigmoid kernel , C = 100

[[123 141]

[135 146]]

The reason why I put these confusion matrices here is to show that sigmoid kernel is at least trying to classify samples as belonging to class 0.

This is really bad for a balanced dataset. This is just one part of the problem though. The models (SVMs and RFs) are not just performing poorly but also not learning anything. As you can see from the Recall and Specificity scores, the models are classifying a lot of cases as 1 and a very small amount as 0.

What could be the reason/s behind this? Since the dataset is balanced, I don't understand why the models are not learning at least. There are 2180 training samples. Out of which 1119 are up days (51.33%) and 1061 are down days (48.66%). The test samples are a total of 545 samples. Out of which 294 are up days (53.94%) and 251 are down days (46.05%). I think this is quite balanced for a real world scenario.

I tried out different kernels, various values for C and gamma for the SVMs, and tried varying the n_estimators for the RFs. But I see the same results: Poor accuracy and classification of majority of the samples as class 1.

Am I right to arrive at either or both of the following conclusions?

This is the best that a simple model can perform
The problem lies in the dataset and not in the model

As Dave pointed it out in the comments, the reason why I think this performance can be improved a lot (not necessarily bad performance) is because I came across a few papers wherein people have achieved over 65% results in one day ahead trend prediction.

Now their models were complex (SVM with quasi linear kernel, SVM with GA for feature selection, MLP with GA for feature selection). But I wonder how much of a performance difference should these complexities make.

As nxglogic suggested in the comments following his answer, I tried swapping the labels i.e. up days are now 0s and down days are now 1s. Now the model is classifying majority of samples as 0s, whereas before it classifying majority of samples as 1s.

I do not understand why this must be happening.

If this has to do something with the feature engineering part and requires domain knowledge then there can always be improvements. What I would like to know is whether I am failing to experiment properly using these baseline models.

You do better than chance at making a stock decision. What’s the problem? You’re a billionaire! (More seriously, why do you think this is such poor performance? Is it because tasks like MNIST digit recognition gets $99.9\%$ accuracy?) // Are you sure your data should be balanced? I thought sticks went up more than they went down. — Dave, Apr 12 '21 at 16:38
Its almost balanced, I mean. And yes, why do I think that this is a bad performance is a very good pointer. I will do the edits. — Aditya Kulkarni, Apr 12 '21 at 17:23
Accuracy is not a great way to measure classifier performance https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models and see also some of our posts about the phantom problem of "balanced/imbalanced" data sets: https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he Since you're presumably trying to make money trading stocks, the more relevant question seems to be whether or not some trading strategy is profitable. — Sycorax, Apr 12 '21 at 20:41
@Sycorax I do understand that accuracy has its limitations and that's why I have stated other scores as well. In this case, I think accuracy does matter because the dataset is really well balanced (more than 60-40, please see the edits). So even if I get an accuracy of 60%, it does make a lot of difference in terms of the profit as the model is predicting down days also. One answer in the link you posted talks about weights of classes. But in this case, down days are as important as up days, as you can book a profit in both the scenarios. — Aditya Kulkarni, Apr 13 '21 at 07:55
Precision, recall, specificity all share the same weaknesses as accuracy because they're comparing probabilistic predictions to a threshold (both 0.1 and 0.49 are less than 0.5, so accuracy treats them as the same). In any case, your classes aren't really discrete either: a 10% loss isn't the same as a 1% gain. — Sycorax, Apr 13 '21 at 13:47
So what performance measure should I go for? And can I achieve a better performance by making changes in the model or its the dataset that needs the work? — Aditya Kulkarni, Apr 13 '21 at 15:48
It depends, since you'd have two models. One that uses the best features to predict up days, and one that uses the best features to predict down days. — , Apr 14 '21 at 16:19

score 0 · Accepted Answer · 2021-04-14T19:43:01.727

0

Your approach assumes that the features which predict up days are the same as the features that predict down days - which may be erroneous. Thus, you might try running your models separately for up days and down days, then for each day (bar) both models will give a prediction. You should also lag price (0,1) by 1, 2, and 5 days, so that you can get a look-ahead prediction.

There's also no guarantee that SVM and RF are the best classifiers for this, so you'll need to look at more.

Be sure to also use the log returns of price, which are equal to $\log(P_t / P_{t-1})=\log(P_t) - \log(P_{t-1})$, as these are more normally distributed than using price itself. You should also consider some sort of transformation, such as percentiles of the log returns, which can help with scaling of the model.

The only other thing I would recommend is to run PCA on your input features in order to reduce noise and collapse them down to a lower number of dimensions. Then input the PC scores into your classifiers.

Below is a spreadsheet picture of how to shift price up in a spreadsheet so that lags 1-10 are achieved. For e.g. lag-5, in Column G price is shifted up (earlier) 5 days, so a prediction model that uses that column as an outcome for the prediction will result in a regression model or predictor that predicts price 5 days in advance. Column L contains price shifted up (earlier) by 10 days. So if you want to fit the 5-day model, just select Column G as the outcome variable.

edited Apr 14 '21 at 19:43

answered Apr 12 '21 at 16:42

2

You suggest running the predictions for up and down days separately. Would you please elaborate? I mean if I were to run a prediction for only only up days, what labels should I use? – Aditya Kulkarni Apr 12 '21 at 17:25
I also didn't quite get what you meant by lagging the price. A particular sample was given a target label as 1 or 0 based on whether the price rises or fall the next day. Do you mean to say that I should try 2-day or 5-day ahead predictions? – Aditya Kulkarni Apr 12 '21 at 17:39
1

For separate models, just run your model, then repeat with 0,1 flipped (recode so down days are 1 and up are 0). However, you should evaluate features which are predictive for both these runs as they will likely be different. If you were using logistic regression and flipped 0,1, the only change would be in the sign of coefficients, but this shouldn't happen for SVM and RF since they are non-linear -- i.e. individual feature importance may change when you flip coding. For 1-day lag, move the 0,1 price variable to 1 day earlier, and fit. Same for 2-day, 5.day. – Apr 12 '21 at 17:49
1

When you e.g. shift the 0,1 price values to 5 days earlier, the 60 predictor feature values for each day will be predicting an outcome that is from 5 days in the future. Hence, the fitted model will be attempting to predict 0,1 values 5 days in the future - since you trained the model with the outcome shifted 5 days earlier. Look at the VantagePoint neural network prediction software for various stock markets. They use a separate neural net for a 1-day lag, separate net for 2 day lag, and separate net for 5-day lag. – Apr 12 '21 at 17:54
1

As you suggested, I swapped the labels i.e. up days are now 0s and down days are now 1s. Now the model is classifying majority of samples as 0 . Also the accuracy is still the same. To address what you have said about the features in your answer, the features are technical indicators. These indicators tell you whether the price will rise or fall according to their values. So a given indicator can tell you about both the scenarios depending on what its value is. What do I make of result that I got after swapping the labels? I still don't understand why this happens. – Aditya Kulkarni Apr 12 '21 at 20:16
1

Also I am sorry but I don't understand why I should look for 2-day ahead or 5-day ahead predictions. My problem is to specifically get 1-day ahead predictions. Please help me understand your view. – Aditya Kulkarni Apr 12 '21 at 20:16
1

2 and 5 day lags are commonly used just to see if there's an indication that the asset's price may turn further into the future. Wouldn't you want to know if the asset's price is predicted to go down 5 days from today even is tomorrow's prediction is up? – Apr 12 '21 at 20:19
1

I see. Let's say I were to do 2-day ahead predictions. Won't I need to change my dataset altogether? – Aditya Kulkarni Apr 12 '21 at 20:22
1

Yes, the 1, 2, and 5 day models are all different runs, but you can add them next to one another, and just select either the 1-day, 2-day, or 5-day lagged variable (0,1) at run-time. – Apr 12 '21 at 20:24
1

I provided a picture in the OP showing how to do the shift. Column is the unshifted data, and column L is the 10-day lag, where the price is shift up to an earlier time 10 trading days earlier. – Apr 12 '21 at 20:31
1

Thanks. The picture helped. I will try running the predictions for x-days ahead – Aditya Kulkarni Apr 13 '21 at 08:14
I modified the answer at the bottom, suggesting use of log returns and transformations. Your model may be doing poorly because of a scale problem. However, predicting stock price can be very complex and sometimes difficult to get a good prediction. – Apr 14 '21 at 16:26
1

Also, you aren't likely to have a "eureka moment" where everything magically falls in place and prediction accuracy is very high, so you are forewarned. Thus, please mark this as an answer, in the event it helped expand your horizons on a common route to take. – Apr 14 '21 at 16:27
1

About scaling issue, I normalized the data before training. – Aditya Kulkarni Apr 15 '21 at 10:08

Poor classification performance on a balanced dataset

1 Answers1

Linked