I am using XGBoost for a classification problem (3 classes) where the 6 features are (unscaled) time series. I tried applying forward chained cross validation, but I obtain very low accuracy with my model, 30-40%.
Here is the code:
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
from xgboost import XGBClassifier
from sklearn.svm import SVC
tscv = TimeSeriesSplit()
def evaluate(model):
# train model on training dataset
for train_index, test_index in tscv.split(X2):
X_train, X_test = X2[train_index], X2[test_index]
y_train, y_test = Y[train_index], Y[test_index]
model.fit(X_train, y_train)
y_predict = model.predict(X_test)
predict_values = [round(value) for value in y_predict]
# calculate accuracy
accuracy = accuracy_score(y_test, predict_values)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
# confusion matrix
print("Confusion Matrix:")
conf_matrix = confusion_matrix(y_test, y_predict)
print(conf_matrix)
seed=7
mxgbc = XGBClassifier(max_depth=20, learning_rate=0.1, n_estimators=100, objective='multi:softprob', booster='gbtree', n_jobs=1, nthread=None, gamma=0.1, min_child_weight=4, tree_method='hist',max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1, colsample_bynode=1, reg_alpha=0.5, reg_lambda=1, base_score=0.5, random_state=seed,missing=None)
print("## XGBClassifier:")
evaluate(mxgbc)
The accuracy I get is low, 30-40%. I compared with scaling with MinMaxScaler(), but results are the same. Before moving forward to parameter tuning, I need to know if am I doing something wrong in the model specification/ time series validation above. Help much appreciated.