I'm currently working on sales forecasting. I'm using a Regression Forest to make my forecast. (with MLLib from Spark on Databricks) I'm trying to find what features are useful in my forecasting. Something disturbs me, the standard deviation (STDDEV) of my prediction is really low. For a period of 65 working days to predict :
- STDDEV Real Data = 79 MV = 403 (Max Value) ; STDDEV Prediction = 50 MV = 253
- STDDEV Real Data = 88 MV = 492 ; STTDEV Prediction = 39 MV = 225
- STDDEV Real Data = 58 MV = 268 ; STTDEV Prediction = 27 MV = 137
I'm always using the same parameters for my forest :
- .setNumTrees(60) .setMaxDepth(25) .setMaxBins(100)
In any cases, my max values of my prediction is smaller.
Is there a way to increase this Standard deviation for my predictions ? Should I add more features ? Should I try to changer the numTrees and maxDepth ?