0

I have a dataset of 45k datapoints. The dependent variable is Continuous (Time to resolve certain ticket) and most of the independent variables are categorical in nature. I tried to apply multiple linear regression and Random forest. The accuracy with both the models seems to be pretty bad (around 6%).

Can I get help from this forum how to approach to this kind of problems to get a better performing model?

  • 6
    How do you know a better accuracy is even possible? – Spacedman May 09 '17 at 08:48
  • [Sometimes your machine learning or statistical problem simply is hopeless.](https://stats.stackexchange.com/q/222179/1352) If you have a lot of residual noise, or equivalently, driving factors that you cannot capture (or *too many* driving factors, so you run into the bias-variance dilemma), then you simply won't be able to predict as well as you'd like. You can't predict a tossed coin with more than 50% accuracy, nor a [twenty-sided die](https://en.wikipedia.org/wiki/Dice#Polyhedral_dice) with more than 5% accuracy. – Stephan Kolassa May 09 '17 at 09:26
  • Those are appropriate methods for your case. – Peter Flom May 09 '17 at 10:45

1 Answers1

0

Can you transform the regressors in dummies variables through the dummies package (attention for the increase of amount of memory that new model ask)