Approach to be used when the independent variables are categorical

Question

I have a dataset of 45k datapoints. The dependent variable is Continuous (Time to resolve certain ticket) and most of the independent variables are categorical in nature. I tried to apply multiple linear regression and Random forest. The accuracy with both the models seems to be pretty bad (around 6%).

Can I get help from this forum how to approach to this kind of problems to get a better performing model?

[Sometimes your machine learning or statistical problem simply is hopeless.](https://stats.stackexchange.com/q/222179/1352) If you have a lot of residual noise, or equivalently, driving factors that you cannot capture (or *too many* driving factors, so you run into the bias-variance dilemma), then you simply won't be able to predict as well as you'd like. You can't predict a tossed coin with more than 50% accuracy, nor a [twenty-sided die](https://en.wikipedia.org/wiki/Dice#Polyhedral_dice) with more than 5% accuracy. — Stephan Kolassa, May 09 '17 at 09:26

score 0 · Answer 1 · answered Dec 19 '17 at 22:26

0

Can you transform the regressors in dummies variables through the dummies package (attention for the increase of amount of memory that new model ask)

answered Dec 19 '17 at 22:26

Luigi Biagini

75
6

Seems like a question rather than an answer. – Michael R. Chernick Dec 19 '17 at 23:06

Approach to be used when the independent variables are categorical

1 Answers1