I want to build a binary classification tree to clasfiy wether a person is working or not and use the model for prediction. I read that unbalanced data could be a problem. Now i ask myself at which treshold is the unbalance big enough that it could yield to problems for the tree? Below you can see the table outputs for the variable to see how unbalanced the the variable is that i try to predict. I want build 2 trees for 2 different years. Is the data to unbalanced or is it okay?
> table(testSet2002$Partizipation)
0 1
1031 2229
> table(trainSet2002$Partizipation)
0 1
2361 5246
> table(testSet2015$Partizipation)
0 1
1040 2210
> table(trainSet2015$Partizipation)
0 1
2352 5265