I am trying to build a decision tree model and I have 700,000 true values while 1,300,000 data points are false values, in total, I have 2,000,000 data points including duplicates. I am wondering if the dataset is an imbalanced dataset for a decision tree model.
If this is an imbalanced dataset, can I use 1,000,000 true values and 700,000 false values making them balanced allowing duplicates to build a decision tree model, instead using all of them?
Although I don't know what effect exactly the duplicates will do to the model.