I have a dataset which has Quantity ordered
(along with other variables like product type, product price, customer group etc). Target variable is whether customer churned or not. I am doing this to convert my continuous variable into categorical values like high
, med
, low
based on Qty ordered level
However, my question is not based on the dataset itself but on the technique called supervised binning
.
Doesn't supervised binning
qualify as data leakage? because we create bins based on the target variable (train data only). Later, we use that info (bin info based on target column) and feed it as input to the model.
Can you share some insights on whether it is recommended to this?
If yes, why so?
If not, why so? Because, I see lot of tutorials and posts on doing supervised binning for discretization of continuous variables (during data preparation). Should I only use unsupervised binning
?