I have a simple dataset with balanced target y (0 or 1) ,and imbalanced feature (many 0 , few 1's)
I aim to get high precision (don't care about recall).
I can get high precision of 0.53 if I just assign y=1 if x=1 but when i train DecisionTree, xgboost, randomforest , they all produce model wihch just outputs 1's for any feature value, i.e. they cant find that simple rule (y=1 iff x=1) (precision I get is only 0.38 using these algos) .
what algorithm should I use and how can i make some ML algo learn that simple rule to maximize precision, and do not degenerate to always output 1.
Note that the actual problem will involve many features, thus need robust ML algo.
# sample synthetic data, DecisionTree fails to find the simple rule
df=pd.DataFrame({'x':np.random.choice([0, 1], size=10000, p=[.99, .01])})
df['y']=np.random.randint(0,2,10000)
df.loc[df.x==1,'y']=1
#precision by using rule y=1 if x==1 else y=0
df.query('x==1')['y'].mean() # = 1.0