1

I'm training a model (trying Random Forest, GLM, and dNN) for Molecular function (binary predictor) prediction from set of descriptors. The data is of course highly unbalanced (3/97) and probably will be even more unbalanced on my actual dataset to which I am planning to transfer the model. My question is mostly about DNN. I downscalled the data (50/50) and used whetted cross entropy (90/10). This improves my model a lot and it can actually outperform Random Forest at for high sensitivity (e.g. at sensitivity 0.8 the precision is 0.2 while RF is 0.1). However I don't care about sensitivity, because this is a deco very model, but I care a lot for precision and Random forest can reach much better precision at low sensitivity (0.6 compared to 0.5 in NN). So the question is this, which tricks can I try to improve precision. What comes to my mind is changing downsampling ratio to have more hits proportion, using NN assemblies, tuning cross entropy weights (that strangely didn't have effect so far). Are there other well established solutions to this problem?

Sergej Andrejev
  • 173
  • 1
  • 6

0 Answers0