1

I am trying to predict the most successful contact method (like social media, telephone, email) that will make people buy a product given some X input variables such as age, race, and gender. Our current approach is to simply use a train set with columns age, race and gender, and the contact method. We know that people in the train set are considered successful aka they all bought a product in the end. However, is this the best way to go about doing this prediction? Due to survivorship bias, shouldn't we have to consider the unsuccessful observations as well? I.E, do we have to consider the methods which resulted in people not buying anything?

However, I am unable to think of a solution for this. There does not seem to be a 'weighted' regression where I would be able to predict Y based on X, using outcome as a weight?

  • Yes: you should "consider the unsuccessful observations". If you do not have those, you should at least consider the overall numbers of types of contact and likely demographic features of those being contacted. – Henry Oct 21 '20 at 07:58
  • @Henry Yes, after reflecting more, I came to the same conclusion. However, I am lost as to specifically how to code that in R. Obviously, I cannot just use regression on the overall dataset as I will only end up predicting the contact method, not the MOST SUCCESSFUL contact method. Do you have any idea on how I may proceed? – Chee Jia Yuan Oct 21 '20 at 08:03
  • This post gives an overview of Wald's approach to proper estimation of likelihood of plane going down: https://stats.stackexchange.com/questions/465188/abraham-wald-survivorship-bias-intuition (Would also be interested to – Bryan Shalloway May 20 '21 at 16:26

0 Answers0