I am facing a challenging problem:
Say I have shirts of three different colors (same price). And say I am running a strange kind of store in into which people come in one by one, and I can show them only one shirt, and they decide weather to buy or not to buy before they leave.
I wish to optimize my sells.
Up to this point, this looks to me like a classic multiarmed bandit problem, which would be classified as reinforcement-learning problem.
Now say that I have a number of features about each person that comes into my store: age/gender and so on (a fairly large mix of categorical and non-categorical measures).
How can I use this information to optimize my sells?
I though about taking the following approach: Assuming that the expected rate of purchase for each of the colors is low, and though NOT equal, it is about the same (say around 1%), I can look at this problem as a classification problem (supervised learning). I will forget about the people who did not buy a shirt, and ask, from those who did buy a shirt, to which of the three groups of buyers does this new person resembles most? I`ll use a classification algorithm and try to show him the colored shirt that my trained classifier predicted.
Even though I do not know weather the buyers would have bought other colored shirt had they been presented, my hope is that with large number of buyers, and hopefully relevant features, a classifier can find similarities between the groups of buyers.
I know that this is far from a classic classification problem, but I think this can do better than just ignoring the feature I know of the people, and just use a multiarmed bandit like approach.
My questions:
Is my approach reasonable?
Any different ideas to approach this?