Problem description: Every 3 weeks a fashion company sends out an expensive booklet with descriptions of clothes to each customer on their electronic records. There exists a purchase history what each customer bought in the 3 weeks after receiving the booklet and pricing information about clothes.
The problem is to find those customers to whom it makes sense to send the booklet, so that fewer booklets are sent.
Additionally to this data, there is also a small dataset containing information what customers bought in a period of 9 weeks when no booklet was sent out.
In order to solve this problem, I need to turn it into a machine learning problem - but this is the step I'm struggling with. I need to define some criterion for customer that determines when it makes sense to send out a booklet to them!
Question: Could you please help me figure out a sufficiently good criterion to use and which (probably unsupervised) algorithm will find the customers fulfilling that criterion?
I think the best criterion would be if sending out the booklet to a specific customer results in expected gains for the company that are greater than the cost of one booklet - but I have no idea how to model this. Could you give me any advice?
A much less ideal criterion would be to simply find those customers that bought stuff that was in the booklet. I know how to implement a model solving this problem - but I would like a less crude criterion than this one.
More precise description of the dataset: Pictures tell a 1000 words, so there is a simplified toy example about the process of sending out booklets and user behavior, that is contained in the dataset:
So, for example, on 21 Jan 2012 to all customer a booklet with 3 items was send. In the ensuing 3 weeks, customer 000001 bought one item from the booklet (the shirt), customer 000002 bought something that was not in that booklet (a different type of shirt than advertised) and customer 000003 bought nothing.
Looking at all the data in this toy example of the dataset, it seems that customer 000001 usually bought things from the booklet (and more), while customer 000002 sometimes bought this, though they were not directly related to the booklet he last received and customer 000003 never engaged.
Booklets contain only a subset of all products customers can buy and they don't all contain the same number of products.
But remember that a small portion of the dataset also contains information what customers bought when no booklet was sent out.