Advertisment decision making based on customer past behaviour

Question

Problem description: Every 3 weeks a fashion company sends out an expensive booklet with descriptions of clothes to each customer on their electronic records. There exists a purchase history what each customer bought in the 3 weeks after receiving the booklet and pricing information about clothes. The problem is to find those customers to whom it makes sense to send the booklet, so that fewer booklets are sent.
Additionally to this data, there is also a small dataset containing information what customers bought in a period of 9 weeks when no booklet was sent out.

In order to solve this problem, I need to turn it into a machine learning problem - but this is the step I'm struggling with. I need to define some criterion for customer that determines when it makes sense to send out a booklet to them!

Question: Could you please help me figure out a sufficiently good criterion to use and which (probably unsupervised) algorithm will find the customers fulfilling that criterion?
I think the best criterion would be if sending out the booklet to a specific customer results in expected gains for the company that are greater than the cost of one booklet - but I have no idea how to model this. Could you give me any advice?
A much less ideal criterion would be to simply find those customers that bought stuff that was in the booklet. I know how to implement a model solving this problem - but I would like a less crude criterion than this one.

More precise description of the dataset: Pictures tell a 1000 words, so there is a simplified toy example about the process of sending out booklets and user behavior, that is contained in the dataset:

So, for example, on 21 Jan 2012 to all customer a booklet with 3 items was send. In the ensuing 3 weeks, customer 000001 bought one item from the booklet (the shirt), customer 000002 bought something that was not in that booklet (a different type of shirt than advertised) and customer 000003 bought nothing.
Looking at all the data in this toy example of the dataset, it seems that customer 000001 usually bought things from the booklet (and more), while customer 000002 sometimes bought this, though they were not directly related to the booklet he last received and customer 000003 never engaged.
Booklets contain only a subset of all products customers can buy and they don't all contain the same number of products.
But remember that a small portion of the dataset also contains information what customers bought when no booklet was sent out.

If you have enough data of sending Vs not sending the booklet to different sets of customers then it's a supervised learning problem, eg predict average spend in next 90 days, given past history and whether or not they received a booklet. See eg 'causality in machine learning' on Google unofficial data science blog. Other keywords you might search for is database marketing... — seanv507, Oct 06 '18 at 20:35
@seanv507 I like very much the idea of not analysing what happens to individual products (or groups of similar products, such as "shirts"), but just focussing on average spend. The problem with your approach though (as well as in the answer below) is that I do not have a lot of data about customers who did not receive a booklet, so its very hard to compute the average spend for them. Would you know any techniques that would help? — MyCatsHat, Oct 07 '18 at 07:58
Maybe I could make some assumptions (e.g. if X was not in the past 5 booklets and in this time, say, 70% of all the people bought X then this X should be considered to be bought as if no booklet had been sent out) that allow me to infer how customers would have behaved if no booklet would have been sent out and therefore to estimate the average spend this way? But how could I check if the type of behavior in the assumption above was indeed not correlated or dependent on the sending of the booklets in any way? — MyCatsHat, Oct 07 '18 at 08:08
This approach of making a long list if plausible assumptions and then testing them does not "feel" to be the right way to go about this, there must be a more automatic way. — MyCatsHat, Oct 07 '18 at 08:09
This is exactly why people ab test. You need to test what happens if you don't send a book. Now as discussed in causality in machine learning article, you need a regression model and you use all the data. — seanv507, Oct 07 '18 at 08:27
Welcome to science! Making hypotheses and testing them is normal. — seanv507, Oct 07 '18 at 08:30
Thanks for the book - that will be a great read for later, right now I need to some quick tips to be able to start coding to not miss the deadline. — MyCatsHat, Oct 07 '18 at 15:54
I know how science works :p But the problem is if I start making lists of hypotheses and test them individually I might miss some hypothesis. That is why I would not like to go down that way at all, if there is an alternative. And even if there were no alternative, I can't get more data, so I can't go into A/B testing to figure out which assumptions was good. I need to work with the data I got now. Thus I return to the question in my first comment: Do you know any general techniques, given my (fixed) data, to factor out the influence of the booklet on the customers [...] — MyCatsHat, Oct 07 '18 at 15:58
[..] so that I ultimately may predict the average spend if I have not sent out a booklet. Or maybe something like bootstrapping (I haven't used this ever) to increase the amount of data of customers not having received the booklet? — MyCatsHat, Oct 07 '18 at 16:01
P.S. I haven't had time yet to read to google blog, I'll try to do so by tomorrow. — MyCatsHat, Oct 07 '18 at 18:28
@seanv507 So I read the google blog; unfortunately the section "Using randomization in training" which is the interesting bit, telling me how, was a bit too vague and at the same time to technical for me to make much sense of it. Therefore I made a followup question to this one: https://stats.stackexchange.com/questions/370777/unofficial-google-data-science-blog-problem-application — MyCatsHat, Oct 08 '18 at 15:02

score 2 · Accepted Answer · answered Oct 06 '18 at 17:05

2

This does not seem to be a direct machine learning problem. Basically, in machine learning, you have two main types of approaches:

supervised learning: train a model to predict something based on labelled data; to perform supervised learning on your problem, you would need to know a set of customers to whom it is worth sending the booklet
unsupervised learning: analyze your data to find patterns, clusters, outliers... this does not tell you if you should send the booklet or not.

I also think that your problem may be more complicated than what you explained. For instance, a customer could buy something from a booklet they received more than three weeks ago.

The key here is the behaviour of customers when the booklet is not sent. You wrote that you had a small dataset, perhaps you will need to grow it to make it significant, depending on its size. Also, the data might be twisted if their is any contextual side effect (for instance, if the 9 week off period happened during summer holidays).

Machine learning could actually help improving your data on what happens when the booklet is not sent. Using unsupervised learning techniques, you could cluster your customers depending on their characteristics (what is in the records), and their behaviour towards the booklet (for instance, fraction of their total order that was advertised in a recent booklet). You would expect customers from the same cluster to behave the same: then, you just need to stop sending the booklet to a few of them, and compare their behaviour with that of the other customers.

Then, a good metric, to decide whether you should send the booklet to some type of customer, would be the average difference between the amount of orders from those who receive the booklet, and those who don't.

answered Oct 06 '18 at 17:05

Romain Reboulleau

619
3
16

This approach would of course be the best - but I have no way of obtaining more data (at least the 9 months have no contextual problems associated). I need to deal with what I have; so I'm afraid I see no possibility to use your metric. The metric you mention focuses on a single customer. That is why I think the bold metric I mentioned in theory may be better, because it makes no assumption on the actual behavior of customers, it just measures the global increase in revenue. [...] – MyCatsHat Oct 07 '18 at 07:49
Thus it includes cases where for example customer A gets sent a booklet but then customer B (who may not even have been in the database, as he is a new customer) buys a lot of things from the booklet, because customer for example lives in the same house as A and thus had access. These effects how sending a booklet to one customer, which triggers a different customer to buy stuff would not be taken into account with your metric. But I don't know either how to implement my metric. Do you have any ideas about this? Or could you think of a metric I could use, given the data that I have? – MyCatsHat Oct 07 '18 at 07:49
In statistics, either you have data and make use of it, or you don't, then you have to make assumptions. I could only recommend that you make assumptions based on your problem prior knowledge. But still, even if you can't improve your data, clustering your customers remains a good idea. The lack of data (on what happens without a booklet) would be partially balanced by the fact that you analyze a set of customers. – Romain Reboulleau Oct 07 '18 at 12:08
But could you please also go a bit into detail regarding the 2 questions? – MyCatsHat Oct 07 '18 at 15:03
Regarding clustering, I don't understand why that would help me in the scenario where I can control the data I will have? Also, how would a "good" clustering look like? Should customers be similar if they bought the same stuff at the same time, or just if they bought things in the same price range - I'm not sure, since I don't see how clustering would improve my data (and not just help me make a good decision on whom not to send the booklet, since I cannot do that). – MyCatsHat Oct 07 '18 at 15:12
Clustering allows to bin your data and get rid of single customer side-effect. Also, you can study a whole group by taking just a few samples inside. $\\$ There is no such thing as a good *a priori* clustering, it all depends on your knowledge of your problem. It could be price range, home town, age... – Romain Reboulleau Oct 08 '18 at 17:39

Advertisment decision making based on customer past behaviour

1 Answers1