I have 3 different methods of showing recommendations of products to users. I want to use vowpal wabbit to find context specific policies to choose the optimal action (3 actions as there are 3 methods of generating recommendations).
I followed the CB example that John Langford has described here. Also looked at this and this.
Context in my case is defined using 3 user features and 5 product features. I am, however, not clear on how to specify the cost
as described on VW's CB example page linked above. Our goal is to maximize the CTR.
I have two questions regarding this situation.
Do I need to present VW with "all" the data (clicks as well as non-clicks) or can I simply provide VW with only the clicks data along with the probability with which the action was selected?
How should I define the cost for each click and non-click?
I think I should use cost = 1 for each generated recommendation, and cost = 0 if it results in a click, and should provide all these events to VW. Would appreciate if people here can provide their suggestions.