1

I have 3 different methods of showing recommendations of products to users. I want to use vowpal wabbit to find context specific policies to choose the optimal action (3 actions as there are 3 methods of generating recommendations).

I followed the CB example that John Langford has described here. Also looked at this and this.

Context in my case is defined using 3 user features and 5 product features. I am, however, not clear on how to specify the cost as described on VW's CB example page linked above. Our goal is to maximize the CTR.

I have two questions regarding this situation.

  1. Do I need to present VW with "all" the data (clicks as well as non-clicks) or can I simply provide VW with only the clicks data along with the probability with which the action was selected?

  2. How should I define the cost for each click and non-click?

I think I should use cost = 1 for each generated recommendation, and cost = 0 if it results in a click, and should provide all these events to VW. Would appreciate if people here can provide their suggestions.

Nik
  • 1,279
  • 2
  • 13
  • 19
  • 1
    I guess click/no-click should be 0/1 cost, but I'm not from the click industry myself. Generally speaking, you need to provide all of the history, meaning, the actions where the user didn't click (and a cost of maybe 1 for them) are an equal citizen feedback. If it's the online scenario, you need to provide feedback (e.g. a cost of 1) for every "non-click". – matt May 11 '18 at 08:40

0 Answers0