I am looking at this paper on posterior sampling.
The algorithm is on page 8 (image below):
- Let’s say I have 3 arms and on line 22 arm 3 is the best followed by arm 2 then arm
- Line 24 calculates the number of candidate arm sample to draw. There is a sum over all the sub-optimal arms to produce N_t+1. Let’s say that turns out to be 100.
- How are lines 25 and 26 calculated?
- When it says “Draw N_t+1 candidate arm samples” I will draw 100 samples but from what distribution? Cat(…) takes as it’s argument the probability that arm “a” will be the optimal arm at time t+1 but how does Cat(…) lead to selecting an arm 1,2 or 3? I am confused as to what Cat(…) is doing. Can you explain with this example?
Basically what is $$Cat(p_{{\hat{a}},t+1})$$ doing in line 25 and what does it produce that is used in line 26?
- Then on Line 26 it looks like there will be a sequence of arm numbers 1111…2222…33333 and the mode will be selected as the arm to play?
This is from page 8: