Thompson sampling with adaptive kernel density estimation

Asked Jun 14 '18 at 04:43

Active Jun 18 '18 at 18:27

Viewed 170 times

This is an extension to this question, which is about handling arbitrary (potentially unbounded) reward distributions for the multi-armed bandit problem. Given a sequence of observed rewards $r_t \in \mathbb{R}$ for arm $i$, one could try to approximate the true reward distribution $R_i$ using a Gaussian posterior. It also occurred to me that one could use adaptive kernel density estimation to approximate the true reward distribution. Has the application of adaptive KDE to Thompson sampling been studied? How does it compare empirically with the Gaussian approach, or other possible approaches?

asked Jun 14 '18 at 04:43

user76284

Thompson sampling with adaptive kernel density estimation

0 Answers0