I read on Wikipedia that Thompson sampling consists in playing the action ${\displaystyle a \in {\mathcal {A}}}$ according to the probability that this action maximizes the expected reward.
This probability seems to be:
$\int {\mathbb {I}}[{\mathbb {E}}(r \;\vert \;a,\theta )=\max _{{a'}}{\mathbb {E}}(r \; | \; a',\theta )]\; P(\theta |{\mathcal {D}})\,d\theta$
How does one derive this Eq? That is, why is the value of the Eq. above the probability of the action maximizing expected reward)?
This Eq. can also be found in papers on Thompson sampling, e.g. first Eq. here.