Why psuedo regret and not regret is used in adversarial bandits?

Asked Feb 09 '19 at 11:58

Active Feb 09 '19 at 11:58

Viewed 32 times

In adversarial settings, psuedo regret and not the actual regret is used. The explanation I have been given is that with actual regret the problem is no longer learnable (that is adversary can generate losses so that regret is no longer sub-linear).

But I don't see how adversary can do such a thing. Can you give a protocol for the adversary that will make it impossible to make regret sub-linear for any policy?

asked Feb 09 '19 at 11:58

Inspired_Blue

Could you clarify exactly what you're referring to by pseudo-regret and "actual regret"? – JP Trawinski Feb 15 '19 at 16:26

Why psuedo regret and not regret is used in adversarial bandits?

0 Answers0