I am studying the Bandit Algorithms book by Tor Lattimore and Csaba Szepesv´ari and I have studied the adversarial bandit problem. However, I don't understand what is the mechanism of adversarial bandit problem. The book says that regret of deterministic policies is linear because, in adversarial bandits, we assume that we will our policy to adversary and he will choose the reward based on our policy. Therefore, the regret would be linear. In this setting, does it mean that we would say that for example, we will always play arm 2 or just we play a deterministic policy? I would be thankful if someone can explain the mechanism of the adversarial bandit and also its contextual form.
Moreover, I am wondering why the regret for the contextual bandit is defined as follows:
$$R_{n}= \mathbf{E} \left[ \sum_{c \in C} \max_{i \in [K]} \sum_{t \in [n]: \: c_t=c} (x_{it}-X_t) \right]$$
Does it mean that the same contextual information should be applied for all periods? I thought that regret should be defined as follows:
$$ R_n= \mathbf{E} \left[ \max_{i \in [k]} \sum_{t \in [n]} \max_{c_t \in C} (x_{it}-X_{t}) \right] $$