I am referring to pages 130-131 of Sutton and Barto book on Reinforcement Learning available here: book
I don't understand the slight difference that there is between the two procedural algorithms described respectively at page 130 for Sarsa and at page 131 for Q-learning.
Indeed, in the first case the $\varepsilon$-greedy choice of action $A$ is inside the loop for each episode but before of the loop for each step of the episode, while in the second one the $\varepsilon$-greedy choice of action $A$ is inside the loop for each step of the episode. Does this imply any real difference between the two algorithms (except the update rule for $Q(s,a)$ of course), or is this only a formal one?
To be more precise: can I move the the $\varepsilon$-greedy choice of action $A$ inside the loop for each step of the episode also in Sarsa algorithm?