In the RL book of Barto and Sutton, the authors have proved that any $\epsilon$-greedy policy with respect to $q_{\pi}$ is an improvement over any $\epsilon$-soft policy $\pi$ is assured by the policy improvement theorem. Let $\pi^{'}$ be the $\epsilon$-greedy policy. In this derivation, I couldn't understand how the authors the authors went from equation 1 to equation 2.
Equation 1 : $ q_{\pi}(s,\pi^{'}(s)) = \sum_{a}\pi^{'}(a|s)q(s,a)$
Equation 2 : $ q_{\pi}(s,\pi^{'}(s)) = \frac{\epsilon}{|A(s)|}\sum_{a} q(s,a) + ( 1 - \epsilon)max_{a}q_{\pi}(s,a)$
As far as I understand we are choosing non-greedy actions with $\epsilon$ probability and the greedy actions i.e. actions with $1 - \epsilon$ probability but then how did we end up with $\frac{\epsilon}{A(s)}$ as a weight for non-greedy actions shouldn't it be $\frac{\epsilon}{number\ of\ non-greedy \ actions}$ and this would get the summation of the weights to 1 as they are probabilities after all.
Am I missing something here? please help me out I am a beginner in RL thanks.