I found 2 diffefent versions of $\epsilon$Greedy policy for monte carlo and q learning:
For monte carlo: $\pi (a|s)=\epsilon /m +1-\epsilon$ to choose the best action and $\pi =\epsilon /m$ for other actions
For q learning: $\pi (a|s)=1-\epsilon$ to choose the best action and $\epsilon$ to choose uniformly random action from possible actions
They both are stated as epsilon greedy policy. Are they different? (i think they are) am i missing somethings here or they really have the same name?
P/s: i am pretty sure they are different now, just aliitle confused about the names and the meanings of them in 2 different methods (monte carlo and qlearning)