There is a two-player game (discrete, deterministic, perfect information and so on) where - in some but not all states - a few moves may be equally good; i.e. they are symmetric and expert player will expect the same outcome from any of those moves. How to train neural network agent to treat those moves equally and how to measure accuracy for such approach?
Or maybe such approach doesn't make sense at all? Maybe policy should stick with only one variant (i.e. "go left side") so we can point only one move as designated valid one for given game state?