I am wondering if there is an interpretation of the Bandit Problem with more than one states. I know that there are versions which views each slot machine as an independent Markovian machines and as such the states evolve when an arm is pulled.
However, I do not seem to find any discussions about incorporating states that is more or less based on the player's psychological/belief state. What I mean is that there should be some sort of distinction between the scenario where I have won \$5000 after ten trials and the scenario when I have lost \$5000 after ten trials. The way I I see it, whether or not I have won or lost bunch of money would certainly affect how I would make decisions.
The lack of these sort of variations of the Bandit Problem seems to imply that they are not particularly useful or practical, so I would very much appreciate if someone shed some light into why.