In a previous question I asked about use of Open AI Gym as a vehicle for modeling business problems as MDPs. A comment suggested that I start a new question with more refined scope. In general, I'm interested in RL for combinatorial optimization. As an example, I'd like to see a solution to the Traveling Salesman Problem (TSP) implemented in Open AI Gym.
I picked TSP because it's an established problem in combinatorial optimization and so I wouldn't be surprised if someone already has an implementation available. However, any combinatorial optimization problem, framed as an MDP, implemented in Open AI Gym would meet the "ask." The goal is getting enough context to know how to frame my own problems as MDPs in this powerful API.
Edit: Per request, an MDP or Markov Decision Process is a bit like a Hidden Markov Model. But rather than states and emissions $\{X,Y\}$, there are states, actions, and rewards $\{S,A,R\}$. An action in a given state influences the state transition probability of $s\to s'$ and a given action in the context of a specific state will culminate in a reward. $P(R=r|S=s,A=a)$. These problems are typically solved via dynamic programming (as the discrete-time HMM is.)
However, I'm looking for a unified framework approach to specifying an MDP as a function in OpenAIGym and solving through flexible, black box methods, as opposed to potentially writing a custom MDP solver.