1

In a previous question I asked about use of Open AI Gym as a vehicle for modeling business problems as MDPs. A comment suggested that I start a new question with more refined scope. In general, I'm interested in RL for combinatorial optimization. As an example, I'd like to see a solution to the Traveling Salesman Problem (TSP) implemented in Open AI Gym.

I picked TSP because it's an established problem in combinatorial optimization and so I wouldn't be surprised if someone already has an implementation available. However, any combinatorial optimization problem, framed as an MDP, implemented in Open AI Gym would meet the "ask." The goal is getting enough context to know how to frame my own problems as MDPs in this powerful API.

Edit: Per request, an MDP or Markov Decision Process is a bit like a Hidden Markov Model. But rather than states and emissions $\{X,Y\}$, there are states, actions, and rewards $\{S,A,R\}$. An action in a given state influences the state transition probability of $s\to s'$ and a given action in the context of a specific state will culminate in a reward. $P(R=r|S=s,A=a)$. These problems are typically solved via dynamic programming (as the discrete-time HMM is.)

However, I'm looking for a unified framework approach to specifying an MDP as a function in OpenAIGym and solving through flexible, black box methods, as opposed to potentially writing a custom MDP solver.

Carl
  • 11,532
  • 7
  • 45
  • 102
jbuddy_13
  • 1,578
  • 3
  • 22
  • 1
    Are you sure you want to start with combinatorial optimization? Whilst you can use RL to approach them, its performance is not typically as good as purpose built solvers. It is highly unlikely you could build a RL solver that competed with Lin-Kernigan for example. An online stochastic optimisation problem might be more suitable? – Neil Slater Mar 02 '21 at 21:30
  • Hi @NeilSlater, my current solution(s) are genetic algorithms and Bayesian optimization (via GPs.) At this point, I'm interested in exploring what RL could do for a textbook "business problem" and thought Com. Op. was a good enough example. However, I'm curious- by online stochastic optimization, did you have something other than BO in mind? I'm open to exploring new ideas – jbuddy_13 Mar 03 '21 at 18:16
  • If you want to compare with GAs and Bayesian optimisation, then RL should compare well. Specifically for TSP I was thinking of this: http://webhotel4.ruc.dk/~keld/research/LKH/LKH-2.0/DOC/LKH_REPORT.pdf which gives fast, state-of-the-art results and will beat all three of the learning/search approaches – Neil Slater Mar 03 '21 at 19:47

0 Answers0