I've been looking at creating an implementation of the MAXQ framework for offline batch hierarchical RL and am in search of a data generator for reinforcement learning. I've seen scikit's (and similar) data generation methods but they doesn't seem to focus on RL models.
Any tips on where to look for this?