Stable version, confirmed learning PPO, TRPO, MMPO