神刀安全网

OpenAI Gym Beta for developing and comparing reinforcement learning algorithms

We’re releasing the public beta ofOpenAI Gym, a toolkit for developing and comparingreinforcement learning(RL) algorithms. It consists of a growing suite ofenvironments (fromsimulated robots toAtari games), and a site for comparing and reproducing results. OpenAI Gym is compatible with algorithms written in any framework, such as Tensorflow and Theano . The environments are written in Python, but we’ll soon make them easy to use from any language.

We originally built OpenAI Gym as a tool to accelerate our own RL research. We hope it will be just as useful for the broader community.

Getting started

If you’d like to dive in right away, you can work through ourtutorial. You can also help out while learning byreproducing a result .

Why RL?

Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:

  • RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot’s motors so that it’s able torun andjump, making business decisions like pricing and inventory management, or playingvideo games andboard games. RL can even be applied to supervised learning problems with sequential or structured outputs.
  • RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind’s Atari results , BRETT fromPieter Abbeel’s group, and AlphaGo all used deep RL algorithms which did not make too many assumptions about their environment, and thus can be applied in other settings.

However, RL research is also slowed down by two factors:

  • The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet . In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.
  • Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

OpenAI Gym is an attempt to fix both problems.

The Environments

OpenAI Gym provides a diverse suite of environments that range from easy to difficult and involve many different kinds of data. We’re starting out with the following collections:

  • Classic control andtoy text: complete small-scale tasks, mostly from the RL literature. They’re here to get you started.
  • Algorithmic: perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it’s easy to vary the difficulty by varying the sequence length.
  • Atari: play classic Atari games. We’ve integrated the Arcade Learning Environment (which has had a big impact on reinforcement learning research) in an easy-to-install form.
  • Board games: play Go on 9×9 and 19×19 boards. Two-player games are fundamentally different than the other settings we’ve included, because there is an adversary playing against you. In our initial release, there is a fixed opponent provided by Pachi , and we may add other opponents later (patches welcome!). We’ll also likely expand OpenAI Gym to have first-class support for multi-player games.
  • 2D and 3D robots: control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. Included are some environments from a recent benchmark by UC Berkeley researchers (who incidentally will bejoining us this summer). MuJoCo is proprietary software, but offers free trial licenses.

Over time, we plan to greatly expand this collection of environments. Contributions from the community are more than welcome.

Each environment has a version number (such as Hopper-v0 ). If we need to change an environment, we’ll bump the version number, defining an entirely new task. This ensures that results on a particular environment are always comparable.

Evaluations

We’ve made it easy toupload results to OpenAI Gym. However, we’ve opted not to create traditional leaderboards. What matters for research isn’t your score (it’s possible to overfit or hand-craft solutions to particular tasks), but instead the generality of your technique.

We’re starting out by maintaing acurated list of contributions that say something interesting about algorithmic capabilities. Long-term, we want this curation to be a community effort rather than something owned by us. We’ll necessarily have to figure out the details over time, and we’d would love yourhelp in doing so.

We want OpenAI Gym to be a community effort from the beginning. We’ve starting working with partners to put together resources around OpenAI Gym:

  • NVIDIA : technical Q&A with John.
  • Nervana : implementation of a DQN OpenAI Gym agent .
  • Amazon Web Services (AWS) : $250 credit vouchers for select OpenAI Gym users. If you have an evaluation demonstrating the promise of your algorithm and are resource-constrained from scaling it up,ping us for a voucher. (While supplies last!)

During the public beta, we’re looking for feedback on how to make this into an even better tool for research. If you’d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people’s results, or even implementing your own environments. Also please join us in thecommunity chat!

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » OpenAI Gym Beta for developing and comparing reinforcement learning algorithms

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址