(元)强化学习开源代码调研


(元)强化学习相关开源代码调研

本地代码:https://github.com/lucifer2859/meta-RL 

元强化学习简介:https://www.cnblogs.com/lucifer1997/p/13603979.html

 

一、Meta-RL

1、Learning to Reinforcement Learn:CogSci 2017

  • https://github.com/awjuliani/Meta-RL
    • 环境:TensorFlow,CPU;
    • 任务:Dependent(Easy, Medium, Hard, Uniform)/Independent/Restless Bandit,Contextual Bandit,GridWorld
      • A3C-Meta-Bandit - Set of bandit tasks described in paper. Including: Independent, Dependent, and Restless bandits.
      • A3C-Meta-Context - Rainbow bandit task using randomized colors to indicate reward-giving arm in each episode.
      • A3C-Meta-Grid - Rainbow Gridworld task; a variation of gridworld in which goal colors are randomzied each episode and must be learned "on the fly."
    • 模型:one-layer LSTM A3C [Figure 1(a),无Enc层];
    • 实验:成功运行,无bug;训练收敛;结果大致相符;性能未达到论文效果(当前超参数);本地代码对其略有修改,参见https://github.com/lucifer2859/meta-RL/tree/master/Meta-RL

2、RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning (RL2):ICLR 2017

  • https://github.com/mwufi/meta-rl-bandits
    • 环境:PyTorch,CPU;
    • 任务:Independent Bandit;
    • 模型:two-layer LSTM REINFORCE;
    • 实验:成功运行,无bug;模型与论文不符,原文RNN模型为GRU;训练不收敛(当前超参数);
  • https://github.com/VashishtMadhavan/rl2
    • 环境:TensorFlow,CPU;
    • 任务:Dependent Bandit;
    • 模型:one-layer LSTM A3C [无Enc层];
    • 实验:运行失败,gym.error.UnregisteredEnv: No registered env with id: MediumBandit-v0;

3、Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML):ICML 2017

4、Evolved Policy Gradients (EPG):NeurIPS, 2018

5、A Simple Neural Attentive Meta-Learner:ICLR 2018

6、Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables (PEARL):arXiv: Learning, 2019

  •  https://github.com/katerakelly/oyster
    • 环境:PyTorch,GPU;
    • 任务:MuJoCo;
    • 模型:PEARL (SAC-based);
    • 实验:Docker配置过程中运行docker build . -t pearl失败;放弃Docker配置在本地对相关包进行安装后,可以成功运行;使用本地包需要提前加一句:conda config --set restore_free_channel true,不然找不到大部分特定版本的包,就会导致创建环境失败;相关问题可以咨询Chains朱朱的主页 - 博客园 (cnblogs.com)

7、Improving Generalization in Meta Reinforcement Learning using Learned Objectives (MetaGenRL): ICLR 2020

  • http://louiskirsch.com/code/metagenrl
    • 环境:TensorFlow,GPU;
    • 任务:MuJoCo;
    • 模型:MetaGenRL;
    • 实验:在tensorflow-gpu==1.14.0与tensorflow==1.13.2环境上运行python ray_experiments.py train时都会出现bug;

 

二、RL-Adventure

1、Deep Q-Learning:

2、Policy Gradients:

  • https://github.com/haarnoja/sac
    • 环境:TensorFlow,GPU;
    • 任务:Continuous Control Tasks (MuJoCo);
    • 模型:Soft Actor-Critic(SAC,第一版,模型带有状态价值函数V);
    • 实验:未运行;
  • https://github.com/denisyarats/pytorch_sac
    • 环境:PyTorch,GPU;
    • 任务:Continuous Control Tasks (MuJoCo);
    • 模型:Soft Actor-Critic(SAC,第一版,模型带有状态价值函数V);
    • 实验:未运行;
  • http://github.com/rail-berkeley/softlearning/
    • 环境:TensorFlow,GPU;
    • 任务:Continuous Control Tasks (MuJoCo);
    • 模型:Soft Actor-Critic(SAC,第二版,模型去掉了状态价值函数V);
    • 实验:未运行;

3、两者兼有:

  • https://github.com/ShangtongZhang/DeepRL
    • 环境:PyTorch,GPU;
    • 任务:Atari,MuJoCo;
    • 模型:(Double/Dueling/Prioritized) DQN,C51,QR-DQN,(Continuous/Discrete) Synchronous Advantage A2C,N-Step DQN,DDPG,PPO,OC,TD3,COF-PAC,GradientDICE,Bi-Res-DDPG,DAC,Geoff-PAC,QUOTA,ACE;
    • 实验:成功运行;
  • https://github.com/astooke/rlpyt
    • 环境:PyTorch,GPU;
    • 任务:Atari;
    • 模型:Modular, optimized implementations of common deep RL algorithms in PyTorch, with unified infrastructure supporting all three major families of model-free algorithms: policy gradient, deep-q learning, and q-function policy gradient.
      • Policy Gradient:A2C, PPO.
      • Replay Buffers:(supporting both DQN + QPG) non-sequence and sequence (for recurrent) replay, n-step returns, uniform or prioritized replay, full-observation or frame-based buffer (e.g. for Atari, stores only unique frames to save memory, reconstructs multi-frame observations).
      • Deep Q-Learning DQN + variants: Double, Dueling, Categorical (up to Rainbow minus Noisy Nets), Recurrent (R2D2-style).
      • Q-Function Policy Gradient DDPG, TD3, SAC.
    • 实验:
      • 成功运行,无bug;
  • https://github.com/vitchyr/rlkit
    • 环境:PyTorch,GPU;
    • 任务:gym[all]
    • 模型:Skew-Fit,RIG,TDM,HER,DQN,SAC(新版),TD3,AWAC;
    • 实验:未运行;
  • p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch: PyTorch implementations of deep reinforcement learning algorithms and environments (github.com)
    • 环境:PyTorch;
    • 任务:CartPole,MountainCar,Bit Flipping,Four Rooms,Long Corridor,Ant-[Maze, Push, Fall];
    • 模型:DQN,DQN with Fixed Q Target,DDQN,DDQN with Prioritised Experience Replay,Dueling DDQN,REINFORCE,DDPG,TD3,SAC,SAC-Discrete,A3C,A2C,PPO,DQN-HER,DDPG-HER,h-DQN,Stochastic NN-HRL,DIAYN;
    • 实验:部分模型在部分任务上成功运行(例如SAC-Discrete无法在Atari上成功运行);
  • https://github.com/hill-a/stable-baselines
    • 环境:TensorFlow;
  • https://github.com/openai/baselines
    • 环境:TensorFlow;
  • https://github.com/openai/spinningup
    • 环境:TensorFlow/PyTorch
    • 介绍:This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL). For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learningThis module contains a variety of helpful resources, including:
      • a short introduction to RL terminology, kinds of algorithms, and basic theory,
      • an essay about how to grow into an RL research role,
      • curated list of important papers organized by topic,
      • a well-documented code repo of short, standalone implementations of key algorithms,
      • and a few exercises to serve as warm-ups.
    • 实验:TD3在MuJuCo任务上运行成功;
  • quantumiracle/Popular-RL-Algorithms: PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet.. (github.com)
    • 环境:PyTorch/Tensorflow 2.0 + TensorLayer 2.0
    • 介绍:PyTorch和Tensorflow 2.0在OpenAI gym环境和自行实现的Reacher环境中实现了最先进的无模型强化学习算法。算法包括SAC,DDPG,TD3,AC/A2C,PPO,QT-Opt(包括交叉熵方法),PointNet,Transporter,Recurrent Policy Gradient,Soft Decision Tree,Probabilistic Mixture-of-Experts等。请注意,此repo更多的是我在研究和学习期间实现和测试的算法的个人集合,而不是供使用的官方开源库/包。然而,我认为与其他人分享可能会有所帮助,我期待着对我的实现进行有益的讨论。但我没有花太多时间清理或构建代码。正如您可能注意到的,每个算法可能有几个版本的实现,我特意在这里展示它们,供您参考和比较。此外,该repo仅包含PyTorch实施。对于RL算法的官方库,我提供了以下两个使用TensorFlow 2.0 + TensorLayer 2.0的方案:
      • RL Tutorial (Status: Released) contains RL algorithms implementation as tutorials with simple structures.

      • RLzoo (Status: Released) is a baseline implementation with high-level API supporting a variety of popular environments, with more hierarchical structures for simple usage.

      由于Tensorflow 2.0已经包含了动态图形构造而不是静态图形构造,因此在Tensorflow和PyTorch之间传输RL代码就变得非常简单。

    • 实验:PPO在Atari任务上运行性能无法收敛;

 

三、Meta Learning (Learn to Learn)

1、Platform:


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM