強化學習


強化學習筆記(一)

 

 

1 強化學習概述

隨着 Alpha Go 的成功,強化學習(Reinforcement Learning,RL)成為了當下機器學習中最熱門的研究領域之一。與常見的監督學習和非監督學習不同,強化學習強調智能體(agent)與環境(environment)的交互,交互過程中智能體需要根據自身所處的狀態(state)選擇接下來采取的動作(action),執行動作后,智能體會進入下一個狀態,同時從環境中得到這次狀態轉移的獎勵(reward)。

強化學習的目標就是從智能體與環境的交互過程中獲取信息,學出狀態與動作之間的映射,指導智能體根據狀態做出最佳決策,最大化獲得的獎勵。

 

 

 

 

 

2 強化學習要素

強化學習通常使用馬爾科夫決策過程(Markov Decision Process,MDP)來描述。MDP數學上通常表示為五元組的形式,分別是狀態集合,動作集合,狀態轉移函數,獎勵函數以及折扣因子。

近些年有研究工作將強化學習應用到更為復雜的MDP形式,如部分可觀察馬爾科夫決策過程(Partially Observable Markov Decision Process,POMDP),參數化動作馬爾科夫決策過程(Parameterized Action Markov Decision Process,PAMDP)以及隨機博弈(Stochastic Game,SG)。

 

 

狀態(S):一個任務中可以有很多個狀態,且我們設每個狀態在時間上是等距的;

動作(A):針對每一個狀態,應該有至少1個操作可選;

獎勵(R):針對每一個狀態,環境會在下一個狀態直接給予一個數值回饋,這個值越高,說明該狀態越值得青睞;

策略(π):給定一個狀態,經過π的處理,總是能產生唯一一個操作a,即a=π(s),π可以是個查詢表,也可以是個函數;

 

 

 

3 強化學習的算法分類

 

強化學習的算法分類眾多,比較常見的算法有馬爾科夫決策過程算法(MDP),Q-Learning算法等。在阿法狗人機大戰中,就得益於強化學習算法。

同時,強化學習也引發了博弈論的討論,用強化學習算法求解博弈論,用博弈論指導強化學習算法。二者是相輔相成的關系。在這些強化學習算法中都可以看到博弈論的思想。

 

4 強化學習應用

強化學習的經典應用案例有:非線性二級擺系統(非線性控制問題)、棋類游戲、機器人學習站立和走路、無人駕駛、機器翻譯、人機對話,博弈論等。概括來說,強化學習所能解決的問題為序貫決策問題,就是需要連續不斷做出決策,才能實現最終目標的問題。強化學習與其它的機器學習方法相比,專注於從交互中進行以目標為導向的學習。

 

 

圖:強化學習-無人駕駛示意圖

 

5 強化學習相關論文

一. 開山鼻祖DQN

1. Playing Atari with Deep Reinforcement Learning,V. Mnih et al., NIPS Workshop, 2013.

2. Human-level control through deep reinforcement learning, V. Mnih et al., Nature, 2015.

二. DQN的各種改進版本(側重於算法上的改進)

1. Dueling Network Architectures for Deep Reinforcement Learning. Z. Wang et al., arXiv, 2015.

2. Prioritized Experience Replay, T. Schaul et al., ICLR, 2016.

3. Deep Reinforcement Learning with Double Q-learning, H. van Hasselt et al., arXiv, 2015.

4. Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI, 2016.

5. Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop, 2016.

6. Deep Exploration via Bootstrapped DQN, I. Osband et al., arXiv, 2016.

7. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies, V. François-Lavet et al., NIPS Workshop, 2015.

8. Learning functions across many orders of magnitudes,H Van Hasselt,A Guez,M Hessel,D Silver

9. Massively Parallel Methods for Deep Reinforcement Learning, A. Nair et al., ICML Workshop, 2015.

10. State of the Art Control of Atari Games using shallow reinforcement learning

11. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening(11.13更新)

12. Deep Reinforcement Learning with Averaged Target DQN(11.14更新)

13. Safe and Efficient Off-Policy Reinforcement Learning(12.20更新)

14. The Predictron: End-To-End Learning and Planning (1.3更新)

三. DQN的各種改進版本(側重於模型的改進)

1. Deep Recurrent Q-Learning for Partially Observable MDPs, M. Hausknecht and P. Stone, arXiv, 2015.

2. Deep Attention Recurrent Q-Network

3. Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML, 2016.

4. Progressive Neural Networks

5. Language Understanding for Text-based Games Using Deep Reinforcement Learning

6. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

7. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

8. Recurrent Reinforcement Learning: A Hybrid Approach

9. Value Iteration Networks, NIPS, 2016 (12.20更新)

10. MazeBase:A sandbox for learning from games(12.20更新)

11. Strategic Attentive Writer for Learning Macro-Actions(12.20更新)

四. 基於策略梯度的深度強化學習

深度策略梯度:

1. End-to-End Training of Deep Visuomotor Policies

2. Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

3. Trust Region Policy Optimization

深度行動者評論家算法:

1. Deterministic Policy Gradient Algorithms

2. Continuous control with deep reinforcement learning

3. High-Dimensional Continuous Control Using Using Generalized Advantage Estimation

4. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

5. Deep Reinforcement Learning in Parameterized Action Space

6. Memory-based control with recurrent neural networks

7. Terrain-adaptive locomotion skills using deep reinforcement learning

8. Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

9. SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY(11.13更新)

搜索與監督:

1. End-to-End Training of Deep Visuomotor Policies

2. Interactive Control of Diverse Complex Characters with Neural Networks

連續動作空間下探索改進:

1. Curiosity-driven Exploration in DRL via Bayesian Neuarl Networks

結合策略梯度和Q學習:

1. Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC(11.13更新)

2. PGQ: COMBINING POLICY GRADIENT AND Q-LEARNING(11.13更新)

其它策略梯度文章:

1. Gradient Estimation Using Stochastic Computation Graphs

2. Continuous Deep Q-Learning with Model-based Acceleration

3. Benchmarking Deep Reinforcement Learning for Continuous Control

4. Learning Continuous Control Policies by Stochastic Value Gradients

5. Generalizing Skills with Semi-Supervised Reinforcement Learning(12.20更新)

五. 分層DRL

1. Deep Successor Reinforcement Learning

2. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

3. Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks

4. Stochastic Neural Networks for Hierarchical Reinforcement Learning – Authors: Carlos Florensa, Yan Duan, Pieter Abbeel (11.14更新)

六. DRL中的多任務和遷移學習

1. ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources

2. A Deep Hierarchical Approach to Lifelong Learning in Minecraft

3. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

4. Policy Distillation

5. Progressive Neural Networks

6. Universal Value Function Approximators

7. Multi-task learning with deep model based reinforcement learning(11.14更新)

8. Modular Multitask Reinforcement Learning with Policy Sketches (11.14更新)

七. 基於外部記憶模塊的DRL模型

1. Control of Memory, Active Perception, and Action in Minecraft

2. Model-Free Episodic Control

八. DRL中探索與利用問題

1. Action-Conditional Video Prediction using Deep Networks in Atari Games

2. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks

3. Deep Exploration via Bootstrapped DQN

4. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

5. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

6. Unifying Count-Based Exploration and Intrinsic Motivation

7. #Exploration: A Study of Count-Based Exploration for Deep Reinforcemen Learning(11.14更新)

8. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning(11.14更新)

9. VIME: Variational Information Maximizing Exploration(12.20更新)



九. 多Agent的DRL

1. Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

2. Multiagent Cooperation and Competition with Deep Reinforcement Learning

十. 逆向DRL

1. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

2. Maximum Entropy Deep Inverse Reinforcement Learning

3. Generalizing Skills with Semi-Supervised Reinforcement Learning(11.14更新)

十一. 探索+監督學習

1. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning

2. Better Computer Go Player with Neural Network and Long-term Prediction

3. Mastering the game of Go with deep neural networks and tree search, D. Silver et al., Nature, 2016.

十二. 異步DRL

1. Asynchronous Methods for Deep Reinforcement Learning

2. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU(11.14更新)

十三:適用於難度較大的游戲場景

1. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv, 2016.

2. Strategic Attentive Writer for Learning Macro-Actions

3. Unifying Count-Based Exploration and Intrinsic Motivation

十四:單個網絡玩多個游戲

1. Policy Distillation

2. Universal Value Function Approximators

3. Learning values across many orders of magnitude

十五:德州poker

1. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

2. Fictitious Self-Play in Extensive-Form Games

3. Smooth UCT search in computer poker

十六:Doom游戲

1. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning

2. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning

3. Playing FPS Games with Deep Reinforcement Learning

4. LEARNING TO ACT BY PREDICTING THE FUTURE(11.13更新)

5. Deep Reinforcement Learning From Raw Pixels in Doom(11.14更新)

十七:大規模動作空間

1. Deep Reinforcement Learning in Large Discrete Action Spaces

十八:參數化連續動作空間

1. Deep Reinforcement Learning in Parameterized Action Space

十九:Deep Model

1. Learning Visual Predictive Models of Physics for Playing Billiards

2. J. Schmidhuber, On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models, arXiv, 2015. arXiv

3. Learning Continuous Control Policies by Stochastic Value Gradients

4.Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

5. Action-Conditional Video Prediction using Deep Networks in Atari Games

6. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

二十:DRL應用

機器人領域:

1. Trust Region Policy Optimization

2. Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

3. Path Integral Guided Policy Search

4. Memory-based control with recurrent neural networks

5. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

6. Learning Deep Neural Network Policies with Continuous Memory States

7. High-Dimensional Continuous Control Using Generalized Advantage Estimation

8. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

9. End-to-End Training of Deep Visuomotor Policies

10. DeepMPC: Learning Deep Latent Features for Model Predictive Control

11. Deep Visual Foresight for Planning Robot Motion

12. Deep Reinforcement Learning for Robotic Manipulation

13. Continuous Deep Q-Learning with Model-based Acceleration

14. Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

15. Asynchronous Methods for Deep Reinforcement Learning

16. Learning Continuous Control Policies by Stochastic Value Gradients

機器翻譯:

1. Simultaneous Machine Translation using Deep Reinforcement Learning

目標定位:

1. Active Object Localization with Deep Reinforcement Learning

目標驅動的視覺導航:

1. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

自動調控參數:

1. Using Deep Q-Learning to Control Optimization Hyperparameters

人機對話:

1. Deep Reinforcement Learning for Dialogue Generation

2. SimpleDS: A Simple Deep Reinforcement Learning Dialogue System

3. Strategic Dialogue Management via Deep Reinforcement Learning

4. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

視頻預測:

1. Action-Conditional Video Prediction using Deep Networks in Atari Games

文本到語音:

1. WaveNet: A Generative Model for Raw Audio

文本生成:

1. Generating Text with Deep Reinforcement Learning

文本游戲:

1. Language Understanding for Text-based Games Using Deep Reinforcement Learning

無線電操控和信號監控:

1. Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent

DRL來學習做物理實驗:

1. LEARNING TO PERFORM PHYSICS EXPERIMENTS VIA DEEP REINFORCEMENT LEARNING(11.13更新)

DRL加速收斂:

1. Deep Reinforcement Learning for Accelerating the Convergence Rate(11.14更新)

利用DRL來設計神經網絡:

1. Designing Neural Network Architectures using Reinforcement Learning(11.14更新)

2. Tuning Recurrent Neural Networks with Reinforcement Learning(11.14更新)

3. Neural Architecture Search with Reinforcement Learning(11.14更新)

控制信號燈:

1. Using a Deep Reinforcement Learning Agent for Traffic Signal Control(11.14更新)

自動駕駛:

1. CARMA: A Deep Reinforcement Learning Approach to Autonomous Driving(12.20更新)

2. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control(12.20更新)

3. Deep Reinforcement Learning framework for Autonomous Driving(12.20更新)

二十一:其它方向

避免危險狀態:

1. Combating Deep Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear (11.14更新)

DRL中On-Policy vs. Off-Policy 比較:

1. On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning(11.14更新)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM