在Matlab 上使用 Reinforcement learning

環境搭建

在Matlab中安裝Deep Learning Toolbox后安裝Reinforcement Learning Toolbox

什么是強化學習

強化學習的最終目標是在未知的環境中訓練一個agent，這個agent接受來自環境的observation與reward並對環境輸出action，其中的reward用來表示當前動作對任務目標的貢獻。
agent由policy和Reinforcement learning algorithm兩部分構成.

policy部分相當於閉環控制系統中的控制器。
reinforcement learning algorithm部分基於observation, action, reward對policy的參數進行調節。其目標是找到一個最優的policy最大化累計reward。

在Matlab使用reinforcement learning入門

直立小車環境模擬

對於環境：

直立狀態角度為0,垂直向下角度為pi
開始狀態的初始角度在-0.05~0.05之間
從agent到enviroment的力量信號在-10~10N之間
觀察量為：位置，速度，角度和角速度。
如果角度偏差超過12^o或者移動距離大於2.4m則終止程序
每當處於直立狀態時+1分，每當倒下-5分。

創建預設環境

可以使用rlPredefinedEnv調用Matlab預設的環境。環境中包含了reset和step兩個函數，這兩個函數描述了環境的功能細節。

env = rlPredefinedEnv("CartPole-Discrete");

env =
CartPoleDiscreteAction with properties:

           	  Gravity: 9.8000
             MassCart: 1
             MassPole: 0.1000
               Length: 0.5000
             MaxForce: 10
                   Ts: 0.0200
ThetaThresholdRadians: 0.2094
           XThreshold: 2.4000
  RewardForNotFalling: 1
    PenaltyForFalling: -5
                State: [4x1 double]

obsInfo = getObservationInfo(env);

obsInfo =
rlNumericSpec with properties:

 LowerLimit: -Inf
 UpperLimit: Inf
       Name: "CartPole States"
Description: "x, dx, theta, dtheta"
  Dimension: [4 1]
   DataType: "double"

actInfo = getActionInfo(env);

rng(0);

創建DQN Agent

接着創建PG Agent，首先需要創建Policy的神經網絡結構，該網絡結構決定了強化學習的表現的上限, 再使用rlStochasticActorRepresentation對該網絡進行representation，接着按默認選項創建PGAgent即可。

dnn = [
	featureInputLayer(obsInfo.Dimension(1),'Normalization','none','Name','state')
	fullyConnectedLayer(24,'Name','CriticStateFC1')
	reluLayer('Name','CriticRelu1')
    	fullyConnectedLayer(24, 'Name','CriticStateFC2')
    	reluLayer('Name','CriticCommonRelu')
	fullyConnectedLayer(length(actInfo.Elements),'Name','output')];

該網絡的結構

figure
plot(layerGraph(dnn))

使用rlRepresentationOptions設定一些評價器參數

criticOpts = rlRepresentationOptions('LearnRate',0.001,'GradientThreshold',1);

創建一個critic

critic = rlQValueRepresentation(dnn,obsInfo,actInfo,'Observation',{'state'},criticOpts);

使用rlDQNAgentOptions設定一些agent參數，並使用rlPGAgent創造一個agent

agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false, ...    
    'TargetSmoothFactor',1, ...
    'TargetUpdateFrequency',4, ...   
    'ExperienceBufferLength',100000, ...
    'DiscountFactor',0.99, ...
    'MiniBatchSize',256);
agent = rlDQNAgent(critic,agentOpts);

訓練Agent

為了訓練agent，首先需要設定一些參數，我們在這里使用下列的參數：

運行一次最多包含 1000 個片段的訓練，每個片段最多持續 500 個時間步長。
在Episode Manager dialog box中顯示訓練過程，禁用命令行顯示
當agent平均得分大於480的時候停止訓練

trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1000, ...
    'MaxStepsPerEpisode',500, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',480);

在訓練或者仿真時可以用plot觀看。

plot(env)

通過使用train來訓練。

doTraining = false;
if doTraining    
    % Train the agent.
    trainingStats = train(agent,env,trainOpts);
else
    % Load the pretrained agent for the example.
    load('MATLABCartpoleDQNMulti.mat','agent')
end

DQN agent仿真

為了驗證訓練出的agent的性能，將agent在環境中進行仿真。使用rlSimulationOptions和sim函數

simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);

totalReward = sum(experience.Reward)

totalReward = 500

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Learning to Reinforcement Learn 強化學習(Reinforcement Learning) [Reinforcement Learning] Policy Gradient Methods [Reinforcement Learning] Cross-entropy Method [Reinforcement Learning] 強化學習介紹 [Reinforcement Learning] 馬爾可夫決策過程 [Reinforcement Learning] Model-Free Prediction 強化學習（Reinforcement Learning, RL） Rainbow: Combining Improvements in Deep Reinforcement Learning Optimizing Federated Learning on Non-IID Data with Reinforcement Learning 筆記