強化學習之一：從TensorFlow開始（Start from TensorFlow）

本文轉載自查看原文 2018-01-01 12:46 1226 Reinforcement Learning

本文是對Tensorflow官方教程的個人（tomqianmaple@outlook.com）中文翻譯，供大家學習參考。

tf的揚帆起航Getting Started With TensorFlow
張量Tensors
tf核心教程TensorFlow Core tutorial
tfestimator
- 基本使用Basic usage
- 一個自定義模型A custom model
下一步Next steps

tf的揚帆起航（Getting Started With TensorFlow）

這個教程將幫助你開始用tf編程。在使用本教程之前，首先請安裝好tf。為了能盡量順暢地理解該教程，你應該具備以下能力：

會用Python編程
了解數組(array)的概念
最好你已經了解機器學習方面的知識。然而，如果你不知道也沒關系，你仍然應該從讀本教程開始。

tf提供了各種api。最底層的api–【tf核心】(Tensorflow Core)–提供了你完善的編程控制機制。我們認為機器學習研究者以及希望對模型有全面掌控的人都是用tf核心。而更高級別的API都基於tf核心所實現。這些高級的API一般都比tf核心更容易上手是用。此外，高級API也更易於在不同的用戶之間去重復演示相同的任務而保證一致性。一種高級的API像tf.estimator可以幫助你管理數據集，estimators,訓練以及推斷。

這個教程從一個tf核心的指導開始。然后，我們將解釋如何用tf.estimator實現相同的模型。了解了tf核心的原理后，你將清晰地理解更簡單、更高級的API的背后工作原理。

張量（Tensors）

tf中的核心數據單位就是張量（tensor）。一個張量由一系列的原始值以一個任意維數組的形式組織而成。一個張量的秩即它的維度的數目。以下是一些張量的例子：

3 # a rank 0 tensor; a scalar with shape []
[1., 2., 3.] # a rank 1 tensor; a vector with shape [3]
[[1., 2., 3.], [4., 5., 6.]] # a rank 2 tensor; a matrix with shape [2, 3]
[[[1., 2., 3.]], [[7., 8., 9.]]] # a rank 3 tensor with shape [2, 1, 3]

tf核心教程（TensorFlow Core tutorial）

導入tf（Importing TensorFlow）

tf項目的標准導入語句如下：

import tensorflow as tf

這使得Python可以直接使用tf的所有類、方法和符號。大多數的后續文檔都會假設你已經完成了這一步導入工作。

計算圖（The Computational Graph）

你可以會認為tf核心項目由以下兩個單獨的部分構成：

建立計算圖（Building the computational graph.）
運行計算圖（Running the computational graph.）

一個計算圖就是一系列以圖節點形式安排好的的tf操作。我們先來建立一個簡單的計算圖。每個節點接受0或多個張量作為輸入，並產生一個張量作為輸出。一種節點類型是constant（常量）。就像所有的tf常量，它不接受輸入，並且它會輸出一個它內部儲存的值。我們可以創建兩個tf浮點節點node1和node2：

node1 = tf.constant(3.0, dtype=tf.float32)
node2 = tf.constant(4.0) # also tf.float32 implicitly
print(node1, node2)

最后的聲明語句會輸出：

Tensor("Const:0", shape=(), dtype=float32) Tensor("Const_1:0", shape=(), dtype=float32)

要注意的是，節點可能並不會輸出如你所預期的值3.0和4.0，而是被評估（evaluated）時才會分別輸出3.0和4.0的節點的說明。而為了evaluate節點，我們必須在一個會話（Session）中運行計算圖。一個會話封裝了tf運行時的控制和狀態。

接下來的代碼創建了一個會話對象，並調用了它的運行方法以推進計算圖的運行，繼而評估node1和node2：

sess = tf.Session()
print(sess.run([node1, node2]))
we see the expected values of 3.0 and 4.0:

[3.0, 4.0]

我們可以通過結合更多的綁定了一些操作（操作（operation）也是一種節點）的tf節點以構造更復雜的計算圖。比如，我們可以加上我們的兩個常量節點並產生一個新的圖：

from __future__ import print_function
node3 = tf.add(node1, node2)
print("node3:", node3)
print("sess.run(node3):", sess.run(node3))

最后兩個print語句將輸出：

node3: Tensor("Add:0", shape=(), dtype=float32)
sess.run(node3): 7.0

tf提供了一個工具TensorBoard用以展示計算圖的樣貌。這里是一個截屏以示范Tensorflow如何可視化計算圖：

這個圖看起來似乎很簡單，因為它總是只能產生一個常量結果。一個圖可以被參數化以接受外部輸入，這種圖稱為placeholders。一個placeholder是需要在創建后填充進數值的槽位：

a = tf.placeholder(tf.float32)
b = tf.placeholder(tf.float32)
adder_node = a + b  # + provides a shortcut for tf.add(a, b)

上面的三行代碼有點像lambda函數，它定義了兩個輸入（a和b）並對它們進行了一個操作。我們可以用各種各樣的輸入來評估這個圖，只要使用feed_dict參數並把它傳入run方法以將具體值喂給placeholders即可：

print(sess.run(adder_node, {a: 3, b: 4.5}))
print(sess.run(adder_node, {a: [1, 3], b: [2, 4]}))

輸入如下：

7.5
[ 3.  7.]

在TensorBoard中，計算圖看起來像這樣：

我們可以再加上另一個操作使計算圖更復雜。示例如下：

add_and_triple = adder_node * 3. print(sess.run(add_and_triple, {a: 3, b: 4.5}))

輸出：

22.5

之前的計算圖在TensorBoard中看起來如下：

在機器學習中，我們通常都會想要一個可以接受任意輸入的模型，比如像上面的那個例子。為了讓模型可以訓練，我們還需要能夠調整修改計算圖，以在相同的輸入上能夠產生新的輸出。Variables（變量）允許我們在圖中加上可訓練的參數。它們由類型和初始值兩部分定義：

W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W*x + b

Constants在你調用tf.constant時就已經完成初始化，並且它們的值不會變。相反，variable在你調用tf.Variable時尚未初始化。為了在tf程序中初始化所有的variables，你必須顯示地調用一個特別操作：

init = tf.global_variables_initializer()
sess.run(init)

要理解init對於tf的子圖（sub-graph）來說，就是一種用以初始化所有全局variable的工具。在我們調用sess.run之前，所有的variable都是還沒有初始化的。

因為x是一個placeholder，我們可以同時用多個值來評估linear_model：

print(sess.run(linear_model, {x: [1, 2, 3, 4]}))

輸出：

[ 0.          0.30000001  0.60000002  0.90000004]

我們已經創建了一個模型，但是我們還不知道它能工作得多好。為了在訓練數據上評估這個模型，我們需要一個y placeholder來提供所期望的輸出值，並且我們需要寫一個損失函數：

損失函數衡量了當前模型對於給定輸入的輸出與期望輸出之間的差異度。我們將用一個線性回歸的標准損失模型，它會加總所有的模型輸出與期望輸出的差值的平方。linear_model -y 創造一個向量，向量的每個分量都是相應的樣例的誤差差值。我們調用tf.square來平方化誤差項。然后，我們用tf.reduce_sum加總所有的平方誤差來構造一個標量，它集聚了所有樣例的誤差：

y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))

輸出損失值：

23.66

我們可以改進一下這個過程，比如重新指定W和b的值為1和-1。一個variable會被初始化為調用構造函數tf.Variable時傳入的值，但也可以用tf.assign這樣的操作來改變其值。比如，假設W=-1和b=1是我們模型的最優參數，我們就可以對W和b的值進行相應地改變：

fixW = tf.assign(W, [-1.])
fixb = tf.assign(b, [1.])
sess.run([fixW, fixb])
print(sess.run(loss, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]}))

現在，輸出的損失值就變成了0：

0.0

我們是通過猜得到的W和b的“完美”值，但是機器學習的核心目標就是自動找到正確的模型參數。我們將在接下來展示如何實現這一點。

tf模型訓練api（tf.train API）

對於機器學習知識的完整介紹不在本教程的覆蓋范圍內。然而，tf提供了優化器（optimizer），它可以逐漸改變variable的值以使損失函數的值最小化。最簡單的優化器就是梯度下降（Gradient Descent）。它根據損失函數對每個variable的導數的大小來調整每個variable的值。總體來說，手工計算符號導數是很困難並且容易出錯的。而tf則可以根據所給定的對模型的描述自動產生相應導數，只要調用函數tf.gradients即可。更簡單而言，優化器本身可以為你做這件事。舉例來說：

optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
sess.run(init) # reset values to incorrect defaults.
for i in range(1000):
  sess.run(train, {x: [1, 2, 3, 4], y: [0, -1, -2, -3]})

print(sess.run([W, b]))

最后模型的參數結果：

[array([-0.9999969], dtype=float32), array([ 0.99999082], dtype=float32)]

現在我們就是在做實實在在的機器學習！盡管這個簡單的線性回歸模型不要很多的tf核心代碼，但是為了給模型提供更復雜的數據，你必須用更多代碼來實現和調用更復雜的模型和方法。因此，tf提供了公共模式、結構以及功能的高級抽象。我們將在下一節學習這些抽象。

完整的程序（Complete program）

完整的可訓練線性回歸模型如下所示：


import tensorflow as tf

# Model parameters
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W*x + b
y = tf.placeholder(tf.float32)

# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# training data
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
  sess.run(train, {x: x_train, y: y_train})

# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))

運行完后輸出：

W: [-0.9999969] b: [ 0.99999082] loss: 5.69997e-11

注意損失值是非常小的數（非常接近0）。如果你運行這個程序，你的損失值可能和之前的不一樣，因為模型是用偽隨機數值初始化的。

這個更復雜的程序也可以在TensorBoard中可視化：

tf.estimator

tf.estimator是一個高級的tf框架，它可以簡化機器學習的機制，它包括以下部分：

執行訓練循環（running training loops）
運行評估循環（running evaluation loops）
管理數據集（managing data sets）

tf.estimator定義了很多公共模型：

基本使用（Basic usage）

注意我們之前的簡單線性回歸程序在使用tf.estimator后變得多么簡單：

# NumPy is often used to load, manipulate and preprocess data.
# NumPy總是用來加載、操作和預處理數據
import numpy as np
import tensorflow as tf

# Declare list of features. We only have one numeric feature. There are many
# other types of columns that are more complicated and useful.
# 聲明特征列表。我們只有一個數值特征，還有很多其他類型的列，它們更復雜也更有用。
feature_columns = [tf.feature_column.numeric_column("x", shape=[1])]

# An estimator is the front end to invoke training (fitting) and evaluation
# (inference). There are many predefined types like linear regression,
# linear classification, and many neural network classifiers and regressors.
# The following code provides an estimator that does linear regression.
# 一個estimator是調用訓練（擬合）和評估方法的前端（接口）。有很多預定義的類型像線性回歸、線性分類，還有很多神經網絡分類器和回歸器。下面的代碼提供了一個做線性回歸的estimator。
estimator = tf.estimator.LinearRegressor(feature_columns=feature_columns)

# TensorFlow provides many helper methods to read and set up data sets.
# Here we use two data sets: one for training and one for evaluation
# We have to tell the function how many batches
# of data (num_epochs) we want and how big each batch should be.
# tf提供了很多幫助函數供閱讀和設置數據集。這里我們用到兩個數據集：一個用以訓練，一個用以評估。我們需要告訴函數我們將用多少batches的數據(num_epochs)，以及每個batch的大小。
x_train = np.array([1., 2., 3., 4.])
y_train = np.array([0., -1., -2., -3.])
x_eval = np.array([2., 5., 8., 1.])
y_eval = np.array([-1.01, -4.1, -7, 0.])
input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False)


# We can invoke 1000 training steps by invoking the method and passing the
# training data set.
# 我們可以觸發1000次訓練步驟通過調用方法並傳入訓練數據集。
estimator.train(input_fn=input_fn, steps=1000)

# Here we evaluate how well our model did.
# 評估我們的模型性能
train_metrics = estimator.evaluate(input_fn=train_input_fn)
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("train metrics: %r"% train_metrics)
print("eval metrics: %r"% eval_metrics)

運行后輸出：

train metrics: {'average_loss': 1.4833182e-08, 'global_step': 1000, 'loss': 5.9332727e-08}
eval metrics: {'average_loss': 0.0025353201, 'global_step': 1000, 'loss': 0.01014128}

注意我們的評估數據的損失值相對訓練集數據的損失值更高，但是也很接近0了，這表示我們的模型學的不錯。

一個自定義模型（A custom model）

tf.estimator並不會局限你只使用預定義模型。假設我們想創造一個tf中沒有的自定義模型，我們仍然可以保持對數據集、喂數據、訓練等過程的高度抽象。為了解釋這一點，我們將展示如何用低階tf API來實現我們的線性回歸的等價模型。

要定義一個可以和tf.estimator協作的自定義模型，我們需要使用tf.estimator.Estimator。tf.estimator.LinearRegressor就是一個tf.estimator.Estimator的子類。相比於繼承Estimator，我們只是提供Estimator一個函數model_fn以告訴tf.estimator它如何評估預測值、訓練步驟和損失。代碼如下所示：

import numpy as np
import tensorflow as tf

# Declare list of features, we only have one real-valued feature
# 聲明特征列表，我們只有一個實值特征
def model_fn(features, labels, mode):
  # Build a linear model and predict values
  # 建立一個線性模型並預測值
  W = tf.get_variable("W", [1], dtype=tf.float64)
  b = tf.get_variable("b", [1], dtype=tf.float64)
  y = W*features['x'] + b
  # Loss sub-graph
  # 損失子圖
  loss = tf.reduce_sum(tf.square(y - labels))
  # Training sub-graph
  # 訓練子圖
  global_step = tf.train.get_global_step()
  optimizer = tf.train.GradientDescentOptimizer(0.01)
  train = tf.group(optimizer.minimize(loss),
                   tf.assign_add(global_step, 1))
  # EstimatorSpec connects subgraphs we built to the
  # appropriate functionality.
  # EstimatorSpec將我們建立的子圖連接到合適的功能上
  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=y,
      loss=loss,
      train_op=train)

estimator = tf.estimator.Estimator(model_fn=model_fn)
# define our data sets
# 定義數據集
x_train = np.array([1., 2., 3., 4.])
y_train = np.array([0., -1., -2., -3.])
x_eval = np.array([2., 5., 8., 1.])
y_eval = np.array([-1.01, -4.1, -7., 0.])
input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x": x_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False)

# train
# 訓練
estimator.train(input_fn=input_fn, steps=1000)
# Here we evaluate how well our model did.
# 評估模型性能
train_metrics = estimator.evaluate(input_fn=train_input_fn)
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("train metrics: %r"% train_metrics)
print("eval metrics: %r"% eval_metrics)
When run, it produces

train metrics: {'loss': 1.227995e-11, 'global_step': 1000}
eval metrics: {'loss': 0.01010036, 'global_step': 1000}

注意到我們自定義的model_fn()函數的內容非常近似於我們用低階API實現的手動模型訓練部分。

下一步（Next steps）

現在你已經了解了tf的基本運作方式。我們還有一些教程你可以繼續參考學習。如果你是一個機器學習的新手，可以看一下MNIST的新手教程，否則你也可以看為專業人士准備的深度MNIST教程。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 強化學習--DDPG---tensorflow實現強化學習--Actor-Critic---tensorflow實現強化學習 8 —— DQN 算法 Tensorflow 2.0 實現強化學習 9 —— DQN 改進算法DDQN、Dueling DQN tensorflow 2.0 實現強化學習總結強化學習——入門強化學習（MATLAB）什么是強化學習？強化學習雜談強化學習之CartPole