TensorFlow之estimator詳解

本文轉載自查看原文 2019-07-23 16:59 16202 Tensorflow/ 機器學習/ estimator/ EstimatorSpec/ RunConfig/ TensorFlow

Estimator初識

框架結構

在介紹Estimator之前需要對它在TensorFlow這個大框架的定位有個大致的認識，如下圖示：

可以看到Estimator是屬於High level的API，而Mid-level API分別是：

Layers：用來構建網絡結構
Datasets: 用來構建數據讀取pipeline
Metrics：用來評估網絡性能

可以看到如果使用Estimator，我們只需要關注這三個部分即可，而不用再關心一些太細節的東西，另外也不用再使用煩人的Session了。

Estimator使用步驟

創建一個或多個輸入函數，即input_fn
定義模型的特征列,即feature_columns
實例化 Estimator，指定特征列和各種超參數。
在 Estimator 對象上調用一個或多個方法，傳遞適當的輸入函數作為數據的來源。（train, evaluate, predict）

下面通過偽代碼的形式介紹如何使用Estimator：

創建一個或多個輸入函數，即input_fn：

def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    return dataset.shuffle(1000).repeat().batch(batch_size)

注意， features需要是字典 (另外此處的feature與我們常說的提取特征的feature還不太一樣，也可以指原圖數據(raw image),或者其他未作處理的數據)。下面定義的my_feature_column會傳給Estimator用於解析features。

定義模型的特征列,即feature_columns

# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys(): 		    
   my_feature_columns.append(tf.feature_column.numeric_column(key=key))

實例化 Estimator，指定特征列和各種超參數。

# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10],
    # The model must choose between 3 classes.
    n_classes=3)

注意在實例化Estimator的時候不用把數據傳進來，你只需要把feature_columns傳進來即可，告訴Estimator需要解析哪些特征值，而數據集需要在訓練和評估模型的時候才傳。

在 Estimator 對象上調用一個或多個方法，傳遞適當的輸入函數作為數據的來源

train(訓練)

# Train the Model.
classifier.train(
	input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),
	steps=args.train_steps)

evaluate(評估)

# Evaluate the model.
eval_result = classifier.evaluate(
	input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

predict(預測)

# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
	'SepalLength': [5.1, 5.9, 6.9],
	'SepalWidth': [3.3, 3.0, 3.1],
	'PetalLength': [1.7, 4.2, 5.4],
	'PetalWidth': [0.5, 1.5, 2.1],
}

predictions = classifier.predict(
	input_fn=lambda:iris_data.eval_input_fn(predict_x,
											batch_size=args.batch_size))

深入理解Estimator

上面的示例中簡單地介紹了Estimator，網絡使用的是預創建好的DNNClassifier,其他預創建網絡結構有如下：

當然在實際任務中這些網絡並不能滿足我們的需求，所以我們需要能夠使用自定義的網絡結構，那么如何實現呢？我之前看官網的教程，反正看的有點蒙，因為時不時就又蹦出一個新的參數來實現不同功能，所以就納悶到底有多少參數可以使用？沒辦法只能從源代碼開始啃着硬骨頭（其實也不硬。。。之前只是懶）。

從源代碼來理解Estimator

Estimator的源代碼如下（為方便說明，已經掐頭去尾）：

class Estimator(object):
  def __init__(self, model_fn, model_dir=None, config=None, params=None, warm_start_from=None):
  ...

可以看到需要傳入的參數如下：

model_dir: 指定checkpoints和其他日志存放的路徑。
model_fn: 這個是需要我們自定義的網絡模型函數，后面詳細介紹
config: 用於控制內部和checkpoints等，如果model_fn函數也定義config這個變量，則會將config傳給model_fn
params: 該參數的值會傳遞給model_fn。
warm_start_from: 指定checkpoint路徑，會導入該checkpoint開始訓練

構建model_fn

模型函數一般定義如下：

def my_model_fn(
   features, 	# This is batch_features from input_fn,`Tensor` or dict of `Tensor` (depends on data passed to `fit`).
   labels,     # This is batch_labels from input_fn
   mode,      # An instance of tf.estimator.ModeKeys
   params,  	# Additional configuration
   config=None
   ):

前兩個參數是從輸入函數中返回的特征和標簽批次；也就是說，features 和 labels 是模型將使用的數據。
params 是一個字典，它可以傳入許多參數用來構建網絡或者定義訓練方式等。例如通過設置params['n_classes']來定義最終輸出節點的個數等。
config 通常用來控制checkpoint或者分布式什么，這里不深入研究。
mode 參數表示調用程序是請求訓練、評估還是預測，分別通過tf.estimator.ModeKeys.TRAIN / EVAL / PREDICT 來定義。另外通過觀察DNNClassifier的源代碼可以看到，mode這個參數並不用手動傳入，因為Estimator會自動調整。例如當你調用estimator.train(...)的時候，mode則會被賦值tf.estimator.ModeKeys.TRAIN。

model_fn需要對於不同的模式提供不同的處理方式，並且都需要返回一個tf.estimator.EstimatorSpec的實例。

咋聽起來可能有點不知所雲，大白話版本就是：模型有訓練，驗證和測試三種階段，而且對於不同模式，對數據有不同的處理方式。例如在訓練階段，我們需要將數據喂給模型，模型基於輸入數據給出預測值，然后我們在通過預測值和真實值計算出loss，最后用loss更新網絡參數，而在評估階段，我們則不需要反向傳播更新網絡參數，換句話說，mdoel_fn需要對三種模式設置三套代碼。

另外model_fn需要返回什么東西呢？Estimator規定model_fn需要返回tf.estimator.EstimatorSpec，這樣它才好更具一般化的進行處理。

Config

此處的config需要傳入tf.estimator.RunConfig,其源代碼如下：

class RunConfig(object):
  """This class specifies the configurations for an `Estimator` run."""

  def __init__(self,
               model_dir=None,
               tf_random_seed=None,
               save_summary_steps=100,
               save_checkpoints_steps=_USE_DEFAULT,
               save_checkpoints_secs=_USE_DEFAULT,
               session_config=None,
               keep_checkpoint_max=5,
               keep_checkpoint_every_n_hours=10000,
               log_step_count_steps=100,
               train_distribute=None,
               device_fn=None,
               protocol=None,
               eval_distribute=None,
               experimental_distribute=None,
               experimental_max_worker_delay_secs=None,
               session_creation_timeout_secs=7200):

model_dir: 指定存儲模型參數，graph等的路徑
save_summary_steps: 每隔多少step就存一次Summaries，不知道summary是啥
save_checkpoints_steps:每隔多少個step就存一次checkpoint
save_checkpoints_secs: 每隔多少秒就存一次checkpoint，不可以和save_checkpoints_steps同時指定。如果二者都不指定，則使用默認值，即每600秒存一次。如果二者都設置為None，則不存checkpoints。

注意上面三個**save-**參數會控制保存checkpoints（模型結構和參數）和event文件（用於tensorboard），如果你都不想保存，那么你需要將這三個參數都置為FALSE

keep_checkpoint_max：指定最多保留多少個checkpoints，也就是說當超出指定數量后會將舊的checkpoint刪除。當設置為None或0時，則保留所有checkpoints。
keep_checkpoint_every_n_hours：
log_step_count_steps:該參數的作用是,(相對於總的step數而言)指定每隔多少step就記錄一次訓練過程中loss的值，同時也會記錄global steps/s，通過這個也可以得到模型訓練的速度快慢。（天啦，終於找到這個參數了。。。。之前用TPU測模型速度，每次都得等好久才輸出一次global steps/s的數據。。。藍瘦香菇）

后面這些參數與分布式有關，以后有時間再慢慢了解。

train_distribute
device_fn
protocol
eval_distribute
experimental_distribute
experimental_max_worker_delay_secs

什么是`tf.estimator.EstimatorSpec`？

傳入參數

它是一個class(類)，是定義在model_fn中的，並且model_fn返回的也是它的一個實例，這個實例是用來初始化Estimator類的。其源代碼如下:

class EstimatorSpec():
  def __new__(cls,
              mode,
              predictions=None,
              loss=None,
              train_op=None,
              eval_metric_ops=None,
              export_outputs=None,
              training_chief_hooks=None,
              training_hooks=None,
              scaffold=None,
              evaluation_hooks=None,
              prediction_hooks=None):

重要函數參數：

mode：一個ModeKeys,指定是training(訓練)、evaluation(計算)還是prediction(預測).
predictions：Predictions Tensor or dict of Tensor.
loss：Training loss Tensor. Must be either scalar, or with shape [1].
train_op：適用於訓練的步驟.
eval_metric_ops: Dict of metric results keyed by name.
The values of the dict can be one of the following:
- (1) instance of Metric class.
- (2) Results of calling a metric function, namely a (metric_tensor, update_op) tuple. metric_tensor should be evaluated without any impact on state (typically is a pure computation results based on variables.). For example, it should not trigger the update_op or requires any input fetching.

其他參數的作用可參見源代碼說明

不同模式需要傳入不同參數

根據mode的值的不同,需要不同的參數,即：

對於mode == ModeKeys.TRAIN：必填字段是loss和train_op.
對於mode == ModeKeys.EVAL：必填字段是loss.
對於mode == ModeKeys.PREDICT：必填字段是predictions.

上面的參數說明看起來還是一頭霧水，下面給出例子幫助理解：

最簡單的情況： predict

只需要傳入mode和predictions

# Compute predictions.
predicted_classes = tf.argmax(logits, 1)
if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {
        'class_ids': predicted_classes[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits,
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)

評估模式：eval

需要傳入mode,loss,eval_metric_ops

如果調用 Estimator 的 evaluate 方法，則 model_fn 會收到 mode = ModeKeys.EVAL。在這種情況下，模型函數必須返回一個包含模型損失和一個或多個指標（可選）的 tf.estimator.EstimatorSpec。

loss示例如下：

# Compute loss.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

TensorFlow提供了一個指標模塊tf.metrics來計算常用的指標，這里以accuracy為例：

# Compute evaluation metrics.
accuracy = tf.metrics.accuracy(labels=labels,
                               predictions=predicted_classes,
                               name='acc_op')

返回方式如下：

metrics = {'accuracy': accuracy}

if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(
        mode, loss=loss, eval_metric_ops=metrics)

訓練模式：train

需要傳入mode,loss,train_op

loss同eval模式：

# Compute loss.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

train_op示例：

optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss,global_step=tf.train.get_global_step())

返回值：

return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

通用模式

model_fn可以填充獨立於模式的所有參數.在這種情況下,Estimator將忽略某些參數.在eval和infer模式中,train_op將被忽略.例子如下：

def my_model_fn(mode, features, labels):
  predictions = ...
  loss = ...
  train_op = ...
  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=predictions,
      loss=loss,
      train_op=train_op)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Tensorflow Estimator源碼分析 4. Tensorflow的Estimator實踐原理 tensorflow estimator API小栗子 tensorflow estimator 與 model_fn 是這樣溝通的 tensorflow創建自定義 Estimator tensorflow2.0第5章 Tensorflow Estimator使用與tf1.0 tensorboard報錯ModuleNotFoundError: No module named ‘tensorflow_core.estimator‘ import tensorflow 報錯： tf.estimator package not installed. Tensorflow1.4 高級接口使用（estimator, data, keras, layers） TensorFlow 1.4利用Keras+Estimator API進行訓練和預測