Google在TensorFlow1.0,之后推出了一個叫slim的庫,TF-slim是TensorFlow的一個新的輕量級的高級API接口。這個模塊是在16年新推出的,其主要目的是來做所謂的“代碼瘦身”。它類似我們在TensorFlow模塊中所介紹的tf.contrib.lyers模塊,將很多常見的TensorFlow函數進行了二次封裝,使得代碼變得更加簡潔,特別適用於構建復雜結構的深度神經網絡,它可以用了定義、訓練、和評估復雜的模型。
這里我們為什么要過來介紹這一節的內容呢?主要是因為TensorFlow的models模塊里提供了大量用slim寫好的網絡模型結構代碼,以及用該代碼訓練出來的模型檢查點文件,可以作為我們預訓練模型來使用。因此我們需要會使用slim庫。
一 獲取models中的slim模塊代碼
為了能夠使用models中的代碼,需要先驗證下我們的TensorFlow版本是否集成了slim模塊。接着從GitHub上將models代碼下載下來:
1.驗證slim庫
在使用slim之前,要測試本地的tf.contrib.slim模塊是否有效,在命令行中輸入如下命令:
python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once"
如果沒有任何錯誤,則表明TF-Slim是可以工作的。
2. 下載models模塊
To use TF-Slim for image classification, you also have to install the TF-Slim image models library, which is not part of the core TF library. To do this, check out the tensorflow/models repository as follows:
cd $HOME/workspace
git clone https://github.com/tensorflow/models/
This will put the TF-Slim image models library in $HOME/workspace/models/research/slim. (It will also create a directory calledmodels/inception, which contains an older version of slim; you can safely ignore this.)
To verify that this has worked, execute the following commands; it should run without raising any errors.
cd $HOME/workspace/models/research/slim python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"
我使用的是window操作系統,我直接從https://github.com/tensorflow/models/網址下載了該模塊:

二 models中的slim目錄結構
slim位於\models-master\research\slim路徑下,一共有5個文件夾:

- datasets:處理數據集相關的代碼。
- deployment:部署。通過創建clone方式實現跨機器的分布訓練,可以在多CPU和多GPU上實現運算的同步或者異步。
- nets:該文件夾里存放着各種網絡模型。
- preprocessing:適用於各種網絡的圖片處理函數。
- scripts:運行網絡模型的一些案例腳本,這些腳本只能在支持shell的系統下使用。
在這里重點介紹datasets,nets,preprocessing三個文件夾。
1.datesets數據集處理模塊
datasets里面存放着常用的圖片訓練數據集相關的代碼。主要支持的數據集有cifar10、flowers、mnist、imagenet。
代碼文件的名稱和數據集相對應,可以使用這些代碼下載或獲取數據集中的數據。以imagenet為例,可以使用如下函數從網上獲取imagenet標簽。
imagenet_map = imagenet.create_readable_names_for_imagenet_labels()
上面代碼返回的是imagenet中1000個類的分類標簽名字(與樣本序列對應)。
2.nets模塊
該文件夾下面包含各種網絡模塊:

每個網絡模型文件都是以自己的名字命名的,而且里面的代碼結構框架也大致相同,以inception_resnet_v2為例:
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Contains the definition of the Inception Resnet V2 architecture. As described in http://arxiv.org/abs/1602.07261. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf slim = tf.contrib.slim def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 35x35 resnet block.""" with tf.variable_scope(scope, 'Block35', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 32, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope='Conv2d_0b_3x3') with tf.variable_scope('Branch_2'): tower_conv2_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope='Conv2d_0c_3x3') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 17x17 resnet block.""" with tf.variable_scope(scope, 'Block17', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 128, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7], scope='Conv2d_0b_1x7') tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1], scope='Conv2d_0c_7x1') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 8x8 resnet block.""" with tf.variable_scope(scope, 'Block8', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 192, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3], scope='Conv2d_0b_1x3') tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1], scope='Conv2d_0c_3x1') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def inception_resnet_v2_base(inputs, final_endpoint='Conv2d_7b_1x1', output_stride=16, align_feature_maps=False, scope=None, activation_fn=tf.nn.relu): """Inception model from http://arxiv.org/abs/1602.07261. Constructs an Inception Resnet v2 network from inputs to the given final endpoint. This method can construct the network up to the final inception block Conv2d_7b_1x1. Args: inputs: a tensor of size [batch_size, height, width, channels]. final_endpoint: specifies the endpoint to construct the network up to. It can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a', 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1'] output_stride: A scalar that specifies the requested ratio of input to output spatial resolution. Only supports 8 and 16. align_feature_maps: When true, changes all the VALID paddings in the network to SAME padding so that the feature maps are aligned. scope: Optional variable_scope. activation_fn: Activation function for block scopes. Returns: tensor_out: output tensor corresponding to the final_endpoint. end_points: a set of activations for external use, for example summaries or losses. Raises: ValueError: if final_endpoint is not set to one of the predefined values, or if the output_stride is not 8 or 16, or if the output_stride is 8 and we request an end point after 'PreAuxLogits'. """ if output_stride != 8 and output_stride != 16: raise ValueError('output_stride must be 8 or 16.') padding = 'SAME' if align_feature_maps else 'VALID' end_points = {} def add_and_check_final(name, net): end_points[name] = net return name == final_endpoint with tf.variable_scope(scope, 'InceptionResnetV2', [inputs]): with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'): # 149 x 149 x 32 net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points # 147 x 147 x 32 net = slim.conv2d(net, 32, 3, padding=padding, scope='Conv2d_2a_3x3') if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points # 147 x 147 x 64 net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3') if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points # 73 x 73 x 64 net = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_3a_3x3') if add_and_check_final('MaxPool_3a_3x3', net): return net, end_points # 73 x 73 x 80 net = slim.conv2d(net, 80, 1, padding=padding, scope='Conv2d_3b_1x1') if add_and_check_final('Conv2d_3b_1x1', net): return net, end_points # 71 x 71 x 192 net = slim.conv2d(net, 192, 3, padding=padding, scope='Conv2d_4a_3x3') if add_and_check_final('Conv2d_4a_3x3', net): return net, end_points # 35 x 35 x 192 net = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_5a_3x3') if add_and_check_final('MaxPool_5a_3x3', net): return net, end_points # 35 x 35 x 320 with tf.variable_scope('Mixed_5b'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5, scope='Conv2d_0b_5x5') with tf.variable_scope('Branch_2'): tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3, scope='Conv2d_0c_3x3') with tf.variable_scope('Branch_3'): tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME', scope='AvgPool_0a_3x3') tower_pool_1 = slim.conv2d(tower_pool, 64, 1, scope='Conv2d_0b_1x1') net = tf.concat( [tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3) if add_and_check_final('Mixed_5b', net): return net, end_points # TODO(alemi): Register intermediate endpoints net = slim.repeat(net, 10, block35, scale=0.17, activation_fn=activation_fn) # 17 x 17 x 1088 if output_stride == 8, # 33 x 33 x 1088 if output_stride == 16 use_atrous = output_stride == 8 with tf.variable_scope('Mixed_6a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3, scope='Conv2d_0b_3x3') tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3, stride=1 if use_atrous else 2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_2'): tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2, padding=padding, scope='MaxPool_1a_3x3') net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3) if add_and_check_final('Mixed_6a', net): return net, end_points # TODO(alemi): register intermediate endpoints with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1): net = slim.repeat(net, 20, block17, scale=0.10, activation_fn=activation_fn) if add_and_check_final('PreAuxLogits', net): return net, end_points if output_stride == 8: # TODO(gpapan): Properly support output_stride for the rest of the net. raise ValueError('output_stride==8 is only supported up to the ' 'PreAuxlogits end_point for now.') # 8 x 8 x 2080 with tf.variable_scope('Mixed_7a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_1'): tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_2'): tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_3'): tower_pool = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_1a_3x3') net = tf.concat( [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3) if add_and_check_final('Mixed_7a', net): return net, end_points # TODO(alemi): register intermediate endpoints net = slim.repeat(net, 9, block8, scale=0.20, activation_fn=activation_fn) net = block8(net, activation_fn=None) # 8 x 8 x 1536 net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1') if add_and_check_final('Conv2d_7b_1x1', net): return net, end_points raise ValueError('final_endpoint (%s) not recognized', final_endpoint) def inception_resnet_v2(inputs, num_classes=1001, is_training=True, dropout_keep_prob=0.8, reuse=None, scope='InceptionResnetV2', create_aux_logits=True, activation_fn=tf.nn.relu): """Creates the Inception Resnet V2 model. Args: inputs: a 4-D tensor of size [batch_size, height, width, 3]. Dimension batch_size may be undefined. If create_aux_logits is false, also height and width may be undefined. num_classes: number of predicted classes. If 0 or None, the logits layer is omitted and the input features to the logits layer (before dropout) are returned instead. is_training: whether is training or not. dropout_keep_prob: float, the fraction to keep before final layer. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. create_aux_logits: Whether to include the auxilliary logits. activation_fn: Activation function for conv2d. Returns: net: the output of the logits layer (if num_classes is a non-zero integer), or the non-dropped-out input to the logits layer (if num_classes is 0 or None). end_points: the set of end_points from the inception model. """ end_points = {} with tf.variable_scope(scope, 'InceptionResnetV2', [inputs], reuse=reuse) as scope: with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): net, end_points = inception_resnet_v2_base(inputs, scope=scope, activation_fn=activation_fn) if create_aux_logits and num_classes: with tf.variable_scope('AuxLogits'): aux = end_points['PreAuxLogits'] aux = slim.avg_pool2d(aux, 5, stride=3, padding='VALID', scope='Conv2d_1a_3x3') aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1') aux = slim.conv2d(aux, 768, aux.get_shape()[1:3], padding='VALID', scope='Conv2d_2a_5x5') aux = slim.flatten(aux) aux = slim.fully_connected(aux, num_classes, activation_fn=None, scope='Logits') end_points['AuxLogits'] = aux with tf.variable_scope('Logits'): # TODO(sguada,arnoegw): Consider adding a parameter global_pool which # can be set to False to disable pooling here (as in resnet_*()). kernel_size = net.get_shape()[1:3] if kernel_size.is_fully_defined(): net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_8x8') else: net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool') end_points['global_pool'] = net if not num_classes: return net, end_points net = slim.flatten(net) net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='Dropout') end_points['PreLogitsFlatten'] = net logits = slim.fully_connected(net, num_classes, activation_fn=None, scope='Logits') end_points['Logits'] = logits end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions') return logits, end_points inception_resnet_v2.default_image_size = 299 def inception_resnet_v2_arg_scope(weight_decay=0.00004, batch_norm_decay=0.9997, batch_norm_epsilon=0.001, activation_fn=tf.nn.relu): """Returns the scope with the default parameters for inception_resnet_v2. Args: weight_decay: the weight decay for weights variables. batch_norm_decay: decay for the moving average of batch_norm momentums. batch_norm_epsilon: small float added to variance to avoid dividing by zero. activation_fn: Activation function for conv2d. Returns: a arg_scope with the parameters needed for inception_resnet_v2. """ # Set weight_decay for weights in conv2d and fully_connected layers. with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay), biases_regularizer=slim.l2_regularizer(weight_decay)): batch_norm_params = { 'decay': batch_norm_decay, 'epsilon': batch_norm_epsilon, 'fused': None, # Use fused batch norm if possible. } # Set activation_fn and parameters for batch_norm. with slim.arg_scope([slim.conv2d], activation_fn=activation_fn, normalizer_fn=slim.batch_norm, normalizer_params=batch_norm_params) as scope: return scope
該網絡的框架接口如下:
- inception_resnet_v2.default_image_size:默認圖片的大小
- inception_resnet_v2_base:為inception_resnet_v2的基礎結構實現函數,輸出inception_resnet_v2網絡中最原始的數據,默認是傳到inception_resnet_v2函數中,一般不會改變其內部。當要使用自定義的輸出層時,會將傳入自己的函數來替代inception_resnet_v2函數。
- inception_resnet_v2:inception_resnet_v2網絡的實現函數,這個函數有兩個輸出,一個是預測結果logits,另一個是輔助信息AuxLogits。輔助信息是為了顯示或分析使用,主要包括summaries和losses。
- inception_resnet_v2_arg_scope:該函數返回命名空間的名字。在外層修改或者使用模型時,可以使用與模型相同的命名空間。
3.preprocessing模塊
該模塊代碼包含幾個圖片預處理文件,命名也是按照模型的名字來命名的。slim會把某一類模型常用的預處理函數放到一個文件里,並命名該類模型相關的名字,而且每個代碼文件函數結構也大致相似。例如調用inception_preprocessing函數中的代碼如下:
inception_preprocessing.preprocess_image
該函數是將傳入的圖片轉換成模型尺寸並歸一化處理。
三 slim中的數據集處理
1.准備數據集
As part of this library, we've included scripts to download several popular image datasets (listed below) and convert them to slim format.

2 下載數據集並轉換成TFRecord格式
TFRecord是TensorFlow推薦的數據集格式,與TensorFlow框架結合緊密。在TensorFlow中提供了一系列接口可以訪問TFRecord格式,該結構存在的意義主要是為了滿足在處理海量樣本集時,需要邊執行訓練邊從硬盤上讀取數據的需求。將原始文件轉換成TFRecord的格式,然后在運行中通過多線程的方式來讀取,這樣可以減少主線程訓練的負擔,使得訓練過程變得更高效。關於TFRecord格式詳情可以參考文章
第十二節,TensorFlow讀取數據的幾種方法以及隊列的使用。
For each dataset, we'll need to download the raw data and convert it to TensorFlow's native TFRecord format. Each TFRecord contains a TF-Example protocol buffer. Below we demonstrate how to do this for the Flowers dataset.
$ DATA_DIR=/tmp/data/flowers $ python download_and_convert_data.py \ --dataset_name=flowers \ --dataset_dir="${DATA_DIR}"
這里有兩個關鍵點:一個是數據集(例子中的flowers),另一個是下載路徑(這里是存放在/tmp/data/flowers下的)
When the script finishes you will find several TFRecord files created:

These represent the training and validation data, sharded over 5 files each. You will also find the $DATA_DIR/labels.txt file which contains the mapping from integer labels to class names.
You can use the same script to create the mnist and cifar10 datasets. However, for ImageNet, you have to follow the instructionshere. Note that you first have to sign up for an account at image-net.org. Also, the download can take several hours, and could use up to 500GB.
在這里我詳細介紹一下執行的代碼,我們打開download_and_convert_data.py 文件,代碼內容如下:
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== r"""Downloads and converts a particular dataset. Usage: ```shell $ python download_and_convert_data.py \ --dataset_name=mnist \ --dataset_dir=/tmp/mnist $ python download_and_convert_data.py \ --dataset_name=cifar10 \ --dataset_dir=/tmp/cifar10 $ python download_and_convert_data.py \ --dataset_name=flowers \ --dataset_dir=/tmp/flowers ``` """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf from datasets import download_and_convert_cifar10 from datasets import download_and_convert_flowers from datasets import download_and_convert_mnist FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_string( 'dataset_name', None, 'The name of the dataset to convert, one of "cifar10", "flowers", "mnist".') tf.app.flags.DEFINE_string( 'dataset_dir', None, 'The directory where the output TFRecords and temporary files are saved.') def main(_): if not FLAGS.dataset_name: raise ValueError('You must supply the dataset name with --dataset_name') if not FLAGS.dataset_dir: raise ValueError('You must supply the dataset directory with --dataset_dir') if FLAGS.dataset_name == 'cifar10': download_and_convert_cifar10.run(FLAGS.dataset_dir) elif FLAGS.dataset_name == 'flowers': download_and_convert_flowers.run(FLAGS.dataset_dir) elif FLAGS.dataset_name == 'mnist': download_and_convert_mnist.run(FLAGS.dataset_dir) else: raise ValueError( 'dataset_name [%s] was not recognized.' % FLAGS.dataset_name) if __name__ == '__main__': tf.app.run()
- 程序使用過 tf.app.run()函數執行的,該函數會解析命令行參數,並傳遞給flags。當我們執行上面那一句命令行時,即等於FLAGS.dataset_name='flowers',FLAGS.dataset_dir=‘/tmp/data/flowers’
- 執行main函數,然后執行 download_and_convert_flowers.run(FLAGS.dataset_dir)該函數。該函數實現:開始下載數據集,並解壓數據集,然后再轉換成TFRecord格式,刪除數據集文件。
download_and_convert_flowers.run函數位於download_and_convert_flowers.py文件下,run()函數代碼如下:
def run(dataset_dir): """Runs the download and conversion operation. Args: dataset_dir: The dataset directory where the dataset is stored. """ if not tf.gfile.Exists(dataset_dir): tf.gfile.MakeDirs(dataset_dir) if _dataset_exists(dataset_dir): print('Dataset files already exist. Exiting without re-creating them.') return dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir) photo_filenames, class_names = _get_filenames_and_classes(dataset_dir) class_names_to_ids = dict(zip(class_names, range(len(class_names)))) # Divide into train and test: random.seed(_RANDOM_SEED) random.shuffle(photo_filenames) training_filenames = photo_filenames[_NUM_VALIDATION:] validation_filenames = photo_filenames[:_NUM_VALIDATION] # First, convert the training and validation sets. _convert_dataset('train', training_filenames, class_names_to_ids, dataset_dir) _convert_dataset('validation', validation_filenames, class_names_to_ids, dataset_dir) # Finally, write the labels file: labels_to_class_names = dict(zip(range(len(class_names)), class_names)) dataset_utils.write_label_file(labels_to_class_names, dataset_dir) _clean_up_temporary_files(dataset_dir) print('\nFinished converting the Flowers dataset!')
在這里只粗略的解釋一下代碼的執行流程:
- 判斷dataset_dir文件夾是否存在,不存在則創建。
- 檢查dataset_dir文件夾下是否存在所有的TFRecord文件,存在則退出。
- 從_DATA_URL網址下載數據集,並解壓到dataset_dir文件下下。
- 獲取所有圖片的全路徑和類別名,注意這里文件夾均是以類別名稱命名的,所以全路徑中就包含了類別。

- 創建標簽->類別名的映射字典。
- 打亂文件名,然后划分驗證集和訓練集。
- 把訓練集每一個樣本分別以TF-Example 格式寫入TFRecord文件中。
- 把驗證集每一個樣本分別以TF-Example 格式寫入TFRecord文件中。
-
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), }))
- 生成標簽文件.txt。每行數據格式為 標簽:類別名(后面是換行符\n)
- 清除數據集.tgz文件和解壓的文件。
3 利用slim讀取TFRecord中的數據
我們已經創建好了TFRecord文件,下面就可以讀取文件中的數據了。
# -*- coding: utf-8 -*- """ Created on Fri Jun 8 08:52:30 2018 @author: zy """ ''' 導入flowers數據集 ''' from datasets import download_and_convert_flowers from preprocessing import vgg_preprocessing from datasets import flowers import tensorflow as tf slim = tf.contrib.slim def read_flower_image_and_label(dataset_dir,is_training=False): ''' 下載flower_photos.tgz數據集 切分訓練集和驗證集 並將數據轉換成TFRecord格式 5個訓練數據文件(3320),5個驗證數據文件(350),還有一個標簽文件(存放每個數字標簽對應的類名) args: dataset_dir:數據集所在的目錄 is_training:設置為TRue,表示加載訓練數據集,否則加載驗證集 return: image,label:返回隨機讀取的一張圖片,和對應的標簽 ''' download_and_convert_flowers.run(dataset_dir) ''' 利用slim讀取TFRecord中的數據 ''' #選擇數據集train if is_training: dataset = flowers.get_split(split_name = 'train',dataset_dir=dataset_dir) else: dataset = flowers.get_split(split_name = 'validation',dataset_dir=dataset_dir) #創建一個數據provider provider = slim.dataset_data_provider.DatasetDataProvider(dataset) #通過provider的get隨機獲取一條樣本數據 返回的是兩個張量 [image,label] = provider.get(['image','label']) return image,label
上面代碼中,先引入頭文件,然后創建provider,通過get來獲取image與label兩個張量。這是並沒有真的讀取到數據,只是構建圖的過程,具體數據需要通過session啟動隊列線程后才可以。
下面我們啟動session讀取數據。
if __name__ == '__main__': #test() #讀取一張圖片,以及對應的標簽 image,label = read_flower_image_and_label('./datasets/data/flowers') ''' 啟動session,讀取數據 ''' with tf.Session() as sess: sess.run(tf.global_variables_initializer()) #創建一個協調器,管理線程 coord = tf.train.Coordinator() #啟動QueueRunner, 此時文件名才開始進隊。 threads=tf.train.start_queue_runners(sess=sess,coord=coord) img, lab = sess.run([image, label]) plt.imshow(img) plt.title('Original image') plt.show() #終止線程 coord.request_stop() coord.join(threads)

如果我們想一次讀取多張圖片怎么辦?
TFRecord格式每一行樣本定義為:
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), }))
假設我們訓練時要從生成的5個TFRecord文件中讀取數據,然后組合成batch。
- 將example反序列化成存儲之前的格式。由tf完成
-
keys_to_features = { 'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 'image/format': tf.FixedLenFeature((), tf.string, default_value='png'), 'image/class/label': tf.FixedLenFeature( [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)), }
- 將反序列化的數據組裝成更高級的格式。由slim完成
-
items_to_handlers = { 'image': slim.tfexample_decoder.Image('image/encoded','image/format'), 'label': slim.tfexample_decoder.Tensor('image/class/label'), }
- 解碼器,進行解碼
-
decoder = slim.tfexample_decoder.TFExampleDecoder( keys_to_features, items_to_handlers) - dataset對象定義了數據集的文件位置,解碼方式等元信息
-
dataset = slim.dataset.Dataset( data_sources=file_pattern, reader=tf.TFRecordReader, decoder=decoder, num_samples=SPLITS_TO_SIZES[split_name],#訓練數據的總數 items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, num_classes=_NUM_CLASSES, labels_to_names=labels_to_names #字典形式,格式為:id:class_call, )
- provider對象根據dataset信息讀取數據
-
provider = slim.dataset_data_provider.DatasetDataProvider( dataset, num_readers=FLAGS.num_readers, common_queue_capacity=20 * FLAGS.batch_size, common_queue_min=10 * FLAGS.batch_size)
- 獲取數據,獲取到的數據是單個數據,還需要對數據進行預處理,組合數據
-
[image, label] = provider.get(['image', 'label']) # 圖像預處理 image = preprocessing_image(image, train_image_size, train_image_size) images, labels = tf.train.batch( [image, label], batch_size=FLAGS.batch_size, num_threads=FLAGS.num_preprocessing_threads, capacity=5 * FLAGS.batch_size) labels = slim.one_hot_encoding( labels, dataset.num_classes - FLAGS.labels_offset)
由於DatasetDataProvider讀取到的一個樣本就是隨機的,因此在后面獲取批量數據的時候不再使用tf.train.shuffle_batch函數。一次讀取batch_size個樣本的代碼如下:
def get_batch_images_and_label(dataset_dir,batch_size,num_classes,is_training=False,output_height=224, output_width=224,num_threads=10): ''' 每次取出batch_size個樣本 注意:這里預處理調用的是slim庫圖片預處理的函數,例如:如果你使用的vgg網絡,就調用vgg網絡的圖像預處理函數 如果你使用的是自己定義的網絡,則可以自己寫適合自己圖像的預處理函數,比如歸一化處理也可以使用其他網絡已經寫好的預處理函數 args: dataset_dir:數據集所在的目錄 batch_size:一次取出的樣本數量 num_classes:輸出的類別 用於對標簽one_hot編碼 is_training:設置為TRue,表示加載訓練數據集,否則加載驗證集 output_height:輸出圖片高度 output_width:輸出圖片寬 return: images,labels:返回隨機讀取的batch_size張圖片,和對應的標簽one_hot編碼 ''' #獲取單張圖像和標簽 image,label = read_flower_image_and_label(dataset_dir,is_training) # 圖像預處理 這里要求圖片數據是tf.float32類型的 image = vgg_preprocessing.preprocess_image(image, output_height, output_width,is_training=is_training) #縮放處理 #image = tf.image.convert_image_dtype(image, dtype=tf.float32) #image = tf.image.resize_image_with_crop_or_pad(image, output_height, output_width) # shuffle_batch 函數會將數據順序打亂 # bacth 函數不會將數據順序打亂 images, labels = tf.train.batch( [image, label], batch_size = batch_size, capacity=5 * batch_size, num_threads = num_threads) #one-hot編碼 labels = slim.one_hot_encoding(labels,num_classes) return images,labels
至此,就可以使用images作為神經網絡的輸入,使用labels計算損失函數等操作。
四 在slim中訓練模型
slim模塊共享了模型的訓練代碼,使用者不再需要關注模型代碼,只需通過命令行方式即可完成訓練、微調、測試等任務。
對於linux用戶,在slim的scripts文件夾下還提供了模型下載、訓練、預訓練、微調、測試等一條龍的完整shell腳本,如果你是windows,也可以在命令行下一條一條地復制命令並執行。
1.從頭訓練
訓練模型的代碼被放在slim下的train_image_classifier.py文件里,在該文件所在路徑下,這里使用flower數據集來訓練Inception_v3網絡模型。在命令行下執行:
python train_image_classifier.py --train_dir=./log/train_logs --dataset_name=flowers --dataset_split_name=train --dataset_dir=./datasets/data/flowers --model_name=inception_v3
2 預訓練模型
預訓練是在別人訓練好的模型上進行二次訓練,以得到自己想要的模型。可以幫你省去大量的時間。一些高質量的模型都是通過了大量的數據樣本訓練而來。Github上提供了很多訓練好的模型(在Imagenet數據集),可以在https://github.com/tensorflow/models/tree/master/research/slim/#Pretrained中下載。
Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, we provide various pre-trained models, as listed below. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.
In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales.

下載完預訓練模型后,只要在上一節命令中添加一個參數checkpoint_path即可。
--checkpoint_path = 模型路徑
checkpoint_path 里的模型是用於預訓練模型的參數初始化,在訓練過程中不會改變,新產生的模型會被保存在--train_dir路徑下。
注意:預訓練時使用的樣本必須與原來的輸入尺寸和輸出的分類個數一致。這些下載的模型都是分成1000類的,如果你不想分這么多類,可以使用下面的微調方法。
3微調fine-tuning
上述的預訓練模型都是在imagenet上訓練的,最終輸出的是1000個分類,如果我們想使用預訓練模型訓練自己的數據集,就要微調了。
在微調的過程中,需要將原有模型中的最后一層去掉,換成自己的數據集對應的分類層,例如我們要訓練flowers數據集,就需要將1000個輸出換成10個輸出。
具體做法如下:
- 通過參數--checkpoint_exclude_scopes指定載入預訓練時哪一層的權重不被載入。
- 再通過--trainable_scopes參數指定對哪一層的參數進行訓練,當--trainable_scopes出現時,沒有被指定訓練的參數將在訓練中被凍結。
舉例:使用inception_v3的模型進行微調,使其可以訓練flowers數據集。將下載好的模型inception_v3.ckpt解壓后放在當前目錄文件夾inception_v3下,通過cmd進入命令行來到slim文件下,運行命令:
python train_image_classifier.py --train_dir=./log/in3--dataset_dir=./datasets/data/flowers--dataset_name=flowers --dataset_split_name=train --model_name=inception_v3 --checkpoint_path=./inception_v3/inception_v3.ckpt--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
在例子中,--checkpoint_path里的模型會被載入,將權重初始化成模型里的參數,同時--checkpoint_exclude_scopes限制了最后一層沒有被初始化成模型里的參數。--trainable_scopes指定了只需訓練最后新加的一層,這樣在訓練過程中被凍結的其它參數具有原來模型訓練好的合適值,而新加入的一層則通過迭代在不斷的優化自己的參數。
在微調過程中,還可以通過在上面命令中加入:
--max_number_of_steps=500
來指定訓練步數。如果沒有指定訓練步數,默認會一致訓練下去。更多的參數,可以去看train_image_classifier.py源碼。另外Script中還有使用模型來識別圖片的例子。
4 評估模型
To evaluate the performance of a model (whether pretrained or your own), you can use the eval_image_classifier.py script, as shown below.
Below we give an example of downloading the pretrained inception model and evaluating it on the imagenet dataset.
python eval_image_classifier.py --alsologtostderr --checkpoint_path=./log/in3/model.ckpt
--dataset_dir=./datasets/data/flowers
--dataset_name=flowers
--dataset_split_name=validation --model_name=inception_v3
指定的./log/in3/model.ckpt,為在微調中訓練出來的模型文件。
5 打包模型
訓練好的模型可以被打包到各個平台上使用,無論是iso,Android還是linux。具體是通過一個bazel開源工具實現的。詳情參考:https://github.com/tensorflow/models/tree/master/research/slim/#Export
參考文章
