TF-卷積函數 tf.nn.conv2d 介紹

本文轉載自查看原文 2017-05-09 19:50 96610

轉自 http://www.cnblogs.com/welhzh/p/6607581.html

下面是這位博主自己的翻譯加上測試心得

tf.nn.conv2d是TensorFlow里面實現卷積的函數，參考文檔對它的介紹並不是很詳細，實際上這是搭建卷積神經網絡比較核心的一個方法，非常重要

`tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)`

除去name參數用以指定該操作的name，與方法有關的一共五個參數：

第一個參數input：指需要做卷積的輸入圖像，它要求是一個Tensor，具有[batch, in_height, in_width, in_channels]這樣的shape，具體含義是[訓練時一個batch的圖片數量, 圖片高度, 圖片寬度, 圖像通道數]，注意這是一個4維的Tensor，要求類型為float32和float64其中之一

第二個參數filter：相當於CNN中的卷積核，它要求是一個Tensor，具有[filter_height, filter_width, in_channels, out_channels]這樣的shape，具體含義是[卷積核的高度，卷積核的寬度，圖像通道數，卷積核個數]，要求類型與參數input相同，有一個地方需要注意，第三維in_channels，就是參數input的第四維

第三個參數strides：卷積時在圖像每一維的步長，這是一個一維的向量，長度4

第四個參數padding：string類型的量，只能是"SAME","VALID"其中之一，這個值決定了不同的卷積方式（后面會介紹）

第五個參數：use_cudnn_on_gpu:bool類型，是否使用cudnn加速，默認為true

結果返回一個Tensor，這個輸出，就是我們常說的feature map，shape仍然是[batch, height, width, channels]這種形式。

那么TensorFlow的卷積具體是怎樣實現的呢，用一些例子去解釋它：

1.考慮一種最簡單的情況，現在有一張3×3單通道的圖像（對應的shape：[1，3，3，1]），用一個1×1的卷積核（對應的shape：[1，1，1，1]）去做卷積，最后會得到一張3×3的feature map

2.增加圖片的通道數，使用一張3×3五通道的圖像（對應的shape：[1，3，3，5]），用一個1×1的卷積核（對應的shape：[1，1，1，1]）去做卷積，仍然是一張3×3的feature map，這就相當於每一個像素點，卷積核都與該像素點的每一個通道做卷積。

input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([1,1,5,1]))

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')

3.把卷積核擴大，現在用3×3的卷積核做卷積，最后的輸出是一個值，相當於情況2的feature map所有像素點的值求和

input = tf.Variable(tf.random_normal([1,3,3,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')

4.使用更大的圖片將情況2的圖片擴大到5×5，仍然是3×3的卷積核，令步長為1，輸出3×3的feature map

input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')

注意我們可以把這種情況看成情況2和情況3的中間狀態，卷積核以步長1滑動遍歷全圖，以下x表示的位置，表示卷積核停留的位置，每停留一個，輸出feature map的一個像素

.....

.xxx.
.xxx.
.xxx.
.....

5.上面我們一直令參數padding的值為‘VALID’，當其為‘SAME’時，表示卷積核可以停留在圖像邊緣，如下，輸出5×5的feature map

input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,1]))

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')

xxxxx
xxxxx
xxxxx
xxxxx
xxxxx

6.如果卷積核有多個

input = tf.Variable(tf.random_normal([1,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))

op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')

此時輸出7張5×5的feature map

7.步長不為1的情況，文檔里說了對於圖片，因為只有兩維，通常strides取[1，stride，stride，1]

input = tf.Variable(tf.random_normal([1,5,5,5]))

filter = tf.Variable(tf.random_normal([3,3,5,7]))

op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')

此時，輸出7張3×3的feature map

x.x.x

.....
x.x.x
.....
x.x.x

8.如果batch值不為1，同時輸入10張圖

input = tf.Variable(tf.random_normal([10,5,5,5]))
filter = tf.Variable(tf.random_normal([3,3,5,7]))

op = tf.nn.conv2d(input, filter, strides=[1, 2, 2, 1], padding='SAME')

每張圖，都有7張3×3的feature map，輸出的shape就是[10，3，3，7]

最后，把程序總結一下：

import tensorflow as tf

# tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)
# 除去name參數用以指定該操作的name，與方法有關的一共五個參數：
#
# 第一個參數input：指需要做卷積的輸入圖像，它要求是一個Tensor，具有[batch, in_height, in_width, in_channels]這樣的shape，具體含義是[訓練時一個batch的圖片數量, 圖片高度, 圖片寬度, 圖像通道數]，注意這是一個4維的Tensor，要求類型為float32和float64其中之一
#
# 第二個參數filter：相當於CNN中的卷積核，它要求是一個Tensor，具有[filter_height, filter_width, in_channels, out_channels]這樣的shape，具體含義是[卷積核的高度，卷積核的寬度，圖像通道數，卷積核個數]，要求類型與參數input相同，有一個地方需要注意，第三維in_channels，就是參數input的第四維
#
# 第三個參數strides：卷積時在圖像每一維的步長，這是一個一維的向量，長度4
#
# 第四個參數padding：string類型的量，只能是"SAME","VALID"其中之一，這個值決定了不同的卷積方式（后面會介紹）
#
# 第五個參數：use_cudnn_on_gpu:bool類型，是否使用cudnn加速，默認為true
#
# 結果返回一個Tensor，這個輸出，就是我們常說的feature map

oplist=[]
# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([1, 3, 3, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([1 ,1 , 5 ,1]))

op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,1,1,1], use_cudnn_on_gpu=False, padding='VALID')
oplist.append([op2, "case 2"])

# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([1, 3, 3, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([3 ,3 , 5 ,1]))

op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,1,1,1], use_cudnn_on_gpu=False, padding='VALID')
oplist.append([op2, "case 3"])

# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([1, 5, 5, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([3 ,3 , 5 ,1]))

op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,1,1,1], use_cudnn_on_gpu=False, padding='VALID')
oplist.append([op2, "case 4"])

# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([1, 5, 5, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([3 ,3 , 5 ,1]))
op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,1,1,1], use_cudnn_on_gpu=False, padding='SAME')
oplist.append([op2, "case 5"])

# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([1, 5, 5, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([3 ,3 , 5 ,7]))
op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,1,1,1], use_cudnn_on_gpu=False, padding='SAME')
oplist.append([op2, "case 6"])


# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([1, 5, 5, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([3 ,3 , 5 ,7]))
op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,2,2,1], use_cudnn_on_gpu=False, padding='SAME')
oplist.append([op2, "case 7"])


# [batch, in_height, in_width, in_channels]
input_arg  = tf.Variable(tf.ones([4, 5, 5, 5]))
# [filter_height, filter_width, in_channels, out_channels]
filter_arg = tf.Variable(tf.ones([3 ,3 , 5 ,7]))
op2 = tf.nn.conv2d(input_arg, filter_arg, strides=[1,2,2,1], use_cudnn_on_gpu=False, padding='SAME')
oplist.append([op2, "case 8"])

with tf.Session() as a_sess:
    a_sess.run(tf.global_variables_initializer())
    for aop in oplist:
        print("----------{}---------".format(aop[1]))
        print(a_sess.run(aop[0]))
        print('---------------------\n\n')

結果是這樣的：

----------case 2---------
[[[[ 5.]
[ 5.]
[ 5.]]

[[ 5.]
[ 5.]
[ 5.]]

[[ 5.]
[ 5.]
[ 5.]]]]
---------------------

----------case 3---------
[[[[ 45.]]]]
---------------------

----------case 4---------
[[[[ 45.]
[ 45.]
[ 45.]]

[[ 45.]
[ 45.]
[ 45.]]

[[ 45.]
[ 45.]
[ 45.]]]]
---------------------

----------case 5---------
[[[[ 20.]
[ 30.]
[ 30.]
[ 30.]
[ 20.]]

[[ 30.]
[ 45.]
[ 45.]
[ 45.]
[ 30.]]

[[ 20.]
[ 30.]
[ 30.]
[ 30.]
[ 20.]]]]
---------------------

----------case 6---------
[[[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]

[[ 30. 30. 30. 30. 30. 30. 30.]
[ 45. 45. 45. 45. 45. 45. 45.]
[ 45. 45. 45. 45. 45. 45. 45.]
[ 45. 45. 45. 45. 45. 45. 45.]
[ 30. 30. 30. 30. 30. 30. 30.]]

[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]]]
---------------------

----------case 7---------
[[[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]

[[ 30. 30. 30. 30. 30. 30. 30.]
[ 45. 45. 45. 45. 45. 45. 45.]
[ 30. 30. 30. 30. 30. 30. 30.]]

[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]]]
---------------------

----------case 8---------
[[[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]

[[ 30. 30. 30. 30. 30. 30. 30.]
[ 45. 45. 45. 45. 45. 45. 45.]
[ 30. 30. 30. 30. 30. 30. 30.]]

[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]]

[[[ 20. 20. 20. 20. 20. 20. 20.]
[ 30. 30. 30. 30. 30. 30. 30.]
[ 20. 20. 20. 20. 20. 20. 20.]]