還沒入門,就因為工作需要,要用CNN實現文本分類,用了github上現成的cnn-text-classification-tf代碼,邊讀邊學吧。
源碼為四個PY文件,分別是
- text_cnn.py:網絡結構設計
- train.py:網絡訓練
- eval.py:預測&評估
- data_helpers.py:數據預處理
下面分別進行注釋。

1 import tensorflow as tf 2 import numpy as np 3 4 #定義網絡的結構 5 class TextCNN(object): 6 """ 7 A CNN for text classification. 8 Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer. 9 """ 10 def __init__( 11 self, sequence_length, num_classes, vocab_size, 12 embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0): 13 14 # Placeholders for input, output and dropout 15 self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x") 16 self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y") 17 self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob") 18 19 # Keeping track of l2 regularization loss (optional) 20 l2_loss = tf.constant(0.0) 21 22 # Embedding layer 23 with tf.device('/cpu:0'), tf.name_scope("embedding"): 24 self.W = tf.Variable( 25 tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), 26 name="W") 27 self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x) 28 self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1) 29 30 # Create a convolution + maxpool layer for each filter size 31 pooled_outputs = [] 32 for i, filter_size in enumerate(filter_sizes): 33 with tf.name_scope("conv-maxpool-%s" % filter_size): 34 # Convolution Layer 35 filter_shape = [filter_size, embedding_size, 1, num_filters] 36 W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") 37 b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") 38 conv = tf.nn.conv2d( 39 self.embedded_chars_expanded, 40 W, 41 strides=[1, 1, 1, 1], 42 padding="VALID", 43 name="conv") 44 # Apply nonlinearity 45 h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") 46 # Maxpooling over the outputs 47 pooled = tf.nn.max_pool( 48 h, 49 ksize=[1, sequence_length - filter_size + 1, 1, 1], 50 strides=[1, 1, 1, 1], 51 padding='VALID', 52 name="pool") 53 pooled_outputs.append(pooled) 54 55 # Combine all the pooled features 56 num_filters_total = num_filters * len(filter_sizes) 57 self.h_pool = tf.concat(pooled_outputs, 3) 58 self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total]) 59 60 # Add dropout 61 with tf.name_scope("dropout"): 62 self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob) 63 64 # Final (unnormalized) scores and predictions 65 with tf.name_scope("output"): 66 W = tf.get_variable( 67 "W", 68 shape=[num_filters_total, num_classes], 69 initializer=tf.contrib.layers.xavier_initializer()) 70 b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b") 71 l2_loss += tf.nn.l2_loss(W) 72 l2_loss += tf.nn.l2_loss(b) 73 self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores") 74 self.predictions = tf.argmax(self.scores, 1, name="predictions") 75 76 # Calculate mean cross-entropy loss 77 with tf.name_scope("loss"): 78 losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y) 79 self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss 80 81 # Accuracy 82 with tf.name_scope("accuracy"): 83 correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1)) 84 self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
可以看到類TextCNN定義了神經網絡的結構,用若干函數參數初始化
- sequence_length:句子固定長度(不足補全,超過截斷)
- num_classes:類別數
- vocab_size:詞庫大小
- embedding_size:詞向量維度
- filter_sizes:卷積核尺寸
- num_filters:每個尺寸的卷積核數量
- l2_reg_lambda=0.0:L2正則參數
下面開始構建網絡,一句句看。
self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y") self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
l2_loss = tf.constant(0.0)
變量input_x存儲句子矩陣,寬為sequence_length,長度自適應(=句子數量);input_y存儲句子對應的分類結果,寬度為num_classes,長度自適應;
變量dropout_keep_prob存儲dropout參數,常量l2_loss為L2正則超參數。
# Embedding layer with tf.device('/cpu:0'), tf.name_scope("embedding"): self.W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name="W") self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x) self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
Embedding層
self.W可以理解為詞向量詞典,存儲vocab_size個大小為embedding_size的詞向量,隨機初始化為-1~1之間的值;
self.embedded_chars是輸入input_x對應的詞向量表示;size:[句子數量, sequence_length, embedding_size]
self.embedded_chars_expanded是,將詞向量表示擴充一個維度(embedded_chars * 1),維度變為[句子數量, sequence_length, embedding_size, 1],方便進行卷積(tf.nn.conv2d的input參數為四維變量,見后文)
函數tf.expand_dims(input, axis=None, name=None, dim=None):在input第axis位置增加一個維度(dim用法等同於axis,官方文檔已棄用)
# Create a convolution + maxpool layer for each filter size pooled_outputs = [] for i, filter_size in enumerate(filter_sizes): with tf.name_scope("conv-maxpool-%s" % filter_size): # Convolution Layer filter_shape = [filter_size, embedding_size, 1, num_filters] W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") conv = tf.nn.conv2d( self.embedded_chars_expanded, W, strides=[1, 1, 1, 1], padding="VALID", name="conv") # Apply nonlinearity h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") # Maxpooling over the outputs pooled = tf.nn.max_pool( h, ksize=[1, sequence_length - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") pooled_outputs.append(pooled)
卷積層(以下的參數名都是conv-maxpool-i域名下的)
卷積計算:
conv-maxpool-i/filter_shape:卷積核矩陣的大小,包括num_filters個(輸出通道數)大小為filter_size*embedding_size的卷積核,輸入通道數為1;卷積核尺寸中的embedding_size,相當於對輸入文字序列從左到右卷,沒有上下卷的過程。
conv-maxpool-i/W:卷積核,shape為filter_shape,元素隨機生成,正態分布
conv-maxpool-i/b:偏移量,num_filters個卷積核,故有這么多個偏移量
conv-maxpool-i/conv:conv-maxpool-i/W與self.embedded_chars_expanded的卷積
函數tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)實現卷積計算:參考http://blog.csdn.net/mao_xiao_feng/article/details/78004522,本處調用的參數:
- input:輸入的詞向量,[句子數(圖片數)batch, 句子定長(對應圖高),詞向量維度(對應圖寬), 1(對應圖像通道數)]
- filter:卷積核,[卷積核的高度,詞向量維度(卷積核的寬度),1(圖像通道數),卷積核個數(輸出通道數)]
- strides:圖像各維步長,一維向量,長度為4,圖像通常為[1, x, x, 1]
- padding:卷積方式,'SAME'為等長卷積, 'VALID'為窄卷積
- 輸出feature map:shape是[batch, height, width, channels]這種形式
激活函數:
conv-maxpool-i/h:存儲WX+b后非線性激活的結果
函數tf.nn.bias_add(value, bias, name = None):將偏差項bias加到value上,支持廣播的形式,bias必須為1維的,value維度任意,最后一維和bias大小一致;
函數tf.nn.relu(features, name = None):非線性激活單元relu激活函數
池化(Pooling):
conv-maxpool-i/pooled:池化后結果
函數tf.nn.max_pool(value, ksize, strides, padding, name=None):對value池化
value:待池化的四維張量,維度是[batch, height, width, channels]
- ksize:池化窗口大小,長度(大於)等於4的數組,與value的維度對應,一般為[1,height,width,1],batch和channels上不池化
- strides:與卷積步長類似
- padding:與卷積的padding參數類似
- 返回值shape仍然是[batch, height, width, channels]這種形式
池化后的結果append到pooled_outputs中。對每個卷積核重復上述操作,故pooled_outputs的數組長度應該為num_filters。
# Combine all the pooled features num_filters_total = num_filters * len(filter_sizes) self.h_pool = tf.concat(pooled_outputs, 3) self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
tf.concat(values, concat_dim)連接values中的矩陣,concat_dim指定在哪一維(從0計數)連接。values[i].shape = [D0, D1, ... Dconcat_dim(i), ...Dn],連接后就是:
[D0, D1, ... Rconcat_dim, ...Dn]。回想pool_outputs的shape,是存在pool_outputs中的若干種卷積核的池化后結果,維度為[len(filter_sizes), batch, height, width, channels=1],因此連接的第3維為width,即對句子中的某個詞,將不同核產生的計算結果(features)拼接起來。
1 # Add dropout 2 with tf.name_scope("dropout"): 3 self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob) 4 5 # Final (unnormalized) scores and predictions 6 with tf.name_scope("output"): 7 W = tf.get_variable( 8 "W", 9 shape=[num_filters_total, num_classes], 10 initializer=tf.contrib.layers.xavier_initializer()) 11 b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b") 12 l2_loss += tf.nn.l2_loss(W) 13 l2_loss += tf.nn.l2_loss(b) 14 self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores") 15 self.predictions = tf.argmax(self.scores, 1, name="predictions") 16 17 # Calculate mean cross-entropy loss 18 with tf.name_scope("loss"): 19 losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y) 20 self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss 21 22 # Accuracy 23 with tf.name_scope("accuracy"): 24 correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1)) 25 self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
——————————————————————————————————————————————————————————————————
應邀補上后面的注釋,時間久了有錯誤歡迎指正
dropout層
tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)
dropout層,對池化后的結果h_pool_flat做dropout,概率是dropout_keep_prob,防止過擬合
上面就是神經網絡的隱層
輸出層(+softmax層)
W和b均為線性參數,因為加了兩個參數所以增加了L2損失,都加到了l2_loss里;
scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")這里計算了WX+b,作為模型最后各個分類的得分
predict = tf.argmax(self.scores, 1, name="predictions")這里取到scores中數值最大的類別作為模型預測結果
這個scores是沒有歸一化的結果
后面利用tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y),計算了模型預測值scores和真實值input_y之間的交叉熵損失
最終的損失值為交叉熵損失+L2正則損失,l2_reg_lambda是正則項系數
模型的訓練應該是用這個loss值作為梯度下降的損失函數的
模型評估
這里不屬於模型結構的一部分,只是計算了模型的准確率,用tf.equal判斷預測結果和真實值之間是否相等,tf.reduce_mean計算了模型和真實值一致的結果占得比例