本文使用TensorFlow的基本語法(不使用高級庫),搭建神經網絡,來完成圖片分類的功能。
實現流程是:
主要分為:制作數據集、搭建CNN網絡進行訓練,使用測試集驗證、對一張實際應用的圖片進行預測。
開發環境:ancanada + python3.5 +tensorflow1.3
文件結構如圖所示:data里面存放的是圖片,net里用來存放訓練的模型,TFrecorder是制作數據集圖片預處理,resnet是搭建的resnet18層的網絡結構,之后的幾個文件見名知義。
1、制作數據集
制作數據集的圖片來源,可以使用Python從百度上爬去,將圖片下載到文件夾中,每一個類別分為一個文件夾,我們使用TensorFlow中的tfrecord,將圖片划為同一格式,並且為每一類圖片添加上標簽,最后生成tfrecord文件,並將其作為神經網絡的輸入。
如圖所示,bottle、paper里面存放的是瓶子和紙盒兩類圖片,record用來存放tfrecord文件,testdata是最后用來預測的圖片
這樣,制作數據集的代碼可以分為,生成tfrecord文件、讀取tfrecord文件、讀取的tfrecord文件的圖片分為一個batch,這三個函數進行編寫,數據集里分為訓練集和測試集,我是將每一類的圖片的前70%存儲為訓練集的tfrecord文件,剩下的30%存儲為測試集的tfrecord文件。下面是我的實現代碼:
1 import tensorflow as tf 2 from PIL import Image 3 import os 4 import matplotlib.pyplot as plt 5 import numpy as np 6 7 path = "D:\code\resnet\data" 8 train_record_path = "data/record/train.tfrecords" 9 test_record_path = "data/record/test.tfrecords" 10 classes={'bottle','paper'} #人為 設定 2 類 11 12 def _byteslist(value): 13 return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 14 15 def _int64list(value): 16 return tf.train.Feature(int64_list=tf.train.Int64List(value=[value])) 17 def create_train_record(): 18 writer = tf.python_io.TFRecordWriter(train_record_path) 19 NUM = 1 20 for index, name in enumerate(classes): 21 class_path = path + "/" + name + '/' 22 l = int(len(os.listdir(class_path))*0.7) 23 print("create tf "+str(index)) 24 for image_name in os.listdir(class_path)[:l]: 25 image_path = class_path + image_name 26 img = Image.open(image_path) 27 img = img.resize((224, 224)) 28 img_raw = img.tobytes() 29 example = tf.train.Example( 30 features=tf.train.Features(feature={ 31 'label': _int64list(index), 32 'img_raw': _byteslist(img_raw)})) 33 writer.write(example.SerializeToString()) 34 print('creat train record in ', NUM) 35 NUM += 1 36 writer.close() 37 print('creat_train_record success !') 38 39 def create_test_record(): 40 writer = tf.python_io.TFRecordWriter(test_record_path) 41 NUM = 1 42 for index,name in enumerate(classes): 43 class_path = path + "/"+name+"/" 44 l = int(len(os.listdir(class_path))*0.7) 45 for image_name in os.listdir(class_path)[l:]: 46 image_path = class_path + image_name 47 img = Image.open(image_path) 48 img = img.resize((224,224)) 49 img_raw = img.tobytes() 50 example = tf.train.Example( 51 features = tf.train.Features(feature={ 52 'label':_int64list(index), 53 'img_raw':_byteslist(img_raw)})) 54 writer.write(example.SerializeToString()) 55 print('creat test record in',NUM) 56 NUM+=1 57 writer.close() 58 print('creat_test_record success !') 59 60 def read_record(filename, img_w, img_h): 61 filename_queue = tf.train.string_input_producer([filename]) 62 reader = tf.TFRecordReader() 63 _, serialize_example = reader.read(filename_queue) 64 feature = tf.parse_single_example( 65 serialize_example, 66 features={ 67 'label': tf.FixedLenFeature([], tf.int64), 68 'img_raw': tf.FixedLenFeature([], tf.string)}) 69 label = feature['label'] 70 img = feature['img_raw'] 71 img = tf.decode_raw(img, tf.uint8) 72 img = tf.reshape(img, (224, 224, 3)) 73 img = tf.image.resize_image_with_crop_or_pad(img, img_w, img_h) 74 img = tf.cast(img, tf.float32)/255 75 label = tf.cast(label, tf.int32) 76 return img, label 77 78 def get_batch_record(filename, batch_size, img_W, img_H): 79 image, label = read_record(filename, img_W, img_H) 80 image_batch, label_batch= tf.train.shuffle_batch([image, label], 81 batch_size=batch_size, 82 83 capacity=30, 84 min_after_dequeue=10) 85 label_batch = tf.one_hot(label_batch,depth=2) 86 return image_batch, label_batch 87 88 89 90 # if __name__ == '__main__': 91 # img, label = get_batch_record(test_record_path,1,224,224) 92 # print(img) 93 94 # img, label = get_batch_record(test_record_path,2, 224, 224) 95 96 # with tf.Session() as sess: 97 # 98 # sess.run(tf.global_variables_initializer()) 99 # sess.run(tf.local_variables_initializer()) 100 # coord = tf.train.Coordinator() 101 # threads = tf.train.start_queue_runners(sess, coord) 102 # for i in range(200): 103 # image, l =sess.run([img, label]) 104 # print(image[0].shape) 105 106 # #print(image[1].shape) 107 # print(l[0]) 108 109 # plt.imshow(image[0]) 111 # plt.show() 112 # coord.request_stop()
上述代碼的說明:生成tfrecord文件時每一類圖片的label值是定義數組class的索引號,class數組是{bottle,paper},那么bottle文件夾中圖片的標簽值(label)是0,圖片預處理基本上就兩步,第一裁剪放縮到(224,224,3),第二步將圖片的像素值歸一化為0到1之間,標簽值在形成一個batch時候,又轉換為onehot格式。
2、搭建網絡結構,訓練模型
我采用的是resnet網絡架構中的18層結構,根據resnet論文中的結構進行堆疊,為了方便我自己的理解,沒有調用高級庫。搭建的時候主要的難點殘差結構的實現,當正常卷積通道數與捷徑的通道數相等時,兩個可直接相加,通道數不等時,對捷徑做一次1*1*通道數的卷積,之后再相加。如圖所示:
前向傳播搭建resnet18網絡結構代碼如下:
import tensorflow as tf def weight_variable(shape): initial = tf.truncated_normal(shape,stddev=0.1) return tf.Variable(initial) def bias_variable(shape): initial = tf.constant(0.1,shape = shape) return tf.Variable(initial) def conv2d(input,filter,strides,padding="SAME"): return tf.nn.conv2d(input,filter,strides,padding="SAME") def resnet18(input): kernel_1 = weight_variable([7,7,3,64]) bias_1 = weight_variable([64]) layer_1 = tf.nn.relu(conv2d(input,kernel_1,strides=[1,2,2,1]) + bias_1) Maxpool_1 = tf.nn.max_pool(layer_1,ksize=[1,3,3,1],strides=[1,2,2,1],padding="SAME") kernel_2 = weight_variable([3,3,64,64]) layer_2 = tf.nn.relu(conv2d(Maxpool_1,kernel_2,strides=[1,1,1,1])) kernel_3 = weight_variable([3,3,64,64]) layer_3 = conv2d(layer_2,kernel_3,strides=[1,1,1,1]) res1 = tf.nn.relu(Maxpool_1+layer_3) kernel_4 = weight_variable([3,3,64,64]) layer_4 = tf.nn.relu(conv2d(res1,kernel_4,strides=[1,1,1,1])) kernel_5 = weight_variable([3,3,64,64]) layer_5 = conv2d(layer_4,kernel_5,strides=[1,1,1,1]) res2 = tf.nn.relu(res1+layer_5) kernel_6 = weight_variable([3,3,64,128]) layer_6 = tf.nn.relu(conv2d(res2,kernel_6,strides=[1,2,2,1])) kernel_7 = weight_variable([3,3,128,128]) layer_7 = conv2d(layer_6,kernel_7,strides=[1,1,1,1]) kernel_line_1 = weight_variable([1,1,64,128]) res3 = tf.nn.relu(conv2d(res2,kernel_line_1,strides=[1,2,2,1]) + layer_7) kernel_8 = weight_variable([3,3,128,128]) layer_8 = tf.nn.relu(conv2d(res3,kernel_8,strides=[1,1,1,1])) kernel_9 = weight_variable([3,3,128,128]) layer_9 = conv2d(layer_8,kernel_9,strides=[1,1,1,1]) res4 = tf.nn.relu(res3+layer_9) kernel_10 = weight_variable([3,3,128,256]) layer_10 = tf.nn.relu(conv2d(res4,kernel_10,strides=[1,2,2,1])) kernel_11 = weight_variable([3,3,256,256]) layer_11 = conv2d(layer_10,kernel_11,strides=[1,1,1,1]) kernel_line_2 = weight_variable([1,1,128,256]) res5 = tf.nn.relu(conv2d(res4,kernel_line_2,strides=[1,2,2,1])+layer_11) kernel_12 = weight_variable([3,3,256,256]) layer_12 = tf.nn.relu(conv2d(res5,kernel_12,strides=[1,1,1,1])) kernel_13 = weight_variable([3,3,256,256]) layer_13 = conv2d(layer_12,kernel_13,strides=[1,1,1,1]) res6 = tf.nn.relu(res5+layer_13) kernel_14 = weight_variable([3,3,256,512]) layer_14 = tf.nn.relu(conv2d(res6,kernel_14,strides=[1,2,2,1])) kernel_15 = weight_variable([3,3,512,512]) layer_15 = conv2d(layer_14,kernel_15,strides=[1,1,1,1]) kernel_line_3 = weight_variable([1,1,256,512]) res7 = tf.nn.relu(conv2d(res6,kernel_line_3,strides=[1,2,2,1])+ layer_15) kernel_16 = weight_variable([3,3,512,512]) layer_16 = tf.nn.relu(conv2d(res7,kernel_16,strides=[1,1,1,1])) kernel_17 = weight_variable([3,3,512,512]) layer_17 = conv2d(layer_16,kernel_17,strides=[1,1,1,1]) res8 = tf.nn.relu(layer_17+res7) avgpool = tf.nn.avg_pool(res8,ksize=[1,7,7,1],strides=[1,1,1,1],padding="VALID") line = tf.reshape(avgpool,[-1,512]) fc_18 = weight_variable([512,2]) bias_18 = bias_variable([2]) layer_18 = tf.matmul(line,fc_18)+bias_18 return layer_18
反向傳播,輸入圖片和標簽,使得loss最小化的代碼如下:
1 import tensorflow as tf 2 import Ipynb_importer 3 import matplotlib.pyplot as plt 4 from TFrecorder import get_batch_record 5 import numpy as np 6 from resnet import resnet18 7 batch_size=20 8 filename = "data/record/train.tfrecords" 9 filename_test = "data/record/test.tfrecords" 10 num_classes = 2 11 img_w = 224 12 img_h = 224 13 14 x = tf.placeholder(tf.float32, [None, img_w, img_h, 3]) 15 y = tf.placeholder(tf.float32, [None, num_classes]) 16 prediction=resnet18(x) 17 loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction)) 18 train_step=tf.train.AdamOptimizer(0.001).minimize(loss) 19 20 21 correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))#argmax求最大的概率 22 accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) 23 24 image_batch, label_batch = get_batch_record(filename, batch_size, img_w, img_h) 25 image_batch_test, label_batch_test = get_batch_record(filename_test, batch_size, img_w, img_h) 26 init = tf.global_variables_initializer() 27 28 saver = tf.train.Saver() 29 30 with tf.Session() as sess: 31 sess.run(init) 32 saver.restore(sess, "net/my_resnet18.ckpt") 33 coord = tf.train.Coordinator() 34 threads = tf.train.start_queue_runners(sess, coord) 35 #image, label = sess.run([image_batch, label_batch]) 36 for i in range(302): 37 image, label = sess.run([image_batch, label_batch]) 38 #image_test, label_test = sess.run([image_batch_test, label_batch_test]) 39 sess.run(train_step,feed_dict={x:image,y:label}) 40 l = sess.run(loss,feed_dict={x:image,y:label}) 41 #acc = sess.run(accuracy,feed_dict={x:image_test,y:label_test}) 42 if i%20==0: 43 print("iter: "+str(i)+" loss "+str(l)) 44 saver.save(sess,"net/my_resnet18.ckpt") 45 coord.request_stop() # 7 46 coord.join(threads)
代碼的說明:里面使用了多線程讀取,最后要關閉線程。還有里面加了斷點續訓,在開始訓練之前會先載入模型參數,第一次訓練如果報錯,將32行注釋就可以了。
3、測試集驗證
這個里面就沒有什么難點了,直接放代碼:
1 import tensorflow as tf 2 import Ipynb_importer 3 import matplotlib.pyplot as plt 4 from TFrecorder import get_batch_record 5 import numpy as np 6 from resnet import resnet18 7 batch_size=20 8 filename = "data/record/test.tfrecords" 9 num_classes = 2 10 img_w = 224 11 img_h = 224 12 13 14 15 16 x = tf.placeholder(tf.float32, [None, img_w, img_h, 3]) 17 y = tf.placeholder(tf.float32, [None, num_classes]) 18 prediction=resnet18(x) 19 20 21 #train_step=tf.train.AdamOptimizer(0.001).minimize(loss) 22 23 correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))#argmax求最大的概率 24 accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) 25 26 image_batch, label_batch = get_batch_record(filename, batch_size, img_w, img_h) 27 28 init = tf.global_variables_initializer() 29 saver = tf.train.Saver() 30 31 with tf.Session() as sess: 32 33 sess.run(init) 34 module_file = tf.train.latest_checkpoint('/my_resnet18.ckpt') 35 saver.restore(sess, "net/my_resnet18.ckpt") 36 coord = tf.train.Coordinator() 37 threads = tf.train.start_queue_runners(sess, coord) 38 39 for i in range(21): 40 image, label = sess.run([image_batch, label_batch]) 41 acc = sess.run(accuracy,feed_dict={x:image,y:label}) 42 43 if i%2==0: 44 print(" acc: "+str(acc)) 45 46 coord.request_stop() # 7 47 coord.join(threads)
4、單張圖片預測
從文件夾中讀取一張圖片之后進行預測。
代碼如下:
import tensorflow as tf import Ipynb_importer from PIL import Image import matplotlib.pyplot as plt import numpy as np from resnet import resnet18 from skimage import io, transform import numpy as np from pylab import mpl image_dir = "data/testdata/4.jpg" MODEL_SAVE_PATH = "net/my_resnet18.ckpt" def load_image(path): #傳入讀入圖片的參數路徑 img = io.imread(path) #將像素歸一化處理到[0,1] img = img / 255.0 re_img = transform.resize(img, (224, 224)) plt.imshow(re_img) plt.show() img_ready = re_img.reshape((1, 224, 224, 3)) return img_ready def prediction_result(): x = tf.placeholder(tf.float32, [None, 224, 224, 3]) image = load_image(image_dir) y=resnet18(x) # 利用softmax來獲取概率 #數組的0號代表瓶子,1號位置是紙盒 probabilities = tf.nn.softmax(y) # 獲取最大概率的標簽位置 correct_prediction = tf.argmax(y, 1) init = tf.global_variables_initializer() saver = tf.train.Saver() with tf.Session() as sess: sess.run(init) saver.restore(sess, MODEL_SAVE_PATH) #image = sess.run(image) probabilities, label = sess.run([probabilities, correct_prediction], feed_dict={x:image}) print(probabilities) if label == 0: print("this is a bottle") def main(): #print('數組的0號代表瓶子,1號位置是紙盒') prediction_result() if __name__ == '__main__': main()
最后添加的是一個應用場景的功能,可以換應用場景,相應的就要更換數據集的圖片,例如可以使用嵌入式硬件作為終端拍攝圖片上傳到服務器進行識別,我畢業設計的應用場景就是使用樹莓派控制攝像頭對物體進行拍照,上傳到服務端,返回識別結果,語音和顯示屏輸出,來教幼兒園的小孩,識字和認識物體。這個場景的實現以后有時間會更新。寫這篇博文的目的,一是為了總結學的圖片分類的知識,二是希望給想學習圖片分類應用做入門的參考。