使用TensorFlow進行圖片分類實例


本文使用TensorFlow的基本語法(不使用高級庫),搭建神經網絡,來完成圖片分類的功能。

實現流程是:

主要分為:制作數據集、搭建CNN網絡進行訓練,使用測試集驗證、對一張實際應用的圖片進行預測。

發環境:ancanada + python3.5 +tensorflow1.3

文件結構如圖所示:data里面存放的是圖片,net里用來存放訓練的模型,TFrecorder是制作數據集圖片預處理,resnet是搭建的resnet18層的網絡結構,之后的幾個文件見名知義。

 

1、制作數據集

制作數據集的圖片來源,可以使用Python從百度上爬去,將圖片下載到文件夾中,每一個類別分為一個文件夾,我們使用TensorFlow中的tfrecord,將圖片划為同一格式,並且為每一類圖片添加上標簽,最后生成tfrecord文件,並將其作為神經網絡的輸入。

如圖所示,bottle、paper里面存放的是瓶子和紙盒兩類圖片,record用來存放tfrecord文件,testdata是最后用來預測的圖片

 

這樣,制作數據集的代碼可以分為,生成tfrecord文件、讀取tfrecord文件、讀取的tfrecord文件的圖片分為一個batch,這三個函數進行編寫,數據集里分為訓練集和測試集,我是將每一類的圖片的前70%存儲為訓練集的tfrecord文件,剩下的30%存儲為測試集的tfrecord文件。下面是我的實現代碼:

  1 import  tensorflow as tf
  2 from PIL import Image 
  3 import os
  4 import matplotlib.pyplot as plt 
  5 import numpy as np
  6 
  7 path = "D:\code\resnet\data"
  8 train_record_path = "data/record/train.tfrecords"
  9 test_record_path = "data/record/test.tfrecords"
 10 classes={'bottle','paper'} #人為 設定 2 類
 11 
 12 def _byteslist(value):
 13     return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
 14  
 15 def _int64list(value):
 16     return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
 17 def create_train_record():
 18     writer = tf.python_io.TFRecordWriter(train_record_path)
 19     NUM = 1
 20     for index, name in enumerate(classes):
 21         class_path = path + "/" + name + '/'
 22         l = int(len(os.listdir(class_path))*0.7)
 23         print("create tf "+str(index))
 24         for image_name in os.listdir(class_path)[:l]:
 25             image_path = class_path + image_name
 26             img = Image.open(image_path)
 27             img = img.resize((224, 224))
 28             img_raw = img.tobytes()
 29             example = tf.train.Example(
 30                     features=tf.train.Features(feature={
 31                             'label': _int64list(index),
 32                             'img_raw': _byteslist(img_raw)}))
 33             writer.write(example.SerializeToString())
 34             print('creat train record in ', NUM)
 35             NUM += 1
 36     writer.close()
 37     print('creat_train_record success !')
 38     
 39 def create_test_record():
 40     writer = tf.python_io.TFRecordWriter(test_record_path)
 41     NUM = 1
 42     for index,name in enumerate(classes):
 43         class_path = path + "/"+name+"/"
 44         l = int(len(os.listdir(class_path))*0.7)
 45         for image_name in os.listdir(class_path)[l:]:
 46             image_path = class_path + image_name
 47             img = Image.open(image_path)
 48             img = img.resize((224,224))
 49             img_raw = img.tobytes()
 50             example = tf.train.Example(
 51                     features = tf.train.Features(feature={
 52                         'label':_int64list(index),
 53                         'img_raw':_byteslist(img_raw)}))
 54             writer.write(example.SerializeToString())
 55             print('creat test record in',NUM)
 56             NUM+=1
 57     writer.close()
 58     print('creat_test_record success !')
 59     
 60 def read_record(filename, img_w, img_h):
 61     filename_queue = tf.train.string_input_producer([filename])
 62     reader = tf.TFRecordReader()
 63     _, serialize_example = reader.read(filename_queue)
 64     feature = tf.parse_single_example(
 65             serialize_example,
 66             features={
 67                     'label': tf.FixedLenFeature([], tf.int64),
 68                     'img_raw': tf.FixedLenFeature([], tf.string)})
 69     label = feature['label']
 70     img = feature['img_raw']
 71     img = tf.decode_raw(img, tf.uint8)
 72     img = tf.reshape(img, (224, 224, 3))
 73     img = tf.image.resize_image_with_crop_or_pad(img, img_w, img_h) 
 74     img = tf.cast(img, tf.float32)/255
 75     label = tf.cast(label, tf.int32)
 76     return img, label
 77 
 78 def get_batch_record(filename, batch_size, img_W, img_H):
 79     image, label = read_record(filename, img_W, img_H)
 80     image_batch, label_batch= tf.train.shuffle_batch([image, label],
 81                                                      batch_size=batch_size,
 82                                                      
 83                                                      capacity=30,
 84                                                      min_after_dequeue=10)
 85     label_batch = tf.one_hot(label_batch,depth=2)
 86     return image_batch, label_batch
 87 
 88 
 89 
 90 # if __name__ == '__main__':
 91 #     img, label = get_batch_record(test_record_path,1,224,224)
 92 #     print(img)
 93 
 94 # img, label = get_batch_record(test_record_path,2, 224, 224)
 95 
 96 # with tf.Session() as sess:
 97 #
 98 #     sess.run(tf.global_variables_initializer())
 99 #     sess.run(tf.local_variables_initializer())
100 #     coord = tf.train.Coordinator()
101 #     threads = tf.train.start_queue_runners(sess, coord)
102 #     for i in range(200):
103 #         image, l =sess.run([img, label])
104 #         print(image[0].shape)
105         
106 #         #print(image[1].shape)
107 #         print(l[0])
108         
109 #         plt.imshow(image[0])
111 #         plt.show()
112 #     coord.request_stop()

上述代碼的說明:生成tfrecord文件時每一類圖片的label值是定義數組class的索引號,class數組是{bottle,paper},那么bottle文件夾中圖片的標簽值(label)是0,圖片預處理基本上就兩步,第一裁剪放縮到(224,224,3),第二步將圖片的像素值歸一化為0到1之間,標簽值在形成一個batch時候,又轉換為onehot格式。

2、搭建網絡結構,訓練模型

我采用的是resnet網絡架構中的18層結構,根據resnet論文中的結構進行堆疊,為了方便我自己的理解,沒有調用高級庫。搭建的時候主要的難點殘差結構的實現,當正常卷積通道數與捷徑的通道數相等時,兩個可直接相加,通道數不等時,對捷徑做一次1*1*通道數的卷積,之后再相加。如圖所示:

 

前向傳播搭建resnet18網絡結構代碼如下:

import tensorflow as tf
def weight_variable(shape):
    initial = tf.truncated_normal(shape,stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1,shape = shape)
    return tf.Variable(initial)

def conv2d(input,filter,strides,padding="SAME"):
    return tf.nn.conv2d(input,filter,strides,padding="SAME")

def resnet18(input):
    kernel_1 = weight_variable([7,7,3,64])
    bias_1 = weight_variable([64])
    layer_1 = tf.nn.relu(conv2d(input,kernel_1,strides=[1,2,2,1]) + bias_1)
    Maxpool_1 = tf.nn.max_pool(layer_1,ksize=[1,3,3,1],strides=[1,2,2,1],padding="SAME")
    
    kernel_2 = weight_variable([3,3,64,64])
    layer_2 = tf.nn.relu(conv2d(Maxpool_1,kernel_2,strides=[1,1,1,1]))
    
    kernel_3 = weight_variable([3,3,64,64])
    layer_3 = conv2d(layer_2,kernel_3,strides=[1,1,1,1]) 
    res1 = tf.nn.relu(Maxpool_1+layer_3)
    
    kernel_4 = weight_variable([3,3,64,64])
    layer_4 = tf.nn.relu(conv2d(res1,kernel_4,strides=[1,1,1,1]))
    
    kernel_5 = weight_variable([3,3,64,64])
    layer_5 = conv2d(layer_4,kernel_5,strides=[1,1,1,1])
    res2 = tf.nn.relu(res1+layer_5)
    
    kernel_6 = weight_variable([3,3,64,128])
    layer_6 = tf.nn.relu(conv2d(res2,kernel_6,strides=[1,2,2,1]))
    
    kernel_7 = weight_variable([3,3,128,128])
    layer_7 = conv2d(layer_6,kernel_7,strides=[1,1,1,1])
    kernel_line_1 = weight_variable([1,1,64,128])
    res3 = tf.nn.relu(conv2d(res2,kernel_line_1,strides=[1,2,2,1]) + layer_7)
    
    kernel_8 = weight_variable([3,3,128,128])
    layer_8 = tf.nn.relu(conv2d(res3,kernel_8,strides=[1,1,1,1]))
    
    kernel_9 = weight_variable([3,3,128,128])
    layer_9 = conv2d(layer_8,kernel_9,strides=[1,1,1,1])
    res4 = tf.nn.relu(res3+layer_9)
    
    kernel_10 = weight_variable([3,3,128,256])
    layer_10 = tf.nn.relu(conv2d(res4,kernel_10,strides=[1,2,2,1]))
    
    kernel_11 = weight_variable([3,3,256,256])
    layer_11 = conv2d(layer_10,kernel_11,strides=[1,1,1,1])
    kernel_line_2 = weight_variable([1,1,128,256]) 
    res5 = tf.nn.relu(conv2d(res4,kernel_line_2,strides=[1,2,2,1])+layer_11)
    
    kernel_12 = weight_variable([3,3,256,256])
    layer_12 = tf.nn.relu(conv2d(res5,kernel_12,strides=[1,1,1,1]))
    
    kernel_13 = weight_variable([3,3,256,256])
    layer_13 = conv2d(layer_12,kernel_13,strides=[1,1,1,1])
    res6 = tf.nn.relu(res5+layer_13)
    
    kernel_14 = weight_variable([3,3,256,512])
    layer_14 = tf.nn.relu(conv2d(res6,kernel_14,strides=[1,2,2,1]))
    
    kernel_15 = weight_variable([3,3,512,512])
    layer_15 = conv2d(layer_14,kernel_15,strides=[1,1,1,1])
    kernel_line_3 = weight_variable([1,1,256,512])
    res7 = tf.nn.relu(conv2d(res6,kernel_line_3,strides=[1,2,2,1])+ layer_15)
    
    kernel_16 = weight_variable([3,3,512,512])
    layer_16 = tf.nn.relu(conv2d(res7,kernel_16,strides=[1,1,1,1]))
    
    kernel_17 = weight_variable([3,3,512,512])
    layer_17 = conv2d(layer_16,kernel_17,strides=[1,1,1,1])
    res8 = tf.nn.relu(layer_17+res7)
    
    avgpool = tf.nn.avg_pool(res8,ksize=[1,7,7,1],strides=[1,1,1,1],padding="VALID")
    
    line = tf.reshape(avgpool,[-1,512])
    
    fc_18 = weight_variable([512,2])
    bias_18 = bias_variable([2])
    layer_18 = tf.matmul(line,fc_18)+bias_18
    
    
    return layer_18

反向傳播,輸入圖片和標簽,使得loss最小化的代碼如下:

 1 import tensorflow as tf
 2 import Ipynb_importer
 3 import matplotlib.pyplot as plt
 4 from TFrecorder import get_batch_record
 5 import numpy as np
 6 from resnet import resnet18
 7 batch_size=20
 8 filename = "data/record/train.tfrecords"
 9 filename_test = "data/record/test.tfrecords"
10 num_classes = 2
11 img_w = 224
12 img_h = 224
13 
14 x = tf.placeholder(tf.float32, [None, img_w, img_h, 3])
15 y = tf.placeholder(tf.float32, [None, num_classes])
16 prediction=resnet18(x)
17 loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
18 train_step=tf.train.AdamOptimizer(0.001).minimize(loss)
19 
20 
21 correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))#argmax求最大的概率
22 accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
23 
24 image_batch, label_batch = get_batch_record(filename, batch_size, img_w, img_h)
25 image_batch_test, label_batch_test = get_batch_record(filename_test, batch_size, img_w, img_h)
26 init = tf.global_variables_initializer()
27 
28 saver = tf.train.Saver()
29 
30 with tf.Session() as sess:
31     sess.run(init)
32     saver.restore(sess, "net/my_resnet18.ckpt")
33     coord = tf.train.Coordinator()
34     threads = tf.train.start_queue_runners(sess, coord)
35     #image, label = sess.run([image_batch, label_batch])
36     for i in range(302):
37         image, label = sess.run([image_batch, label_batch])
38         #image_test, label_test = sess.run([image_batch_test, label_batch_test])
39         sess.run(train_step,feed_dict={x:image,y:label})
40         l = sess.run(loss,feed_dict={x:image,y:label})
41         #acc = sess.run(accuracy,feed_dict={x:image_test,y:label_test})
42         if i%20==0:
43             print("iter: "+str(i)+" loss "+str(l))
44         saver.save(sess,"net/my_resnet18.ckpt")
45     coord.request_stop()  # 7
46     coord.join(threads) 

代碼的說明:里面使用了多線程讀取,最后要關閉線程。還有里面加了斷點續訓,在開始訓練之前會先載入模型參數,第一次訓練如果報錯,將32行注釋就可以了。

3、測試集驗證

這個里面就沒有什么難點了,直接放代碼:

 1 import tensorflow as tf
 2 import Ipynb_importer
 3 import matplotlib.pyplot as plt
 4 from TFrecorder import get_batch_record
 5 import numpy as np
 6 from resnet import resnet18
 7 batch_size=20
 8 filename = "data/record/test.tfrecords"
 9 num_classes = 2
10 img_w = 224
11 img_h = 224
12 
13 
14 
15 
16 x = tf.placeholder(tf.float32, [None, img_w, img_h, 3])
17 y = tf.placeholder(tf.float32, [None, num_classes])
18 prediction=resnet18(x)
19 
20 21 #train_step=tf.train.AdamOptimizer(0.001).minimize(loss)
22 
23 correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))#argmax求最大的概率
24 accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
25 
26 image_batch, label_batch = get_batch_record(filename, batch_size, img_w, img_h)
27 
28 init = tf.global_variables_initializer()
29 saver = tf.train.Saver()
30 
31 with tf.Session() as sess:
32     
33     sess.run(init)
34     module_file =  tf.train.latest_checkpoint('/my_resnet18.ckpt')
35     saver.restore(sess, "net/my_resnet18.ckpt")
36     coord = tf.train.Coordinator()
37     threads = tf.train.start_queue_runners(sess, coord)
38     
39     for i in range(21):
40         image, label = sess.run([image_batch, label_batch])
41         acc = sess.run(accuracy,feed_dict={x:image,y:label})
42         43         if i%2==0:
44             print(" acc: "+str(acc))
45            
46     coord.request_stop()  # 7
47     coord.join(threads) 

4、單張圖片預測

從文件夾中讀取一張圖片之后進行預測。

代碼如下:

 

import tensorflow as tf
import Ipynb_importer
from PIL import Image 
import matplotlib.pyplot as plt
import numpy as np
from resnet import resnet18
from skimage import io, transform
import numpy as np
from pylab import mpl


image_dir = "data/testdata/4.jpg"
MODEL_SAVE_PATH = "net/my_resnet18.ckpt"
def load_image(path):
    #傳入讀入圖片的參數路徑
    img = io.imread(path) 
    #將像素歸一化處理到[0,1]
    img = img / 255.0 
    re_img = transform.resize(img, (224, 224)) 
    plt.imshow(re_img)
    plt.show()
    img_ready = re_img.reshape((1, 224, 224, 3))
    return img_ready

def prediction_result():
    x = tf.placeholder(tf.float32, [None, 224, 224, 3])
    image = load_image(image_dir)
    y=resnet18(x)
    # 利用softmax來獲取概率
    #數組的0號代表瓶子,1號位置是紙盒
    probabilities = tf.nn.softmax(y)
    # 獲取最大概率的標簽位置
    correct_prediction = tf.argmax(y, 1)
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    
    with tf.Session() as sess:
        sess.run(init)
        saver.restore(sess, MODEL_SAVE_PATH)
        #image = sess.run(image)
        probabilities, label = sess.run([probabilities, correct_prediction], feed_dict={x:image})
        print(probabilities)
        if label == 0:
            print("this is a bottle")
            
        
        
    
def main():
    #print('數組的0號代表瓶子,1號位置是紙盒')
    prediction_result()
   

if __name__ == '__main__':
    main() 

 

 

最后添加的是一個應用場景的功能,可以換應用場景,相應的就要更換數據集的圖片,例如可以使用嵌入式硬件作為終端拍攝圖片上傳到服務器進行識別,我畢業設計的應用場景就是使用樹莓派控制攝像頭對物體進行拍照,上傳到服務端,返回識別結果,語音和顯示屏輸出,來教幼兒園的小孩,識字和認識物體。這個場景的實現以后有時間會更新。寫這篇博文的目的,一是為了總結學的圖片分類的知識,二是希望給想學習圖片分類應用做入門的參考。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM