一、數據集選擇和實現思路
1、數據集說明:這里用到數據集來自於百度AI Studio平台的公共數據集,屬於實驗數據集,數據集本身較小因而會影響深度網絡最終訓練的准確率。數據集鏈接:[https://aistudio.baidu.com/aistudio/datasetdetail/8325]:
2、使用說明:數據集解壓縮后有四類標注圖像,此次只使用其中兩類做一個簡單的二分類,如有其他分類需求可自行修改相關的訓練代碼,本人在此使用“jiangwen”和“zhangziyi”的分類。
如圖:
(需要說明的是,我這里的face數據集文件夾放在項目文件夾下,項目文件夾是cascadeFace)
3、實現思路:使用OpenCV中提供的Haar級聯分類器進行面部檢測,扣取Haar分類器檢測到的面部圖像,喂入已經訓練好的AlexNet卷積模型中獲取識別結果(當然我們將自己構建AlexNet並訓練它)。關於Haar的使用,我將在后面的測試代碼部分講解;關於Haar的理論知識請參考[https://www.cnblogs.com/zyly/p/9410563.html];
二、數據預處理
代碼如下:
import os
import sys
import cv2
import numpy as np
"""預處理
"""
IMAGE_SIZE = 64
def resize_image(image, height=IMAGE_SIZE, width=IMAGE_SIZE):
"""按照指定尺寸調整圖像大小
"""
top, bottom, left, right = (0, 0, 0, 0)
h, w, _ = image.shape
# 找到最長的邊(對於長寬不等的圖像)
longest_edge = max(h, w)
# 計算短邊需要增加多上像素寬度使其與長邊等長
if h < longest_edge:
dh = longest_edge - h
top = dh // 2
bottom = dh - top
elif w < longest_edge:
dw = longest_edge - w
left = dw // 2
right = dw - left
else:
pass
# RGB色彩
BLACK = [0, 0, 0]
# 給圖像增加邊界,是圖像長、寬等長,cv2.BORDED_CONSTANT指定邊界顏色由value指定
constant = cv2.copyMakeBorder(image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=BLACK)
return cv2.resize(constant, (height, width))
# 讀取訓練數據
images = []
labels = []
def read_path(path_name):
for dir_item in os.listdir(path_name):
# 從初始路徑開始疊加,合並成可識別操作路徑
full_path = os.path.abspath(os.path.join(path_name, dir_item))
if os.path.isdir(full_path):
read_path(full_path)
else:
if dir_item.endswith('.jpg') or dir_item.endswith('.png'):
image = cv2.imread(full_path)
image = resize_image(image, IMAGE_SIZE, IMAGE_SIZE)
# cv2.imwrite('1.jpg', image)
images.append(image)
labels.append(path_name)
return images, labels
# 從指定路徑讀取訓練數據
def load_dataset(path_name):
images, labels = read_path(path_name)
# 將圖片轉換成四維數組,尺寸:圖片數量 * IMAGE_SIZE * IMAGE_SIZE * 3
# 圖片為 64*64 像素,一個像素3個顏色值
images = np.array(images)
print(images.shape)
# 標注數據
labels = np.array([0 if label.endswith('jiangwen') else 1 for label in labels])
return images, labels
if __name__ == '__main__':
if len(sys.argv) != 1:
print('Usage: %s path_name\r\n' % (sys.argv[0]))
else:
images, labels = load_dataset('./face')
說明:resize_image()的功能是判斷圖像是否長寬相等,如果不是則統一長寬,然后才調用cv2.resize()實現等比縮放,這樣確保了圖像不會失真。
三、模型搭建與訓練
1、模型結構介紹
說明:這個結構參考圖很好的展示了AlexNet網絡模型,AlexNet雖然如今已經是相對簡單基礎的卷積模型,但其參數量依然龐大,用作分類任務時其全連接層的百萬級參數量成為訓練網絡的負擔,我們使用Dropout對半丟棄結點。還有一點需要說的就是我們的這次實驗在輸入數據的尺寸與上網絡結構圖所顯示的不太一樣,具體的情況請閱讀接下來所展示的模型搭建與訓練代碼。
2、模型參數表
(alexnet)
Layer (type) Output Shape Param #
conv2d_31 (Conv2D) (None, 55, 55, 96) 28896
_________________________________________________________________
activation_47 (Activation) (None, 55, 55, 96) 0
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 27, 27, 96) 0
_________________________________________________________________
conv2d_32 (Conv2D) (None, 27, 27, 256) 614656
_________________________________________________________________
activation_48 (Activation) (None, 27, 27, 256) 0
_________________________________________________________________
max_pooling2d_20(MaxPooling) (None, 13, 13, 256) 0
_________________________________________________________________
conv2d_33 (Conv2D) (None, 13, 13, 384) 885120
_________________________________________________________________
activation_49 (Activation) (None, 13, 13, 384) 0
_________________________________________________________________
conv2d_34 (Conv2D) (None, 13, 13, 384) 1327488
_________________________________________________________________
activation_50 (Activation) (None, 13, 13, 384) 0
_________________________________________________________________
conv2d_35 (Conv2D) (None, 13, 13, 256) 884992
_________________________________________________________________
activation_51 (Activation) (None, 13, 13, 256) 0
_________________________________________________________________
max_pooling2d_21(MaxPooling) (None, 6, 6, 256) 0
_________________________________________________________________
flatten_5 (Flatten) (None, 9216) 0
_________________________________________________________________
dense_17 (Dense) (None, 4096) 37752832
_________________________________________________________________
activation_52 (Activation) (None, 4096) 0
_________________________________________________________________
dropout_15 (Dropout) (None, 4096) 0
_________________________________________________________________
dense_18 (Dense) (None, 4096) 16781312
_________________________________________________________________
activation_53 (Activation) (None, 4096) 0
_________________________________________________________________
dropout_16 (Dropout) (None, 4096) 0
_________________________________________________________________
dense_19 (Dense) (None, 2) 8194
3、模型搭建與訓練代碼展示
import random
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Dropout
from keras.layers import Activation
from keras.layers import Flatten
from keras.layers import Dense
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import load_model
from keras import backend
from load_data import load_dataset, resize_image, IMAGE_SIZE # 這里load_data是引入上面預處理代碼
class Dataset:
def __init__(self, path_name):
# 訓練集
self.train_images = None
self.train_labels = None
# 驗證集
self.valid_images = None
self.valid_labels = None
# 測試集
self.test_images = None
self.test_labels = None
# 數據集加載路徑
self.path_name = path_name
# 當前庫采取的維度順序
self.input_shape = None
# 加載數據集並按照交叉驗證划分數據集再開始相關的預處理
def load(self, img_rows=IMAGE_SIZE, img_cols=IMAGE_SIZE, img_channels=3, nb_classes=2):
images, labels = load_dataset(self.path_name)
train_images, valid_images, train_labels, valid_lables = train_test_split(images,
labels,
test_size=0.2,
random_state=random.randint(0, 100))
_, test_images, _, test_labels = train_test_split(images,
labels,
test_size=0.3,
random_state=random.randint(0, 100))
# 當前維度順序如果是'th',則輸入圖片數據時的順序為:channels, rows, cols; 否則:rows, cols, channels
# 根據keras庫要求的維度重組訓練數據集
if backend.image_dim_ordering() == 'th':
train_images = train_images.reshape(train_images.shape[0], img_channels, img_rows, img_cols)
valid_images = valid_images.reshape(valid_images.shape[0], img_channels, img_rows, img_cols)
test_images = test_images.reshape(test_images.shape[0], img_channels, img_rows, img_cols)
self.input_shape = (img_channels, img_rows, img_cols)
else:
train_images = train_images.reshape(train_images.shape[0], img_rows, img_cols, img_channels)
valid_images = valid_images.reshape(valid_images.shape[0], img_rows, img_cols, img_channels)
test_images = test_images.reshape(test_images.shape[0], img_rows, img_cols, img_channels)
self.input_shape = (img_rows, img_cols, img_channels)
# 輸出訓練集、驗證集、測試集數量
print(train_images.shape[0], 'train samples')
print(valid_images.shape[0], 'valid samples')
print(test_images.shape[0], 'test samples')
# 使用categorical_crossentropy作為損失,因此需要根據類別數量nb_classes將類別標簽進行one-hot編碼,分類類別為4類,所以轉換后的標簽維數為4
train_labels = np_utils.to_categorical(train_labels, nb_classes)
valid_lables = np_utils.to_categorical(valid_lables, nb_classes)
test_labels = np_utils.to_categorical(test_labels, nb_classes)
# 像素數據浮點化以便進行歸一化
train_images = train_images.astype('float32')
valid_images = valid_images.astype('float32')
test_images = test_images.astype('float32')
# 歸一化
train_images /= 255
valid_images /= 255
test_images /= 255
self.train_images = train_images
self.valid_images = valid_images
self.test_images = test_images
self.train_labels = train_labels
self.valid_labels = valid_lables
self.test_labels = test_labels
"""CNN構建
"""
class CNNModel:
def __init__(self):
self.model = None
# 模型構建
def build_model(self, dataset, nb_classes=2):
# 構建一個空間網絡模型(一個線性堆疊模型)
self.model = Sequential()
self.model.add(Convolution2D(96, 10, 10, input_shape=dataset.input_shape))
self.model.add(Activation('relu'))
self.model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
self.model.add(Convolution2D(256, 5, 5, border_mode='same'))
self.model.add(Activation('relu'))
self.model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
self.model.add(Convolution2D(384, 3, 3, border_mode='same'))
self.model.add(Activation('relu'))
self.model.add(Convolution2D(384, 3, 3, border_mode='same'))
self.model.add(Activation('relu'))
self.model.add(Convolution2D(256, 3, 3, border_mode='same'))
self.model.add(Activation('relu'))
self.model.add(MaxPooling2D(pool_size=(3, 3), strides=2))
self.model.add(Flatten())
self.model.add(Dense(4096))
self.model.add(Activation('relu'))
self.model.add(Dropout(0.5))
self.model.add(Dense(4096))
self.model.add(Activation('relu'))
self.model.add(Dropout(0.5))
self.model.add(Dense(nb_classes))
self.model.add(Activation('softmax'))
# 輸出模型概況
self.model.summary()
def train(self, dataset, batch_size=10, nb_epoch=5, data_augmentation=True):
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.7, nesterov=True) # SGD+momentum的訓練器
self.model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) # 模型配置工作
# 跳過數據提升
if not data_augmentation:
self.model.fit(dataset.train_images,
dataset.train_labels,
batch_size=batch_size,
nb_epoch=nb_epoch,
validation_data=(dataset.valid_images, dataset.valid_labels),
shuffle=True)
# 使用實時數據提升
else:
# 定義數據生成器用於數據提升,其返回一個生成器對象datagen,datagen每被調用一次
# 其順序生成一組數據,節省內存,該數據生成器其實就是python所定義的數據生成器
datagen = ImageDataGenerator(featurewise_center=False, # 是否使輸入數據去中心化(均值為0)
samplewise_center=False, # 是否使輸入數據的每個樣本均值為0
featurewise_std_normalization=False, # 是否數據標准化(輸入數據除以數據集的標准差)
samplewise_std_normalization=False, # 是否將每個樣本數據除以自身的標准差
zca_whitening=False, # 是否對輸入數據施以ZCA白化
rotation_range=20, # 數據提升時圖片隨機轉動的角度(范圍為0~180)
width_shift_range=0.2, # 數據提升時圖片水平偏移的幅度(單位為圖片寬度的占比,0~1之間的浮點數)
height_shift_range=0.2, # 垂直偏移幅度
horizontal_flip=True, # 是否進行隨機水平翻轉
vertical_flip=False # 是否進行隨機垂直翻轉
)
datagen.fit(dataset.train_images)
self.model.fit_generator(datagen.flow(dataset.train_images,
dataset.train_labels,
batch_size=batch_size),
samples_per_epoch=dataset.train_images.shape[0],
nb_epoch=nb_epoch,
validation_data=(dataset.valid_images, dataset.valid_labels)
)
MODEL_PATH = './cascadeface.model.h5'
def save_model(self, file_path=MODEL_PATH):
self.model.save(file_path)
def load_model(self, file_path=MODEL_PATH):
self.model = load_model(file_path)
def evaluate(self, dataset):
score = self.model.evaluate(dataset.test_images, dataset.test_labels, verbose=1)
print('%s: %.2f%%' % (self.model.metrics_names[1], score[1] * 100))
# 識別人臉
def face_predict(self, image):
# 根據后端系統確定維度順序
if backend.image_dim_ordering() == 'th' and image.shape != (1, 3, IMAGE_SIZE, IMAGE_SIZE):
image = resize_image(image) # 尺寸必須與訓練集一致,都為:IMAGE_SIZE * IMAGE_SIZE
image = image.reshape((1, 3, IMAGE_SIZE, IMAGE_SIZE)) # 與模型訓練不同,這里是預測單張圖像
elif backend.image_dim_ordering() == 'tf' and image.shape != (1, IMAGE_SIZE, IMAGE_SIZE, 3):
image = resize_image(image)
image = image.reshape((1, IMAGE_SIZE, IMAGE_SIZE, 3))
# 歸一化
image = image.astype('float32')
image /= 255
# 給出輸入屬於各類別的概率
result = self.model.predict_proba(image)
print('result:', result)
result = self.model.predict_classes(image)
# 返回預測結果
return result[0]
if __name__ == '__main__':
dataset = Dataset('./face/')
dataset.load()
model = CNNModel()
model.build_model(dataset)
# 先前添加的測試build_model()函數的代碼
model.build_model(dataset)
# 測試訓練函數的代碼
model.train(dataset)
if __name__ == '__main__':
dataset = Dataset('./face/')
dataset.load()
model = CNNModel()
model.build_model(dataset)
model.train(dataset)
model.save_model(file_path='./model/cascadeface.model.h5')
if __name__ == '__main__':
dataset = Dataset('./face/')
dataset.load()
# 評估模型
model = CNNModel()
model.load_model(file_path='./model/cascadeface.model.h5')
model.evaluate(dataset)
說明:相關超參數請參考代碼注釋,並請注意代碼中使用到的數據增強方法。
4、訓練結果
說明:因為數據集較小,所以模型最終准確率不如人意在意料之中,如果有機會拿到較好的數據集,可重新訓練一下。
保存的模型文件:
四、模型測試
1、Haar人臉識別:OK,這里我們來說一下Haar分類器的使用,在OpenCV的開源代碼中提供了用於人臉檢測的.xml文件,這些文件封裝了已經通過haar分類器提取好的人臉特征,其路徑是:opencv/sources/data/haarcascades,我的文件所在位置如圖,
我們這里使用到的是靜態圖像檢測,所以將該文件夾下的haarcascade_frontalface_default.xml文件拷貝到項目文件夾下,下面我們將使用該文件完成人臉檢測,詳細的使用方法請參考下面的測試代碼。
2、人臉檢測和識別測試代碼
import cv2
from 人臉檢測與識別 import CNNModel # 引入訓練代碼中的模型對象
if __name__ == '__main__':
# 加載模型
model = CNNModel()
model.load_model(file_path='./cascadeface.model.h5')
# 人臉識別矩形框
color = (0, 255, 0)
# 人臉識別分類器路徑
cascade_path = './haarcascade_frontalface_default.xml'
image = cv2.imread('jiangwen.jpg')
image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# 讀入分類器
cascade = cv2.CascadeClassifier(cascade_path)
# 利用分類器識別人臉區域
faceRects = cascade.detectMultiScale(image_gray, scaleFactor=1.2, minNeighbors=5, minSize=(32, 32))
if len(faceRects) > 0:
for faceRect in faceRects:
x, y, w, h = faceRect
# 截取圖像臉部提交給識別模型識別
img = image[y-10: y + h + 10, x-10: x + w + 10]
faceID = model.face_predict(image)
if faceID == 0:
cv2.rectangle(image, (x - 10, y - 10), (x + w + 10, y + h + 10), color, thickness=2)
cv2.putText(image,
'jiangwen',
(x + 30, y + 30), # 坐標
cv2.FONT_HERSHEY_SIMPLEX, # 字體
1, # 字號
(0, 0, 255), # 顏色
1) # 字的線寬
elif faceID == 1:
cv2.rectangle(image, (x - 10, y - 10), (x + w + 10, y + h + 10), color, thickness=2)
cv2.putText(image,
'zhangziyi',
(x + 30, y + 30),
cv2.FONT_HERSHEY_SIMPLEX,
1,
(0, 0, 255),
1)
else:
pass
cv2.imshow('image', image)
cv2.waitKey(0)
運行結果如下: