MobileFaceNets: Efficient CNNs for Accurate Real- Time Face Verification on Mobile Devices
該論文簡要分析了一下普通的mobile網絡用於人臉檢測的缺點。這些缺點能夠很好地被他們特別設計的MobileFaceNets克服,該網絡是一種為了能夠在手機和嵌入式設備中實現高准確度的實時人臉檢測而進行剪切的一種極其高效的CNN模型類別
該論文的主要貢獻有:
在該人臉特征嵌入CNN網絡的最后一個卷積層(non-global)之后,使用了一個global depthwise卷積層,而不是一個全局平均池化層或哦一個全連接層去輸出一個可區分的特征向量。該選擇的優勢將在理論和實驗中分析
我們設計了一個人臉特征嵌入CNNs網絡,命名為MobileFaceNets,在手機和嵌入設備中極其高效
我們在LFW、AgeDB和MegaFace等數據中進行的實驗說明了在人臉驗證任務中,該MobileFaceNets網絡對比於以前最先進的mobile CNNs網絡在效率上獲得了顯著的優化
The Weakness of Common Mobile Networks for Face Verification
目前用於一般的視覺識別任務的最先進的mobile網絡都會有一個全局平均池化層,如MobileNetV1、ShuffleNet和MobileNetV2
對於人臉檢測和識別任務,一些研究表明不使用全局平均池化層的CNNs網絡的效果比使用的好一些,但是東歐沒有對該現象進行理論說明
這里我們就簡單地以接受域的理論來解釋下這個現象:
一個典型的深度人臉檢測過程包括預處理人臉圖片、通過一個可訓練的深度網絡來抽取人臉特征、最后通過特征的相似度或距離來對比兩張人臉。在預處理步驟使用MTCNN來檢測圖片上的人臉和5個人臉定位點。然后根據這5個定位點,通過簡單的轉換來對齊人臉。對齊后的人臉大小為112*112,在RGB圖像中每個像素都會減去127.5並除以128來進行歸一化。最后一個嵌入人臉特征的CNN網絡就會將一個對齊的人臉映射到一個特征向量,如圖1所示:

在接下來的討論中,為了不失一般性,我們使用MobileNetV2作為嵌入人臉特征的CNN網絡。為了保持輸出feature map的大小與MobileNetV2的輸入為224×224的原始網絡相同(因為我們的輸入是112*112),我們在第一個卷積層中使用stride = 1而不是stride = 2,因為設置stride=2會導致准確率很差。
為了方便起見,在全局平均池化層之前的最后一個卷積層輸出的特征映射將被稱為FMap-end,其空間分辨率為7*7,雖然理論上FMap-end的中心單元和角落的單元對應輸入圖像的接受域大小是一樣的,但是它們對應的是輸入圖像的不同位置
FMap-end的角落單元對應的接受域的中心是在輸入圖像的角落,對應的中心單元對應的接受域的中心是在輸入圖像的中心,如圖1所示。根據[24],一個接受域中心的像素對一個輸出有着更大的影響,對於一個輸出,其對應的一個接受域的重要性分布是符合高斯分布的。
因此FMap-end的角落單元對應的有效接受域是遠遠小於中心單元的。當輸入圖像是一個對齊人臉時,FMap-end的角落單元攜帶着比中心單元更少的信息。因此,當要抽取一個人臉特征向量時,FMap-end的不同單元是有着不同的重要性的。
在MobileNetV2中,展平的FMap-end不適合直接用作人臉特征向量,因為其有着過高的維度(62720)。因此自然地就會選擇使用全局平均池化層(稱為GAPool)的輸出作為人臉特征向量,但是在很多研究中[14,5],該方法會獲得較低的檢測准確率(如表2所示)。這是因為全局平均池化層將FMap-end的所有單元都賦予同樣的重要性,根據上面的分析可知這是不合理的。因此另一個方法就是選擇使用全連接層來將FMap-end投射為一個緊湊的人臉特征向量,但是它將會給整個模型添加大量的參數。即使人臉特征向量只有128維,FMap-end后面的全連接層也會帶來額外的8 million參數給MobileNetV2。因此如果我們將更小的模型大小作為追求,就不會考慮這個選擇
Global Depthwise Convolution
使用global Depthwise Convolution來替換全局平均池化層,該GDConv層就相當於一個depthwise卷積層,kernel大小為輸入大小,padding=0,stride=1。這樣能保證輸出的大小H*W = 1*1,就相當於下面的實現中的conv_6_dw層
輸出的計算公式為:
![]()
F是大小為W*H*M的輸入特征映射,K是大小為W*H*M的depthwise卷積核(只有一個卷積核,因為設置groups=M),輸出的大小為1*1*M,因為對應的每個m通道中只有一個元素Gm(因為H*W=1*1)。(i,j)表示F和K的W*H對應的空間位置,m表示的是通道索引
因此計算cost為:
![]()
模型設計為:

實現,參考https://github.com/TreB1eN/InsightFace_Pytorch:
#coding:utf-8 from torch.nn import Linear, Conv2d, BatchNorm1d, BatchNorm2d, PReLU, ReLU, Sigmoid, Dropout2d, Dropout, AvgPool2d, MaxPool2d, AdaptiveAvgPool2d, Sequential, Module, Parameter class Flatten(Module): def forward(self, input): return input.view(input.size(0), -1) class Conv_block(Module): def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1): super(Conv_block, self).__init__() self.conv = Conv2d(in_c, out_channels=out_c, kernel_size=kernel, groups=groups, stride=stride, padding=padding, bias=False) self.bn = BatchNorm2d(out_c) self.prelu = PReLU(out_c) def forward(self, x): x = self.conv(x) x = self.bn(x) x = self.prelu(x) return x class Linear_block(Module): def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1): super(Linear_block, self).__init__() self.conv = Conv2d(in_c, out_channels=out_c, kernel_size=kernel, groups=groups, stride=stride, padding=padding, bias=False) self.bn = BatchNorm2d(out_c) def forward(self, x): x = self.conv(x) x = self.bn(x) return x class Depth_Wise(Module): def __init__(self, in_c, out_c, residual = False, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=1): super(Depth_Wise, self).__init__() self.conv = Conv_block(in_c, out_c=groups, kernel=(1, 1), padding=(0, 0), stride=(1, 1)) self.conv_dw = Conv_block(groups, groups, groups=groups, kernel=kernel, padding=padding, stride=stride) self.project = Linear_block(groups, out_c, kernel=(1, 1), padding=(0, 0), stride=(1, 1)) self.residual = residual def forward(self, x): if self.residual: short_cut = x x = self.conv(x) x = self.conv_dw(x) x = self.project(x) if self.residual: output = short_cut + x else: output = x return output class Residual(Module): def __init__(self, c, num_block, groups, kernel=(3, 3), stride=(1, 1), padding=(1, 1)): super(Residual, self).__init__() modules = [] for _ in range(num_block): modules.append(Depth_Wise(c, c, residual=True, kernel=kernel, padding=padding, stride=stride, groups=groups)) self.model = Sequential(*modules) def forward(self, x): return self.model(x) class MobileFaceNet(Module): def __init__(self, embedding_size): super(MobileFaceNet, self).__init__() self.conv1 = Conv_block(3, 64, kernel=(3, 3), stride=(2, 2), padding=(1, 1)) self.conv2_dw = Conv_block(64, 64, kernel=(3, 3), stride=(1, 1), padding=(1, 1), groups=64) self.conv_23 = Depth_Wise(64, 64, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=128) self.conv_3 = Residual(64, num_block=4, groups=128, kernel=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv_34 = Depth_Wise(64, 128, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=256) self.conv_4 = Residual(128, num_block=6, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv_45 = Depth_Wise(128, 128, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=512) self.conv_5 = Residual(128, num_block=2, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv_6_sep = Conv_block(128, 512, kernel=(1, 1), stride=(1, 1), padding=(0, 0)) self.conv_6_dw = Linear_block(512, 512, groups=512, kernel=(7,7), stride=(1, 1), padding=(0, 0)) self.conv_6_flatten = Flatten() self.linear = Linear(512, embedding_size, bias=False) self.bn = BatchNorm1d(embedding_size) def forward(self, x): out = self.conv1(x) out = self.conv2_dw(out) out = self.conv_23(out) out = self.conv_3(out) out = self.conv_34(out) out = self.conv_4(out) out = self.conv_45(out) out = self.conv_5(out) out = self.conv_6_sep(out) out = self.conv_6_dw(out) out = self.conv_6_flatten(out) out = self.linear(out) out = self.bn(out) return l2_norm(out)
查看:
if __name__ == '__main__': model = MobileFaceNet(512) # for model in model.modules(): # print(model) for child in model.children(): print(child)
返回:
Conv_block( (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=64) ) Conv_block( (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=64, bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=64) ) Depth_Wise( (conv): Conv_block( (conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (conv_dw): Conv_block( (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (project): Linear_block( (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) Residual( (model): Sequential( (0): Depth_Wise( (conv): Conv_block( (conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (conv_dw): Conv_block( (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (project): Linear_block( (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Depth_Wise( (conv): Conv_block( (conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (conv_dw): Conv_block( (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (project): Linear_block( (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (2): Depth_Wise( (conv): Conv_block( (conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (conv_dw): Conv_block( (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (project): Linear_block( (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (3): Depth_Wise( (conv): Conv_block( (conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (conv_dw): Conv_block( (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=128, bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=128) ) (project): Linear_block( (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) Depth_Wise( (conv): Conv_block( (conv): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) Residual( (model): Sequential( (0): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (2): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (3): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (4): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (5): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=512) ) (conv_dw): Conv_block( (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=512, bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=512) ) (project): Linear_block( (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) Residual( (model): Sequential( (0): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Depth_Wise( (conv): Conv_block( (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (conv_dw): Conv_block( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256, bias=False) (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=256) ) (project): Linear_block( (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) ) Conv_block( (conv): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (prelu): PReLU(num_parameters=512) ) Linear_block( (conv): Conv2d(512, 512, kernel_size=(7, 7), stride=(1, 1), groups=512, bias=False) (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) Flatten() Linear(in_features=512, out_features=512, bias=False) BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
模型:


其中的Residual層就是Inverted residual block
