[論文筆記]CVPR2017_Joint Detection and Identification Feature Learning for Person Search

本文轉載自查看原文 2019-07-03 13:03 1092 論文閱讀/ person search

Title: Joint Detection and Identification Feature Learning for Person Search;

aXiv上該論文的第一個版本題目是 End-to-End Deep Learning for Person Search

Authors: Tong Xiao^1* ; Shuang Li^1* ; Bochao Wang² ; Liang Lin^2; Xiaogang Wang¹

Affilations: 1.The Chinese University of Hong Kong; 2.Sun Yat-Sen University

第一遍看的時候看的是第一個版本，只簡單地掃了一眼結構圖，覺得就是對faster r-cnn做了小修，而且沒有OIM loss，覺得創新性一般。然后發現好幾篇后來的文章都用了OIM loss，回過頭來再細看文章才發現文章有很多有意思的地方。慚愧！

Motivation

person re-id問題往往是用已經cropped的行人圖像塊進行檢索，判斷query和gallary中的圖像是否是同一個identity。這里面存在幾個問題：

　　①現實中檢索都是直接從原始場景圖像中實現，而不是利用detection之后的cropped image；

　　②很多數據集都是手動標注的框，實際上detector的檢測精度以及是否存在漏檢都會對行人重識別的結果造成影響。

因此，作者提出端到端的person search思想，將detection和re-id問題融在一起。

模型

網絡的輸入是整張圖像；
pedestrian proposal net：輸入經過ResNet-50的第一個部分(conv1-conv4_3)之后輸出1024d的feature maps(大小是原輸入的1/16)；類似於RPN，該feature map先經過一個$512\times3\times3$的卷積，得到的特征每個位置的9個anchors分別送入一個softmax classifier（person/non-person）和linear layer（bbox regression）；bbox經過NMS，得到128個final proposals；
identification net：每個proposal經過ROI pooling得到$1024\times14\times14$的特征，然后送入ResNet-50的第二個部分(conv4_4-conv5_3)，經過一個GAP(global average pooling)得到一個1024維的feature map；這個1024 feature map一分為三：①softmax二分類；②linear regression位置回歸；③映射成一個256維、l2 normalized的子控件，實際上是一個FC層，得到256d的id-feat，inference階段id-feat用來計算consine similarity，training階段用來計算OIM loss。

Online Instance Matching Loss（OIM LOSS）

注意是用所有final proposals的256d id-feat計算OIM loss。

訓練集中有$L$個labeled identities，賦予他們class-id（1到$L$）；也有許多unlabeled identities；還有許多背景和錯誤信息。OIM只考慮前兩種。

做法：

對於labeled identities: 記mini-batch中的一個labeled identity為$x\in\mathbb{R}^D$，$D$是特征維度。線下計算和存儲一個lookup table(LUT)$V\in\mathbb{R}^{D \times L}$，里面存儲着所有labeled identities的id-feat。

前向階段，用$V^Tx$計算mini-batch中的樣本和所有labeled identities之間的余弦相似性。
后向階段，如果目標的class-id是$t$，那么用$v_t \leftarrow \gamma v_t+(1-\gamma)x$更新LUT的第$t$列，其中$r\in[0,1]$不明白為什么這么更新

對於unlabeled identities，由於數量不等，作者用了一個循環隊列來存儲$U\in\mathbb{R}^{D \times Q}$，$Q$是隊列空間大小。同樣用$U^Tx$來計算mini-batch中樣本和隊列中unlabeled identities的余弦相似性。每次循環，將新的feature vector push，pop一個舊的，保證隊列大小不變。
基於上述結構，$x$被認作class-id $i$的概率用softmax函數計算

同樣，被認作第$i$個unlabeled identity的概率是

OIM objective是最大化log似然的期望

求導是

為什么不用softmax loss直接分類?

一是類別太多，而每類的正樣本太少，使得訓練很難
二是無法利用unlabeled identities，因為他們沒有標簽

Dataset

作者提出了新的person search的數據集，包含street view和視頻截圖，即CUHK-SYSY

Evaluation Protocols and Metrics

person search很自然地繼承了detection和re-ID的評價指標，cumulative matching characteristics (CMC top-K) 和mean averaged precision (mAP)。這里要注意和person re-id中這兩個指標的異同。

CMC

原文：a matching is counted if there is at least one of the top-K predicted bounding boxes overlaps with the ground truths with intersection-over-union (IoU) greater or equal to 0.5.

這里相對好理解，對於輸出的bbox，與GT的IoU>0.5的算作candidates，然后和re-id一樣計算top K中是否包含，包含則算做匹配上。對於誤檢或者漏檢不管。

mAP

原文：（MAP）is inspired from the object detection tasks. We follow the ILSVRC object detection criterion [29] to judge the correctness of predicted bounding boxes. An averaged precision (AP) is calculated for each query based on the precision-recall curve, and then we average the APs across all the queries to get the final result.

這個和reid的mAP應該有較大區別；應該是對每個query相當於一類，求detection的AP

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [論文筆記]CVPR2019_Learning Context Graph for Person Search [論文筆記]2014ICPR--Deep Metric Learning for Person Re-Identification Person Re-identification 系列論文筆記（六）：AlignedReID Person Re-identification 系列論文筆記（八）：SPReID 論文閱讀筆記（七十）【CVPR2021】：Combined Depth Space based Architecture Search For Person Re-identification Person Re-identification 系列論文筆記（二）：A Discriminatively Learned CNN Embedding for Person Re-identification CVPR19_Patch-based Discriminative Feature Learning for Unsupervised Person Re-identification 論文閱讀筆記（二十五）【CVPR2020】：Weakly Supervised Discriminative Feature Learning with State Information for Person Identiﬁcation [CVPR2017] Deep Self-Taught Learning for Weakly Supervised Object Localization 論文筆記論文筆記：（CVPR2017）PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation