學習筆記: Top-push Video-based Person Re-identification

本文轉載自查看原文 2017-04-02 17:42 1107

簡介

Top-push Video-based Person Re-identification[1]這篇論文是 CVPR 2016 上關於行人再識別（Person Re-identification）的一篇論文，文中針對圖片序列（視頻）提取 HOG3D 等特征，並提出 TDL(Top-push Distance Learning) 的距離度量學習方法。
TDL 跟近年來的很多方法（如 KISSME[2] ）一樣，也是基於馬氏距離(Mahalanobis distance)進行學習。
馬氏距離是這樣的：

$$ D(\vec{x}_i, \vec{x}_j) = (\vec{x}_i- \vec{x}_j)^\mathrm{T} \mathbf{M} (\vec{x}_i- \vec{x}_j) \tag{1} $$

上式中 $\vec{x}_i$ 和 $\vec{x}_j$ 代表兩個特征向量。當 $\mathbf{M}$ 為單位矩陣時，上式計算的就是歐式距離。文中對該式做了一些簡化/優化，使用 $\mathbf{X}_{i,j}$ 來表示兩個向量的外積：

$$ \mathbf{X}_{i,j} = (\vec{x}_i- \vec{x}_j) (\vec{x}_i- \vec{x}_j)^\mathrm{T} \tag{2} $$

這樣距離就能表示成：

$$ D(\vec{x}_i, \vec{x}_j) = tr(\mathbf{M} \mathbf{X}_{i,j}) \tag{3} $$

TDL 目標

一是最小化類內距離：

$$ min \sum_{\vec{x}_i,\vec{x}_i,y_i=y_j} D(\vec{x}_i, \vec{x}_j) \tag{4} $$

二是使最小類間距離小於類內距離

$$ D(\vec{x}_i, \vec{x}_j) + \rho < \min\limits_{y_k \ne y_i}D(\vec{x}_i, \vec{x}_j),y_i = y_j \tag{5} $$

將上式寫成式(5)的形式：

$$ min \sum_{\vec{x}_i,\vec{x}_i,y_i=y_j} \max\{ D(\vec{x}_i, \vec{x}_j) - \min\limits_{y_k \ne y_i}D(\vec{x}_i, \vec{x}_k) + \rho, 0 \} \tag{6} $$

目標二在具體操作時，是去尋找與$\vec{x}_i$距離最小的類間特征向量，這樣做能減少一些計算量，但效果會不會受到影響就不得而知了。

TDL 損失函數

根據式(4)和式(6)，構建 TDL 的損失函數：

$$ f(\mathbf{M}) = (1-\alpha)\sum_{\vec{x}_i,\vec{x}_i,y_i=y_j}tr(\mathbf{M} \mathbf{X}_{i,j}) + \alpha\sum_{\vec{x}_i,\vec{x}_i,y_i=y_j}\max\{ D(\vec{x}_i, \vec{x}_j) - \min\limits_{y_k \ne y_i}D(\vec{x}_i, \vec{x}_k) + \rho, 0 \} \tag{7} $$

對 $\mathbf{M}$ 求偏導，得到梯度函數：

$$ \mathbf{G}_t = \frac{\partial f}{\mathbf{M}}|_{\mathbf{M}=\mathbf{M}_t} = (1-\alpha)\sum_{i,j} \mathbf{X}_{i,j} + \alpha\sum_{(i,j,k) \in \mathcal{N}(\mathbf{M}_t)}(\mathbf{X}_{i,j}-\mathbf{X}_{i,k}) \tag{8} $$

算法流程

TDL 也使用梯度下降法，通過不斷的迭代更新來優化 $\mathbf{M}$。這是我自己總結的 TDL 算法流程：

初始化：令 $\mathbf{M}$ 為單位矩陣；
迭代：直至收斂或達到最大迭代次數：
1. 尋找最小類內距離 $D(\vec{x}_i, \vec{x}_j)$
2. 尋找與 xi 對應的最小類間距離 $D(\vec{x}_k, \vec{x}_i)$ ，構建 triggered set $\{i,j,k\}$
3. 計算梯度 $\mathbf{G}$
4. 更新 $\mathbf{M}_{t+1} = \mathbf{M}_{t} - \lambda \mathbf{G}_{t}$
5. 令 $\mathbf{M}$ 保持正半定

算法實現

原作者的主頁給出了 Demo，但是關鍵部分是加密的。不過算法不算復雜，自己實現起來也很方便。
迭代中的第5步，我在實現的時候直接用了網上找的一個叫做 nearestPSD 的函數。
原作者為了方便起見，實現的是 single-shot 版本的 TDL，我實現的是 multi-shot，不過速度上不及原版。
（這里更正一下，原作者給的 Demo 應該是 multi-shot 的，只是在這篇文章中，使用了 single-shot 的實驗配置。這一點原文中有說明。2017-07-17）
我用自己實現的 TDL 替換掉了 Demo 中的加密部分，實驗結果和原結果相差不大。

之后我將自己寫的 TDL 放入了 KISSME 作者提供的測試框架里，使用 LFW 數據庫來測試，結果很尷尬……還不如歐式距離。
我又把原版 TDL 放入 KISSME 的 learnPairwise 方法里面，同樣使用 LFW 測試，結果內存開銷巨大（幾十G），跑了一天沒結果，遂放棄。
猜測可能要搭配 TDL 原文中的特征一起使用才能發揮作用。

我的實現的 TDL:
https://github.com/tyusr/CodeImplement/tree/master/TDL

參考文獻

[1] You J, Wu A, Li X, et al. Top-push video-based person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1345-1353.
[2] Koestinger M, Hirzer M, Wohlhart P, et al. Large scale metric learning from equivalence constraints[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 2288-2295.

本作品采用知識共享署名-非商業性使用-相同方式共享 4.0 國際許可協議進行許可。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文筆記之： Person Re-Identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function 論文閱讀筆記（七十）【CVPR2021】：Combined Depth Space based Architecture Search For Person Re-identification 論文閱讀筆記（十）【CVPR2016】：Recurrent Convolutional Network for Video-based Person Re-Identiﬁcation 論文閱讀筆記（三十六）【AAAI2020】：Relation-Guided Spatial Attention and Temporal Reﬁnement for Video-based Person Re-Identiﬁcation Person Re-identification 系列論文筆記（七）：PCB+RPP (2022 IVC 行人再識別綜述)Deep learning-based person re-identification methods: A survey and outlook of recent works Unsupervised Person Re-identification by Soft Multilabel Learning Long-Term Cloth-Changing Person Re-identification Paper Reading: In Defense of the Triplet Loss for Person Re-Identification FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification