KCF:High-Speed Tracking with Kernelized Correlation Filters 的翻譯與分析(一)。分享與轉發請注明出處-作者:行於此路


 

 High-Speed Tracking with Kernelized Correlation Filters 的翻譯與分析

基於核相關濾波器的高速目標跟蹤方法,簡稱KCF

  寫在前面,之所以對這篇文章進行精細的閱讀,是因為這篇文章極其重要,在目標跟蹤領域石破天驚的一篇論文,后來在此論文基礎上又相繼出現了很多基於KCF的文章,因此文章好比作大廈的基石,深度學習,長短記憶等框架網絡也可以在KCF上進行增添模塊,並能夠達到較好的效果,因此我將深入學習這篇文章,並在此與大家分享,由於學識有限,難免有些謬誤,還請斧正。

High-Speed Tracking with Kernelized Correlation Filters:

Abstract---The core component of most modern trackers is discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies --any overlapping pixels are constrained to be the same. Based on this simple observation, we can diagonalize it with the Discrete Fourier Transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation Filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it ,we also propose a fast multi-channel extension of linear correlation filters ,via a linear kernel ,which we call Dual Correlation Filter(DCF).Both KCF and DCF outperform top-ranking trackers such as Struck of TLD on a 50videos benchmark, despite running at hundreds of frames-per -second , and being implemented in a few lines of code(Algorithm1). To encourage further developments, our tracking framework was made open-source.

Index Terms——Visual tracking, circulant matrices, discrete Fourier transform, kernel methods, ridge regression, correlation filters.

摘要:當前最前沿的跟蹤器的核心組成是一個具有分辨能力的分類器,負責將目標和其背景環境區分開來。為了應對自然圖像的變化,這個分類器通常采用譯本或者按比例縮小的樣本訓練。這種樣本集充斥着很多冗余,任何重疊像素都被約束為相同的類。基於這種簡單的觀察,我們提出一種分析模型為了成千上萬個translate patches .通過顯示,我們看到得到的結果為數據矩陣,這個矩陣是循環矩陣,我們可以使用離散傅里葉變換將它對角化,通過同時可以降低幾個量級存儲空間和節省大量計算資源。有趣的是,對於線性回歸來說,我們的數學模型等價於一個相關過濾器,使用了一些快速有競爭力的追蹤;對於核回歸來說,我們提出一個新的核相關濾波器,它不同於其他的核算法,它有着線性回歸的計算復雜度,為了建立它,我們又提出一種線性相關濾波器的快速多通道擴展項,通過一個線性的kernel,我們把它稱為Dual correlation filter(DCF)雙卷積濾波器。通過一個50個視頻的標准對比試驗,DCF與KCF勝過了Struck和TLD上的排名top的跟蹤器,並且速度達到了每秒百幀,關鍵是代碼很少,為了鼓勵更多的研發人員,我們對代碼進行了開源。

1Introduction

  Arguable one of the biggest breakthroughs in recent visual tracking research was the widespread adoption of discriminative learning methods. The task of tracking , a crucial component of many computer vision systems ,can be naturally specified as an online learning problem .Given an initial image patch containing the target, the goal is to learn a classifier to discriminate between its appearance and that of discriminate between its appearance and that of the environment. This classifier can be evaluated exhaustively at many locations, in order to detect it in subsequent frames .Of course, each new detection provides a new image patch that can be used to update the model.

介紹:

  在最近目標物體追蹤的科學研究中,一個最大的突破就是廣泛采用的辨別學習方法。目標追蹤任務是許多計算機視覺系統的關鍵組成部分。這項任務可以很自然的歸化為一個在線學習的問題。給定一個初始圖像塊,包含有目標對象,我們的目標就是學習一個分類器,去區分對象和對象周圍的環境,為了在后續的圖像幀中也能檢測到目標target,我們這個分類器能在許多位置盡情的評估檢測。當然,每一個新的檢測都提供一個可以用於更新模型的圖像塊。

圖像塊(image patches)

 

It is tempting to focus on characterizing the object of interest - the positive samples for the classifier. However, a core tenet of discriminative methods is to give as much importance, or more, to the relevant environment - the negative samples. The most commonly used negative samples are image patches from different locations and scales, reflecting the prior knowledge that the classifier will be evaluated under those conditions.

人們很容易將注意力集中在描述感興趣的對象——分類器的正樣本上。然而,判別方法的一個核心原則是對相關的環境——負樣本給予同等或更多的重視。最常用的負樣本是:來自不同位置和尺度的圖像小塊,分類器將在這些條件下進行評估,反映了先驗知識。

  A extremely challenging factor is the virtually(事實上) unlimited amount of negative samples that can be obtained from an image . Due to thee time-sensitive nature of tracking, modern trackers walk a fine line between incorporating as many samples as possible and keeping computational demand low .It is common practice to randomly choose only a few samples each frame34567.

  實際上一個極端的挑戰因素是從一幅圖像中可以提取無數個負樣本。由於目標追蹤本質是時間敏感的,現代追蹤器在 盡可能多的合並樣本 與 保持一個小的計算量 之間進行着一個“走鋼絲的活動”。

一個非常常見的做法就是在每一幀中隨機選擇一些樣本。

  Although the reasons for doing so are understandable we argue that undersampling negatives is the main factor inhibiting performance in tracking .In this paper, we develop tools to analytically incorporate thousands of samples at different relative translations, without iteration over them explicitly. This is made possible by the discovery that, in the Fourier domain, some learning algorithms actually become easier as we add more samples, if we use a specific model for translations.

盡管這樣做的原因可以理解,我們認為對負樣本進行降采樣是抑制降低追蹤器效果的主要因素。在這片文章里,我們開發了一個工具,去合並成百上千個相對平移的樣本,而不需要明顯的迭代。發現表明,在傅里葉域里是可以實現的。事實上,在傅里葉域中如果采用一個特殊的模型進行轉換,,許多學習算法隨着更多樣本的加入變得更簡單。

  These analytical tools, namely circulant matrices, provide a useful bridge between popular learning algorithms and classical signal processing. The implication is that we are able to propose a tracker based on Kernel Ridge Regression that does not suffer from the "curse of kernelization",Instead ,it can be seen as a kernelized version of a linear correlation filter, which forms the basis for the fastest trackers available .We leverage the powerful kernel trick at the same computational complexity as linear correlation filters .Our framework easily incorporates multiple feature channels , and by using a linear kernel we show a fast extension of linear correlation filters to the multichannel case.

  這些分析工具稱之為循環矩陣,給現在流行的一些學習算法和經典的信號處理的算法提供了一個有力的橋梁,它的影響就是我們能夠提出一個基於核的脊回歸算法。這種算法避免了核化 的詛咒,也就是更大的漸近線復雜度。這種算法甚至展現出比非結構化線性回歸更低的復雜度。這種算法可以看作是線性相關濾波器的核化版本,這就為最快的跟蹤器的形成奠定了基礎。我們利用了強有力的核技巧使得計算復雜度跟線性相關濾波器一樣。我們的框架能夠很容易包含多通道特征,通過使用線性核,我們展示線性相關濾波器對多通道情況的快速的拓展。

2 Related Work

2.1 On tracking-by-detection

  A comprehensive review of tracking-by-detection is outside the scope of this article, but we refer the interested reader to two excellent and very recent surveys .The most popular approach is to use a discriminative appearance model .It consists of training a classifier online, inspired by statistical machine learning methods, to predict the presence or absence of the target in an image patch. This classifier is then tested on many candidate patches to find the most likely location. Alternatively, the position can also be predicted directly .Regression with class labels can be seen as classification, so we use the two terms interchangeably.

2 相關工作

2.1 關於檢測的跟蹤

  對於通過檢測跟蹤方法的全面回顧超出了我們本文的視野范圍,但我們給讀者推薦兩個傑出的近期的研究工作。最受歡迎的方法是采用一個具有分辨外觀形狀的模型。受啟發與統計方面的機器學習方法,這種方法包含一個在線訓練分類器,使用者個分類器去預測一個圖像塊兒中是否有所檢測目標。這個分類器將測試更多的候選的圖像塊,找到target的最可能的位置坐標。同時,這個坐標位置也可以直接預測,帶有類標簽的回歸也可以看做分類,因此我們可以交叉使用這兩個術語。

  We will discuss some relevant trackers before focusing on the literature that is more directly related to our analytical methods. Canonical examples of the tracking-by-detection paradigm include those based on Support Vector Machine (SVM) Random Forest classifiers, or boosting variants .All the mentioned algorithms had to be adapted for online learning, in order to be useful of tracking .Zhang et al. propose a projection to a fixed random basis, to train a Naive Bayes classifier, inspired by compressive sensing techniques. Aiming to predict the target's location directly , instead of its presence in a given image patch ,Hare et al. employed a Structured Output SVM and Gaussian kernels, based on a large number of image features. Examples of non-discriminative trackers include the work of Wu who formulate tracking as a sequence of image alignment objectives, and of Sevilla-Lare and Learned-Miller, who propose a strong appearance descriptor based on distribution fields. Another discriminative approach by Kalal uses a set of structural constraints to guide the sampling process of a boosting classifier .Finally, Bolme employ classical signal processing analysis to derive fast correlation filters. We will discuss these last two works in more detail shortly.

  在看下面的文章之前,我們先討論一些跟蹤器相關的工作,這些工作與我們的分析方法有着極強的相關性。有關跟蹤器的權威例子有SVM,隨機森林分類器,或者boosting變體。所有的算法都必須是英在線學習,以便對跟蹤任務有用。張開華等人提出了一種對固定隨機基的投影,來訓練一個基於壓縮感知技術的朴素貝葉斯分類器。Hare等人致力於直接預測出目標的位置,而不是在給定的圖像塊中顯示出來,他們使用了一種基於大量圖像特征的結構化輸出的SVM和高斯核。非分類型跟蹤器的工作方面有有吳**等人,他們將跟蹤描述為圖像目標配准的序列。並且SLLM等人提出一種基於分布野的強大的外觀描述器。另外一種辨別區分方法是K 等人提出來的,使用一組結構化的約束項來引導boosting分類器進行采樣。最后,Bolme等人使用經典的信號處理分析的方法提出了快速核相關濾波器的概念,我們稍后將對最后兩個工作進行詳盡講解。

2.2 On sample translations and correlation filtering

  Recall that our goal is to learn and detect over translated image patches efficiently. Unlike our approach, most attempts so far have focused on trying to weed out irrelevant image patches. On the detection side , it is possible to use branch-and-bound to find the maximum of a classifier's response while avoiding unpromising candidate patches .Unfortunately , in the worst-case the algorithm may still have to iterate over all patches .A related method finds the most similar patches of a pair of images efficiently ,but is not directly translated to our setting .Though it does not preclude an exhaustive search , a notable optimization is to use a fast but inaccurate classifier to select promising patches, and only apply the full , slower classifier on those .

2.2 有關樣本平移和空間相關濾波器:

  回顧我們的目標是學習並高效的檢測平移后的圖像塊,與我們的方法不同,幾乎所有嘗試的方法都是在排除不相關的圖像塊。在檢測方法的方面,可以通過branch & bound 找到分類器的最大響應值,同時可以避免不可能找到object的候選圖像塊兒。不幸的是,算法最差的仍然是要迭代所有的圖像塊,一個相關的方法可以有效的找到最相似的圖像塊,但是不能直接轉換到我們的設置中去。因此仍然要對目標區域進行一個徹底的搜索,但可以采用一個有效的優化算法可以很大的提高速度,首先使用一個快速但准確性一般的分類器將有可能成為包含目標的圖像塊找到,然后對找到的圖像塊使用精度較高的,速度相對較慢的分類器。

  On the training side, Kalal propose using structural constraints to select relevant sample patches from each new image .This approach is relatively expensive, limiting the features that can be used, and requires careful tuning of the structural heuristics. A popular and related method, though it is mainly used in offline detector learning, is hard-negative mining .It consists of running an initial detector on a pool of images, and selecting any wrong detections as samples for re-training .Even though both approaches reduce the number of training samples, a major drawback is that the candidate patches have to be considered exhaustively, by running a detector.

  在訓練方面,Kalal 等人提出使用結構化約束的方法從每一個新的圖像中挑選相關樣本的圖像塊。這種方法相對來說比較費時費力,能使用的特征受到了限制,並且要求非常仔細的微調結構structural探索heuristics。有個常用的相關方法,叫做hard-negative mining (硬負樣本挖掘),它主要用於離線檢測器的學習。它包括在一組圖像上運行一個初始檢測器,並選擇任何錯誤的檢測器作為樣本進行重新訓練。即使兩種方法都降低了訓練樣本的數量,一個 主要的缺點就是需要運行一個檢測器對所有候選的圖像塊進行全面的考慮。

  The initial motivation for our line of research was the recent success of correlation filters in tracking .Correlation filters have proved to be competitive with far more complicated approaches, but using only a fraction of the computational power ,at hundreds of frames-per-second .They take advantage of the fact that the convolution of two patches(loosely ,their dot- product at different relative translations)is equivalent to an element-wise product in the Fourier domain .Thus ,by formulating their objective in the Fourier domain, they can specify the desired output of a linear classifier for several translations, or image shifts, at once.

  我們研究的最初動機就是最近相關(卷積)濾波器在跟蹤方面的成功,相關濾波器算法相對於其他復雜的多的算法來比非常具有競爭力,而且只消耗了一小部分計算量,就能達到每秒數百幀。他們利用了兩個圖像塊的相關卷積的優點,(不嚴格的講,不同相對位置平移圖像像素點的點積)等價於在傅里葉域中元素與元素之間的積。因此,通過轉化他們對象到傅里葉域,可以對幾個平移變換指定一個線性分類器的輸出。

  A Fourier domain approach can be very efficient, and has several decades of research in signal processing to draw from .Unfortunately, it can also be extremely limiting. We would like to simultaneously leverage more recent advances in computer vision, such as more powerful features, large-margin classifiers or kernel methods.

  傅里葉域的方法是非常高效的,可以借鑒幾十年的數字信號處理的經驗,不過,那也是格外的有限。我們,我們也想同時利用計算機視覺領域更多的最新方法,像更有力的特征,更大范圍的分類器,或kernel的方法。

  A few studies go in that direction ,and attempt to apply kernel methods to correlation filters , In these works, a distinction must be drawn between two types of objective functions :those that do not consider the power spectrum or image translations, such as Synthetic Discriminant Function(SDF)filters, and those that do,such as Minimum Average Correlation Energy ,Optimal Trade-Off and Minimum output Sum of Squared Error(MOSSE)filters. Since the spatial structure can effectively be ignored ,the former are easier to kernelize ,and Kernel (SDF)filters have been proposed .However, lacking a clearer relationship between translated images, non-linear kernels and the Fourier domain, applying the kernel trick to other filters has proven much more difficult, with some proposals requiring significantly higher computation times and imposing strong limits on the number of image shift that can be considered.

  在這個方向上有一些研究工作,他們嘗試使用核的方法在相關濾波器上。這些工作中,兩種類型的目標對象函數必須區分開來。一種是不考慮能量譜或者圖像平移的SDF濾波器。另一種是考慮能量譜以及平移的,例如最小平均相關能量法(minimum average correlation energy ),最優化權衡(optimal trade off)平方誤差的最小輸出和濾波器(MOSSE),由於空間結構可以有效的忽略,所以前者更容易實現,也就是Kernel SDF濾波器已提了出來。然而,變換后的圖像,非線性核與傅里葉域之間尚缺乏一個清晰的關系。將核技巧應用到其他濾波器已經被證明更加困難,其中一些提議明顯要求更過的計算時間,並對可考慮的圖像以為數量進行了嚴格限制。

  For us, this hinted that as deeper connection between translated image patches and training algorithms was needed, in order to overcome the limitations of direct Fourier domain formulations.

  對我們來說,這表明需要在變換后的圖像塊和訓練算法之間建立更深層的聯系,以克服直接傅里葉變換的局限性。

 

。。。。。。。。未完待續

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM