運動物體檢測論文（1）

本文轉載自查看原文 2018-11-09 21:57 1690 機器視覺/ 文獻閱讀

接下來就是我要介紹的論文

Zhou D, Frémont V, Quost B, et al. Moving Object Detection and Segmentation in Urban Environments from a Moving Platform ☆[J]. Image & Vision Computing, 2017, 68.

這是一篇2017 的論文，發表在HAL，HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientifc research documents

文章摘要：

This paper proposes an effective approach to detect and segment moving objects from two time-consecutive stereo frames, which leverages the uncertainties in camera motion estimation and in disparity computation. First, the relative camera motion and its uncertainty are computed by tracking and matching sparse features in four images（是雙目相機）. Then, the motion likelihood at each pixel is estimated by taking into account the ego-motion uncertainty and disparity in computation procedure. Finally, the motion likelihood, color and depth cues are combined in the graph-cut framework for moving object segmentation. The efficiency of the proposed method is evaluated on the KITTI benchmarking datasets, and our experiments show that the proposed approach is robust against both global (camera motion) and local (optical flow) noise. Moreover, the approach is dense as it applies to all pixels in an image, and even partially occluded moving objects can be detected successfully. Without dedicated tracking strategy, our approach achieves high recall and comparable precision on the KITTI benchmarking sequences.

文章提出了一種基於雙目視覺中時間連續兩幀中檢測和分割出運動物體的有效方法，該方法利用了相機運動估計和視差計算中的不確定性。

首先，通過跟蹤和匹配四個圖像中的稀疏特征來計算相對相機運動及其不確定性。然后，將每個像素處的運動似然考慮到自車運動的不確定性和視察估計中。最后，將運動似然，顏色和深度信息，組合在用於運動對象分割的圖形切割框架中。在KITTI基准數據集上評估了所提方法的效率，並且我們的實驗表明，所提出的方法對全局（相機運動）和局部（光流）噪聲具有魯棒性。此外，該方法是密集的，因為它適用於圖像中的所有像素，並且甚至可以成功地檢測到部分遮擋的移動對象。如果沒有專門的跟蹤策略，我們的方法可以在KITTI基准測試序列上實現高召回率和可比較的精確度。

介紹

Making the vehicles to automatically perceive and understand their 3D environment is a challenging and important task，Due to the improvement of the sensor tech- nologies, processing techniques and researchers’ contributions, several Advanced Driver Assistance Systems (ADASs) have been developed for various purposes such as forward collision warning systems, parking assist systems, blind spot detection systems and adaptive cruise control systems

文中說到科研人員一直以來都在挑戰的一個任務，就是使車輛能夠感知和理解這個3D環境，，當然隨着傳感器技術的不斷進步以及研究者們的貢獻，ADAS有了很大的進展，並舉例有碰撞報警，泊車輔助，盲區檢測，以及自適應巡航系統。

當前更為流行的比如SLAM和SFM系統都很好的應用在ADAS系統以及自動駕駛中，比如比較常用且流行的ORB-SLAM

R. Mur-Artal, J. Montiel, and J. D. Tardos, \Orb-slam: a versatile and accu-rate monocular slam system," Robotics, IEEE Transactions on, vol. 31, no. 5,600 pp. 1147{1163, 2015.

但是呢，這些系統都假設是靜態的環境，他們必須要面對一些復雜的城市環境和動態的物體，因此，有效且有效地檢測移動物體對於這種系統的准確性來說是一個至關重要的問題。

moving objects are considered as outliers and RANSAC strategy is applied to get rid of them efficiently. However, this strategy will fail when the moving objects are the dominant part of the image. Thus, efficiently and effectively detecting moving objects turns out to be a crucial issue for the accuracy of such systems.

In this article, we focus on the specific problem of moving object detection. We propose a detection and segmentation system based on two time-consecutive stereo images. The key idea is to detect the moving pixels by compensating the image changes caused by the global camera motion. The uncertainty of the camera motion is also considered to obtain reliable detection results. Furthermore, color and depth information is also employed to remove some false detection

此文章重點解決移動對象檢測的具體問題。提出了一種基於時間連續立體圖像的兩幀圖像移動物體的檢測和分割系統。關鍵思想是通過補償由全局相機運動引起的圖像變化來檢測運動像素。攝像機運動的不確定性也被認為是獲得可靠的檢測結果。此外，還使用顏色和深度信息來消除一些錯誤檢測！！！（什么是通過補償相機的全局運動引起的圖像變換來檢測相機運動）

移動物體檢測一直以來都是研究的熱點，其中背景減除法是最常用的一種物體檢測方法。說了一些單目視覺上的移動物體檢測方法，主要還是上面介紹的那些方法。

但是本文使用的雙目，相比於單目攝像頭，雙目（stereo vision system SVS）提供了深度信息和視差信息。

Dense or sparse depth/disparity maps computed by global [10] or semi-global [11] matching approaches can be used to build 3D information on the environment. Theoretically, by obtaining the 3D information, any kind of motion can be detected, even the case of degenerate motion mentioned above. In [12], 3D point clouds are reconstructed from linear stereo vision systems first and then objects are detected based on a spectral clustering technique from the 3D points. Common used methods for Moving Object Detection (MOD) in stereo rig can be divided into sparse feature based [13, 14] and dense scene flow-based approaches [15, 16, 17]

通過全局[10]或半全局[11]匹配方法計算的密集或稀疏深度/視差圖可用於重構環境的3D信息。理論上，通過獲得3D信息，即使是在自車運動退化的情況，也可以檢測任何類型的運動。在[12]中，首先從線性立體視覺系統重建3D點雲，然后基於來自3D點的光譜聚類技術檢測物體。在立體相機中用於運動物體檢測（MOD）的常用方法可以分為基於稀疏特征的[13,14]和基於密集場景流的方法[15,16,17]。

[10]L. Wang and R. Yang, \Global stereo matching leveraged by sparse ground control points," in Computer Vision and Pattern Recognition (CVPR), Conference on, pp. 3033{3040, IEEE, 2011.

[11] H. Hirschmuller, \Accurate and efficient stereo processing by semi-global matching and mutual information," in Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 2, pp. 807{814, 2005.

[12] S. Moqqaddem, Y. Ruichek, R. Touahni, and A. Sbihi, \Objects detection and tracking using points cloud reconstructed from linear stereo vision," Current Advancements in Stereo Vision, p. 161, 2012.

[13] B. Kitt, B. Ranft, and H. Lategahn, \Detection and tracking of independently moving objects in urban environments," in Intelligent Transportation Systems, 13th International IEEE Conference on, pp. 1396{1401, IEEE, 2010.

[14] P. Lenz, J. Ziegler, A. Geiger, and M. Roser, \Sparse scene flow segmentation for moving object detection in urban environments," in Intelligent Vehicles Symposium (IV),IEEE, pp. 926{932, 2011.

[15] A. Talukder and L. Matthies, \Real-time detection of moving objects from moving vehicles using dense stereo and optical flow," in Intelligent Robots and Systems, Proceedings. International Conference on, vol. 4, pp. 3718{3725, IEEE, 2004.

[16] V. Romero-Cano and J. I. Nieto, \Stereo-based motion detection and tracking from a moving platform," in Intelligent Vehicles Symposium, IEEE, pp. 499 IEEE, 2013.

[17] C. Rabe, T. M¨uller, A. Wedel, and U. Franke, \Dense, robust, and accurate motion field estimation from stereo image sequences in real-time," in European conference on computer vision, pp. 582{595, Springer, 2010

當在移動物體對象上檢測到很少的特征時，基於稀疏特征的方法就會失敗。此時，可以使用基於密集光流的方法。在[15]中，基於當前場景深度和自我運動，預測和計算兩個連續幀之間的光流。從預測的光流場和測量得到的光流場之間的差異，較大的非零區域被分類為潛在的移動物體。盡管該運動檢測方案提供了密集的結果，但是由於感知任務中涉及的噪聲，系統可能易於產生大量的誤檢測。通過考慮3D場景流[18]或2D真實光流[16]的不確定性，已經開發了其他改進方法[18]和[16]來限制誤檢測。然而，這些方法粗略地模擬了從其他傳感器比如 (GPS or IMU)獲得的自我運動的不確定性。

使用來自單目相機的對極幾何結構不能在其運動退化時檢測移動物體。（退化的解釋：3D點沿着由兩個相機中心和點本身形成的極線平面移動，而其2D投影沿着極線移動。）

假設已經標定好的雙目相機，We denote b as the calibrated baseline for the stereo head.

Additionally, the left and right rectified images have identical focal length f and principal point coordinates as p0 = (u0; v0)T.

下圖給出了兩幀連續的從t-1到t時刻的雙目相機圖像。假設世界坐標系的原點在時間t-1與左攝像機的局部坐標系重合。

the X-axis points to the right and the Y -axis points downwards （X軸方向向右，Y軸向下）

在t-1時刻，從靜態背景點提取的像素的位置是

在t時刻獲取的位置是

其中K是相機的內在參數矩陣，R，tr是相對相機旋轉和平移（姿勢），Zt-1代表t-1中幀中3D點X的深度。

為了檢測圖像中的運動物體，一個直截了當的想法是通過方程式補償相機運動。根據公式

（1）

然后，殘差圖像被計算為在運動中補償的當前和先前的差值，突出顯示屬於運動對象的像素和與運動誤差估計有關的像素。為了清楚起見，我們首先定義三種不同的基於流的表達式：

全局圖像運動光流（ Global Image Motion Flow GIMF）表示僅由相機運動引起的預測圖像變化，可以使用等式（1）計算

測量光流（ Measured Optical Flow MOF）表示使用圖像處理技術估計的實際密集光流[23]。

C. Liu, Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis, Massachusetts Institute of Technology, 2009.

殘余圖像運動光流（ Residual Image Motion Flow RIMF）用於測量MOF和GIMF之間的差異

RIMF可用於區分該像素是否和移動和非移動物體相關的像素。為了計算RIMF，應首先計算MOF和GIMF。注意計算后者需要關於相機運動（自我運動）和像素深度值的信息。文中沒有說明計算密集光流[23]和視差圖[24]的問題：

[23] C. Liu, Beyond pixels: exploring new representations and applications for motion analysis. PhD thesis, Massachusetts Institute of Technology, 2009.

[24] A. Geiger, M. Roser, and R. Urtasun, \Efficient large-scale stereo matching,"in Asian Conference on Computer Vision, pp. 25{38, Springer, 2010

[25]C. Vogel, K. Schindler, and S. Roth, \3d scene flow estimation with a piecewise rigid scene model," International Journal of Computer Vision, vol. 115, no. 1, pp. 1{28, 2015.

更確切地說，我們利用[25]中提出的方法來計算密集光流和密集視差圖。然后我們直接將它們用作我們系統的輸入。整個系統可以通過以下三個步驟進行總結：

1. Moving Pixel Detection 移動像素檢測。在該步驟中，通過補償由相機運動引起的圖像變化來檢測運動像素。為了改善檢測結果，考慮了相機運動的不確定性。

2. Moving object segmentation移動對象分割。在移動像素檢測之后，使用基於圖形切割的算法通過考慮顏色和視差信息來移除錯誤檢測。

3. Bounding box generation.邊界框生成。最后，通過使用UV視差圖分析為每個移動物體生成邊界框

圖1 雙目視覺下的坐標系

圖2 Framework of the moving object detection and segmentation system

紅色部分用於計算每個像素的運動似然。

藍色部分是基於圖形切割的運動對象分割。

綠色部分是為每個移動對象生成邊界框的后處理。

首先介紹Moving Pixel Detection 移動像素檢測

從圖1雙目連續兩幀的四個圖像來看，在t-1時刻和t時刻的圖像，在t-1時刻左圖像I_(t-1,L)被當做是參考圖像，以下是定義

接着是自車運動估計和不確定性計算

給定兩個連續幀的四個圖像中的一組對應點，可以通過使用非線性最小化方法最小化重投影誤差的總和來估計相機的相對姿態。

首先，重建前一幀的3D特征點。通過三角測量和使用相機內在參數。然后使用如下的相機運動將這些3D點重新投影到當前幀的圖像上、

（2）

其中

是通過前一幀圖像重構后的3D點計算出的當前幀圖像上的像素點。

是前一幀圖像上的像素點。

該向量是代表了六個自由度的相對位姿（是兩個幀上同一點的相對位姿）

P rl 和 P rr 是3D點投影到左右相機上的像素的坐標(non-homogeneous coordinates)

通常，可以通過最小化測量和預測的加權平方誤差來獲得最佳相機運動矢量Θ^ ，公式如下：

（3）

是使用跟蹤和匹配策略的當前幀中的匹配點

代表根據協方差矩陣Σ的曼哈頓距離

所以我們根據以上這些可知，最優估計的運動向量 Θ^可以根據（3）公式求得，但結果是依賴於圖像之間的匹配和跟蹤的精度的。

（4）

由於篇幅限制，所以接下來的內容就請查看《運動i物體檢測論文（2）》，那么可以根據這個框架圖可以總結一下文章的思想，在雙目視覺中，由於我們可以根據雙目相機求得特征點對應的深度信息，所以我們使用上述的公以求得上一幀圖像中的特征點，在當前幀圖像上的位置，，那么根據該點的位移值即是我們上文中說到的全局圖像運動光流（ Global Image Motion Flow GIMF）之后再利用KLT光流法求得我們的測量光流（測量光流（ Measured Optical Flow MOF）），那么這兩個光流值對於靜態物體而言，這兩個值是相等，而對於動態移動物體是有誤差的這也就是我們上文中說到的殘余圖像運動光流（ Residual Image Motion Flow RIMF），主要思想就是通過這個誤差來判斷該特征點是屬於靜態特征點還是動態特征點，當然文中還是使用一些其他方法來提高檢測的精度，但是主要的思想就是如此。接下來的文章是文中關於其他一些技術上的說明。

有興趣的小伙伴可以關注微信公眾號，加入QQ或者微信群，和大家一起交流分享吧（該群主要是與點雲三維視覺相關的交流分享群，歡迎大家加入並分享）

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 運動物體檢測論文（2） opencv學習之路（37）、運動物體檢測（二） opencv學習之路（36）、運動物體檢測（一）運動物體檢測——光流法（攝像機固定）利用opencv進行移動物體檢測 opencv運動物體識別相機曝光時間與運動物體拖影關系基於opencv3實現運動物體識別 Cocos Creator繞圓做圓周運動，且變換運動物體的角度 Unity 剛體運動物體抖動解決方法（撞牆抖動）