Distinctive Image Features from Scale-Invariant Keypoints,這篇論文是圖像識別領域SIFT算法最為經典的一篇論文,導師給布置的第一篇任務就是它。網上找了好多找不到中譯本,那就自己動手豐衣足食吧,順便造福后人,花時間翻譯啃下來並做一個筆記在這吧。
--------------------------------------------------------------------------------------------------------
Distinctive Image Features from Scale-Invariant Keypoints
獨特的尺度無關的圖像特征關鍵點
abstract
摘要
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a a substantial(充實的,有實力的) range of affine(仿射,幾何學) distortion(扭曲,變形), change in 3D viewpoint, addition of noise, and change in illumination.The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
這篇文章展示了一種從圖片中提取有特色的不變特征方法,它可以用來執行對一個物體或者風景不同視角之間的匹配。這些特征對於圖像的伸縮以及旋轉是不變的,而且展示出能對幾何扭曲變形、變換三維視角,增加噪聲,光照改變進行健壯的匹配。這些特征是獨特的,在一幅場景中,一個單個的特征能夠被正確的在很多圖片的大量數據庫中進行高可能性的匹配。這篇文章也提供一種方法來利用這些特征用於物體識別,這一識別通過在已知物體是什么的數據庫中利用快速鄰域法(fast nearest-neighbor algorithm)匹配獨立的特征。緊接着用Hough變換以鑒別對於一個單個物體的類屬,最終對一致姿勢的屬性通過最小方差法(least-squares solution)執行認證。這種方法識別能夠很好的在聚類與閉塞(occlusion )之間識別物體的同時接近實時的表現
【筆記】SIFT這種方法,能夠有效的對物體在不同的視角不同的光照有噪聲的情況下對圖像進行匹配,這種匹配是一幅圖在一堆圖片中的匹配。同時,該文章提供一種方法,通過快速鄰域法匹配特征,用霍夫變換對這些類聚類,再通過最小方差法進行圖像的匹配。
1.Introduction
1.介紹
Image matching is a fundamental aspect of many problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images, stereo correspondence, and motion tracking. This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene. The features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. Large numbers of features can be extracted from typical images with efficient algorithms. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition.
圖像匹配在於計算機視覺中是很多問題的根本問題。包括物體識別、場景識別、從多幅圖像中計算3D結構、立體對應和動作跟蹤。這篇文章描述圖像特征有很多屬性使得他們更加合適於從不同的圖像匹配物體或風景。這些特征對於圖片的縮放以及旋轉是不變的。對於光照以及3D照相機所得到的部分不變。它們能夠很好的在頻率域以及空間域定位,排出了可能的光照,聚類或者噪聲的干擾。大量的特征能夠通過使用合適的算法從典型的圖片中提取出來。除此之外,這些特征是高度有特色的。能夠允許用一個單個的特征在大量特征的數據庫中以很高的正確概論匹配一幅圖像。提供一個物體識別以及場景識別的基礎。
The cost of extracting these features is minimized by taking a cascade filtering approach,in which the more expensive operations are applied only at locations that pass an initial test.Following are the major stages of computation used to generate the set of image features:
采用瀑布濾波器(cascade filtering卷積濾波器?)可以使提取特征的開銷最小化,其中開銷最大運算只在定位跟初始化測試時。接下來生成圖像特征的主要的幾個階段:
1. Scale-space extrema detection: The first stage of computation searches over all scales and image locations. It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.
1.尺度空間極值檢測:第一步運算查找所有尺度和圖片位置,使用差分高斯運算識別潛在的尺度、方向不變的興趣點能夠使得運行更快。
2. Keypoint localization: At each candidate location, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stability.
2.關鍵點定位:對於每一個候選點,一個詳細的模型要適應確定的位置與尺度,基於測量穩定性來確定關鍵點。
3. Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.
3.方向分配,一個或者多個方向為每個關鍵點指定,基於局部圖像梯度指示,所有接下來在圖片數據上對於每個特征的操作的都轉變到相對的指定的方向、尺度、和位置。從而為這些變換提供了不變性。
4. Keypoint descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.
4.關鍵點描述:在選定的尺度上對每一個關鍵點周圍的區域測量局部圖像的梯度。他們都被轉換到了轉換到了一個代表允許特征局部的形狀變形和光線的改變。
This approach has been named the Scale Invariant Feature Transform (SIFT), as it transforms image data into scale-invariant coordinates relative to local eatures.
這個方法命名為尺度不變的特征變換(SIFT),因為他轉換圖片進入了一個尺度不變的坐標對英語局部特征。
An important aspect of this approach is that it generates large numbers of features that densely cover the image over the full range of scales and locations. A typical image of size 500x500 pixels will give rise to about 2000 stable features (although this number depends on both image content and choices for various parameters). The quantity of features is particularly important for object recognition, where the ability to detect small objects in cluttered backgrounds requires that at least 3 features be correctly matched from each object for reliable identification.
對於這個方法一個重要的方面在於這個方法能夠生成大量特征稠密的覆蓋全尺度和位置。一個典型的500x500像素的圖片將產生大約2000穩定的特征(盡管這個數字決定於圖像的內容以及所選擇的屬性)。這些特征的量對於物體識別特別重要,在從雜亂的背景中檢測小物體時,要得到可信的鑒別則至少3個特征與被正確的匹配、
For image matching and recognition, SIFT features are first extracted from a set of reference images and stored in a database. A new image is matched by individually comparing each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors. This paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databases.
對於圖像匹配和識別,SIFT特征是第一個從一組相關圖像提取出來並存儲到數據庫中的。一個新的圖片與之前數據庫中的特征單個的比較每個特征被匹配,基於計算特征向量之間的歐拉距離找出候選的匹配特征。這篇文章會討論快速鄰域可以使得面對大的數據庫時計算快速。
The keypoint descriptors are highly distinctive, which allows a single feature to find its correct match with good probability in a large database of features. However, in a cluttered 2 image, many features from the background will not have any correct match in the database, giving rise to many false matches in addition to the correct ones. The correct matches can be filtered from the full set of matches by identifying subsets of keypoints that agree on the object and its location, scale, and orientation in the new image. The probability that several features will agree on these parameters by chance is much lower than the probability that any individual feature match will be in error. The determination of these consistent clusters can be performed rapidly by using an efficient hash table implementation of the generalized Hough transform.
關鍵點的描述是高度有特色的允許它在大數據庫中用一個特征以較高的可能在數據庫中找到正確的匹配。然而,在兩幅聚類了的圖片中,許多來自背景的特征不能夠與數據庫很好的對應上,會在正確的匹配上增加許多錯誤的匹配。正確的匹配,用鑒定關鍵點子集對應物體與它的位置,尺度,和方向的方法,能從所有匹配集中濾除出來。這樣一些特征點與屬性偶然的匹配錯誤要比單個點的匹配錯誤低很多。確定這些始終如一的聚類在使用有效的霍夫變換實現的哈希表實現能夠快速的表現出來。
Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed verification. First, a least-squared estimate is made for an affine approximation to the object pose. Any other image features consistent with this pose are identified,and outliers are discarded. Finally, a detailed computation is made of the probability that a particular set of features indicates the presence of an object, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence .
對每個聚類的3或多個特征對應的一個物體,它的姿勢受制於更深入的詳細的驗證,首先,最小二乘的估計是用做仿射近似一個物體的姿勢,恆定不變的其他圖片的特征當姿勢鑒別出來,異常的被丟棄,最終,一個詳細的計算是由特征的特定的集合組成的,代表了存在一個物體。給定匹配的准確度以及可能錯誤的匹配。拖過這些測試,物體匹配能夠有足夠的自信能夠成功鑒別。