1. 緒言
圖像拼接算是傳統計算機視覺領域集大成者的一個方向,涉及的步驟主要有:特征點提取、特征匹配、圖像配准、圖像融合等。如下圖1.1 是opencv圖像拼接的流程圖,圖像拼接方向涉及的研究方向眾多,如特征提取方向就有常用的SIFT、SURF、ORB等,這些特征提取方法在slam方向也有非常廣的應用,所以有余力的話弄清楚這些實現細節,對建立自身的知識體系還是非常有必要的。

2. opencv stitcher
opencv當中有直接封裝的拼接類 Stitcher,基本是調用一個接口就可以完成所有拼接步驟,得到拼接圖像。測試用例圖片參考。
2.1 示例代碼
下面是調用接口的示例代碼:
#include "opencv2/opencv.hpp"
#include "logging.hpp"
#include <string>
void stitchImg(const std::vector<cv::Mat>& imgs, cv::Mat& pano)
{
//設置拼接圖像 warp 模式,有PANORAMA與SCANS兩種模式
//panorama: 圖像會投影到球面或者柱面進行拼接
//scans: 默認沒有光照補償與柱面投影,直接經過仿射變換進行拼接
cv::Stitcher::Mode mode = cv::Stitcher::PANORAMA;
cv::Ptr<cv::Stitcher> stitcher = cv::Stitcher::create(mode);
cv::Stitcher::Status status = stitcher->stitch(imgs, pano);
if(cv::Stitcher::OK != status){
LOG(INFO) << "failed to stitch images, err code: " << (int)status;
}
}
int main(int argc, char* argv[])
{
std::string pic_path = "data/img/*";
std::string pic_pattern = ".jpg";
if(2 == argc){
pic_path = std::string(argv[1]);
}else if(3 == argc){
pic_path = std::string(argv[1]);
pic_pattern = std::string(argv[2]);
}else{
LOG(INFO) << "default value";
}
std::vector<cv::String> img_names;
std::vector<cv::Mat> imgs;
pic_pattern = pic_path + pic_pattern;
cv::glob(pic_pattern, img_names);
if(img_names.empty()){
LOG(INFO) << "no images";
return -1;
}
for(size_t i = 0; i < img_names.size(); ++i){
cv::Mat img = cv::imread(img_names[i]);
imgs.push_back(img.clone());
}
cv::Mat pano;
stitchImg(imgs, pano);
if(!pano.empty()){
cv::imshow("pano", pano);
cv::waitKey(0);
}
return 0;
}
2.2 示例效果
-
mode = panorama

CMU場景拼接 1 -
mode=scans

CMU場景拼接 2
上面的兩組CMU場景對比圖說明了PANORAMA與SCANS的區別,前者會將圖像進行柱面投影,得到的全景圖會有彎曲的現象,而SCANS只有仿射變換,所以拼接圖基本都保留了原圖的直線平行關系。
3. 簡化的拼接
這一節准備挖一些坑。在看opencv stitcher里面的細節時,先簡單模仿實現一下scans模式的拼接,看看拼接的效果。基本思路是:
- 特征提取與匹配,找到圖像間的匹配關系;
- 估算圖像的變換矩陣,以便圖像對齊;選取十個匹配程度最高的特征點,繪制這十個特征點,找到正確匹配的三個點估算仿射變換矩陣;
- 設置一個畫布,寬度是所有圖像的寬度之和,高度為所有圖像高度的最大值,默認值為0
- 將匹配程度最高的點投影到畫布上,作為左右拼接圖像的中心
- 以右邊的圖像為參考圖像,即將左邊的圖像進行變換然后與右邊的圖像進行融合
3.1 特征提取
常用的特征提取主要有SIFT 、SURF、ORB,ORB速度較快,再其他視覺任務中用的也比較多,但是精度沒有前兩者高。
void featureExtract(const std::vector<cv::Mat> &imgs,
std::vector<std::vector<cv::KeyPoint>> &keyPoints,
std::vector<cv::Mat> &imageDescs)
{
keyPoints.clear();
imageDescs.clear();
//提取特征點
int minHessian = 800;
cv::Ptr<cv::ORB> orbDetector = cv::ORB::create(minHessian);
for (int i = 0; i < imgs.size(); ++i) {
std::vector<cv::KeyPoint> keyPoint;
//灰度圖轉換
cv::Mat image;
cvtColor(imgs[i], image, cv::COLOR_BGR2GRAY);
orbDetector->detect(image, keyPoint);
keyPoints.push_back(keyPoint);
cv::Mat imageDesc1;
orbDetector->compute(image, keyPoint, imageDesc1);
/*需要將imageDesc轉成浮點型,不然會出錯
**Unsupported format or combination of formats
**in buildIndex using FLANN algorithm
*/
imageDesc1.convertTo(imageDesc1, CV_32F);
imageDescs.push_back(imageDesc1.clone());
}
}
3.2 特征匹配
這一步根據圖像的特征點確定圖像之間特征點的配對關系,從而求取變換矩陣H 。此H是對整幅圖像進行的變換,現在為了解決一些視差問題,有人在圖像上划分網格,然后對每個網格單獨計算變換矩陣H。
void featureMatching(const std::vector<cv::Mat> &imgs,
const std::vector<std::vector<cv::KeyPoint>> &keyPoints,
const std::vector<cv::Mat> &imageDescs,
std::vector<std::vector<cv::Point2f>> &optimalMatchePoint)
{
optimalMatchePoint.clear();
//獲得匹配特征點,並提取最優配對,此處假設是順序輸入,測試使用假設是兩張圖
cv::FlannBasedMatcher matcher;
std::vector<cv::DMatch> matchePoints;
matcher.match(imageDescs[0], imageDescs[1], matchePoints, cv::Mat());
sort(matchePoints.begin(), matchePoints.end());//特征點排序
//獲取排在前N個的最優匹配特征點
std::vector<cv::Point2f> imagePoints1, imagePoints2;
for (int i = 0; i < MAX_OPTIMAL_POINT_NUM; i++) {
imagePoints1.push_back(keyPoints[0][matchePoints[i].queryIdx].pt);
imagePoints2.push_back(keyPoints[1][matchePoints[i].trainIdx].pt);
}
optimalMatchePoint.push_back(std::vector<cv::Point2f>{
imagePoints1[0], imagePoints1[3], imagePoints1[6]});
optimalMatchePoint.push_back(std::vector<cv::Point2f>{
imagePoints2[0], imagePoints2[3], imagePoints2[6]});
}
使用orb特征提取的時候,這里有很多誤匹配的點,上面三個點是根據顯示出來匹配正確的點,將用來估算仿射變換矩陣H。opencv 內部處理是使用 RANSAC 算法進行估計的,此處我省略了這個步驟。
3.3 估算仿射變換矩陣
上一步得到了最強匹配的三個點,這一步可以直接計算得到H。在計算之前,先將右邊的圖像移到畫布的右邊
void getAffineMat(std::vector<std::vector<cv::Point2f>>& optimalMatchePoint,
int left_cols, std::vector<cv::Mat>& Hs)
{
std::vector<cv::Point2f> newMatchingPt;
for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
cv::Point2f pt = optimalMatchePoint[1][i];
pt.x += left_cols;
newMatchingPt.push_back(pt);
}
//左邊圖像的變換矩陣,右圖的特征點經過移動,左圖需要變換到畫布上右圖的特征點位置
cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
//右邊圖像的變換矩陣,即將右圖移到畫布右側
cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);
Hs.push_back(homo1);
Hs.push_back(homo2);
}
3.4 拼接圖像
確定了變換矩陣以后,取最強響應的特征點作為兩幅圖像的融合中心,中心左右兩邊分別對應各自兩幅圖像。這種拼接處理方式非常粗暴,對於只有平移變化拍攝的圖像,尚且還能拼接到一起,但是若加上旋轉或者拍攝時光心不對起的情況,拼接錯位非常嚴重。另外一點是圖像融合,此處直接選用一條分界線作為選取原圖像素的依據,過渡不夠平滑,也會有錯位。
void getPano2(std::vector<cv::Mat> &imgs, const std::vector<cv::Mat> &H,
cv::Point2f &optimalPt, cv::Mat &pano)
{
//以右邊圖像為參考,將left的圖像經過仿射變換變到與右邊圖像重合,取最強響應特征點作為兩幅圖像融合的中心
//默認的全景圖畫布尺寸為:
// width=left.width + right.width,
// height = std::max(left.height, right.height)
int pano_width = imgs[0].cols + imgs[1].cols;
int pano_height = std::max(imgs[0].rows, imgs[1].rows);
pano = cv::Mat::zeros(cv::Size(pano_width, pano_height), CV_8UC3);
cv::Mat img_trans0, img_trans1;
img_trans0 = cv::Mat::zeros(pano.size(), CV_8UC3);
img_trans1 = cv::Mat::zeros(pano.size(), CV_8UC3);
//原圖經過仿射變化后已經位於全景圖對應的位置
cv::warpAffine(imgs[0], img_trans0, H[0], pano.size());
cv::warpAffine(imgs[1], img_trans1, H[1], pano.size());
//最強響應特征點
cv::Mat trans_pt = (cv::Mat_<double>(3, 1) << optimalPt.x, optimalPt.y, 1.0f);
//最強響應特征點在畫布上的位置
trans_pt = H[0]*trans_pt;
//確定兩幅圖像需要選取的區域
cv::Rect left_roi = cv::Rect(0, 0, trans_pt.at<double>(0, 0), pano_height);
cv::Rect right_roi = cv::Rect(trans_pt.at<double>(0, 0), 0,
pano_width - trans_pt.at<double>(0, 0) + 1, pano_height);
//將選取的區域像素復制到畫布上
img_trans0(left_roi).copyTo(pano(left_roi));
img_trans1(right_roi).copyTo(pano(right_roi));
cv::imshow("pano", pano);
cv::waitKey(0);
}
int main(int argc, char *argv[])
{
cv::Mat image01 = cv::imread("data/img/medium11.jpg");
cv::resize(image01, image01, cv::Size(image01.cols, image01.rows + 1));
cv::Mat image02 = cv::imread("data/img/medium12.jpg");
cv::resize(image02, image02, cv::Size(image02.cols, image02.rows + 1));
std::vector<cv::Mat> imgs = {image01, image02};
std::vector<std::vector<cv::KeyPoint>> keyPoints;
std::vector<std::vector<cv::Point2f>> optimalMatchePoint;
std::vector<cv::Mat> imageDescs;
featureExtract(imgs, keyPoints, imageDescs);
featureMatching(imgs, keyPoints, imageDescs, optimalMatchePoint);
std::vector<cv::Point2f> newMatchingPt;
for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
cv::Point2f pt = optimalMatchePoint[1][i];
pt.x += imgs[0].cols;
newMatchingPt.push_back(pt);
}
cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);
std::vector<cv::Mat> Hs = {homo1, homo2};
cv::Mat pano;
//getPano1(imgs, Hs, pano);
getPano2(imgs, Hs, optimalMatchePoint[0][0], pano);
return 0;
}
3.5 簡化拼接效果
- 只有平移變化的圖像拼接效果

- 有旋轉變化的圖像拼接

算不上啥效果吧,圖3.5.2可以清晰的看到錯位,而且整個拼接圖左側有明顯的傾斜,左側紅框為左圖區域,中間繪制的紅線表示左右圖分界線。錯位有多方面原因,沒有好的融合過渡算法,沒有考慮到相機的旋轉變化,拼接縫位置找的不好。畫面有傾斜,不夠自然,則是單一選擇某張圖片作為參考圖片,將其它圖像變換到其所在坐標系導致。
4. opencv stitcher 模塊
opencv在示例代碼中有提供 stitching_detailed.cpp 示例,里面包含了各個模塊的實現步驟。我們在實際使用的時候一般都是要求實時拼接,直接調接口基本是沒法達到這個要求的,特別是在arm嵌入式端,這就需要我們弄清楚實現細節找到優化點。我這里只對 stitching_detailed.cpp 中的部分細節感興趣,所以將耗時統計、縮放選找融合區域這些都去掉了。
4.1 參數預覽
opencv的stitching_detailed.cpp中有非常多的配置參數, 由圖1.1 opencv 拼接流程圖可知,opencv stitcher中的主要步驟有:
- registration
- 特征提取
- 特征匹配
- 圖像配准
- 相機內參估算
- 波形矯正
- compositing
- 圖像變換
- 光照補償
- 查找拼接縫
- 圖像融合
registration部分主要是用來獲取圖像間的匹配關系,估算相機的內外參,並使用BA算法對參數進行優化,此模塊主要是對圖像的拼接順序和變換矩陣估算。compositing部分則是在獲取到參數以后進行圖像變換、融合,並使用光照補償等算法進行畫面一致性的改善。參數預覽如下:
static void printUsage(char** argv)
{
cout <<
"Rotation model images stitcher.\n\n"
<< argv[0] << " img1 img2 [...imgN] [flags]\n\n"
"Flags:\n"
" --preview\n"
" Run stitching in the preview mode. Works faster than usual mode,\n"
" but output image will have lower resolution.\n"
" --try_cuda (yes|no)\n"
" Try to use CUDA. The default value is 'no'. All default values\n"
" are for CPU mode.\n"
"\nMotion Estimation Flags:\n"
" --work_megapix <float>\n"
" Resolution for image registration step. The default is 0.6 Mpx.\n"
" --features (surf|orb|sift|akaze)\n"
" Type of features used for images matching.\n"
" The default is surf if available, orb otherwise.\n"
" --matcher (homography|affine)\n"
" Matcher used for pairwise image matching.\n"
" --estimator (homography|affine)\n"
" Type of estimator used for transformation estimation.\n"
" --match_conf <float>\n"
" Confidence for feature matching step. The default is 0.65 for surf and 0.3 for orb.\n"
" --conf_thresh <float>\n"
" Threshold for two images are from the same panorama confidence.\n"
" The default is 1.0.\n"
" --ba (no|reproj|ray|affine)\n"
" Bundle adjustment cost function. The default is ray.\n"
" --ba_refine_mask (mask)\n"
" Set refinement mask for bundle adjustment. It looks like 'x_xxx',\n"
" where 'x' means refine respective parameter and '_' means don't\n"
" refine one, and has the following format:\n"
" <fx><skew><ppx><aspect><ppy>. The default mask is 'xxxxx'. If bundle\n"
" adjustment doesn't support estimation of selected parameter then\n"
" the respective flag is ignored.\n"
" --wave_correct (no|horiz|vert)\n"
" Perform wave effect correction. The default is 'horiz'.\n"
" --save_graph <file_name>\n"
" Save matches graph represented in DOT language to <file_name> file.\n"
" Labels description: Nm is number of matches, Ni is number of inliers,\n"
" C is confidence.\n"
"\nCompositing Flags:\n"
" --warp (affine|plane|cylindrical|spherical|fisheye|stereographic|"
" compressedPlaneA2B1|compressedPlaneA1.5B1|compressedPlanePortraitA2B1|"
" compressedPlanePortraitA1.5B1|paniniA2B1|paniniA1.5B1|paniniPortraitA2B1|"
" paniniPortraitA1.5B1|mercator|transverseMercator)\n"
" Warp surface type. The default is 'spherical'.\n"
" --seam_megapix <float>\n"
" Resolution for seam estimation step. The default is 0.1 Mpx.\n"
" --seam (no|voronoi|gc_color|gc_colorgrad)\n"
" Seam estimation method. The default is 'gc_color'.\n"
" --compose_megapix <float>\n"
" Resolution for compositing step. Use -1 for original resolution.\n"
" The default is -1.\n"
" --expos_comp (no|gain|gain_blocks|channels|channels_blocks)\n"
" Exposure compensation method. The default is 'gain_blocks'.\n"
" --expos_comp_nr_feeds <int>\n"
" Number of exposure compensation feed. The default is 1.\n"
" --expos_comp_nr_filtering <int>\n"
" Number of filtering iterations of the exposure compensation gains.\n"
" Only used when using a block exposure compensation method.\n"
" The default is 2.\n"
" --expos_comp_block_size <int>\n"
" BLock size in pixels used by the exposure compensator.\n"
" Only used when using a block exposure compensation method.\n"
" The default is 32.\n"
" --blend (no|feather|multiband)\n"
" Blending method. The default is 'multiband'.\n"
" --blend_strength <float>\n"
" Blending strength from [0,100] range. The default is 5.\n"
" --output <result_img>\n"
" The default is 'result.jpg'.\n"
" --timelapse (as_is|crop) \n"
" Output warped images separately as frames of a time lapse movie, "
" with 'fixed_' prepended to input file names.\n"
" --rangewidth <int>\n"
" uses range_width to limit number of images to match with.\n";
}
4.2 Motion Estimation Flags 參數含義
- work_megapix :在特征提取等 registration過程中,為了減小耗時,會將圖像進行縮放,這就需要一個縮放比例;
- features : 表示選用的提取的特征,(SURF|ORB|SIFT|akaze)
- matcher : 特征匹配方法,(homography | affine),單應性變換與仿射變換方法,分別對應BestOf2NearestMatcher、AffineBestOf2NearestMatcher,后者會找到兩幅圖仿射變換的最佳匹配點;
- estimator : (homography | affine),相機參數評估方法;
- match_conf : 浮點型數據,表示匹配階段內點判斷的閾值;
- conf_thresh : 兩幅圖片是來自同一全景的閾值:
- ba : BA優化相機參數的代價函數,(no|reproj|ray|affine);
- ba_refine_mask : BA優化的時候,可以固定某些參數不動,通過指定mask實現。'x'表示需要優化,'_'表示固定參數,對應的順序是fx,skew,ppx,aspect,ppy;
- wave_correct : 波形矯正標志,有(no|horiz|vert)三種類型,可以將拼接圖像約束在水平方向,或者垂直方向,避免出現“大鵬展翅”的情況;

- save_graph : 以DOT語言格式保存圖像之間的匹配關系;
4.3 Compositing Flags 參數含義
- warp : 圖像變換方法,包括球面投影、柱面投影等,opencv支持的投影方法比較多;
- seam_megapix : 尋找拼接縫的時候,會將圖像進行縮放,此參數與 work_scale 可以用來控制縮放比例;
- seam : 接縫尋找的方法;
- compose_megapix : 預覽時用於設置拼接過程中以及拼接圖的分辨率;
- expos_comp : 光照補償方法;
- blend : 圖像融合方法,常用的有(feather|multibend);
4.4 小結
如果輸入的圖片數量、分辨率不是太大,源碼中一些分辨率縮放的步驟,還有耗時測試的步驟都可以去除,以簡化拼接實現流程,在實際的拼接應用過程中,一般也不會直接采用這個流程進行實時拼接。流程中每一個配置參數涉及的算法原理有助於我們理解更多細節,也是后面我想逐步介紹的內容
