圖像拼接1 opencv stitcher


1. 緒言

  圖像拼接算是傳統計算機視覺領域集大成者的一個方向,涉及的步驟主要有:特征點提取、特征匹配、圖像配准、圖像融合等。如下圖1.1 是opencv圖像拼接的流程圖,圖像拼接方向涉及的研究方向眾多,如特征提取方向就有常用的SIFT、SURF、ORB等,這些特征提取方法在slam方向也有非常廣的應用,所以有余力的話弄清楚這些實現細節,對建立自身的知識體系還是非常有必要的。
圖1.1 opencv 拼接流程圖

圖1.1 opencv 拼接流程圖

2. opencv stitcher

  opencv當中有直接封裝的拼接類 Stitcher,基本是調用一個接口就可以完成所有拼接步驟,得到拼接圖像。測試用例圖片參考

2.1 示例代碼

下面是調用接口的示例代碼:

#include "opencv2/opencv.hpp"
#include "logging.hpp"
#include <string>

void stitchImg(const std::vector<cv::Mat>& imgs, cv::Mat& pano)
{
    //設置拼接圖像 warp 模式,有PANORAMA與SCANS兩種模式
    //panorama: 圖像會投影到球面或者柱面進行拼接
    //scans: 默認沒有光照補償與柱面投影,直接經過仿射變換進行拼接
    cv::Stitcher::Mode mode = cv::Stitcher::PANORAMA;
    cv::Ptr<cv::Stitcher> stitcher = cv::Stitcher::create(mode);
    cv::Stitcher::Status status = stitcher->stitch(imgs, pano);
    if(cv::Stitcher::OK != status){
        LOG(INFO) << "failed to stitch images, err code: " << (int)status;
    }
}

int main(int argc, char* argv[])
{
    std::string pic_path = "data/img/*";
    std::string pic_pattern = ".jpg";

    if(2 == argc){
        pic_path = std::string(argv[1]);
    }else if(3 == argc){
        pic_path = std::string(argv[1]);
        pic_pattern = std::string(argv[2]);
    }else{
        LOG(INFO) << "default value";
    }
    std::vector<cv::String> img_names;
    std::vector<cv::Mat> imgs;
    pic_pattern = pic_path + pic_pattern;
    cv::glob(pic_pattern, img_names);
    if(img_names.empty()){
        LOG(INFO) << "no images";
        return -1;
    }
    for(size_t i = 0; i < img_names.size(); ++i){
        cv::Mat img = cv::imread(img_names[i]);
        imgs.push_back(img.clone());
    }
    cv::Mat pano;
    stitchImg(imgs, pano);
    if(!pano.empty()){
        cv::imshow("pano", pano);
        cv::waitKey(0);
    }
    return 0;
}

2.2 示例效果

  • mode = panorama
    image

    CMU場景拼接 1
  • mode=scans
    image

    CMU場景拼接 2

  上面的兩組CMU場景對比圖說明了PANORAMA與SCANS的區別,前者會將圖像進行柱面投影,得到的全景圖會有彎曲的現象,而SCANS只有仿射變換,所以拼接圖基本都保留了原圖的直線平行關系。

3. 簡化的拼接

  這一節准備挖一些坑。在看opencv stitcher里面的細節時,先簡單模仿實現一下scans模式的拼接,看看拼接的效果。基本思路是:

  • 特征提取與匹配,找到圖像間的匹配關系;
  • 估算圖像的變換矩陣,以便圖像對齊;選取十個匹配程度最高的特征點,繪制這十個特征點,找到正確匹配的三個點估算仿射變換矩陣;
  • 設置一個畫布,寬度是所有圖像的寬度之和,高度為所有圖像高度的最大值,默認值為0
  • 將匹配程度最高的點投影到畫布上,作為左右拼接圖像的中心
  • 以右邊的圖像為參考圖像,即將左邊的圖像進行變換然后與右邊的圖像進行融合

3.1 特征提取

  常用的特征提取主要有SIFT 、SURF、ORB,ORB速度較快,再其他視覺任務中用的也比較多,但是精度沒有前兩者高。

void featureExtract(const std::vector<cv::Mat> &imgs,
                    std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                    std::vector<cv::Mat> &imageDescs)
{
    keyPoints.clear();
    imageDescs.clear();
    //提取特征點
    int minHessian       = 800;
    cv::Ptr<cv::ORB> orbDetector = cv::ORB::create(minHessian);
    for (int i = 0; i < imgs.size(); ++i) {
        std::vector<cv::KeyPoint> keyPoint;
        //灰度圖轉換
        cv::Mat image;
        cvtColor(imgs[i], image, cv::COLOR_BGR2GRAY);
        orbDetector->detect(image, keyPoint);
        keyPoints.push_back(keyPoint);
        cv::Mat imageDesc1;
        orbDetector->compute(image, keyPoint, imageDesc1);
        /*需要將imageDesc轉成浮點型,不然會出錯
       **Unsupported format or combination of formats 
       **in buildIndex using FLANN algorithm
       */
        imageDesc1.convertTo(imageDesc1, CV_32F);
        imageDescs.push_back(imageDesc1.clone());
    }
}

3.2 特征匹配

  這一步根據圖像的特征點確定圖像之間特征點的配對關系,從而求取變換矩陣H 。此H是對整幅圖像進行的變換,現在為了解決一些視差問題,有人在圖像上划分網格,然后對每個網格單獨計算變換矩陣H。

void featureMatching(const std::vector<cv::Mat> &imgs,
                     const std::vector<std::vector<cv::KeyPoint>> &keyPoints,
                     const std::vector<cv::Mat> &imageDescs,
                     std::vector<std::vector<cv::Point2f>> &optimalMatchePoint)
{
    optimalMatchePoint.clear();
    //獲得匹配特征點,並提取最優配對,此處假設是順序輸入,測試使用假設是兩張圖
    cv::FlannBasedMatcher matcher;
    std::vector<cv::DMatch> matchePoints;
    matcher.match(imageDescs[0], imageDescs[1], matchePoints, cv::Mat());

    sort(matchePoints.begin(), matchePoints.end());//特征點排序
    //獲取排在前N個的最優匹配特征點
    std::vector<cv::Point2f> imagePoints1, imagePoints2;
    for (int i = 0; i < MAX_OPTIMAL_POINT_NUM; i++) {
        imagePoints1.push_back(keyPoints[0][matchePoints[i].queryIdx].pt);
        imagePoints2.push_back(keyPoints[1][matchePoints[i].trainIdx].pt);
    }
       optimalMatchePoint.push_back(std::vector<cv::Point2f>{
            imagePoints1[0], imagePoints1[3], imagePoints1[6]});
    optimalMatchePoint.push_back(std::vector<cv::Point2f>{
            imagePoints2[0], imagePoints2[3], imagePoints2[6]});
}

  使用orb特征提取的時候,這里有很多誤匹配的點,上面三個點是根據顯示出來匹配正確的點,將用來估算仿射變換矩陣H。opencv 內部處理是使用 RANSAC 算法進行估計的,此處我省略了這個步驟。

3.3 估算仿射變換矩陣

  上一步得到了最強匹配的三個點,這一步可以直接計算得到H。在計算之前,先將右邊的圖像移到畫布的右邊

void getAffineMat(std::vector<std::vector<cv::Point2f>>& optimalMatchePoint,
                  int left_cols, std::vector<cv::Mat>& Hs)
{
    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += left_cols;
        newMatchingPt.push_back(pt);
    }
    //左邊圖像的變換矩陣,右圖的特征點經過移動,左圖需要變換到畫布上右圖的特征點位置
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    //右邊圖像的變換矩陣,即將右圖移到畫布右側
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    Hs.push_back(homo1);
    Hs.push_back(homo2);
}

3.4 拼接圖像

  確定了變換矩陣以后,取最強響應的特征點作為兩幅圖像的融合中心,中心左右兩邊分別對應各自兩幅圖像。這種拼接處理方式非常粗暴,對於只有平移變化拍攝的圖像,尚且還能拼接到一起,但是若加上旋轉或者拍攝時光心不對起的情況,拼接錯位非常嚴重。另外一點是圖像融合,此處直接選用一條分界線作為選取原圖像素的依據,過渡不夠平滑,也會有錯位。

void getPano2(std::vector<cv::Mat> &imgs, const std::vector<cv::Mat> &H, 
			  cv::Point2f &optimalPt, cv::Mat &pano)
{
    //以右邊圖像為參考,將left的圖像經過仿射變換變到與右邊圖像重合,取最強響應特征點作為兩幅圖像融合的中心
    //默認的全景圖畫布尺寸為:
   //	width=left.width + right.width, 
   //	height = std::max(left.height, right.height)
    int pano_width  = imgs[0].cols + imgs[1].cols;
    int pano_height = std::max(imgs[0].rows, imgs[1].rows);
    pano            = cv::Mat::zeros(cv::Size(pano_width, pano_height), CV_8UC3);
    cv::Mat img_trans0, img_trans1;
    img_trans0 = cv::Mat::zeros(pano.size(), CV_8UC3);
    img_trans1 = cv::Mat::zeros(pano.size(), CV_8UC3);
    //原圖經過仿射變化后已經位於全景圖對應的位置
    cv::warpAffine(imgs[0], img_trans0, H[0], pano.size());
    cv::warpAffine(imgs[1], img_trans1, H[1], pano.size());

    //最強響應特征點
    cv::Mat trans_pt = (cv::Mat_<double>(3, 1) << optimalPt.x, optimalPt.y, 1.0f);
    //最強響應特征點在畫布上的位置
    trans_pt = H[0]*trans_pt;

    //確定兩幅圖像需要選取的區域
    cv::Rect left_roi  = cv::Rect(0, 0, trans_pt.at<double>(0, 0), pano_height);
    cv::Rect right_roi = cv::Rect(trans_pt.at<double>(0, 0), 0,
            pano_width - trans_pt.at<double>(0, 0) + 1, pano_height);
    //將選取的區域像素復制到畫布上
    img_trans0(left_roi).copyTo(pano(left_roi));
    img_trans1(right_roi).copyTo(pano(right_roi));
    cv::imshow("pano", pano);
    cv::waitKey(0);
}

int main(int argc, char *argv[])
{
    cv::Mat image01 = cv::imread("data/img/medium11.jpg");
    cv::resize(image01, image01, cv::Size(image01.cols, image01.rows + 1));
    cv::Mat image02 = cv::imread("data/img/medium12.jpg");
    cv::resize(image02, image02, cv::Size(image02.cols, image02.rows + 1));
    std::vector<cv::Mat> imgs = {image01, image02};
    std::vector<std::vector<cv::KeyPoint>> keyPoints;
    std::vector<std::vector<cv::Point2f>> optimalMatchePoint;
    std::vector<cv::Mat> imageDescs;
    featureExtract(imgs, keyPoints, imageDescs);
    featureMatching(imgs, keyPoints, imageDescs, optimalMatchePoint);

    std::vector<cv::Point2f> newMatchingPt;
    for (int i = 0; i < optimalMatchePoint[1].size(); i++) {
        cv::Point2f pt = optimalMatchePoint[1][i];
        pt.x += imgs[0].cols;
        newMatchingPt.push_back(pt);
    }
    cv::Mat homo1 = getAffineTransform(optimalMatchePoint[0], newMatchingPt);
    cv::Mat homo2 = getAffineTransform(optimalMatchePoint[1], newMatchingPt);

    std::vector<cv::Mat> Hs = {homo1, homo2};
    cv::Mat pano;
    //getPano1(imgs, Hs, pano);
    getPano2(imgs, Hs, optimalMatchePoint[0][0], pano);
    return 0;
}

3.5 簡化拼接效果

  • 只有平移變化的圖像拼接效果
    image
圖 3.5.1 雪地場景仿射變化拼接
  • 有旋轉變化的圖像拼接
    image
圖 3.5.2 CMU仿射變化拼接

  算不上啥效果吧,圖3.5.2可以清晰的看到錯位,而且整個拼接圖左側有明顯的傾斜,左側紅框為左圖區域,中間繪制的紅線表示左右圖分界線。錯位有多方面原因,沒有好的融合過渡算法,沒有考慮到相機的旋轉變化,拼接縫位置找的不好。畫面有傾斜,不夠自然,則是單一選擇某張圖片作為參考圖片,將其它圖像變換到其所在坐標系導致。

4. opencv stitcher 模塊

  opencv在示例代碼中有提供 stitching_detailed.cpp 示例,里面包含了各個模塊的實現步驟。我們在實際使用的時候一般都是要求實時拼接,直接調接口基本是沒法達到這個要求的,特別是在arm嵌入式端,這就需要我們弄清楚實現細節找到優化點。我這里只對 stitching_detailed.cpp 中的部分細節感興趣,所以將耗時統計、縮放選找融合區域這些都去掉了。

4.1 參數預覽

opencv的stitching_detailed.cpp中有非常多的配置參數,  由圖1.1 opencv 拼接流程圖可知,opencv stitcher中的主要步驟有:

  • registration
    • 特征提取
    • 特征匹配
    • 圖像配准
    • 相機內參估算
    • 波形矯正
  • compositing
    • 圖像變換
    • 光照補償
    • 查找拼接縫
    • 圖像融合

  registration部分主要是用來獲取圖像間的匹配關系,估算相機的內外參,並使用BA算法對參數進行優化,此模塊主要是對圖像的拼接順序和變換矩陣估算。compositing部分則是在獲取到參數以后進行圖像變換、融合,並使用光照補償等算法進行畫面一致性的改善。參數預覽如下:

static void printUsage(char** argv)
{
    cout <<
         "Rotation model images stitcher.\n\n"
         << argv[0] << " img1 img2 [...imgN] [flags]\n\n"
                       "Flags:\n"
                       "  --preview\n"
                       "      Run stitching in the preview mode. Works faster than usual mode,\n"
                       "      but output image will have lower resolution.\n"
                       "  --try_cuda (yes|no)\n"
                       "      Try to use CUDA. The default value is 'no'. All default values\n"
                       "      are for CPU mode.\n"
                       "\nMotion Estimation Flags:\n"
                       "  --work_megapix <float>\n"
                       "      Resolution for image registration step. The default is 0.6 Mpx.\n"
                       "  --features (surf|orb|sift|akaze)\n"
                       "      Type of features used for images matching.\n"
                       "      The default is surf if available, orb otherwise.\n"
                       "  --matcher (homography|affine)\n"
                       "      Matcher used for pairwise image matching.\n"
                       "  --estimator (homography|affine)\n"
                       "      Type of estimator used for transformation estimation.\n"
                       "  --match_conf <float>\n"
                       "      Confidence for feature matching step. The default is 0.65 for surf and 0.3 for orb.\n"
                       "  --conf_thresh <float>\n"
                       "      Threshold for two images are from the same panorama confidence.\n"
                       "      The default is 1.0.\n"
                       "  --ba (no|reproj|ray|affine)\n"
                       "      Bundle adjustment cost function. The default is ray.\n"
                       "  --ba_refine_mask (mask)\n"
                       "      Set refinement mask for bundle adjustment. It looks like 'x_xxx',\n"
                       "      where 'x' means refine respective parameter and '_' means don't\n"
                       "      refine one, and has the following format:\n"
                       "      <fx><skew><ppx><aspect><ppy>. The default mask is 'xxxxx'. If bundle\n"
                       "      adjustment doesn't support estimation of selected parameter then\n"
                       "      the respective flag is ignored.\n"
                       "  --wave_correct (no|horiz|vert)\n"
                       "      Perform wave effect correction. The default is 'horiz'.\n"
                       "  --save_graph <file_name>\n"
                       "      Save matches graph represented in DOT language to <file_name> file.\n"
                       "      Labels description: Nm is number of matches, Ni is number of inliers,\n"
                       "      C is confidence.\n"
                       "\nCompositing Flags:\n"
                       "  --warp (affine|plane|cylindrical|spherical|fisheye|stereographic|"
                       "     compressedPlaneA2B1|compressedPlaneA1.5B1|compressedPlanePortraitA2B1|"
                       "      compressedPlanePortraitA1.5B1|paniniA2B1|paniniA1.5B1|paniniPortraitA2B1|"
                       "      paniniPortraitA1.5B1|mercator|transverseMercator)\n"
                       "      Warp surface type. The default is 'spherical'.\n"
                       "  --seam_megapix <float>\n"
                       "      Resolution for seam estimation step. The default is 0.1 Mpx.\n"
                       "  --seam (no|voronoi|gc_color|gc_colorgrad)\n"
                       "      Seam estimation method. The default is 'gc_color'.\n"
                       "  --compose_megapix <float>\n"
                       "      Resolution for compositing step. Use -1 for original resolution.\n"
                       "      The default is -1.\n"
                       "  --expos_comp (no|gain|gain_blocks|channels|channels_blocks)\n"
                       "      Exposure compensation method. The default is 'gain_blocks'.\n"
                       "  --expos_comp_nr_feeds <int>\n"
                       "      Number of exposure compensation feed. The default is 1.\n"
                       "  --expos_comp_nr_filtering <int>\n"
                       "      Number of filtering iterations of the exposure compensation gains.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 2.\n"
                       "  --expos_comp_block_size <int>\n"
                       "      BLock size in pixels used by the exposure compensator.\n"
                       "      Only used when using a block exposure compensation method.\n"
                       "      The default is 32.\n"
                       "  --blend (no|feather|multiband)\n"
                       "      Blending method. The default is 'multiband'.\n"
                       "  --blend_strength <float>\n"
                       "      Blending strength from [0,100] range. The default is 5.\n"
                       "  --output <result_img>\n"
                       "      The default is 'result.jpg'.\n"
                       "  --timelapse (as_is|crop) \n"
                       "      Output warped images separately as frames of a time lapse movie, "
                       "      with 'fixed_' prepended to input file names.\n"
                       "  --rangewidth <int>\n"
                       "      uses range_width to limit number of images to match with.\n";
}

4.2 Motion Estimation Flags 參數含義

  • work_megapix :在特征提取等 registration過程中,為了減小耗時,會將圖像進行縮放,這就需要一個縮放比例;
  • features : 表示選用的提取的特征,(SURF|ORB|SIFT|akaze)
  • matcher : 特征匹配方法,(homography | affine),單應性變換與仿射變換方法,分別對應BestOf2NearestMatcher、AffineBestOf2NearestMatcher,后者會找到兩幅圖仿射變換的最佳匹配點;
  • estimator : (homography | affine),相機參數評估方法;
  • match_conf : 浮點型數據,表示匹配階段內點判斷的閾值;
  • conf_thresh : 兩幅圖片是來自同一全景的閾值:
  • ba : BA優化相機參數的代價函數,(no|reproj|ray|affine);
  • ba_refine_mask : BA優化的時候,可以固定某些參數不動,通過指定mask實現。'x'表示需要優化,'_'表示固定參數,對應的順序是fx,skew,ppx,aspect,ppy;
  • wave_correct : 波形矯正標志,有(no|horiz|vert)三種類型,可以將拼接圖像約束在水平方向,或者垂直方向,避免出現“大鵬展翅”的情況;
    image
  • save_graph : 以DOT語言格式保存圖像之間的匹配關系;

4.3 Compositing Flags 參數含義

  • warp : 圖像變換方法,包括球面投影、柱面投影等,opencv支持的投影方法比較多;
  • seam_megapix : 尋找拼接縫的時候,會將圖像進行縮放,此參數與 work_scale 可以用來控制縮放比例;
  • seam : 接縫尋找的方法;
  • compose_megapix : 預覽時用於設置拼接過程中以及拼接圖的分辨率;
  • expos_comp : 光照補償方法;
  • blend : 圖像融合方法,常用的有(feather|multibend);

4.4 小結

  如果輸入的圖片數量、分辨率不是太大,源碼中一些分辨率縮放的步驟,還有耗時測試的步驟都可以去除,以簡化拼接實現流程,在實際的拼接應用過程中,一般也不會直接采用這個流程進行實時拼接。流程中每一個配置參數涉及的算法原理有助於我們理解更多細節,也是后面我想逐步介紹的內容


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM