圖像檢索(3):BoW實現

本文轉載自查看原文 2018-08-07 09:55 3293 02-ImageRetrieval

在上一篇文章中圖像檢索(2):均值聚類-構建BoF中，簡略的介紹了基於sift特征點的BoW模型的構建，以及基於輕量級開源庫vlfeat的一個簡單實現。
本文重新梳理了一下BoW模型，並給出不同的實現。

基於OpenCV的BoW實現
- BoWTrainer的使用
詞袋模型開源庫DBoW3

BoW

BoW模型最初是為解決文檔建模問題而提出的，因為文本本身就是由單詞組成的。它忽略文本的詞序，語法，句法，僅僅將文本當作一個個詞的集合，並且假設每個詞彼此都是獨立的。這樣就可以使用文本中詞出現的頻率來對文檔進行描述，將一個文檔表示成一個一維的向量。

將BoW引入到計算機視覺中，就是將一幅圖像看着文本對象，圖像中的不同特征可以看着構成圖像的不同詞匯。和文本的BoW類似，這樣就可以使用圖像特征在圖像中出現的頻率，使用一個一維的向量來描述圖像。

要將圖像表示為BoW的向量，首先就是要得到圖像的“詞匯”。通常需要在整個圖像庫中提取圖像的局部特征（例如，sift，orb等），然后使用聚類的方法，合並相近的特征，聚類的中心可以看着一個個的視覺詞匯(visual word)，視覺詞匯的集合構成視覺詞典(visual vocabulary) 。得到視覺詞匯集合后，統計圖像中各個視覺詞匯出現的頻率，就得到了圖像的BoW表示。

總結起來就是：

提取圖像庫中所有圖像的局部特征，例如sift，得到特征集合\(F\)
對特征集合\(F\)進行聚類，得到\(k\)個聚類中心\(\{C_i|i = 1,\dots,k\}\)，每一個聚類中心\(C_i\)代表着一個視覺詞匯。聚類中心的集合就是視覺詞典\(vocabulary = \{C_i|i = 1,\dots,k\}\)
一幅圖像的BoW表示
- 提取圖像的局部特征，得到特征集合\(f = \{f_i |i = 1,\dots,n\}\)
- 計算特征\(f_i\)屬於那個詞匯\(C_i\)（到該中心的距離最近）
- 統計每個詞匯\(C_i\)在圖像中出現的頻數，得到一個一維的向量，該向量就是圖像的BoW表示。

綜合起來，取得一幅圖像的BoW向量的步驟：

構建圖像庫的視覺詞典Vocabulary
- 提取圖像庫中所有圖像的局部特征，如SIFT.
- 對提取到的圖像特征進行聚類，如k-means，得到聚類中心就是圖像庫的視覺詞匯詞典Vocabulary
計算一幅圖像的BoW向量
- 提取圖像的局部特征
- 統計Vocabulay中的每個視覺詞匯visual word ，在圖像中出現的頻率。

基於OpenCV的實現

基於OpenCV的原生實現

第一步提取圖像的sift特征。對sift特征的詳細講解，可以參考其余兩篇文章：再論SIFT-基於vlfeat實現和SIFT特征詳解。這里不再贅述，提取特征的代碼如下：

void siftDetecotor::extractFeatures(const std::vector<std::string> &imageFileList,std::vector<cv::Mat> &features)
{
    int index = 1;
    int count = 0;
    features.reserve(imageFileList.size());

    auto size = imageFileList.size();
    //size = 20;
    //#pragma omp parallel for
    for(size_t i = 0; i < size; i ++){
        auto str = imageFileList[i];
        Mat des;
        siftDetecotor::extractFeatures(str,des);
        features.emplace_back(des); 
        count += des.rows;
        index ++ ;
    }  
    cout << "Extract #" << index << "# images features done!" << "Count of features:#" << count << endl;
}

傳入imageFileList是圖像的路徑列表，vector<Mat> featurues返回提取得到的所有圖像的特征。

聚類，得到Vocabulary
OpenCV中k-means聚類的接口如下：

double cv::kmeans    (    InputArray     data,
    int     K,
    InputOutputArray     bestLabels,
    TermCriteria     criteria,
    int     attempts,
    int     flags,
    OutputArray     centers = noArray() 
)

data輸入數據，每一行是一條數據。
k聚類的個數，這是就是Vocabulary的大小（詞匯的個數u）。
bestLabels每一個輸入的數據所屬的聚類中心的index
criteria kmenas算法是迭代進行，這里表示迭代的終止條件。可以是迭代的次數，或者是結果達到的精度，也可以是兩者的結合，達到任一條件就結束。
attmepts 算法的次數，使用不同的初始化方法
flags 算法的初始化方法，可以選擇隨機初始化KMEANS_RANDOM_CENTERS，或者kmeans++的方法KMEANS_PP_CENTERS
centers 聚類的中心組成的矩陣。

得到圖像庫中圖像的所有特征后，可以將這些特征組成一個大的矩陣輸入到kmeans算法中，得到聚類中心，也就是Vocabulary

    Mat f;
    vconcat(features,f);
    vector<int> labes;
    kmeans(f,k,labes,TermCriteria(TermCriteria::COUNT + TermCriteria::EPS,100,0.01),3,cv::KMEANS_PP_CENTERS,m_voc);

首先，使用vconcat將提取的特征點沿Y方向疊放在一起。k-means算法的終止條件是ermCriteria::COUNT + TermCriteria::EPS,100,0.01，算法迭代100次或者精度達到0.01就結束。

圖像的BoW編碼
得到Vocabulary后，統計視覺詞匯在每個圖像出現的概率就很容易得到圖像的BoW編碼

void Vocabulary::transform_bow(const cv::Mat &img,std::vector<int> bow)
{
    auto fdetector = xfeatures2d:: SIFT ::create(0,3,0.2,10);
    vector<KeyPoint> kpts;
    Mat des;
    fdetector->detectAndCompute(img,noArray(),kpts,des);

    Mat f;
    rootSift(des,f);

    // Find the nearest center
    Ptr<FlannBasedMatcher> matcher = FlannBasedMatcher::create();
    vector<DMatch> matches;
    matcher->match(f,m_voc,matches);

    bow = vector<int>(m_k,0);
    // Frequency
    /*for( size_t i = 0; i < matches.size(); i++ ){
        int queryIdx = matches[i].queryIdx;
        int trainIdx = matches[i].trainIdx; // cluster index
        CV_Assert( queryIdx == (int)i );

        bow[trainIdx] ++; // Compute word frequency
    }*/

    // trainIdx => center index
    for_each(matches.begin(),matches.end(),[&bow](const DMatch &match){
        bow[match.trainIdx] ++; // Compute word frequency
    });
}

在查找圖像的某個特征屬於的聚類中心時，本質上就是查找最近的向量，可以使用flann建立索引樹來查找；也可以使用一些特征匹配的方法，這里使用flannMatcher。統計每個詞匯在圖像中出現的頻率，即可得到圖像的BoW向量。

BoWTrainer

在OpenCV中封裝了3個關於BoW的類。

抽象基類BOWTrainer，從圖像庫中的特征集中構建視覺詞匯表Vobulary

class CV_EXPORTS_W BOWTrainer
{
public:
    BOWTrainer();
    virtual ~BOWTrainer();
    CV_WRAP void add( const Mat& descriptors );
    CV_WRAP const std::vector<Mat>& getDescriptors() const;
    CV_WRAP int descriptorsCount() const;
    CV_WRAP virtual void clear();
    CV_WRAP virtual Mat cluster() const = 0;
    CV_WRAP virtual Mat cluster( const Mat& descriptors ) const = 0;

protected:
    std::vector<Mat> descriptors;
    int size;
};

類BOWKMeansTrainer基於k-means聚類，實現了BOWTrainer的方法。使用kmeans方法，從特征集中聚類得到視覺詞匯表Vocabulary。其聲明如下：

class CV_EXPORTS_W BOWKMeansTrainer : public BOWTrainer
{
public:
    CV_WRAP BOWKMeansTrainer( int clusterCount, const TermCriteria& termcrit=TermCriteria(),
                      int attempts=3, int flags=KMEANS_PP_CENTERS );
    virtual ~BOWKMeansTrainer();
    CV_WRAP virtual Mat cluster() const;
    CV_WRAP virtual Mat cluster( const Mat& descriptors ) const;

protected:

    int clusterCount;
    TermCriteria termcrit;
    int attempts;
    int flags;
};

該類的使用也是很簡單的，首先構建一個BOWKMeansTrainer的實例，其第一個參數clusterCount是聚類中心的個數，也就是Vocabulary的大小，余下的幾個參數就是使用kmeans函數的參數，具體可參考上面的介紹。
然后，調用add方法，添加提取到的特征集。添加特征集的時候，有兩種方法：

for(int i=0; i<numOfPictures; i++)
    bowTraining.add( descriptors( i ) )；

也可以提取好所有圖像的特征，然后將特征合並為一個矩陣添加

    Mat feature_list;
    vconcat(features,feature_list);

    BOWKMeansTrainer bow_trainer(k);
    bow_trainer.add(feature_list);

添加圖像特征后，調用

vocabulary =  bow_trainer.cluster();

對特征集進行聚類，得到的聚類中心就是所要求的視覺詞匯表Vocabulary。

在得到Vocabulary后，就可以對一副圖像進行編碼，使用BoW向量來表示該圖像，這時候就要使用BOWImgDescriptorExtractor。其聲明如下：

class BOWImgDescriptorExtractor{
  public:
      BOWImgDescriptorExtractor( const Ptr<DescriptorExtractor> &dextractor,                                                        const Ptr<DescriptorMatcher> & dmatcher );

      virtual ~BOWImgDescriptorExtractor(){}

      void setVocabulary( const Mat& vocabulary );

      const Mat& getVocabulary() const;

      void compute( const Mat& image, vector<KeyPoint> & keypoints,

                    Mat& imgDescriptor,

                    vector<vector<int> >* pointIdxOfClusters = 0,

                    Mat* descriptors = 0 );

      int descriptorSize() const;

      int descriptorType() const;
protected:
    Mat vocabulary;
    Ptr<DescriptorExtractor> dextractor;
    Ptr<DescriptorMatcher> dmatcher;

該類實現了一下三個功能：

根據相應的Extractor提取圖像的特征
找到距離每個特征最近的visual word
計算圖像的BoW表示，並且將其進歸一化。

要實例化一個BOWImgDescriptorExtractor，需要提供三個參數

視覺詞匯表Vocabulalry
圖像特征提取器 DescriptorExtractor
特征匹配的方法 descriptorMatcher ，用來查找和某個特征最近的visual word。

其使用也很便利，使用Extractor和Matcher實例化一個BOWImgDescriptorExtractor，然后設置Vocabulary，

BOWImgDescriptorExtractor bowDE(extractor, matcher);
bowDE.setVocabulary(dictionary); //dictionary是通過前面聚類得到的詞典；

要求某圖像的BoW，可以調用compute方法

bowDE.compute(img, keypoints, bow);

Summary

BOWKMeansTrainer 對提取到的圖像特征集進行聚類，得到視覺詞匯表Vocabulary
BOWImgDescriptorExtractor 在得到視覺詞匯表后，使用該類，可以很方便的對圖像進行BoW編碼。

DBoW3

DBoW3是一個開源的C++詞袋模型庫，可以很方便將圖像轉化成視覺詞袋表示。它采用層級樹狀結構將相近的圖像特征在物理存儲上聚集在一起，創建一個視覺詞典。DBoW3還生成一個圖像數據庫，帶有順序索引和逆序索引，可以使圖像特征的檢索和對比非常快。
DBoW3是DBoW2的增強版，僅依賴OpenCV，能夠很方便的使用。開源的SLAM項目ORB_SLAM2就是使用DBoW2進行回環檢測的，關於DBoW3詳細介紹，可以參考淺談回環檢測中的詞袋模型（bag of words).

本文僅介紹下DBoW3的使用,DBoW3的源代碼在github上https://github.com/rmsalinas/DBow3，是基於CMake的，配置好OpenCV庫后，直接cmake編譯即可。

DBoW3兩個比較重要的類是Vocabulary和Database，Vocabulary表示圖像庫的視覺詞匯表，並可以將任一的圖像轉換為BoW表示，Database是一個圖像數據庫，能夠方便的對圖像進行檢索。
構建Vocabulary的代碼如下：

void vocabulary(const vector<Mat> &features,const string &file_path,int k = 9,int l = 3){
    //Branching factor and depth levels
    const DBoW3::WeightingType weight = DBoW3::TF_IDF;
    const DBoW3::ScoringType score = DBoW3::L2_NORM;

    DBoW3::Vocabulary voc(k,l,weight,score);
    cout << "Creating a small " << k << "^" << l << " vocabulary..." << endl;
    voc.create(features);
    cout << "...done!" << endl;
    
    //cout << "Vocabulary infomation: " << endl << voc << endl << endl;

    // save the vocabulary to disk
    cout << endl << "Saving vocabulary..." << endl;
    stringstream ss;
    ss << file_path << "/small_voc.yml.gz";
    voc.save(ss.str());
    cout << "Done" << endl;
}

傳入圖像的特征集，配置聚類樹的分支樹(k)，以及深度(l)，調用create進行聚類即得到Vocabulary。可以將得到的Vocabulary保存成文件，以便后面使用。
有了Vocabulary后，就可以構建一個Database方便圖像的查找

void database(const vector<Mat> &features,const string &file_path){
    // load the vocabulary from disk
    stringstream ss ;
    ss << file_path <<"/small_voc.yml.gz";
    DBoW3::Vocabulary voc(ss.str());

    DBoW3::Database db(voc, false, 0); // false = do not use direct index
    // add images to the database
    for(size_t i = 0; i < features.size(); i++)
        db.add(features[i]);

    cout << "... done!" << endl;

    cout << "Database information: " << endl << db << endl;

    // we can save the database. The created file includes the vocabulary
    // and the entries added
    cout << "Saving database..." << endl;
    db.save("small_db.yml.gz");
    cout << "... done!" << endl;
}

需要前面得到Vocabulary和圖像的特征集來創建Database，創建完成后，也可以將其保存為本地文件，方便后面的使用。

有了Database后，可以其調用query方法，來查找數據庫中是否有相類似的圖片。

 //auto fdetector=cv::xfeatures2d::SURF::create(400, 4, 2);
    auto fdetector = xfeatures2d::SIFT::create(0,3,0.2,10);
    vector<KeyPoint> kpts;
    Mat des;
    fdetector->detectAndCompute(img,noArray(),kpts,des);

    db.query(des,ql,max_resuts);

提取圖像的特征，調用query該方法使用QueryResults返回查詢的結構。

Summary

本文重新梳理了下BoW模型，並且介紹了三種的實現方法：

基於OpenCV的原生實現，調用OpenCV的特征提取，聚類，匹配方法，獲取圖像的BoW向量。
使用OpenCV封裝的BowTrainer和BOWImgDescriptorExtractor類，更簡單的實現BoW模型。
使用開源庫DBoW3，該方法不斷能夠很簡單的創建Vocabulary，而且創建了一個圖像的Database，比較方法的利用BoW向量在圖像庫中查找類似圖片。

做了幾個月的圖像檢索，陸續把這段時間的收獲整理下。本文主要介紹了BoW的實現，下一篇爭取實現一個完整的圖像檢索的流程，預計有以下幾個方面：

TF-IDF
root-sift
vlad

本系列圖像檢索的代碼會push到github上，初期一直在寫各種sample，代碼有點亂，歡迎fork/start。
地址： https://github.com/brookicv/imageRetrieval

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 圖像檢索算法圖像檢索——VLAD 相似圖像檢索圖像檢索基於哈希的圖像檢索技術圖像檢索(1): 再論SIFT-基於vlfeat實現基於哈希的圖像檢索技術 python-圖像檢索圖像檢索(7)：取得更好的檢索結果圖像檢索(5):基於OpenCV實現小型的圖像數據庫檢索