視覺SLAM（七）后端優化

本文轉載自查看原文 2020-07-24 23:40 798

第七章作業

作者：曾是少年

二 Bundle Adjustment

2.1 文獻閱讀(2 分)

我們在第五講中已經介紹了Bundle Adjustment，指明它可以用於解PnP 問題。現在，我們又在后端中說明了它可以用於解大規模的三維重構問題，但在實時SLAM 場合往往需要控制規模。事實上，Bundle Adjustment 的歷史遠⽐我們想象的要長。請閱讀Bill Triggs 的經典論文Bundle Adjustment: A Modern Synthesis（見paper/目錄）1，了解BA 的發展歷史，然后回答下列問題：

1. 為何說Bundle Adjustment is slow 是不對的？

答：可以利用H矩陣的稀疏性進行優化。

原文中如下所示：

“Optimization / bundle adjustment is slow”: Such statements often appear in papers introducing yet another heuristic Structure from Motion (SFM) iteration. The claimed slowness is almost always due to the unthinking use of a general-purpose optimization routine that completely ignores the problem structure and sparseness. Real bundle routines are much more efﬁcient than this, and usually considerably more efﬁcient and ﬂexible than the newly suggested method (§6, 7). That is why bundle adjustment remains the dominant structure reﬁnement technique for real applications, after 40 years of research.”

翻譯：“優化/束調整運行緩慢”：這種陳述經常出現在介紹啟發式SFM的論文中。他們把BA速度緩慢的原因歸咎於通用的優化流程，該例程完全忽略了問題結構的稀疏性。實際的BA例程要高效很多，並且通常比新提出的方法（第6、7節）更加高效靈活。這就是經過40年的研究后，BA依然是實際應用中占主導地位的結構精煉技術。

2. BA 中有哪些需要注意參數化的地方？Pose 和Point 各有哪些參數化⽅式？有何優缺點。

答：BA中需要被參數化的地方包括：相機位姿、相機內參、三維特征點P以及投影后的像素坐標；

pose

三維歐拉角十三維位移，四元數+三維位移，旋轉矩陣十三維位移，變換矩陣等。歐拉角-萬向鎖問題。

旋轉矩陣-直觀，但自由度過多，小旋轉給不出確切矩陣。

四元數參數簡潔，不直觀。

point

homogeneous affine(X,Y,Z,1)-直觀，但是需要參數有較大的改變才能明顯的影響到 cost function. homogeneous projective parametrization(X,Y,Z,W)-可以表示距離無窮遠的點。

總結一下BA過程首先選擇你想要的圖里的節點與邊的類型，確定它們的參數化形式；

往圖里加入實際的節點和邊；
選擇初值，開始迭代；
每一步迭代中，計算對應於當前估計值的雅可比矩陣和海塞矩陣；
求解稀疏線性方程\(H_kΔx=−b_k\)，得到梯度方向；
繼續用GN或LM進行迭代。如果迭代結束，返回優化值。

Point 參數化方式

視覺SLAM中點的參數化表示包括三維坐標XYZ和逆深度表示方法。

Open VINS文檔中給出了五種特征參數化表示：Global XYZ，Global Inverse Depth，Anchored XYZ，Anchored Inverse Depth，Anchored Inverse Depth (MSCKF Version)，區別在於：

Global vs Anchored：特征點的表示是全局坐標系的坐標還是局部相機坐標系的坐標。
XYZ vs Inverse Depth：使用的XYZ還是逆深度
Two different Inverse Depth：兩種不同類型的逆深度參數

三維坐標XYZ：優點在於簡單直觀；缺點在於不能表示無窮遠點；

逆深度：優點在於能夠建模無窮遠點；在實際應用中，逆深度也具有更好的數值穩定性。

Pose參數化方式

Pose的參數化表示包括歐拉角、四元數、變換矩陣。

歐拉角

優點：容易理解，形象直觀；三個值分別對應x、y、z軸的旋轉角度。

缺點：歐拉角這種方法是要按照一個固定的坐標軸的順序旋轉的，因此不同的順序會造成不同結果；歐拉角旋轉會造成萬向鎖現象，這種現象的發生就是由於上述固定的坐標軸旋轉順序造成的。由於萬向鎖的存在，歐拉旋轉無法實現球面平滑插值。
變換矩陣

優點：旋轉軸可以是任意向量

缺點：旋轉其實只需要知道一個向量+一個角度(共4自由度)，但矩陣卻用了16個元素(消耗時間和內存)
四元數

優點：可以避免萬向鎖問題；只需要一個4維的四元數就可以執行繞任意過原點的向量的旋轉，方便快捷，在某些實現下比旋轉矩陣效率更高；而且四元數旋轉可以提供平滑插值。

缺點：比歐拉旋轉稍微復雜了一點，因為多了一個維度，理解更困難，不直觀。帶有約束條件

3. 本文寫於2000 年，但是文中提到的很多內容在后面十幾年的研究中得到了印證。你能看到哪些方向在后續工作中有所體現？請舉例說明。

答：3.4節的 Intensity- based methods就是BA在直接法中的應用。第5節 Network Structure可以對應到SLAM中的圖優化模型；H的稀疏性可以實現BA實時，在07年的PTAM上實現。

2.2 BAL-dataset

BAL（Bundle Adjustment in large）數據集是⼀個⼤型BA 數據集，它提供了相機與點初始值與觀測，你可以⽤它們進⾏Bundle Adjustment。現在，請你使⽤g2o，自己定義Vertex 和Edge（不要使用自帶的頂點類型，也不要像本書例程那邊調用Ceres來求導），書寫BAL 上的BA 程序。你可以挑選其中⼀個數據，運行你的BA，並給出優化后的點雲圖。

本題不提供代碼框架，請獨立完成。

提示：

注意BAL 的投影模型⽐教材中介紹的多了個負號；

答：（本題我沒有思路，因此主要是參考了網絡上的代碼進行學習驗證的）

一般情況下，使用g2o的主要過程如下：

定義頂點和邊的類型
構建圖
選擇優化算法
調用g2o

具體編程過程如下：

打開BAL數據集文件problem-16-22106-pre.txt。描述問題

這一部分調用common.cpp中的BALProblem類來進行

//讀取BAL數據
string bal_file_name = "problem-16-22106-pre.txt";
//讀取BAL數據 構建BAL問題
BALProblem bal_problem(bal_file_name);
//調用Normalize接口對數據進行歸一化
bal_problem.Normalize();
//調用Perturb接口對數據添加噪聲
bal_problem.Perturb(0.1,0.5,0.5);

定義頂點（使用節點來表示相機和路標）

相機位姿節點

//相機位姿節點
//繼承的g2o::BaseVertex基類   <優化變量維度,數據類型>
/// 位姿加相機內參的頂點，9維，前三維為so3，接下去為t, f, k1, k2
class VertexPoseAndIntrinsics : public g2o::BaseVertex<9, PoseAndIntrinsics> {
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;

    VertexPoseAndIntrinsics() {}
    //重置
    virtual void setToOriginImpl() override {
        _estimate = PoseAndIntrinsics();
    }
    //更新
    virtual void oplusImpl(const double *update) override {
        _estimate.rotation = SO3d::exp(Vector3d(update[0], update[1], update[2])) * _estimate.rotation; //李代數的更新方式
        _estimate.translation += Vector3d(update[3], update[4], update[5]); //平移部分的更新
        _estimate.focal += update[6]; //更新焦距
        _estimate.k1 += update[7]; //更新內參
        _estimate.k2 += update[8]; //更新內參
    }

    /// 根據估計值投影一個點
    Vector2d project(const Vector3d &point) {
        Vector3d pc = _estimate.rotation * point + _estimate.translation;
        pc = -pc / pc[2];
        double r2 = pc.squaredNorm();
        double distortion = 1.0 + r2 * (_estimate.k1 + _estimate.k2 * r2);
        return Vector2d(_estimate.focal * distortion * pc[0],
                        _estimate.focal * distortion * pc[1]);
    }

    //讀盤和存盤
    virtual bool read(istream &in) {}

    virtual bool write(ostream &out) const {}
};

路標節點

class VertexPoint : public g2o::BaseVertex<3, Vector3d> {
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;

    VertexPoint() {}

    virtual void setToOriginImpl() override {
        _estimate = Vector3d(0, 0, 0);
    }

    virtual void oplusImpl(const double *update) override {
        _estimate += Vector3d(update[0], update[1], update[2]);
    }

    virtual bool read(istream &in) {}

    virtual bool write(ostream &out) const {}
};

投影邊

// 邊的定義
// 觀測邊
class EdgeProjection : public g2o::BaseBinaryEdge<2, Vector2d, VertexPoseAndIntrinsics, VertexPoint> {
public:
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW;

    virtual void computeError() override {
        auto v0 = (VertexPoseAndIntrinsics *) _vertices[0];
        auto v1 = (VertexPoint *) _vertices[1];
        auto proj = v0->project(v1->estimate()); //estimate 可以返回該節點的當前估計
        _error = proj - _measurement; //計算誤差
    }

    // use numeric derivatives
    virtual bool read(istream &in) {}
    virtual bool write(ostream &out) const {}
};

選擇優化算法（在solve算法中）

const int point_block_size = bal_problem.point_block_size();
const int camera_block_size = bal_problem.camera_block_size();
double *points = bal_problem.mutable_points(); //得到point參數的起始地址
double *cameras = bal_problem.mutable_cameras(); //得到camera參數的起始地址

// pose dimension 9, landmark is 3
typedef g2o::BlockSolver<g2o::BlockSolverTraits<9, 3>> BlockSolverType;
typedef g2o::LinearSolverCSparse<BlockSolverType::PoseMatrixType> LinearSolverType; //改了一處
// use LM
// 選擇使用LM算法進行優化
auto solver = new g2o::OptimizationAlgorithmLevenberg(
    g2o::make_unique<BlockSolverType>(g2o::make_unique<LinearSolverType>()));
g2o::SparseOptimizer optimizer;
optimizer.setAlgorithm(solver);
optimizer.setVerbose(true);

構建g2o問題

    /// build g2o problem
    const double *observations = bal_problem.observations();
    // 添加頂點 Vertex
    vector<VertexPoseAndIntrinsics *> vertex_pose_intrinsics;
    vector<VertexPoint *> vertex_points;
    // 添加相機位姿節點
    for (int i = 0; i < bal_problem.num_cameras(); ++i) {
        VertexPoseAndIntrinsics *v = new VertexPoseAndIntrinsics();
        double *camera = cameras + camera_block_size * i;
        v->setId(i);
        v->setEstimate(PoseAndIntrinsics(camera));
        optimizer.addVertex(v);
        vertex_pose_intrinsics.push_back(v);
    }
    // 添加路標節點
    for (int i = 0; i < bal_problem.num_points(); ++i) {
        VertexPoint *v = new VertexPoint();
        double *point = points + point_block_size * i;
        v->setId(i + bal_problem.num_cameras());
        v->setEstimate(Vector3d(point[0], point[1], point[2]));
        // g2o在BA中需要手動設置待Marg的頂點
        v->setMarginalized(true);
        optimizer.addVertex(v);
        vertex_points.push_back(v);
    }

    // 添加邊 edge
    for (int i = 0; i < bal_problem.num_observations(); ++i) {
        EdgeProjection *edge = new EdgeProjection;
        edge->setVertex(0, vertex_pose_intrinsics[bal_problem.camera_index()[i]]);
        edge->setVertex(1, vertex_points[bal_problem.point_index()[i]]);
        edge->setMeasurement(Vector2d(observations[2 * i + 0], observations[2 * i + 1]));
        edge->setInformation(Matrix2d::Identity());
        edge->setRobustKernel(new g2o::RobustKernelHuber());
        optimizer.addEdge(edge);
    }

調用g2o

optimizer.initializeOptimization();
optimizer.optimize(40);

// set to bal problem
for (int i = 0; i < bal_problem.num_cameras(); ++i) {
    double *camera = cameras + camera_block_size * i;
    auto vertex = vertex_pose_intrinsics[i];
    auto estimate = vertex->estimate();
    estimate.set_to(camera);
}
for (int i = 0; i < bal_problem.num_points(); ++i) {
    double *point = points + point_block_size * i;
    auto vertex = vertex_points[i];
    for (int k = 0; k < 3; ++k) point[k] = vertex->estimate()[k];
}

程序運行截圖如下：

使用MeshLab打開生成的兩個PLY文件，截圖如下：

初始的點雲文件

經過給g2o BA優化的點雲文件如下：

可以看出，通過給g2o BA優化的點雲結構更加緊湊條理也更加清晰

BuildGraph是核心函數

三直接法的 Bundle Adjustment (5 分，約 3 小時)

3.1 數學模型

特征點法的 BA 以最小化重投影誤差作為優化目標。相對的，如果我們以最小化光度誤差為目標，就得到了直接法的 BA。之前我們在直接法 VO 中，談到了如何用直接法去估計相機位姿。但是直接法亦可用來處理整個 Bundle Adjustment。下面，請你推導直接法 BA 的數學模型，並完成它的 g2o 實現。注意本題使用的參數化形式與實際的直接法還有⼀點不同，我們用 x, y, z 參數化每⼀個 3D 點，而實際的直接法多采用逆深度參數化 [1]。
本題給定 7 張圖片，記為 0.png 至 6.png，每張圖片對應的相機位姿初始值為 \(T_i\)，以 \(T_{cw}\) 形式存儲在 poses.txt 文件中，其中每⼀行代表⼀個相機的位姿，格式如之前作業那樣：
\(time, t_x, t_y, t_z, q_x, q_y, q_z, q_w\)
平移在前，旋轉（四元數形式）在后。同時，還存在⼀個 3D 點集 P，共 N 個點。其中每⼀個點的初始坐標記作 \(p_i = [x, y, z]\) 。每個點還有自己的固定灰度值，我們用 16 個數來描述，這 16 個數為該點周圍 4x4的小塊讀數，記作 \(I(p)_i\)，順序見圖 1 。換句話說，小塊從 u − 2, v − 2 取到 u + 1, v + 1，先迭代 v。那么，我們知道，可以把每個點投影到每個圖像中，然后再看投影后點周圍小塊與原始的 4x4 小塊有多大差異。那么，整體優化目標函數為：

\[\min\sum^7_{j=1}\sum^N_{i=1}\sum_W||I(p_i)-I_j(\pi(KT_jp_i))||^2_2 \]

即最小化任意點在任意圖像中投影與其本⾝顏⾊之差。其中K 為相機內參（在程序內以全局變量形式給定），\(\pi\)為投影函數，\(W\) 指代整個patch。下面，請回答：

1. 如何描述任意一點投影在任意⼀圖像中形成的error？

答：

\[error = I(p_i)-I_j(\pi(KT_jp_i)) \]

2. 每個error 關聯幾個優化變量？

答：每個error關聯兩個優化變量分別是相機的李代數姿態\(\xi\)（6自由度），三維空間點坐標\(P = X,Y,Z\)（3個自由度）。

3. error 關於各變量的雅可比是什么？

答：

error關於空間點坐標P(X,Y,Z)的導數為對應點的像素梯度，即

\[\begin{aligned} \frac{\partial e}{\partial \xi} & = \left[ \begin{array}{ccc} \frac{f_x}{Z} & 0 & -\frac{f_xX}{Z^2} & -\frac{f_xXY}{Z^2} & f_x+\frac{fxX^2}{Z^2} & -\frac{f_xY}{Z}\\ 0 & \frac{f_y}{Z} & -\frac{f_yY}{Z^2} & -f_y-\frac{fyY^2}{Z^2} & \frac{f_yXY}{Z^2} & \frac{f_yX}{Z}\\ \end{array} \right] \end{aligned} \]