面試題：檢測點是否在扇形之內

本文轉載自查看原文 2013-04-19 10:22 25620 游戲編程

前幾天，同事在報告中提及檢測角色是否在扇形攻擊范圍的方法。我覺得該方法的性能不是太好，提出另一個頗為直接的方法。此問題在游戲中十分常見，只涉及簡單的數學，卻又可以看出實現者是否細心，所以我覺得可當作一道簡單的面試題。問題在微博發表后得到不少回應，故撰文提供一些解答。

問題定義:

在二維中，檢測點 \mathbf{p}是否在扇形(circular sector)內，設扇形的頂點為 \mathbf{c}，半徑為 r，從 \mathbf{\hat{u}}方向兩邊展開 \theta角度。

當中 \mathbf{p},\mathbf{c},\mathbf{\hat{u}} 以直角坐標(cartesian coordinates)表示，r>0，0 < \theta < \pi。\mathbf{p}在扇形區邊界上當作不相交。實現時所有數值采用單精度浮點數類型。

問題分析

許多相交問題都可以分拆為較小的問題。在此問題中，扇形可表示為圓形和角度區間的交集。

換句話說，問題等同於檢測 \mathbf{p} 和 \mathbf{c} 的距離小於 r，及 \mathbf{p-c} 的方向在 \mathbf{\hat{u}} 兩邊 \theta 的角度范圍內。

距離

\mathbf{p} 和 \mathbf{c} 的距離小於 r，用數學方式表示:

|\mathbf{p} - \mathbf{c}| < r

極坐標

這是比較麻煩的部分，也是本題的核心。

有人想到，可以把 \mathbf{p-c} 和 \mathbf{\hat{u}} 從直角坐標轉換成極坐標(polar coordinates)。數學上，\mathbf{p-c} 和 \mathbf{\hat{u}} 分別與 \mathbf{x} 軸的夾角可用atan2()函數求得:

\begin{align*} \phi &= \mathrm{atan2}(p_y - c_y, p_x - c_x)\\ \alpha &= \mathrm{atan2}(u_y, u_x) \end{align*}

然后，檢查 \phi 是否在 (\alpha - \theta, \alpha + \theta) 區間內。但這要非常小心，因為 (\alpha - \theta, \alpha + \theta) 區間可能超越 (-\pi, \pi] 的范圍，所以要檢測:

\begin{align*} \alpha_1 &< \phi - 2\pi < \alpha_2 \text{ ; or}\\ \alpha_1 &< \phi < \alpha_2 \text{ ; or}\\ \alpha_1 &< \phi + 2\pi < \alpha_2 \end{align*}

這個方法是可行的，不過即使假設 \mathbf{\hat{u}} 和 \theta 是常數，可預計算 \alpha_1 和 \alpha_2 ，我們還是避免不了要計算一個atan2()。

點積

點積(dot product)可計算兩個矢量的夾角，這非常適合本題的扇形表示方式。我們要檢測 \mathbf{p-c} 和 \mathbf{\hat{u}} 的夾角是否小於 \theta :

\cos^{-1}\left (\mathbf{\frac{p-c}{|p-c|}} \cdot \mathbf{\hat{u}} \right\) < \theta

相比極坐標的方法，點積算出來的夾角必然在 [0, \pi] 區間里，無需作特別處理就可以和 \theta 比較。

這是比較直觀的角度比較方式。本文將以此方法為主軸。

編碼與優化

若直接實現以距離和點積的檢測，可以得到:

// Naive
bool IsPointInCircularSector(
    float cx, float cy, float ux, float uy, float r, float theta,
    float px, float py)
{
    assert(cosTheta > -1 && cosTheta < 1);
    assert(squaredR > 0.0f);

    // D = P - C
    float dx = px - cx;
    float dy = py - cy;

    // |D| = (dx^2 + dy^2)^0.5
    float length = sqrt(dx * dx + dy * dy);

    // |D| > r
    if (length > r)
        return false;

    // Normalize D
    dx /= length;
    dy /= length;

    // acos(D dot U) < theta
    return acos(dx * ux + dy * uy) < theta;
}

優化版本1: 基本優化

由於計算矢量長度需要sqrt()，而不等式(1.1)左右兩側都是非負數，所以可以把兩側平方:

|\mathbf{p - c}|^2 < r^2

如果 r 為常數，我們可以預計算 r^2 。

另外，如果\theta是常數，我們可以預計算 \cos \theta ，然后把不等式(1.5) 改為:

\frac{\mathbf{p-c}}{|\mathbf{p-c}|} \cdot \mathbf{\hat{u}} > \cos \theta

可以這么做，是因為 \cos x 在 0 \le x \le \pi 的區間內是單調下降的。

此外，由於除法一般較乘法慢得多，而 |p-c| \ge 0 ，我們可以把 |\mathbf{p-c}| 調至右側:

(\mathbf{p-c}) \cdot \mathbf{\hat{u}} > \mathbf{|p-c|} \cos \theta

基於這兩個可能預計算的參數，可把檢測函數的參數由 r 和 \theta 改成 r^2 和 \cos \theta 。結合這些改動:

// Basic: use squareR and cosTheta as parameters, defer sqrt(), eliminate division
bool IsPointInCircularSector1(
    float cx, float cy, float ux, float uy, float squaredR, float cosTheta,
    float px, float py)
{
    assert(cosTheta > -1 && cosTheta < 1);
    assert(squaredR > 0.0f);

    // D = P - C
    float dx = px - cx;
    float dy = py - cy;

    // |D|^2 = (dx^2 + dy^2)
    float squaredLength = dx * dx + dy * dy;

    // |D|^2 > r^2
    if (squaredLength > squaredR)
        return false;

    // |D|
    float length = sqrt(squaredLength);

    // D dot U > |D| cos(theta)
    return dx * ux + dy * uy > length * cosTheta;
}

注意，雖然比較長度時不用開平方，在夾角的檢測里還是要算一次開平方，但這也比必須算開平方好，因為第一個檢測失敗就不用算了。

優化版本2: 去除開平方

然而，我們有辦法去除開平方嗎?

這或許是這解答中最難的問題。我們不能簡單地把不等式(2.3)兩側平方:

(\mathbf{(p-c)} \cdot u)^2 > {|p-c|}^2 \cos^2 \theta \qquad \textrm{Wrong!}

因為兩側分別有可能是正數或負數。若把負數的一側平方后，就會把負號清除了，我們必須把比較符號反轉。

針對左右側分別可以是負數和非負數四種情況，可以歸納為:

若左側為非負數，右側為非負數，檢測 ((\mathbf{p-c}) \cdot \mathbf{\hat{u}})^2 > {|\mathbf{p-c}|}^2 \cos^2 \theta
若左側為為負數，右側為負數，檢測 ((\mathbf{p-c}) \cdot \mathbf{\hat{u}})^2 < {|\mathbf{p-c}|}^2 \cos^2 \theta
若左側為非負數，右側為負數，那么不等式(2.3)一定成立
若左側為負數，右側為非負數，那么不等式(2.3)一定不成立

解釋一下，在第2個情況中，先把兩側分別乘以-1，大於就變成小於，兩側就變成非負，可以平方。在實現中，若非第1個或第2個情況，只可能是第3個或第4個，所以只需分辨是3還是4，例如只檢測左側是否負數。

// Eliminate sqrt()
bool IsPointInCircularSector2(
    float cx, float cy, float ux, float uy, float squaredR, float cosTheta,
    float px, float py)
{
    assert(cosTheta > -1 && cosTheta < 1);
    assert(squaredR > 0.0f);

    // D = P - C
    float dx = px - cx;
    float dy = py - cy;

    // |D|^2 = (dx^2 + dy^2)
    float squaredLength = dx * dx + dy * dy;

    // |D|^2 > r^2
    if (squaredLength > squaredR)
        return false;

    // D dot U
    float DdotU = dx * ux + dy * uy;

    // D dot U > |D| cos(theta)
    // <=>
    // (D dot U)^2 > |D|^2 (cos(theta))^2 if D dot U >= 0 and cos(theta) >= 0
    // (D dot U)^2 < |D|^2 (cos(theta))^2 if D dot U <  0 and cos(theta) <  0
    // true                               if D dot U >= 0 and cos(theta) <  0
    // false                              if D dot U <  0 and cos(theta) >= 0
    if (DdotU >= 0 && cosTheta >= 0)
        return DdotU * DdotU > squaredLength * cosTheta * cosTheta;
    else if (DdotU < 0 && cosTheta < 0)
        return DdotU * DdotU < squaredLength * cosTheta * cosTheta;
    else
        return DdotU >= 0;
}

優化版本3: 減少比較

在一些架構上，浮點比較和分支是較慢的操作。由於我們只是要檢測數值是否負值，我們可以利用IEEE754浮點數的二進制特性：最高位代表符號，0為正數，1為負數。

我們可把符號用位掩碼(bit mask)分離出來，第1和第2個情況就變成比較左右側的符號是否相等。我們還可以用OR把符號加到兩側平方，意義上就是當負數時把比較符號調換，

而第3個情況也可以直接用符號作比較。

// Bit trick
bool IsPointInCircularSector3(
    float cx, float cy, float ux, float uy, float squaredR, float cosTheta,
    float px, float py)
{
    assert(cosTheta > -1 && cosTheta < 1);
    assert(squaredR > 0.0f);

    // D = P - C
    float dx = px - cx;
    float dy = py - cy;

    // |D|^2 = (dx^2 + dy^2)
    float squaredLength = dx * dx + dy * dy;

    // |D|^2 > r^2
    if (squaredLength > squaredR)
        return false;

    // D dot U
    float DdotU = dx * ux + dy * uy;

    // D dot U > |D| cos(theta)
    // <=>
    // (D dot U)^2 > |D|^2 (cos(theta))^2 if D dot U >= 0 and cos(theta) >= 0
    // (D dot U)^2 < |D|^2 (cos(theta))^2 if D dot U <  0 and cos(theta) <  0
    // true                               if D dot U >= 0 and cos(theta) <  0
    // false                              if D dot U <  0 and cos(theta) >= 0
    const unsigned cSignMask = 0x80000000;
    union {
        float f;
        unsigned u;
    }a, b, lhs, rhs;
    a.f = DdotU;
    b.f = cosTheta;
    unsigned asign = a.u & cSignMask;
    unsigned bsign = b.u & cSignMask;
    if (asign == bsign) {
        lhs.f = DdotU * DdotU;
        rhs.f = squaredLength * cosTheta * cosTheta;
        lhs.u |= asign;
        rhs.u |= asign;
        return lhs.f > rhs.f;
    }
    else
        return asign == 0;
}

上述所作的”優化”，其實只是從算法上著手，試圖消除一些可能開銷較高的操作。實際上不同的CPU架構的結果也會有所差異。所以我們必須實測才能得知，在某硬件、編譯器上，”優化”是否有效。

時間所限，我只做了一個簡單的性能測試(欠缺足夠邊界條件單元測試……):

const unsigned N = 1000;
const unsigned M = 100000;

template <bool naiveParam, typename TestFunc>
float Test(TestFunc f, float rmax = 2.0f) {
    unsigned count = 0;
    for (int i = 0; i < N; i++) {
        float cx = gCx[i];
        float cy = gCy[i];
        float ux = gUx[i];
        float uy = gUy[i];
        float r = naiveParam ? gR[i] : gSquaredR[i];
        float t = naiveParam ? gTheta[i] : gCosTheta[i];

        for (int j = 0; j < M; j++) {
            if (f(cx, cy, ux, uy, r, t, gPx[j], gPy[j]))
                count++;
        }
    }
    return (float)count / (N * M);
}

我先用偽隨機數生成一些測試數據，然后用兩個循環共執行1億次檢測。這比較合乎游戲應用的情況，每次通常是檢測M個角色是否在1個扇形之內，做N個迭代。

使用 VS2010，缺省設置加上/fp:fast，Win32/Release的結果:

NoOperation hit=0% time=0.376s
IsPointInCircularSector hit=30.531% time=3.964s

NoOperation hit=0% time=0.364s
IsPointInCircularSector1 hit=30.531% time=0.643s
IsPointInCircularSector2 hit=30.531% time=0.614s
IsPointInCircularSector3 hit=30.531% time=0.571s

NoOperation是指測試的函數只簡單傳回false，用來檢測函循環、讀取測試數據和函數調用的開銷。之后測的時間會減去這個開銷。

可以看到，在簡單優化后，預計算 \cos \theta 和 r^2，並延后sqrt()，性能大幅提升4倍以上。之后的優化(第2和第3個版本)，只能再優化少許。為了減少sqrt()，卻增加了乘數和比較。

有趣的是，若打開/arch:SSE2

NoOperation hit=0% time=0.453s
IsPointInCircularSector hit=30.531% time=2.645s

NoOperation hit=0% time=0.401s
IsPointInCircularSector1 hit=30.531% time=0.29s
IsPointInCircularSector2 hit=30.531% time=0.32s
IsPointInCircularSector3 hit=30.531% time=0.455s

所有性能都提升了，但優化版本反而一直倒退。主要原因應該是SSE含sqrtss指令，能快速完成開平方運算，第2個和第3個版本所做的優化反而增加了指令及分支，而第3個版本更需要把SSE寄存器的值儲存至普通寄在器做整數運算，造成更大損失。

優化版本4和5: SOA SIMD

由於編譯器自動化的SSE標量可以提高性，進一步的，可采用SOA(struct of array)布局進行以4組參數為單位的SIMD版本。使用了SSE/SSE2 intrinsic的實現，一個版本使用_mm_sqrt_ps()，一個版本去除了_mm_sqrt_ps():

// SSE2, SOA(struct of array) layout
// SSE2, SOA(struct of array) layout
__m128 IsPointInCircularSector4(
    __m128 cx, __m128 cy, __m128 ux, const __m128& uy, const __m128& squaredR, const __m128& cosTheta,
    const __m128& px, const __m128& py)
{
    // D = P - C
    __m128 dx = _mm_sub_ps(px, cx);
    __m128 dy = _mm_sub_ps(py, cy);

    // |D|^2 = (dx^2 + dy^2)
    __m128 squaredLength = _mm_add_ps(_mm_mul_ps(dx, dx), _mm_mul_ps(dy, dy));

    // |D|^2 < r^2
    __m128 lengthResult = _mm_cmpgt_ps(squaredR, squaredLength);

    // |D|
    __m128 length = _mm_sqrt_ps(squaredLength);

    // D dot U
    __m128 DdotU = _mm_add_ps(_mm_mul_ps(dx, ux), _mm_mul_ps(dy, uy));

    // D dot U > |D| cos(theta)
    __m128 angularResult = _mm_cmpgt_ps(DdotU, _mm_mul_ps(length, cosTheta));

    __m128 result = _mm_and_ps(lengthResult, angularResult);

    return result;
}

// SSE2, SOA(struct of array) layout without sqrt()
__m128 IsPointInCircularSector5(
    __m128 cx, __m128 cy, __m128 ux, const __m128& uy, const __m128& squaredR, const __m128& cosTheta,
    const __m128& px, const __m128& py)
{
    // D = P - C
    __m128 dx = _mm_sub_ps(px, cx);
    __m128 dy = _mm_sub_ps(py, cy);

    // |D|^2 = (dx^2 + dy^2)
    __m128 squaredLength = _mm_add_ps(_mm_mul_ps(dx, dx), _mm_mul_ps(dy, dy));

    // |D|^2 < r^2
    __m128 lengthResult = _mm_cmpgt_ps(squaredR, squaredLength);

    // D dot U
    __m128 DdotU = _mm_add_ps(_mm_mul_ps(dx, ux), _mm_mul_ps(dy, uy));

    // D dot U > |D| cos(theta)
    // <=>
    // (D dot U)^2 > |D|^2 (cos(theta))^2 if D dot U >= 0 and cos(theta) >= 0
    // (D dot U)^2 < |D|^2 (cos(theta))^2 if D dot U <  0 and cos(theta) <  0
    // true                               if D dot U >= 0 and cos(theta) <  0
    // false                              if D dot U <  0 and cos(theta) >= 0
    __m128 asign = _mm_and_ps(DdotU, cSignMask.v);
    __m128 bsign = _mm_and_ps(cosTheta, cSignMask.v);
    __m128 equalsign = _mm_castsi128_ps(_mm_cmpeq_epi32(_mm_castps_si128(asign), _mm_castps_si128(bsign)));
    
    __m128 lhs = _mm_or_ps(_mm_mul_ps(DdotU, DdotU), asign);
    __m128 rhs = _mm_or_ps(_mm_mul_ps(squaredLength, _mm_mul_ps(cosTheta, cosTheta)), asign);
    __m128 equalSignResult = _mm_cmpgt_ps(lhs, rhs);

    __m128 unequalSignResult = _mm_castsi128_ps(_mm_cmpeq_epi32(_mm_castps_si128(asign), _mm_setzero_si128()));

    __m128 result = _mm_and_ps(lengthResult, _mm_or_ps(
        _mm_and_ps(equalsign, equalSignResult),
        _mm_andnot_ps(equalsign, unequalSignResult)));

    return result;
}

因為要以SIMD並行執行，不能分支。所有分支都要計算，然后再混合(blend)在一起。測試結果:

IsPointInCircularSector4 hit=30.531% time=0.121s
IsPointInCircularSector5 hit=30.531% time=0.177s

SOA的好處是移植簡單，基本上只要把float改成__m128，然后用相應的intrinsic代替浮點四則運算便可。而且能盡用每指令4個數據的吞吐量(throughput)。如果用__m128表示二維矢量(或其他問題中的三維矢量)，許多時候會浪費了一些吞吐量(例如在點積運算上)。

第4版本大約比之前最快的版本快近兩倍，比沒有/arch:SSE2最慢的版本快32倍。第5版本雖去除了開平方，在這個架構下還是得不償失。

如果計上NoOperationSIMD的時間，SOA、對齊的內存訪問方式，以及減少迭代數目，性能提高的幅度就更大。在實際應用中，通常比較難直套用這種內存布局，可能需要做轉置(transpose)。對於開發高性能的子系統，在數據結構上可考慮直接儲存為SOA。

其他可行方法

除了本文所述的方法，還可以用其他方法:

把 \mathbf{p} 轉換至扇形的局部坐標 \mathbf{p}’ (如 \mathbf{\hat{u}} 為局部坐標的x軸)，然后再比較角度。這個是原來同事的做法。
利用二維的"叉積"(實際上叉積只定義於三維)去判斷 \mathbf{(p-v)} 是否在兩邊之內。但需要把u旋轉\pm \theta來計算兩邊。另外，邊與 \mathbf{(p-v)} 的夾角可能大於 \pi ，要小心處理。
查表。對於現在的機器來說，許多時候直接計算會比查表快，對於SOA來說，查表不能並行，更容易做成瓶頸。此外，查表要結合邏輯所需的精度，通用性會較弱。然言，對於某些應用場景，例如更復雜的運算、非常受限的處理單元(CPU/GPU/DSP等)，查表還是十分有用的思路。

或許你還會想到其他方法。也可嘗試用不同的扇形表示方式。

結語

這個看來非常簡單的問題，也可以作不同嘗試。在日常的工作中，可以改善的地方就更是數之不盡了。

源碼在放上了Github。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 iOS面試題之內存管理 2022iOS面試題總結之內存管理判斷一點是否在扇形內 promise知識點以及面試題 02技能點面試題匯總面試題-----判斷兩個無環單鏈表是否交叉，如果交叉返回交叉點 java常見面試題1：輸出1-100之內的所有質數部門優化和邀請碼檢測面試題——Java解法 Kafka相關面試題知識點總結 vue面試題，知識點匯總(有答案)