【圖形學手記】Inverse Transform Sampling 逆轉換抽樣

本文轉載自查看原文 2019-05-22 19:27 479 圖形學

需求：

我們通過調查，得知大多數人在20歲左右初戀，以20歲為基准，以隨機變量X表示早於或晚於該時間的年數，為了簡單，假設X值域為[-5,5]，並且PDF(X)是一個正態分布函數（當然可以為任意分布，這里具體化一個熟悉的，方便后面驗證）

現在要求，做出一個計算機模擬程序，該程序能算出一系列X的值，並且這些值是符合觀測所得的PDF(X)的

困難：

1.我們不可能有現成的函數，能夠直接生成一系列值並且符合該分布。（雖然例子中指定了正態分布，但現實情況是任何分布都有可能，所以即使C++支持一系列分布函數，也不能解決本質問題）

2.難以求得一個函數的逆函數（至少我不知道怎么求。。至於為什么要求這個，下面會解釋）

思路

1.得到一個均勻分布的，值域為[0,1]之間的隨機函數並不困難，linux環境下有drand48()可用，windows下可借助於rand()實現；假設隨機數為 r

2.可由PDF(X)得到CDF(X)

3.CDF(X)的值域是[0,1]，CDF^-1(X)的定義域是[0,1]，如果我們可以得到CDF的逆函數，就可以直接通過該函數得到想要的結果

3.將CDF的定義域，也就是[-5,5]，分成32段，定義為一個數組，假設數組名為cdfs，得到cdfs長度為33 ，其中cdfs[0] = 0，cdfs[32]=1

4.找到cdfs(i)，使得cdfs(i）是升序排列后找到的，大於等於r的cdfs數組中的最小值，如果沒有找到的話，就返回cdfs(32)，也就是1

5.假設找到的i = 15，此時情況如下圖

顯然我們只需要知道AC的長度即可得到隨機變量X的取值，假設該值為x，則 x = A點的橫坐標 + AC的長度

由於A點是第15個點，所以 x = -5 + (10/32)*15 + AC

而由於 AC / AD = RC / BD，所以 AC = RC * AD / BD

其中，RC = r - cdfs[15] ， AD = B點橫坐標 - A點橫坐標，也就是一個dx的長度，即 (10/32)，BD = B點縱坐標 - A點縱坐標 = cdfs[16] - cdfs[15]

這樣就得出x了

大體思路就是如此，以下是簡陋的代碼實現，C++基本屬於不會用，后面會持續改進這個例子

#include <iostream>
#include <random>
#define M_PI 3.14159265358979323846

using namespace std;

// 產生0~1之間的隨機數
double randx() {
    random_device rd;
    default_random_engine gen{ rd() };
    uniform_real_distribution<> distr;
    return distr(gen);
}

// standard normal distribution function
float pdf(const float& x)
{
    return 1 / sqrtf(2 * M_PI) * exp(-x * x * 0.5);
}

// create CDF
// nbins 將cdf分成多少段
// minBound 下限
// maxBound 上限
float* cdfCreator(const int nbins, float minBound, float maxBound) {

    // cdf容量是要比nbin多1的;試想如果nbins=2,也就是說把[-5,5]分成2段，結果對CDF來說形成了3個節點，分別是-5,0,5
    //float cdf[nbins + 1], dx = (maxBound - minBound) / nbins, sum = 0;
    float* cdf = new float[nbins + 1];
    float dx = (maxBound - minBound) / nbins;
    cdf[0] = 0;
    cdf[nbins] = 1;
    for (int n = 1; n < nbins; n++) {
        float x = minBound + (maxBound - minBound) * (n / (float)(nbins));
        // 計算dx這一小塊對應的概率
        float pdf_x = pdf(x) * dx;
        cdf[n] = cdf[n - 1] + pdf_x;
    }

    // 調試信息
    cout << "[DEBUG] cdf: ";
    for (int i = 0; i < 33; i++) {
        cout << "cdf[" << i << "]: " << cdf[i] << endl;
    }

    return cdf;
}

float sample(float* cdf, const uint32_t & nbins, const float& minBound, const float& maxBound)
{
    float r = randx();
    float dx = (maxBound - minBound) / nbins;

    // 從cdf開始，到cdf+nbins+1為止，找到第一個大於或等於r的值，在這個例子中，cdf+nbins=32，也就是cdf[32]，即cdf數組的最后一個元素
    // 得到的ptr指向從cdf[0]到cdf[32]中隨機的一個元素的地址
    // 由於后買計算bd時用到了off+1，為了不越界，所以off最大只能為31，所以ptr最大能取第cdf[31]的地址，所以下面要-1
    float* ptr = std::lower_bound(cdf, cdf + nbins - 1, r);

    // 由於lower_bound的性質，cdf[ptr-cdf]永遠是大於r的，為了符合我們上文的構思，這里就進行減一處理，對應於上圖，off是15而不是16
    int off = ptr - cdf - 1;

    float rc = r - cdf[off];
    float ad = dx;
    float bd = cdf[off + 1] - cdf[off];
    float ac = rc * ad / bd;
    float x = minBound + dx * off + ac;
    return x;
}

int main(int argc, char** argv)
{
    // create CDF
    const int nbins = 32;
    float minBound = -5, maxBound = 5;
    float* cdf = cdfCreator(nbins, minBound, maxBound);

    // our simulation
    int numSims = 100000; //樣本容量10萬
    const int numBins = 100; // to collect data on our sim 
    int bins[numBins]; // to collect data on our sim 
    memset(bins, 0x0, sizeof(int) * numBins); // set all the bins to 0 ，將bins的前sizeof(int)*numBins個字節全部設置值為0x0

    for (int i = 0; i < numSims; i++) { // 抽樣十萬
        float x = sample(cdf, nbins, minBound, maxBound); // random var between -5 and 5 

        //計算x落在了我們自己定義的哪個分段里
        int whichBin = (int)((numBins - 1) * (x - minBound) / (maxBound - minBound));
        bins[whichBin]++;
    }

    // 輸出bins看看對不對
    for (int i = 0; i < numBins; i++) {
        /*float r = bins[i] / (float)numSims;
        printf("%f %f\n", 5 * (2 * (i / (float)(numBins)) - 1), r);
        cout << "[debug]i: " << i << endl;*/
        cout << "bins[" << i << "]: " << bins[i] << endl;
    }

    delete cdf;
    return 0;
}

運行結果：像是那么回事

bins[0]: 0
bins[1]: 0
bins[2]: 1
bins[3]: 0
bins[4]: 0
bins[5]: 0
bins[6]: 2
bins[7]: 0
bins[8]: 1
bins[9]: 2
bins[10]: 5
bins[11]: 7
bins[12]: 6
bins[13]: 7
bins[14]: 13
bins[15]: 28
bins[16]: 34
bins[17]: 34
bins[18]: 52
bins[19]: 95
bins[20]: 92
bins[21]: 92
bins[22]: 166
bins[23]: 163
bins[24]: 239
bins[25]: 391
bins[26]: 366
bins[27]: 426
bins[28]: 705
bins[29]: 724
bins[30]: 719
bins[31]: 1200
bins[32]: 1150
bins[33]: 1191
bins[34]: 1793
bins[35]: 1861
bins[36]: 1916
bins[37]: 2504
bins[38]: 2557
bins[39]: 2630
bins[40]: 3243
bins[41]: 3278
bins[42]: 3316
bins[43]: 3811
bins[44]: 3896
bins[45]: 3857
bins[46]: 3811
bins[47]: 4013
bins[48]: 4150
bins[49]: 3827
bins[50]: 3830
bins[51]: 3803
bins[52]: 3542
bins[53]: 3287
bins[54]: 3315
bins[55]: 3029
bins[56]: 2618
bins[57]: 2590
bins[58]: 2474
bins[59]: 1882
bins[60]: 1832
bins[61]: 1735
bins[62]: 1171
bins[63]: 1176
bins[64]: 1191
bins[65]: 722
bins[66]: 705
bins[67]: 665
bins[68]: 379
bins[69]: 321
bins[70]: 382
bins[71]: 213
bins[72]: 185
bins[73]: 164
bins[74]: 97
bins[75]: 88
bins[76]: 75
bins[77]: 39
bins[78]: 30
bins[79]: 33
bins[80]: 21
bins[81]: 7
bins[82]: 6
bins[83]: 7
bins[84]: 3
bins[85]: 3
bins[86]: 3
bins[87]: 1
bins[88]: 0
bins[89]: 2
bins[90]: 0
bins[91]: 0
bins[92]: 0
bins[93]: 0
bins[94]: 0
bins[95]: 0
bins[96]: 0
bins[97]: 0
bins[98]: 0
bins[99]: 0

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【二】圖形學重要概念計算機圖形學——圓的掃描轉換（基本光柵圖形算法） CG-光柵圖形學多邊形掃描轉換算法-學習筆記圖形學之圖像信號處理 shader入門，圖形學心得什么是計算機圖形學？游戲-圖形學學習路線圖形學變換矩陣的分解圖形學_opengl紋理映射 GAMES101 作業0 圖形學