Houdini Vellum 之 Graph Color 的算法分析

本文轉載自查看原文 2020-07-12 10:39 648 Houdini/ graph color/ vellum/ OpenCl/ houdini

Houdini 18 的Vellum目前使用的是更高級的PBD: XPBD(eXtended Position Based Dynamics) 。XPBD使用了2^nd Order Integration ，傳統的PBD使用的1^ndOrder Intergration 。想要查看的話，如下圖（默認是2^nd Order Integration)。

Vellum Solver中因為要對約束重復的運算（Constraint Iterations 設置迭代的次數），需要比較高的效率。為了能夠讓並行運算，引入了Graph Corlor技術。Graph Color說白了，就是定義一個屬性，每個約束的屬性和與它相鄰的約束的屬性值不同。

比如用一個Grid，用Vellum Contraints 得到 distant contrainsts ，然后用Graph Color節點進行計算（結果默認寫入在一個整型color屬性上），最后用wrangle 顯示顏色（相同的color屬性顏色相同），如下圖，能夠看到，每一條邊（primitive）都與相鄰的邊的顏色不同。這樣讓顏色相同的contrainst在同一個Workset。不同顏色，不同的Workset。這樣在對constrait進行迭代計算時（是在opencl里面進行的），Opencl會根據Workset，同一個Workset才會一起平行計算。

當然，Vellum 中Graph Color的計算時在Vellum Solver內部，這里只是為了演示方便。

另個案例了解Vellum Solver 中的 Graph Color

新建個一個Grid，選中兩邊的點，給Pin Constraints，把兩邊定住。加 distant constraints ( Vellum Constraint 選 Distant along edges)，設置如下圖。注意Stiffness設置為最大(10*1e+10)，這種材質現實中是沒有彈性的，但是我們又把Rest Length Scale設置為0.5，Vellum Solver會迫使它壓縮（現實中這種材質的布料是不存在的）。然后把Vellum Solver的 Smooth Iteration 設置為0（這步很重要）。結算。

結算后發現，有的邊的長度縮短大致0.5倍，有的反而被拉長了。用Graph Color查看，發現拉伸邊長的是同一個顏色，即作為同一個WorkSet在opencl中並行計算的。如下圖：

這是因為黃色的邊做一個同一個Workset首先在opencl中運算，紅色和藍色在黃色后面分別進行運算，這樣導致錯誤都轉嫁到了第一次計算的邊上（這里的錯誤是指，設置了Rest Length Scale為0.5倍，紅色和藍色都正確壓縮了，黃色反而拉長了，這是錯誤的，不是我們想要的）。

Graph Color 的算法分析

下面分析下Graph Color節點里面的算法。新建一個Grid，Graph Color， Color（random from attributes）。

進入Graph Color內部，主要分析最左邊這路（Connectivity 為Primtive），Contraints運算的時候也是以Primtive為單位。

（1）create_color_attrib

創建 color (Integer)primitive屬性

（2）find_npts

把當前primitive總共有多少鄰居記錄在屬性 @__npts屬性上。

i@__npts = len(primpoints(0, @primnum));

（3）color_high_valence （Run Over Detail ）

這步主要是把鄰居多的primitive篩選出來（Graph Color 的參數Max Valence設置閾值，默認20），單獨計算。因為進入opencl進行並行計算時，需要把鄰居primitive寫入在一個array數組里，如果相鄰的鄰居太多，會造成存儲和計算時間的極大浪費。挑其中比較重要的代碼部分進行分析：

下面的代碼把篩選出來的符合條件的primitve（@__npts>maxvalence）存入在prims數組里面。　

int maxvalence = chi("maxvalence");
string grp = sprintf("\@__npts>=%d", maxvalence);
int prims[] = expandprimgroup(0, grp);

　　然后對數組進行排序。npts數組記錄prims數組中對應的prim的點的數量乘以-1 。argsort對npts數組排序，它返回的是一個indice的數組。

　　indice：比如一個整型數組 int a[] = {0,4,2} 。a[1] = 4 ，中括號里面的1就是indice, 也就是數組下標。

　　對於一個數組int b[] = {-4，-10，-2，-8} 進行 argsort排序會得到一個數組下標的數組 {1,3,0,2}，最后用reorder函數得到最終的排序數組 {b[1],b[3],b[0],b[2]} 即{ -10，-8，-4，-2}。（argsort 和reorder一般搭檔起來用，來對數組排序）。

foreach(int i; int prim; prims)
{
    npts[i] = -len(primpoints(0, prim));
}
prims = reorder(prims, argsort(npts));

　　最終得到一個數組，根據primitive的點的數量由大到小排列。　　

　　核心的部分在下面這個foreeach部分

foreach(int i; int prim; prims)
{
    int useNewColor = 1;     // 聲明新變量，是否使用新顏色，默認1，即使用新染色
    int pts[] = primpoints(0, prim); 
    for(int j = 0; j < ncolors; j++)    // ncolors記錄當前不同color的數量， 遍歷所有color
    {
        int useColor = 1;       // 聲明新變量，能否用當前顏色，默認1， 可以使用
        foreach(int pt; pts)    // 遍歷當前primitive的每個點
        {
            if (mapping[j * npt + pt] != 0)   //mapping是一個數組，npts是所有點的數量（不止當前primitive）
            {                                 //比如npts= 14 ，當前primitive2個點，對於j=0,即color值為0 ，mapping[0*14+0]~mappint[0*14+(14-1)] 
                useColor = 0;        //對應的所有的點使用的color 0的情況，mapping值若為1即不等於0， 說明當前color值被鄰居占用，不能使用
                break;               // 這是把useColor賦值0， 即當前color值不可用
            }
        }
        if (useColor)                 //如果當前顏色可用
        {
            colors[i] = j;           // 把當前primitive 的color值記錄到colors數組里
            foreach(int pt; pts)　　　// 對於當前primitive的所有點pt,color為j,對應mapping[j*npt+pt],設置為1，表示被當前primitive占用

            {
                mapping[j * npt + pt] = 1;  // 
            }
            useNewColor = 0;          // 這種情況就不使用新顏色了
            break;
        }
    }
    if (useNewColor)         //說明沒有可用的color,需要新增加一個color
    {
        colors[i] = ncolors;  // 新的color值存到colors數組里，colors[i]是prims[i]對應的顏色值
        ncolors++;            // 總顏色數加1
        resize(mapping, ncolors * npt);   // mapping數組resize,空間比之前多了npt個
        foreach(int pt; pts)
        {
            mapping[colors[i] * npt + pt] = 1;  //對於新增的color[i],當前primitive的點pt, 把對應的mappint[colors[i]*npt+pt]賦值1， 表示被占用
        }
    }
}

這個節點完整的代碼如下：

 1 // Handle high valence prims in a serial pre-pass,
 2 // else the find_neighbors wrangle below ends up creating
 3 // extremely large neighbors arrays, which are expensive
 4 // in memory and time.
 5 int maxvalence = chi("maxvalence");
 6 string grp = sprintf("\@__npts>=%d", maxvalence);
 7 int prims[] = expandprimgroup(0, grp);
 8 int nprim = len(prims);
 9 if (nprim == 0)
10     return;
11 
12 // Create mapping for each point to each color and
13 // prims to each color.
14 int npt = npoints(0);
15 int mapping[];
16 int colors[];
17 resize(colors, nprim);
18 int ncolors = 0;
19 
20 
21 // Reverse sort by num pts in constraint
22 // so constraints with more points are first.
23 int npts[];
24 resize(npts, nprim);
25 foreach(int i; int prim; prims)
26 {
27     npts[i] = -len(primpoints(0, prim));
28 }
29 prims = reorder(prims, argsort(npts));
30 
31 foreach(int i; int prim; prims)
32 {
33     int useNewColor = 1;
34     int pts[] = primpoints(0, prim);
35     for(int j = 0; j < ncolors; j++)
36     {
37         // See if we can use this color.
38         int useColor = 1;
39         foreach(int pt; pts)
40         {
41             if (mapping[j * npt + pt] != 0)
42             {
43                 useColor = 0;
44                 break;
45             }
46         }
47 
48         if (useColor)
49         {
50             colors[i] = j;
51             foreach(int pt; pts)
52             {
53                 mapping[j * npt + pt] = 1;
54             }
55             useNewColor = 0;
56             break;
57         }
58     }
59     if (useNewColor)
60     {
61         colors[i] = ncolors;
62         ncolors++;
63         resize(mapping, ncolors * npt);
64         foreach(int pt; pts)
65         {
66             mapping[colors[i] * npt + pt] = 1;
67         }
68     }
69 }
70 
71 // Set prim color attribute.  Colored prims will be skipped
72 // by the OpenCL pass.
73 foreach(int i; int prim; prims)
74     setprimattrib(geoself(), chs("colorname"), prim, colors[i]);

View Code

（4）copy_color_to_newcolor

把color屬性復制一份，新屬性名：__newcolor。　　這是給opencl用。

（5）find_neighbors

把所有與當前primitiv相鄰的primitive記錄在數組@__neighbours數組里。

int neighbors[];
int pts[] = primpoints(0, @primnum);
foreach(int pt; pts)
    foreach(int n; pointprims(0, pt))
        if (n != @primnum)
            append(neighbors, n);
i[]@__neighbors = neighbors;

(6) create_offsets_sizes

設置opencl 的 workset ，@__sizes是所有primitive的數量，@__offsets={0}。即只有一個Workset，這個workset包含所有的primitves。

i[]@__offsets = {0};
i[]@__sizes = array(nprimitives(0));

　　可能會有人產生疑問，就一個workset ？這不和不設置workset一樣的嗎？所有的primitive都並行執行。這是因為在Opencl中，比如設置了迭代次數（Iteration）為1000，當我們迭代到500時，發現已經達到目的了，所有primitive與和它相鄰的primitive 的color值都不同，那么此時可以吧@__sizes設置為0，表明workset的大小為0，不需要繼續迭代計算了，這樣避免了資源的浪費。

（7）assign_colors_sizes

opencl部分，為每個primitive尋找color值。使用的是Gebremedhin-Manne speculative coloring 算法。每一次迭代都分為兩部分 （1）尋找color (2) 解決沖突 ，即當某個primitive與它的相鄰primitive是相同的color時，需要重新尋找color

　　（1）尋找color

　　#define MAXFORBID 18446744073709551615ul 這里定義MAXFORBID為64位二進制能表示的最大值，二進制表示為 11111111 -（中間48個1）- 11111111

    if (color[idx] >= 0)    // 說明已經找到color了， 比如前面第三部，為鄰居多的primitive提前設置了color,這里就不在重新尋找color了
        return;

    int nidx = neighbors_index[idx];    //nidx是指針，指向與當前primitive的相鄰的primitives的第一個primitive
    int nlast = neighbors_index[idx+1]; // nlast也是指針，但實際上它指向與global id 為idx+1的 primitive相鄰的第一個primitive
    int c = -1, offset = 0;             // 因為指針是連續存儲的，nlast-1 也就指向當前primitive的最后一個primitive
    // Loop until colored.
    while(c < 0)                    // c 小於0說明沒找到color ,繼續
    {
        // Bit flag to store seen neighbor colors.
        ulong forbidden = 0;　　　// 64位（二進制）forbidden，每一位表示一個color，詳見下面注釋1
        for(int i = nidx; i < nlast; i++)   //遍歷所有鄰居primitive
        {
            // Flag any neighbor colors present in the current bitrange.
            int n = neighbors[i];
            int nc = color[n] - offset;   //color[n]是當前鄰居的color值
            if (nc >= 0 && nc < FORBIDBITS) // 詳見注釋2
                forbidden |= (1ul << nc);   //見注釋2
        }
        // Check if there's an open color in the current bitrange.
        if (forbidden ^ MAXFORBID)  //forbidden 和 64位全1做二進制與運算
        {
            ulong x = forbidden;     //forbidden值賦值給x
            c = offset;
            // Find position of first zero bit.
            x = (~x) & (x + 1);   // 見注釋3， 找最小的color
            // Color is log2(x)
            while(x > 1)     //說明找到顏色
            {
                x /= 2;      //為方便假設x為八位二進制0000 0100 ,表示成10進制數為2²，這里要循環2次，最終的得到c值為2
                c++;          
            }
        }
        else
        {
            // Otherwise we need to try again with the next range of colors.
            offset += FORBIDBITS;   // 沒找到，說明64位不夠用，再加64位
        }
    }
    // Record speculative coloring.
    newcolor[idx] = c;

注釋1：

為了表示方便，我們假設forbidden是八位的二進制。那么forbidden值為0000 0101 表示第0個color(最右邊1）和第二個color 已經被占用。

注釋2：

color[n]是當前鄰居的color值，假設為3 。假設我們64個color值夠用，那么此時對應offset = 0。為了方便，假設forbidden值為00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010101。

nc = color[n] - offset = 3

1ul << nc ：這是二進制左進位， 1ul 值為 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001，1ul<<3，即相左進三位：

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001000

forbidden |= (1ul << nc)：二進制或運算

forbidden : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010101

(1ul << nc) : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001000

或運算結果： 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011101

把或運算的結果復制給forbidden ，表示第0個，第二個，第三個，第四個 color已經被鄰居占用了，不能使用了。forbidden中存在0，表示還有可以使用的color

注釋3：

這步利用了二進制運算的一個性質。能很快的找到最小的color。

比如當前 x 值為 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011101

~x 即二進制取反： 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11100010

(x + 1)即二進制加1： 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011110

(~x) & (x + 1) 二進制與運算： 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000010

二進制與運算的結果表示第1個color可以使用（從右往左數，最右邊是第0個）

　　（2）解決沖突

有部分是與（1）相同的，挑重點分析一下。

if (nc == c) // 如果當前primitive與鄰居color相同
{
    int nnpt = npts[n];
    if (nnpt > npt || (nnpt == npt && n < idx)) // 設定去改點數少的primitive,或者 idx大的primitive。 
    {
         conflict = 1;     // 那么如果鄰居的primitive包含的點數比當前的primitive多，或者點數一樣多，但是當前primitive
         break;            // 的idx大， 那設定當前primitive的conflicit為1 ，改當前primitive的color值。
     }
}

完整的代碼如下：

  1 #define FORBIDBITS 64
  2 #define MAXFORBID 18446744073709551615ul
  3 
  4 // Gebremedhin-Manne speculative coloring.
  5 kernel void assigncolors( 
  6                  int worksets_begin, 
  7                  int worksets_length, 
  8                  int neighbors_length, 
  9                  global int * neighbors_index, 
 10                   global int * neighbors ,
 11                  int npts_length, 
 12                  global int * npts ,
 13                  int color_length, 
 14                  global int * color ,
 15                  int newcolor_length, 
 16                  global int * newcolor,
 17                  int sizes_length, 
 18                  global int * sizes_index, 
 19                  global int * sizes                 
 20 )
 21 {
 22     int idx = get_global_id(0);
 23     if (idx >= worksets_length)
 24         return;
 25 
 26     // Set sizes to zero, resolveConflicts will reset
 27     // if there's more work to do.
 28     if (idx == 0)
 29         *sizes = 0;
 30 
 31     // Nothing to do if already colored.
 32     if (color[idx] >= 0)
 33         return;
 34 
 35     int nidx = neighbors_index[idx];
 36     int nlast = neighbors_index[idx+1];
 37     int c = -1, offset = 0;
 38     // Loop until colored.
 39     while(c < 0)
 40     {
 41         // Bit flag to store seen neighbor colors.
 42         ulong forbidden = 0;
 43         for(int i = nidx; i < nlast; i++)
 44         {
 45             // Flag any neighbor colors present in the current bitrange.
 46             int n = neighbors[i];
 47             int nc = color[n] - offset;
 48             if (nc >= 0 && nc < FORBIDBITS)
 49                 forbidden |= (1ul << nc);
 50         }
 51         // Check if there's an open color in the current bitrange.
 52         if (forbidden ^ MAXFORBID)
 53         {
 54             ulong x = forbidden;
 55             c = offset;
 56             // Find position of first zero bit.
 57             x = (~x) & (x + 1);
 58             // Color is log2(x)
 59             while(x > 1)
 60             {
 61                 x /= 2;
 62                 c++;
 63             }
 64         }
 65         else
 66         {
 67             // Otherwise we need to try again with the next range of colors.
 68             offset += FORBIDBITS;
 69         }
 70     }
 71     // Record speculative coloring.
 72     newcolor[idx] = c;
 73 }
 74 
 75 kernel void resolveconflicts( 
 76                  int worksets_begin, 
 77                  int worksets_length, 
 78                  int neighbors_length, 
 79                  global int * neighbors_index, 
 80                   global int * neighbors ,
 81                  int npts_length, 
 82                  global int * npts ,
 83                  int color_length, 
 84                  global int * color ,
 85                  int newcolor_length, 
 86                  global int * newcolor,
 87                  int sizes_length, 
 88                  global int * sizes_index, 
 89                   global int * sizes 
 90 )
 91 {
 92     int idx = get_global_id(0);
 93     if (idx >= worksets_length)
 94         return;
 95     // Nothing to do if already colored.
 96     if (color[idx] >= 0)
 97         return;
 98 
 99     int nidx = neighbors_index[idx];
100     int nlast = neighbors_index[idx+1];
101     // Load speculative color.
102     int c = newcolor[idx];
103     int conflict = 0;
104     int npt = npts[idx];
105     for(int i = nidx; i < nlast; i++)
106     {
107         int n = neighbors[i];
108         // Load neighbor's speculative color, which is also
109         // its final color if already accepted.
110         int nc = newcolor[n];
111 
112         // Check for conflict.
113         if (nc == c)
114         {
115             int nnpt = npts[n];
116             // Resolution gives preference to primitives with more points, 
117             // otherwise the one that comes first.
118             // (NOTE: We color in fewer iterations if we prioritize by number
119             // of graph neighbors, but then assuming we process prims by color,
120             // we're usually farther away from original point order leading to many
121             // cache misses and slower downstream processing.)
122             if (nnpt > npt || 
123                 (nnpt == npt && n < idx))
124             {
125                 conflict = 1;
126                 break;
127             }
128         }
129     }
130     
131     // If there's a conflict then reset sizes for more work,
132     // otherewise accept speculative color.
133     if (conflict)
134        *sizes = worksets_length;
135     else
136         color[idx] = c;
137 }

View Code

（7）sort_prims_by_color

根據顏色為primnum排序，為了下面的分workset，做准備（因此注意，vellum中Contraint primnum有可能會是要重新排序過的）

（8）create_workset_attrs

這步比較簡單，根據生成兩個array數組, @__size ,@__offset。 @__size記錄每個color數值的primitive數量，每個color一個workset,@__offset記錄每個workset的起始位置。

int @__offsets[];
int @__sizes[];
int offset = 0;
//int @coloridx[];
string colorname = chs("colorname");
int ncolors = nuniqueval(0, "primitive", colorname);
resize(@__offsets, ncolors);
resize(@__sizes, ncolors);
for(int i=0; i < ncolors; i++)
{
    int c = uniqueval(0, "primitive", colorname, i);
    int n = findattribvalcount(0, "primitive", colorname, c);
    @__offsets[i] = offset;
    @__sizes[i] = n;
//    append(@coloridx, expandprimgroup(0, sprintf("\@color=%d", c)));
    offset += n;
}

View Code

（9）rename_workset_attrs

重新命名@__size , @__offset屬性。名字可以在Graph Color上設置，如下圖：

@__size ---------> @workset_lengths

@__offset----------> @workset_begins

上面是Graph Color 的算法，比較意外的是里面利用二進制的運算，來加大運算的效率。碼字不易，轉載請標明出處。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 算法與算法分析 offsetParent算法分析 TimSort算法分析眾數的算法分析共識算法分析 join算法分析算法分析 popcount 算法分析算法分析-漸進分析 AES算法分析與實現