機器學習 —— 概率圖模型（Homework: Exact Inference）

本文轉載自查看原文 2016-04-15 20:21 1751 機器學習與概率

　　在前三周的作業中，我構造了概率圖模型並調用第三方的求解器對器進行了求解，最終獲得了每個隨機變量的分布（有向圖），最大后驗分布（雙向圖）。本周作業的主要內容就是自行編寫概率圖模型的求解器。實際上，從根本上來說求解器並不是必要的。其作用只是求取邊緣分布或者MAP，在得到聯合CPD后，尋找聯合CPD的最大值即可獲得MAP，對每個變量進行邊緣分布求取即可獲得邊緣分布。但是，這種簡單粗暴的方法效率極其低下，對於MAP求取而言，每次得到新的evidance時都要重新搜索CPD，對於單個變量分布而言，更是對每個變量都要反復進行整體邊緣化。以一個長度為6字母的單詞為例，聯合CPD有着多達26^6個數據，反復操作會浪費大量的計算資源。

1、團樹算法初始化

　　團樹算法背后的思路是分而治之。對於一組隨機變量ABCDEFG,如果A和其他變量之間是獨立的，那么無論是求邊緣分布還是MAP都可以將A單獨考慮。如果ABC聯系比較緊密，CDE聯系比較緊密，那么如果兩個團關於C的邊緣分布是相同的，則我們沒有必要將ABCDE全部乘在一起再來分別求各個變量的邊緣分布。因為反過來想，乘的時候也只是把對應的C乘起來，如果C的邊緣分布相同，在相乘的時候其實兩個團之間並沒有引入其他信息，此時乘法不會對ABDE的邊緣分布產生影響。團樹算法的數學過程和Variable Elimination是相同的。

　　PGM在計算機中的表達是factorLists，factor的var(i)，var表示節點連接關系。val描述了factor中var的關系。cliqueTree其實是一種特殊的factorLists，它的var是clique，表示一堆聚類的var。它的val表示的還是var之間的關系。只不過此時var之間的連接不復存在了。所以clique由兩個變量組成：1、cliqueTree 2、edges.

　　團樹算法的初始化可以分為兩個過程：1、將變量抱團；2、獲取團的初始勢；

　　變量抱團是一個玄學過程，因為有很多不同的抱法，而且還都是對的。比較常見的是最小邊，最小割等...其實如果是人來判斷很容易就能得到結果，但是使用計算機算法則要費一些功夫了。不過這不涉及我們對團樹算法的理解，所以Koller教授代勞了。

　　團的初始勢表示團里變量之間的關系。其算法如下，需要注意的是不能重復使用factor.因為一個factor表達了一種關系，如果兩個團里都有同一個factor，那么就是...這個事情。。。你幫他重復一遍。。。等於你也有責任的，曉得吧？

 1 %COMPUTEINITIALPOTENTIALS Sets up the cliques in the clique tree that is
 2 %passed in as a parameter.
 3 %
 4 %   P = COMPUTEINITIALPOTENTIALS(C) Takes the clique tree skeleton C which is a
 5 %   struct with three fields:
 6 %   - nodes: cell array representing the cliques in the tree.
 7 %   - edges: represents the adjacency matrix of the tree.
 8 %   - factorList: represents the list of factors that were used to build
 9 %   the tree. 
10 %   
11 %   It returns the standard form of a clique tree P that we will use through 
12 %   the rest of the assigment. P is struct with two fields:
13 %   - cliqueList: represents an array of cliques with appropriate factors 
14 %   from factorList assigned to each clique. Where the .val of each clique
15 %   is initialized to the initial potential of that clique.
16 %   - edges: represents the adjacency matrix of the tree. 
17 %
18 % Copyright (C) Daphne Koller, Stanford University, 2012
19 
20 
21 
22 function P = ComputeInitialPotentials(C)
23 Input = C;
24 % number of cliques
25 N = length(Input.nodes);
26 
27 % initialize cluster potentials 
28 P.cliqueList = repmat(struct('var', [], 'card', [], 'val', []), N, 1);
29 P.edges = zeros(N);
30 
31 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
32 % YOUR CODE HERE
33 %
34 % First, compute an assignment of factors from factorList to cliques. 
35 % Then use that assignment to initialize the cliques in cliqueList to 
36 % their initial potentials. 
37 
38 % C.nodes is a list of cliques.
39 % So in your code, you should start with: P.cliqueList(i).var = C.nodes{i};
40 % Print out C to get a better understanding of its structure.
41 %
42 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
43 % N_factors = length(C.factorList);
44 for i = 1:N
45     k = 1;
46     clear clique_
47      N_factors = length(Input.factorList);
48     for j = 1:N_factors
49         if min(ismember(Input.factorList(j).var,Input.nodes{i}))
50             clique_(k) = Input.factorList(j);
51             k = k+1;
52             Input.factorList(j) =struct('var', [], 'card', [], 'val', []);
53         end
54     end
55     Joint_Dis_cliq = ComputeJointDistribution(clique_);
56     Joint_Dis_cliq_std = StandardizeFactors(Joint_Dis_cliq);
57     P.cliqueList(i) = Joint_Dis_cliq_std;
58 end
59 P.edges = Input.edges;
60 end

View Code

2、團樹的校准

　　繼續之前的例子，ABC聯系比較緊密，CDE聯系比較緊密，所以抱成了兩個團。如果其關於C的邊緣分布相同，那么我們則可以在直接對兩個團求ABDE的邊緣分布，而不用乘起來了。然而令人悲傷的是現實中往往C的邊緣分布是不同的。這時就需要對團樹進行校准，希望經過“校准”這個操作后，兩邊關於C達成了一致意見。顯然，一棵校准后的團樹求任意一個變量的邊緣分布都是方便的，只要對很小規模的聯合分布進行邊際化就行。

　　要使得兩邊關於C的意見達成一致，最簡單的方法就是把C在“A團”中的邊緣分布乘以"E團”的勢。反過來再把A在“E團”中的邊緣分布乘以A團的勢。那么此時C在兩個團中的邊緣分布就完全一樣了 all = margin(C,A)*margin(C,E)。此即為團樹校准的朴素想法。在數學上，團樹的校准依然來自VE算法。讓AB領盒飯后，C繼續參加下一輪的VE。AB領盒飯剩下的C就是C在A團中的邊緣分布。

　　團樹校准的關鍵是知道消息傳播的順序。消息一般先由葉向根傳遞，再由根向葉傳遞。並且，一個團在得到其所有鄰團的消息之前，不能向下一個團傳遞消息。消息傳遞順序獲取算法如下：

 1 %COMPUTEINITIALPOTENTIALS Sets up the cliques in the clique tree that is
 2 %passed in as a parameter.
 3 %
 4 %   P = COMPUTEINITIALPOTENTIALS(C) Takes the clique tree skeleton C which is a
 5 %   struct with three fields:
 6 %   - nodes: cell array representing the cliques in the tree.
 7 %   - edges: represents the adjacency matrix of the tree.
 8 %   - factorList: represents the list of factors that were used to build
 9 %   the tree. 
10 %   
11 %   It returns the standard form of a clique tree P that we will use through 
12 %   the rest of the assigment. P is struct with two fields:
13 %   - cliqueList: represents an array of cliques with appropriate factors 
14 %   from factorList assigned to each clique. Where the .val of each clique
15 %   is initialized to the initial potential of that clique.
16 %   - edges: represents the adjacency matrix of the tree. 
17 %
18 % Copyright (C) Daphne Koller, Stanford University, 2012
19 
20 
21 
22 function P = ComputeInitialPotentials(C)
23 Input = C;
24 % number of cliques
25 N = length(Input.nodes);
26 
27 % initialize cluster potentials 
28 P.cliqueList = repmat(struct('var', [], 'card', [], 'val', []), N, 1);
29 P.edges = zeros(N);
30 
31 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
32 % YOUR CODE HERE
33 %
34 % First, compute an assignment of factors from factorList to cliques. 
35 % Then use that assignment to initialize the cliques in cliqueList to 
36 % their initial potentials. 
37 
38 % C.nodes is a list of cliques.
39 % So in your code, you should start with: P.cliqueList(i).var = C.nodes{i};
40 % Print out C to get a better understanding of its structure.
41 %
42 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
43 % N_factors = length(C.factorList);
44 for i = 1:N
45     k = 1;
46     clear clique_
47      N_factors = length(Input.factorList);
48     for j = 1:N_factors
49         if min(ismember(Input.factorList(j).var,Input.nodes{i}))
50             clique_(k) = Input.factorList(j);
51             k = k+1;
52             Input.factorList(j) =struct('var', [], 'card', [], 'val', []);
53         end
54     end
55     Joint_Dis_cliq = ComputeJointDistribution(clique_);
56     Joint_Dis_cliq_std = StandardizeFactors(Joint_Dis_cliq);
57     P.cliqueList(i) = Joint_Dis_cliq_std;
58 end
59 P.edges = Input.edges;
60 end

View Code

　　在獲取消息傳遞順序之后，則可進一步對被傳遞的消息進行計算。被傳遞的消息應為某個團對被傳播變量的“所有認知”，所有認知則包括該團本身對該消息的認知，以及該團收到的“情報”。需要注意的是，向下家報告情報的時候要對所有信息進行總結，但是不能將下家告訴你的事情重復一遍。因為。。。重復一遍你也有責任的，知道吧。。。。

 1     while (1)
 2 
 3       [i,j]=GetNextCliques(P,MESSAGES);
 4 
 5       if i == 0
 6         break
 7       end
 8 
 9       to_be_summed =  setdiff(P.cliqueList(i).var,P.cliqueList(j).var);
10       to_be_propogan  =  setdiff(P.cliqueList(i).var,to_be_summed);
11 
12       tmp_ = 1;
13       clear factorList
14       for k = 1:N
15           if P.edges(i,k)==1&&k~=j&&~isempty(MESSAGES(k,i).var)
16               factorList(tmp_) = MESSAGES(k,i); 
17               tmp_ = tmp_+1;
18           end
19       end
20       factorList(tmp_) = P.cliqueList(i);
21       MESSAGES(i,j) = ComputeMarginal(to_be_propogan,ComputeJointDistribution(factorList),[]);
22     end

View Code

　　在消息完成從頂向下以及從下到上的傳播后，每個團需要根據周邊傳來的消息進行總結。也就是把消息與本身的勢相乘（消息是一種邊緣分布）

 1 N = length(P.cliqueList);
 2     for i = 1:N
 3         tmp_ = 1;
 4         for k = 1:N
 5           if P.edges(i,k)==1
 6               factorList(tmp_) = MESSAGES(k,i); 
 7               tmp_ = tmp_+1;
 8           end
 9         end
10         factorList(tmp_) = P.cliqueList(i);
11         belief(i) = ComputeJointDistribution(factorList);
12         clear factorList
13     end

View Code

　　此時，團樹稱為已經校准。對各個團的中的變量進行marginal就可以得到每個變量的邊緣分布了。

3、基於團樹的MAP估計

　　在很多時候，我們可能對單個變量的分布並不感興趣，而是對[ABCDE]這個組合取哪個值概率最大感興趣。這個思想可以用於信號解碼，OCR，圖像處理等領域。很多時候我們不關心單個像素的label是啥，只關心分割出來的像素塊label是啥。這類問題稱為最大后驗估計（MAP）。

　　　　　　　　　　　　argmaxP(AB) = argmaxP(A)P(B|A) = argmax_a{P(A){argmax_bP(B|A)}

　　顯然，從上述過程中，很容易聯想到之前提到的邊際。只不過這里把邊際換成了argmax。P(A){argmax_bP(B|A)}的結果依舊是分布，只不過這個分布的前提是無論A取哪個值，其assignment to val都對應着argmax_b。也就是說，此時如果選擇最大的val，那么assignment則對應的是argmax_ab。這種操作的意義就在於可以對一組變量的MAP分而治之，最終單個變量的MAP就是全局MAP的一部分。此時的MESSAGE計算如下：

 1 for i = 1:N
 2     P.cliqueList(i).val = log(P.cliqueList(i).val);
 3 end
 4 
 5     while (1)
 6 
 7       [i,j]=GetNextCliques(P,MESSAGES);
 8 
 9       if i == 0
10         break
11       end
12 
13       to_be_summed =  setdiff(P.cliqueList(i).var,P.cliqueList(j).var);
14       to_be_propogan  =  setdiff(P.cliqueList(i).var,to_be_summed);
15 
16       tmp_ = 1;
17       clear factorList
18       for k = 1:N
19           if P.edges(i,k)==1&&k~=j&&~isempty(MESSAGES(k,i).var)
20               factorList(tmp_) = MESSAGES(k,i); 
21               tmp_ = tmp_+1;
22           end
23       end
24       factorList(tmp_) = P.cliqueList(i);
25       F = factorList;
26       Joint = F(1);
27         for l = 2:length(F)
28             % Iterate through factors and incorporate them into the joint distribution
29             Joint = FactorSum(Joint, F(l));
30         end
31       MESSAGES(i,j) = FactorMaxMarginalization(Joint,to_be_summed);
32     end

View Code

此處對val取對數是因為在map估計時，card一般都比較大。對應的val太小不便於作乘法（OCR的card是26！！！）

　消息的綜合如下：

 1     
 2     for i = 1:N
 3         tmp_ = 1;
 4         for k = 1:N
 5           if P.edges(i,k)==1
 6               factorList(tmp_) = MESSAGES(k,i); 
 7               tmp_ = tmp_+1;
 8           end
 9         end    
10      factorList(tmp_) = P.cliqueList(i);
11      F = factorList;
12      belief = F(1);
13         for l = 2:length(F)
14             % Iterate through factors and incorporate them into the joint distribution
15             belief = FactorSum(belief, F(l));
16         end
17     
18      clear factorList
19      Belief(i) = belief;    
20     end

View Code

4、總結

　　團樹算法作為一種精確推理算法在VE算法的基礎上大幅減小了計算量和搜索空間。但其作為一種精確推理方法，依舊有着較大局限性。下周的Homework會以實現MCMC算法為目標～就是Alpha狗用的哪個蒙特卡羅哦～敬請期待。

所有代碼請點這里

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。