一、按照程序執行的順序,第一步是walker.py中的preprocess_transition_probs()函數
這個函數的作用是生成兩個采樣預備數據,alias_nodes,alias_edges。
兩份數據又各自包含兩個列表,這兩個列表分別對應着alias采樣中的概率和另一個選項,具體alias采樣詳見https://blog.csdn.net/haolexiao/article/details/65157026
alias_nodes:根據node和它的鄰居之間的權重確定采樣的概率,權重越高,被采中的概率越大。
alias_edges:調用get_alias_edge()函數生成,返回在前一個訪問頂點為t,當前頂點為v時決定下一次訪問哪個鄰接點時需要的alias表
1 def preprocess_transition_probs(self): 2 """ 3 Preprocessing of transition probabilities for guiding the random walks. 4 """ 5 G = self.G 6 7 alias_nodes = {} 8 for node in G.nodes(): 9 unnormalized_probs = [G[node][nbr].get('weight', 1.0) 10 for nbr in G.neighbors(node)] 11 norm_const = sum(unnormalized_probs) 12 normalized_probs = [ 13 float(u_prob)/norm_const for u_prob in unnormalized_probs] 14 alias_nodes[node] = create_alias_table(normalized_probs) 15 16 alias_edges = {} 17 18 for edge in G.edges(): 19 alias_edges[edge] = self.get_alias_edge(edge[0], edge[1]) 20 21 self.alias_nodes = alias_nodes 22 self.alias_edges = alias_edges 23 24 return
二、第二個比較重要的函數是node2vec_walk()函數
該函數是從start_node開始,生成walk_length長度的序列,序列的生成除了考慮當前節點,還考慮前一個遍歷的節點。
采樣方法是根據之前生成的alias數據進行采樣。
對每一個節點都生成一個序列
def node2vec_walk(self, walk_length, start_node):
1 def node2vec_walk(self, walk_length, start_node): 2 3 G = self.G 4 alias_nodes = self.alias_nodes 5 alias_edges = self.alias_edges 6 7 walk = [start_node] 8 9 while len(walk) < walk_length: 10 cur = walk[-1] 11 cur_nbrs = list(G.neighbors(cur)) 12 if len(cur_nbrs) > 0: 13 if len(walk) == 1: 14 walk.append( 15 cur_nbrs[alias_sample(alias_nodes[cur][0], alias_nodes[cur][1])]) 16 else: 17 prev = walk[-2] 18 edge = (prev, cur) 19 try: 20 prob=alias_edges[edge][0] 21 alias=alias_edges[edge][1] 22 except KeyError: 23 print() 24 next_node = cur_nbrs[alias_sample(prob,alias)] 25 walk.append(next_node) 26 else: 27 break 28 29 return walk
三、之后就是調用gensim中的Word2Vec進行訓練,得到每個節點的embedding。