論文概述
本篇論文中主要提出了兩個計算圖相似性的網絡,分別是Graph Embedding Models和.Graph Matching Networks
Graph Embedding Models
模型思想
圖嵌入模型通過網絡把圖表示成一個向量,這樣通過計算向量間的距離就可以得到兩個圖之間的相似性。
網絡結構
圖嵌入模型主要由三部分組成:(1)一個編碼層(an encoder),(2)多個傳播層(propagation layers),(3)一個聚合層(an aggregator)
an encoder(一層編碼層)
編碼層通過兩個獨立的多層感知機(MLP)分別對節點和邊進行編碼,個人理解編碼器的作用就是對節點信息和邊信息進行進行編碼(特征降維,引入多層非線性層增加模型的表達能力?)。需要注意的是這個編碼器是對於每個節點和每條邊進行單獨編碼的,並不涉及到節點之間或者邊之間信息的交互。
代碼:

class GraphEncoder(snt.AbstractModule): """Encoder module that projects node and edge features to some embeddings.""" def __init__(self, node_hidden_sizes=None, edge_hidden_sizes=None, name='graph-encoder'): """Constructor. Args: node_hidden_sizes: if provided should be a list of ints, hidden sizes of node encoder network, the last element is the size of the node outputs. If not provided, node features will pass through as is. edge_hidden_sizes: if provided should be a list of ints, hidden sizes of edge encoder network, the last element is the size of the edge outptus. If not provided, edge features will pass through as is. name: name of this module. """ super(GraphEncoder, self).__init__(name=name) # this also handles the case of an empty list self._node_hidden_sizes = node_hidden_sizes if node_hidden_sizes else None self._edge_hidden_sizes = edge_hidden_sizes def _build(self, node_features, edge_features=None): """Encode node and edge features. Args: node_features: [n_nodes, node_feat_dim] float tensor. edge_features: if provided, should be [n_edges, edge_feat_dim] float tensor. Returns: node_outputs: [n_nodes, node_embedding_dim] float tensor, node embeddings. edge_outputs: if edge_features is not None and edge_hidden_sizes is not None, this is [n_edges, edge_embedding_dim] float tensor, edge embeddings; otherwise just the input edge_features. """ if self._node_hidden_sizes is None: node_outputs = node_features else: node_outputs = snt.nets.MLP( self._node_hidden_sizes, name='node-feature-mlp')(node_features) if edge_features is None or self._edge_hidden_sizes is None: edge_outputs = edge_features else: edge_outputs = snt.nets.MLP( self._edge_hidden_sizes, name='edge-feature-mlp')(edge_features) return node_outputs, edge_outputs
propagation layers(傳播層)
這里${f_{message}}$是一個MLP,輸入是括號里面特征的拼接,該層的作用是對圖中編碼后的邊特征和邊兩邊的節點特征進行聯合編碼。
${f_{node}}$可以是MLP或RNN,這層的作用是以節點為中心對它的一階鄰域信息進行聚集(accumulate)。為了聚集節點的一階領域信息這里采用了sum,也可以是mean, max 或attention-based weighted sum。
總的來說傳播層的作用就是以節點為中心通過多個傳播層來聚集節點的一階領域信息。
代碼:

#對邊和節點聯合編碼,然后以入度節點為中心求和 def graph_prop_once(node_states, from_idx, to_idx, message_net, aggregation_module=tf.unsorted_segment_sum, edge_features=None): """One round of propagation (message passing) in a graph. Args: node_states: [n_nodes, node_state_dim] float tensor, node state vectors, one row for each node. from_idx: [n_edges] int tensor, index of the from nodes. to_idx: [n_edges] int tensor, index of the to nodes. message_net: a network that maps concatenated edge inputs to message vectors. aggregation_module: a module that aggregates messages on edges to aggregated messages for each node. Should be a callable and can be called like the following, `aggregated_messages = aggregation_module(messages, to_idx, n_nodes)`, where messages is [n_edges, edge_message_dim] tensor, to_idx is the index of the to nodes, i.e. where each message should go to, and n_nodes is an int which is the number of nodes to aggregate into. edge_features: if provided, should be a [n_edges, edge_feature_dim] float tensor, extra features for each edge. Returns: aggregated_messages: an [n_nodes, edge_message_dim] float tensor, the aggregated messages, one row for each node. """ from_states = tf.gather(node_states, from_idx) to_states = tf.gather(node_states, to_idx) edge_inputs = [from_states, to_states] #邊兩端的節點特征 if edge_features is not None: edge_inputs.append(edge_features) #邊特征 edge_inputs = tf.concat(edge_inputs, axis=-1) messages = message_net(edge_inputs) #邊和節點聯合編碼 return aggregation_module(messages, to_idx, tf.shape(node_states)[0]) #以入度節點為中心求和 class GraphPropLayer(snt.AbstractModule): """Implementation of a graph propagation (message passing) layer.""" def __init__(self, node_state_dim, edge_hidden_sizes, node_hidden_sizes, edge_net_init_scale=0.1, node_update_type='residual', use_reverse_direction=True, reverse_dir_param_different=True, layer_norm=False, name='graph-net'): """Constructor. Args: node_state_dim: int, dimensionality of node states. edge_hidden_sizes: list of ints, hidden sizes for the edge message net, the last element in the list is the size of the message vectors. node_hidden_sizes: list of ints, hidden sizes for the node update net. edge_net_init_scale: initialization scale for the edge networks. This is typically set to a small value such that the gradient does not blow up. node_update_type: type of node updates, one of {mlp, gru, residual}. use_reverse_direction: set to True to also propagate messages in the reverse direction. reverse_dir_param_different: set to True to have the messages computed using a different set of parameters than for the forward direction. layer_norm: set to True to use layer normalization in a few places. name: name of this module. """ super(GraphPropLayer, self).__init__(name=name) self._node_state_dim = node_state_dim self._edge_hidden_sizes = edge_hidden_sizes[:] # output size is node_state_dim self._node_hidden_sizes = node_hidden_sizes[:] + [node_state_dim] self._edge_net_init_scale = edge_net_init_scale self._node_update_type = node_update_type self._use_reverse_direction = use_reverse_direction self._reverse_dir_param_different = reverse_dir_param_different self._layer_norm = layer_norm def _compute_aggregated_messages( self, node_states, from_idx, to_idx, edge_features=None): """Compute aggregated messages for each node. Args: node_states: [n_nodes, input_node_state_dim] float tensor, node states. from_idx: [n_edges] int tensor, from node indices for each edge. to_idx: [n_edges] int tensor, to node indices for each edge. edge_features: if not None, should be [n_edges, edge_embedding_dim] tensor, edge features. Returns: aggregated_messages: [n_nodes, aggregated_message_dim] float tensor, the aggregated messages for each node. """ self._message_net = snt.nets.MLP( self._edge_hidden_sizes, initializers={ 'w': tf.variance_scaling_initializer( scale=self._edge_net_init_scale), 'b': tf.zeros_initializer()}, name='message-mlp') aggregated_messages = graph_prop_once( node_states, from_idx, to_idx, self._message_net, aggregation_module=tf.unsorted_segment_sum, edge_features=edge_features) # optionally compute message vectors in the reverse direction if self._use_reverse_direction: if self._reverse_dir_param_different: self._reverse_message_net = snt.nets.MLP( self._edge_hidden_sizes, initializers={ 'w': tf.variance_scaling_initializer( scale=self._edge_net_init_scale), 'b': tf.zeros_initializer()}, name='reverse-message-mlp') else: self._reverse_message_net = self._message_net reverse_aggregated_messages = graph_prop_once( node_states, to_idx, from_idx, self._reverse_message_net, aggregation_module=tf.unsorted_segment_sum, edge_features=edge_features) aggregated_messages += reverse_aggregated_messages if self._layer_norm: aggregated_messages = snt.LayerNorm()(aggregated_messages) return aggregated_messages def _compute_node_update(self, node_states, node_state_inputs, node_features=None): """Compute node updates. Args: node_states: [n_nodes, node_state_dim] float tensor, the input node states. node_state_inputs: a list of tensors used to compute node updates. Each element tensor should have shape [n_nodes, feat_dim], where feat_dim can be different. These tensors will be concatenated along the feature dimension. node_features: extra node features if provided, should be of size [n_nodes, extra_node_feat_dim] float tensor, can be used to implement different types of skip connections. Returns: new_node_states: [n_nodes, node_state_dim] float tensor, the new node state tensor. Raises: ValueError: if node update type is not supported. """ if self._node_update_type in ('mlp', 'residual'): node_state_inputs.append(node_states) if node_features is not None: node_state_inputs.append(node_features) if len(node_state_inputs) == 1: node_state_inputs = node_state_inputs[0] else: node_state_inputs = tf.concat(node_state_inputs, axis=-1) if self._node_update_type == 'gru': _, new_node_states = snt.GRU(self._node_state_dim)( node_state_inputs, node_states) return new_node_states else: mlp_output = snt.nets.MLP( self._node_hidden_sizes, name='node-mlp')(node_state_inputs) if self._layer_norm: mlp_output = snt.LayerNorm()(mlp_output) if self._node_update_type == 'mlp': return mlp_output elif self._node_update_type == 'residual': return node_states + mlp_output else: raise ValueError('Unknown node update type %s' % self._node_update_type) def _build(self, node_states, from_idx, to_idx, edge_features=None, node_features=None): """Run one propagation step. Args: node_states: [n_nodes, input_node_state_dim] float tensor, node states. from_idx: [n_edges] int tensor, from node indices for each edge. to_idx: [n_edges] int tensor, to node indices for each edge. edge_features: if not None, should be [n_edges, edge_embedding_dim] tensor, edge features. node_features: extra node features if provided, should be of size [n_nodes, extra_node_feat_dim] float tensor, can be used to implement different types of skip connections. Returns: node_states: [n_nodes, node_state_dim] float tensor, new node states. """ #主要就是一層 graph_prop_once aggregated_messages = self._compute_aggregated_messages( node_states, from_idx, to_idx, edge_features=edge_features) #對以入度節點為中心求和得到的信息和該節點信息進行聯合編碼 return self._compute_node_update(node_states, [aggregated_messages], node_features=node_features)
an aggregator(聚合層)
聚合層的網絡來源於《GATED GRAPH SEQUENCE NEURAL NETWORKS》這篇論文 ,想要深入的理解需要在看一下這篇論文。一張圖通過前面兩部分的網絡后,我們得到了以節點為中心的一階鄰域的聚合信息,圖中有多少個節點我們就得到多少個這樣的信息,我們最終想要的結果是把圖embedding成一個向量,所以我們需要把得到這些以節點為中心的聚集信息進行聚合,最終把圖表示成一個向量。最簡單的方式就是把這些信息sum后通過一個MLP得到一個向量。作者在這采用了上述論文的聚合結構,這個網絡結構可以近似理解為對sum中的每個信息乘了個權重,這樣就可以過濾掉一些無關信息。
注意:在作者提供的代碼中,上面公式中的MLPgate和MLP是同個網絡層。
代碼:

AGGREGATION_TYPE = { 'sum': tf.unsorted_segment_sum, 'mean': tf.unsorted_segment_mean, 'sqrt_n': tf.unsorted_segment_sqrt_n, 'max': tf.unsorted_segment_max, } class GraphAggregator(snt.AbstractModule): """This module computes graph representations by aggregating from parts.""" def __init__(self, node_hidden_sizes, graph_transform_sizes=None, gated=True, aggregation_type='sum', name='graph-aggregator'): """Constructor. Args: node_hidden_sizes: the hidden layer sizes of the node transformation nets. The last element is the size of the aggregated graph representation. graph_transform_sizes: sizes of the transformation layers on top of the graph representations. The last element of this list is the final dimensionality of the output graph representations. gated: set to True to do gated aggregation, False not to. aggregation_type: one of {sum, max, mean, sqrt_n}. name: name of this module. """ super(GraphAggregator, self).__init__(name=name) self._node_hidden_sizes = node_hidden_sizes self._graph_transform_sizes = graph_transform_sizes self._graph_state_dim = node_hidden_sizes[-1] self._gated = gated self._aggregation_type = aggregation_type self._aggregation_op = AGGREGATION_TYPE[aggregation_type] def _build(self, node_states, graph_idx, n_graphs): """Compute aggregated graph representations. Args: node_states: [n_nodes, node_state_dim] float tensor, node states of a batch of graphs concatenated together along the first dimension. graph_idx: [n_nodes] int tensor, graph ID for each node. n_graphs: integer, number of graphs in this batch. Returns: graph_states: [n_graphs, graph_state_dim] float tensor, graph representations, one row for each graph. """ node_hidden_sizes = self._node_hidden_sizes if self._gated: node_hidden_sizes[-1] = self._graph_state_dim * 2 node_states_g = snt.nets.MLP( node_hidden_sizes, name='node-state-g-mlp')(node_states) if self._gated: gates = tf.nn.sigmoid(node_states_g[:, :self._graph_state_dim]) node_states_g = node_states_g[:, self._graph_state_dim:] * gates graph_states = self._aggregation_op(node_states_g, graph_idx, n_graphs) # unsorted_segment_max does not handle empty graphs in the way we want # it assigns the lowest possible float to empty segments, we want to reset # them to zero. if self._aggregation_type == 'max': # reset everything that's smaller than -1e5 to 0. graph_states *= tf.cast(graph_states > -1e5, tf.float32) # transform the reduced graph states further # pylint: disable=g-explicit-length-test if (self._graph_transform_sizes is not None and len(self._graph_transform_sizes) > 0): graph_states = snt.nets.MLP( self._graph_transform_sizes, name='graph-transform-mlp')(graph_states) return graph_states
Graph Matching Networks
模型思想
圖匹配網絡把一對圖作為輸入直接計算出這對圖間的相似性。
網絡結構
en encoder(編碼層)
圖匹配網絡編碼層和圖嵌入模型相同。
propagation layers(傳播層)
在這里${f_{match}}$是用來積累跨圖信息的函數,這里采用了注意力機制模型:
公式10中的${s_h}$表示向量間的距離,如歐幾里得距離和余弦距離等。
${\mu _{j \to i}}$這層在我理解上就是采用了注意力機制計算出了兩圖節點特征編碼后的相對距離,在公式10中我們可以看到在計算兩個圖中編碼后的節點特征的差值時乘了個注意力權重,個人理解乘以這個權重的原因是我們匹配兩個圖時最后得到的節點關系是一對一的關系,二公式10中計算的到的注意力權重是圖2的節點$j$和圖1的節點$i$越相似權重越大,也就是是說乘以這個權重后我們放大了圖1(圖2)中每個節點和圖2(圖1)中最相似節點間的差值,而減少兩個圖中不相似的節點對差值對最終結果的影響。因為我們在度量兩個圖的相似性時,我們只考慮匹配節點對的相似性。
圖嵌入網絡的傳播層中以節點為中心不僅聚集了該節點的一階領域信息還聚集了該節點和另一個圖中所有節點的匹配信息。
an aggregator(聚合層)
數據label
二元組
兩個圖相似時 t = 1, 不相似時 t = -1
三元組
對於每個三元組(G1, G2, G3),假設G1和G2相似,G2和G3不相似
損失函數
改論文在計算損失時采用了二元組和三元組損失兩種方式,並且比較了兩種不同的損失函數。
合頁損失函數
二元組
三元組
這里圖1和圖2比圖2和圖3更相似。
近似漢明損失
實驗部分
任務一:圖編輯距離學習
數據集生成:
注意:在該實驗中 kp 設為1,kn 設為2
評估方法:
(1)1000對圖預測結果的AUC面積
(2)1000個三元組的准確率
實驗結果:
總結
在該論文中圖匹配網絡相比於圖嵌入模型取得了更好的實驗結果,圖匹配網絡在以節點為中心聚集信息時加入了該節點和另一個圖的匹配信息,在改論文中提到Compared to the graph embedding model, the matching model has the ability to change the representation of the graphs based on the other graph it is compared against. The model will adjust graph representations to make them become more different if they do not match.
我個人認為圖匹配網絡存在以下問題:
在計算當前節點和另一個圖中所有節點的匹配情況時,改論文采用了注意力機制抑制不匹配節點的距離而放大匹配節點的距離,但是由於每個節點都是單獨考慮的,也就是說存在一個圖中多個節點和另一個圖中同一個節點最相似的情況,極端情況就是圖1都是相同類型節點,而圖2有一個節點和圖1節點類型相同,該網絡會不會就認為兩個圖中的節點是精確匹配的,距離為0.