论文概述
本篇论文中主要提出了两个计算图相似性的网络,分别是Graph Embedding Models和.Graph Matching Networks
Graph Embedding Models
模型思想
图嵌入模型通过网络把图表示成一个向量,这样通过计算向量间的距离就可以得到两个图之间的相似性。
网络结构
图嵌入模型主要由三部分组成:(1)一个编码层(an encoder),(2)多个传播层(propagation layers),(3)一个聚合层(an aggregator)
an encoder(一层编码层)
编码层通过两个独立的多层感知机(MLP)分别对节点和边进行编码,个人理解编码器的作用就是对节点信息和边信息进行进行编码(特征降维,引入多层非线性层增加模型的表达能力?)。需要注意的是这个编码器是对于每个节点和每条边进行单独编码的,并不涉及到节点之间或者边之间信息的交互。
代码:

class GraphEncoder(snt.AbstractModule): """Encoder module that projects node and edge features to some embeddings.""" def __init__(self, node_hidden_sizes=None, edge_hidden_sizes=None, name='graph-encoder'): """Constructor. Args: node_hidden_sizes: if provided should be a list of ints, hidden sizes of node encoder network, the last element is the size of the node outputs. If not provided, node features will pass through as is. edge_hidden_sizes: if provided should be a list of ints, hidden sizes of edge encoder network, the last element is the size of the edge outptus. If not provided, edge features will pass through as is. name: name of this module. """ super(GraphEncoder, self).__init__(name=name) # this also handles the case of an empty list self._node_hidden_sizes = node_hidden_sizes if node_hidden_sizes else None self._edge_hidden_sizes = edge_hidden_sizes def _build(self, node_features, edge_features=None): """Encode node and edge features. Args: node_features: [n_nodes, node_feat_dim] float tensor. edge_features: if provided, should be [n_edges, edge_feat_dim] float tensor. Returns: node_outputs: [n_nodes, node_embedding_dim] float tensor, node embeddings. edge_outputs: if edge_features is not None and edge_hidden_sizes is not None, this is [n_edges, edge_embedding_dim] float tensor, edge embeddings; otherwise just the input edge_features. """ if self._node_hidden_sizes is None: node_outputs = node_features else: node_outputs = snt.nets.MLP( self._node_hidden_sizes, name='node-feature-mlp')(node_features) if edge_features is None or self._edge_hidden_sizes is None: edge_outputs = edge_features else: edge_outputs = snt.nets.MLP( self._edge_hidden_sizes, name='edge-feature-mlp')(edge_features) return node_outputs, edge_outputs
propagation layers(传播层)
这里${f_{message}}$是一个MLP,输入是括号里面特征的拼接,该层的作用是对图中编码后的边特征和边两边的节点特征进行联合编码。
${f_{node}}$可以是MLP或RNN,这层的作用是以节点为中心对它的一阶邻域信息进行聚集(accumulate)。为了聚集节点的一阶领域信息这里采用了sum,也可以是mean, max 或attention-based weighted sum。
总的来说传播层的作用就是以节点为中心通过多个传播层来聚集节点的一阶领域信息。
代码:

#对边和节点联合编码,然后以入度节点为中心求和 def graph_prop_once(node_states, from_idx, to_idx, message_net, aggregation_module=tf.unsorted_segment_sum, edge_features=None): """One round of propagation (message passing) in a graph. Args: node_states: [n_nodes, node_state_dim] float tensor, node state vectors, one row for each node. from_idx: [n_edges] int tensor, index of the from nodes. to_idx: [n_edges] int tensor, index of the to nodes. message_net: a network that maps concatenated edge inputs to message vectors. aggregation_module: a module that aggregates messages on edges to aggregated messages for each node. Should be a callable and can be called like the following, `aggregated_messages = aggregation_module(messages, to_idx, n_nodes)`, where messages is [n_edges, edge_message_dim] tensor, to_idx is the index of the to nodes, i.e. where each message should go to, and n_nodes is an int which is the number of nodes to aggregate into. edge_features: if provided, should be a [n_edges, edge_feature_dim] float tensor, extra features for each edge. Returns: aggregated_messages: an [n_nodes, edge_message_dim] float tensor, the aggregated messages, one row for each node. """ from_states = tf.gather(node_states, from_idx) to_states = tf.gather(node_states, to_idx) edge_inputs = [from_states, to_states] #边两端的节点特征 if edge_features is not None: edge_inputs.append(edge_features) #边特征 edge_inputs = tf.concat(edge_inputs, axis=-1) messages = message_net(edge_inputs) #边和节点联合编码 return aggregation_module(messages, to_idx, tf.shape(node_states)[0]) #以入度节点为中心求和 class GraphPropLayer(snt.AbstractModule): """Implementation of a graph propagation (message passing) layer.""" def __init__(self, node_state_dim, edge_hidden_sizes, node_hidden_sizes, edge_net_init_scale=0.1, node_update_type='residual', use_reverse_direction=True, reverse_dir_param_different=True, layer_norm=False, name='graph-net'): """Constructor. Args: node_state_dim: int, dimensionality of node states. edge_hidden_sizes: list of ints, hidden sizes for the edge message net, the last element in the list is the size of the message vectors. node_hidden_sizes: list of ints, hidden sizes for the node update net. edge_net_init_scale: initialization scale for the edge networks. This is typically set to a small value such that the gradient does not blow up. node_update_type: type of node updates, one of {mlp, gru, residual}. use_reverse_direction: set to True to also propagate messages in the reverse direction. reverse_dir_param_different: set to True to have the messages computed using a different set of parameters than for the forward direction. layer_norm: set to True to use layer normalization in a few places. name: name of this module. """ super(GraphPropLayer, self).__init__(name=name) self._node_state_dim = node_state_dim self._edge_hidden_sizes = edge_hidden_sizes[:] # output size is node_state_dim self._node_hidden_sizes = node_hidden_sizes[:] + [node_state_dim] self._edge_net_init_scale = edge_net_init_scale self._node_update_type = node_update_type self._use_reverse_direction = use_reverse_direction self._reverse_dir_param_different = reverse_dir_param_different self._layer_norm = layer_norm def _compute_aggregated_messages( self, node_states, from_idx, to_idx, edge_features=None): """Compute aggregated messages for each node. Args: node_states: [n_nodes, input_node_state_dim] float tensor, node states. from_idx: [n_edges] int tensor, from node indices for each edge. to_idx: [n_edges] int tensor, to node indices for each edge. edge_features: if not None, should be [n_edges, edge_embedding_dim] tensor, edge features. Returns: aggregated_messages: [n_nodes, aggregated_message_dim] float tensor, the aggregated messages for each node. """ self._message_net = snt.nets.MLP( self._edge_hidden_sizes, initializers={ 'w': tf.variance_scaling_initializer( scale=self._edge_net_init_scale), 'b': tf.zeros_initializer()}, name='message-mlp') aggregated_messages = graph_prop_once( node_states, from_idx, to_idx, self._message_net, aggregation_module=tf.unsorted_segment_sum, edge_features=edge_features) # optionally compute message vectors in the reverse direction if self._use_reverse_direction: if self._reverse_dir_param_different: self._reverse_message_net = snt.nets.MLP( self._edge_hidden_sizes, initializers={ 'w': tf.variance_scaling_initializer( scale=self._edge_net_init_scale), 'b': tf.zeros_initializer()}, name='reverse-message-mlp') else: self._reverse_message_net = self._message_net reverse_aggregated_messages = graph_prop_once( node_states, to_idx, from_idx, self._reverse_message_net, aggregation_module=tf.unsorted_segment_sum, edge_features=edge_features) aggregated_messages += reverse_aggregated_messages if self._layer_norm: aggregated_messages = snt.LayerNorm()(aggregated_messages) return aggregated_messages def _compute_node_update(self, node_states, node_state_inputs, node_features=None): """Compute node updates. Args: node_states: [n_nodes, node_state_dim] float tensor, the input node states. node_state_inputs: a list of tensors used to compute node updates. Each element tensor should have shape [n_nodes, feat_dim], where feat_dim can be different. These tensors will be concatenated along the feature dimension. node_features: extra node features if provided, should be of size [n_nodes, extra_node_feat_dim] float tensor, can be used to implement different types of skip connections. Returns: new_node_states: [n_nodes, node_state_dim] float tensor, the new node state tensor. Raises: ValueError: if node update type is not supported. """ if self._node_update_type in ('mlp', 'residual'): node_state_inputs.append(node_states) if node_features is not None: node_state_inputs.append(node_features) if len(node_state_inputs) == 1: node_state_inputs = node_state_inputs[0] else: node_state_inputs = tf.concat(node_state_inputs, axis=-1) if self._node_update_type == 'gru': _, new_node_states = snt.GRU(self._node_state_dim)( node_state_inputs, node_states) return new_node_states else: mlp_output = snt.nets.MLP( self._node_hidden_sizes, name='node-mlp')(node_state_inputs) if self._layer_norm: mlp_output = snt.LayerNorm()(mlp_output) if self._node_update_type == 'mlp': return mlp_output elif self._node_update_type == 'residual': return node_states + mlp_output else: raise ValueError('Unknown node update type %s' % self._node_update_type) def _build(self, node_states, from_idx, to_idx, edge_features=None, node_features=None): """Run one propagation step. Args: node_states: [n_nodes, input_node_state_dim] float tensor, node states. from_idx: [n_edges] int tensor, from node indices for each edge. to_idx: [n_edges] int tensor, to node indices for each edge. edge_features: if not None, should be [n_edges, edge_embedding_dim] tensor, edge features. node_features: extra node features if provided, should be of size [n_nodes, extra_node_feat_dim] float tensor, can be used to implement different types of skip connections. Returns: node_states: [n_nodes, node_state_dim] float tensor, new node states. """ #主要就是一层 graph_prop_once aggregated_messages = self._compute_aggregated_messages( node_states, from_idx, to_idx, edge_features=edge_features) #对以入度节点为中心求和得到的信息和该节点信息进行联合编码 return self._compute_node_update(node_states, [aggregated_messages], node_features=node_features)
an aggregator(聚合层)
聚合层的网络来源于《GATED GRAPH SEQUENCE NEURAL NETWORKS》这篇论文 ,想要深入的理解需要在看一下这篇论文。一张图通过前面两部分的网络后,我们得到了以节点为中心的一阶邻域的聚合信息,图中有多少个节点我们就得到多少个这样的信息,我们最终想要的结果是把图embedding成一个向量,所以我们需要把得到这些以节点为中心的聚集信息进行聚合,最终把图表示成一个向量。最简单的方式就是把这些信息sum后通过一个MLP得到一个向量。作者在这采用了上述论文的聚合结构,这个网络结构可以近似理解为对sum中的每个信息乘了个权重,这样就可以过滤掉一些无关信息。
注意:在作者提供的代码中,上面公式中的MLPgate和MLP是同个网络层。
代码:

AGGREGATION_TYPE = { 'sum': tf.unsorted_segment_sum, 'mean': tf.unsorted_segment_mean, 'sqrt_n': tf.unsorted_segment_sqrt_n, 'max': tf.unsorted_segment_max, } class GraphAggregator(snt.AbstractModule): """This module computes graph representations by aggregating from parts.""" def __init__(self, node_hidden_sizes, graph_transform_sizes=None, gated=True, aggregation_type='sum', name='graph-aggregator'): """Constructor. Args: node_hidden_sizes: the hidden layer sizes of the node transformation nets. The last element is the size of the aggregated graph representation. graph_transform_sizes: sizes of the transformation layers on top of the graph representations. The last element of this list is the final dimensionality of the output graph representations. gated: set to True to do gated aggregation, False not to. aggregation_type: one of {sum, max, mean, sqrt_n}. name: name of this module. """ super(GraphAggregator, self).__init__(name=name) self._node_hidden_sizes = node_hidden_sizes self._graph_transform_sizes = graph_transform_sizes self._graph_state_dim = node_hidden_sizes[-1] self._gated = gated self._aggregation_type = aggregation_type self._aggregation_op = AGGREGATION_TYPE[aggregation_type] def _build(self, node_states, graph_idx, n_graphs): """Compute aggregated graph representations. Args: node_states: [n_nodes, node_state_dim] float tensor, node states of a batch of graphs concatenated together along the first dimension. graph_idx: [n_nodes] int tensor, graph ID for each node. n_graphs: integer, number of graphs in this batch. Returns: graph_states: [n_graphs, graph_state_dim] float tensor, graph representations, one row for each graph. """ node_hidden_sizes = self._node_hidden_sizes if self._gated: node_hidden_sizes[-1] = self._graph_state_dim * 2 node_states_g = snt.nets.MLP( node_hidden_sizes, name='node-state-g-mlp')(node_states) if self._gated: gates = tf.nn.sigmoid(node_states_g[:, :self._graph_state_dim]) node_states_g = node_states_g[:, self._graph_state_dim:] * gates graph_states = self._aggregation_op(node_states_g, graph_idx, n_graphs) # unsorted_segment_max does not handle empty graphs in the way we want # it assigns the lowest possible float to empty segments, we want to reset # them to zero. if self._aggregation_type == 'max': # reset everything that's smaller than -1e5 to 0. graph_states *= tf.cast(graph_states > -1e5, tf.float32) # transform the reduced graph states further # pylint: disable=g-explicit-length-test if (self._graph_transform_sizes is not None and len(self._graph_transform_sizes) > 0): graph_states = snt.nets.MLP( self._graph_transform_sizes, name='graph-transform-mlp')(graph_states) return graph_states
Graph Matching Networks
模型思想
图匹配网络把一对图作为输入直接计算出这对图间的相似性。
网络结构
en encoder(编码层)
图匹配网络编码层和图嵌入模型相同。
propagation layers(传播层)
在这里${f_{match}}$是用来积累跨图信息的函数,这里采用了注意力机制模型:
公式10中的${s_h}$表示向量间的距离,如欧几里得距离和余弦距离等。
${\mu _{j \to i}}$这层在我理解上就是采用了注意力机制计算出了两图节点特征编码后的相对距离,在公式10中我们可以看到在计算两个图中编码后的节点特征的差值时乘了个注意力权重,个人理解乘以这个权重的原因是我们匹配两个图时最后得到的节点关系是一对一的关系,二公式10中计算的到的注意力权重是图2的节点$j$和图1的节点$i$越相似权重越大,也就是是说乘以这个权重后我们放大了图1(图2)中每个节点和图2(图1)中最相似节点间的差值,而减少两个图中不相似的节点对差值对最终结果的影响。因为我们在度量两个图的相似性时,我们只考虑匹配节点对的相似性。
图嵌入网络的传播层中以节点为中心不仅聚集了该节点的一阶领域信息还聚集了该节点和另一个图中所有节点的匹配信息。
an aggregator(聚合层)
数据label
二元组
两个图相似时 t = 1, 不相似时 t = -1
三元组
对于每个三元组(G1, G2, G3),假设G1和G2相似,G2和G3不相似
损失函数
改论文在计算损失时采用了二元组和三元组损失两种方式,并且比较了两种不同的损失函数。
合页损失函数
二元组
三元组
这里图1和图2比图2和图3更相似。
近似汉明损失
实验部分
任务一:图编辑距离学习
数据集生成:
注意:在该实验中 kp 设为1,kn 设为2
评估方法:
(1)1000对图预测结果的AUC面积
(2)1000个三元组的准确率
实验结果:
总结
在该论文中图匹配网络相比于图嵌入模型取得了更好的实验结果,图匹配网络在以节点为中心聚集信息时加入了该节点和另一个图的匹配信息,在改论文中提到Compared to the graph embedding model, the matching model has the ability to change the representation of the graphs based on the other graph it is compared against. The model will adjust graph representations to make them become more different if they do not match.
我个人认为图匹配网络存在以下问题:
在计算当前节点和另一个图中所有节点的匹配情况时,改论文采用了注意力机制抑制不匹配节点的距离而放大匹配节点的距离,但是由于每个节点都是单独考虑的,也就是说存在一个图中多个节点和另一个图中同一个节点最相似的情况,极端情况就是图1都是相同类型节点,而图2有一个节点和图1节点类型相同,该网络会不会就认为两个图中的节点是精确匹配的,距离为0.