論文筆記之：Graph Attention Networks

本文轉載自查看原文 2017-11-24 10:22 2316

Graph Attention Networks

2018-02-06 16:52:49

Abstract：

　　本文提出一種新穎的 graph attention networks (GATs), 可以處理 graph 結構的數據，利用 masked self-attentional layers 來解決基於 graph convolutions 以及他們的預測的前人方法（prior methods）的不足。

　　對象：graph-structured data.

　　方法：masked self-attentional layers.

　　目標：to address the shortcomings of prior methods based on graph convolutions or their approximations.

　　具體方法：By stacking layers in which nodes are able to attend over their neghborhood's feature. We enables specifying different weights to different nodes in a neighborhood, without requiring any kinds of costly matrix operation or depending on knowing the graph structure upfront.

Introduction：

　　Background：CNN 已經被廣泛的應用於各種 grid 結構的數據當中，各種 task 都取得了不錯的效果，如：物體檢測，語義分割，機器翻譯等等。但是，有些數據結構，不是這種 grid-like structure 的，如：3D meshes, social networks, telecommunication networks, biological networks, brain connection。

　　已經有多個嘗試將 RNN 和 graph 結構的東西結合起來，來進行表示。

　　目前，將 convolution 應用到 the graph domain，常見的有兩種做法：

　　1. spectral approaches

　　2. non-spectral approaches (spatial based methods)

　　文章對這兩種方法進行了簡要的介紹，回顧了一些最近的相關工作。

　　然后就提到了 Attention Mechanisms，這種思路已經被廣泛的應用於各種場景中。其中一個優勢就是：they allow for dealing with variable sized inputs, focusing on the most relvant parts of the input to make decisions。當 attention 被用來計算 single sequence 的表示時，通常被稱為：self-attention or intra-attention。將這種方法和 CNN/RNN 結合在一起，就可以得到非常好的結果了。

　　受到最新工作的啟發，我們提出了 attention-based architecture 來執行 node classification of graph-structured data。This idea is to compute the hidden representations of each node in the graph, by attending over its neighbors, following a self-attention stategy。這個注意力機制有如下幾個有趣的性質：

　　1. 操作是非常有效的。

　　2. 可應用到有不同度的 graph nodes，通過給其緊鄰指定不同的權重；

　　3. 這個模型可以直接應用到 inductive learning problems, including tasks where the model has to generalize to completely unseen graphs.

　　Our approach of sharing a neural network computation across edges is reminiscent of the formulation of relational networks (Santoro et al., 2017), wherein relations between objects (regional features from an image extracted by a convolutional neural network) are aggregated across all object pairs, by employing a shared mechanism. 　　

　　作者在三個數據集上進行了實驗，達到頂尖的效果，表明了 attention-based models 在處理任意結構的 graph 的潛力。

GAT Architecture ：

1. Graph Attentional Layer

　　本文所提出 attentional layer 的輸入是一組節點特征（a set of node features），其中，N 是節點的個數，F 是每個節點的特征數。該層產生一組新的節點特征，作為其輸出，即：。

　　為了得到充分表達能力，將輸入特征轉換為高層特征，至少我們需要一個可學習的線性轉換（one learnable linear transformation）。為了達到該目標，作為初始步驟，一個共享的線性轉換，參數化為 weight matrix，W，應用到每一個節點上。我們然后在每一個節點上，進行 self-attention --- a shared attentional mechanism a：計算 attention coefficients

　　表明 node j's feature 對 node i 的重要性。最 general 的形式，該模型允許 every node to attend on every other node, dropping all structural information. 我們將這種 graph structure 通過執行 masked attention 來注射到該機制當中 --- 我們僅僅對 nodes $j$ 計算 $e_{ij}$，其中，graph 中節點 i 的一些近鄰，記為：$N_{i}$。在我們的實驗當中，這就是 the first-order neighbors of $i$。

　　為了使得系數簡單的適應不同的節點，我們用 softmax function 對所有的 j 進行歸一化：

　　在我們的實驗當中，該 attention 機制 a 是一個 single-layer feedforward neural network，參數化為權重向量。全部展開，用 attention 機制算出來的系數，可以表達為：

　　其中，$*^T$ 代表轉置，|| 代表 concatenation operation。

　　一旦得到了，該歸一化的 attention 系數可以用來計算對應特征的線性加權，可以得到最終的每個節點的輸出向量：

　　為了穩定 self-attention 的學習過程，我們發現將我們的機制拓展到 multi-head attention 是有好處的，類似於：Attention is all you need. 特別的，K 個獨立的 attention 機制執行公式（4）的轉換，然后將其特征進行組合，得到下面的特征輸出：

　　特別的，如果我們執行在 network 的最后輸出層執行該 multi-head attention，concatenation 就不再是必須的了，相反的，我們采用 averaging，推遲執行最終非線性，

　　所提出 attention 加權機制的示意圖，如下所示：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文筆記-ResNeSt-Split-Attention Networks [論文筆記] ：Temporal Graph Networks for Deep Learning on Dynamic Graphs 論文筆記之：Semi-supervised Classification with Graph Convolutional Networks SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS 論文筆記論文解讀（GAT）《Graph Attention Networks》論文筆記之：Fully Convolutional Attention Localization Networks: Efficient Attention Localization for Fine-Grained Recognition 論文筆記：（2019）GAPNet: Graph Attention based Point Neural Network for Exploiting Local Feature of Point Cloud 論文筆記：Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks （KDD 2017）【論文筆記】Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition 論文筆記之： Recurrent Models of Visual Attention