Van Thuy Hoang
Dept. of Artificial Intelligence,
The Catholic University of Korea
hoangvanthuy90@gmail.com
Park et al., ICLR 2022
2
 Encode absolution position in the sequence of nodes
 Encode relative position with another node using bias
terms
 propose relative positional encoding for a graph to
overcome the weakness of the previous approaches
https://github.com/ Namkyeong/AFGRL
3
Problems
 Explicit representations of position in graph convolutional networks
are lost, thus incorporating graph structure on the hidden
representations of self-attention is a key challenge.
 linearizing graph with graph Laplacian to encode the absolute
position of each node
 loses preciseness of position due to linearization
 encoding position relative to another node with bias terms
 loses a tight integration of node-edge and node-spatial
information
4
Problems
 Introduce two sets of learnable positional encoding vectors to
represent spatial relation or edge between two nodes.
 Considers the interaction between:
 node features
 the two encoding vectors
To integrate both node-spatial relation and node-edge information
5
BACKGROUND
 The self-attention module computes query q, key k, and value v with
independent linear transformations:
 The attention map is computed by applying a scaled dot product
between the queries and the keys:
 The self-attention module outputs the next hidden feature by
applying weighted summation on the values
6
GRAPH WITH TRANSFORMER
 Graphormer adopted two additional terms on the self-attention
module to encode graph information on the attention map.
 GT: The graph Laplacian represents the structure of a graph with
respect to node:
7
The proposed Graph Relative Positional Encoding
 Left shows an example of how GRPE process relative relation
between nodes. In the example we set the L to 2.
 Right describes our self-attention mechanism.
 Two relative positional encodings, spatial encoding and edge
encoding, are used to encode graph on both attention map and
value.
8
NODE-AWARE ATTENTION
 two terms to encode graph on the attention map with two newly
proposed encodings
 The 1st term:
 It encodes graph by considering interaction between node feature
and spatial relation in graph.
 The 2nd term:
 It encodes graph by considering interaction between node feature
and edge in graph.
9
Problems
 Two terms consider node-spatial relation and node-edge relation,
but Graphormer did not consider the interaction with node feature.
 Finally, the two terms are added to scaled dot product attention map
to encode graph information.
10
GRAPH-ENCODED VALUE
 to encode a graph to the hidden features of self-attention, when
values are weighted summed with the attention map.
 encode both spatial encoding and edge encoding into value via
summation:
 directly encodes graph information into the hidden features of value.
11
EXPERIMENT
 VIRTUAL NODE
 The role of a virtual node is similar to special tokens such as a
classification token
12
EXPERIMENT
 Results on ZINC
13
EXPERIMENT
 Results on MolHIV
 Results on MolPCBA
14
Problems
 Effects of components of GRPE on ZINC datasets (The lower the
better)
 Empirically, sharing the encodings does not significantly change the
performance of a model especially on large datasets.
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for Graph Transformer", ICLR 2022

NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for Graph Transformer", ICLR 2022

  • 1.
    Van Thuy Hoang Dept.of Artificial Intelligence, The Catholic University of Korea [email protected] Park et al., ICLR 2022
  • 2.
    2  Encode absolutionposition in the sequence of nodes  Encode relative position with another node using bias terms  propose relative positional encoding for a graph to overcome the weakness of the previous approaches https://github.com/ Namkyeong/AFGRL
  • 3.
    3 Problems  Explicit representationsof position in graph convolutional networks are lost, thus incorporating graph structure on the hidden representations of self-attention is a key challenge.  linearizing graph with graph Laplacian to encode the absolute position of each node  loses preciseness of position due to linearization  encoding position relative to another node with bias terms  loses a tight integration of node-edge and node-spatial information
  • 4.
    4 Problems  Introduce twosets of learnable positional encoding vectors to represent spatial relation or edge between two nodes.  Considers the interaction between:  node features  the two encoding vectors To integrate both node-spatial relation and node-edge information
  • 5.
    5 BACKGROUND  The self-attentionmodule computes query q, key k, and value v with independent linear transformations:  The attention map is computed by applying a scaled dot product between the queries and the keys:  The self-attention module outputs the next hidden feature by applying weighted summation on the values
  • 6.
    6 GRAPH WITH TRANSFORMER Graphormer adopted two additional terms on the self-attention module to encode graph information on the attention map.  GT: The graph Laplacian represents the structure of a graph with respect to node:
  • 7.
    7 The proposed GraphRelative Positional Encoding  Left shows an example of how GRPE process relative relation between nodes. In the example we set the L to 2.  Right describes our self-attention mechanism.  Two relative positional encodings, spatial encoding and edge encoding, are used to encode graph on both attention map and value.
  • 8.
    8 NODE-AWARE ATTENTION  twoterms to encode graph on the attention map with two newly proposed encodings  The 1st term:  It encodes graph by considering interaction between node feature and spatial relation in graph.  The 2nd term:  It encodes graph by considering interaction between node feature and edge in graph.
  • 9.
    9 Problems  Two termsconsider node-spatial relation and node-edge relation, but Graphormer did not consider the interaction with node feature.  Finally, the two terms are added to scaled dot product attention map to encode graph information.
  • 10.
    10 GRAPH-ENCODED VALUE  toencode a graph to the hidden features of self-attention, when values are weighted summed with the attention map.  encode both spatial encoding and edge encoding into value via summation:  directly encodes graph information into the hidden features of value.
  • 11.
    11 EXPERIMENT  VIRTUAL NODE The role of a virtual node is similar to special tokens such as a classification token
  • 12.
  • 13.
    13 EXPERIMENT  Results onMolHIV  Results on MolPCBA
  • 14.
    14 Problems  Effects ofcomponents of GRPE on ZINC datasets (The lower the better)  Empirically, sharing the encodings does not significantly change the performance of a model especially on large datasets.