0% found this document useful (0 votes)

206 views10 pages

Improving Graph Neural Networks With Simple Architecture Design

Uploaded by

hilmi bukhori

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

206 views10 pages

Improving Graph Neural Networks With Simple Architecture Design

Uploaded by

hilmi bukhori

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Improving Graph Neural Networks with Simple Architecture

Design
Sunil Kumar Maurya Xin Liu Tsuyoshi Murata
[email protected] [email protected] [email protected]
Tokyo Institute of Technology AIRC, AIST Tokyo Institute of Technology
Tokyo, Japan Tokyo, Japan Tokyo, Japan

ABSTRACT 1 INTRODUCTION
arXiv:2105.07634v1 [stat.ML] 17 May 2021

Graph Neural Networks have emerged as a useful tool to learn Graph Neural Networks (GNNs) have opened a unique path to
on the data by applying additional constraints based on the graph learning on data by leveraging the intrinsic relations between enti-
structure. These graphs are often created with assumed intrinsic ties that can be structured as a graph. By imposing these structural
relations between the entities. In recent years, there have been constraints, additional information can be learned and used for
tremendous improvements in the architecture design, pushing the many types of prediction tasks. With rapid development of the field
performance up in various prediction tasks. In general, these neu- and easy accessibility of computation and data, GNNs have been
ral architectures combine layer depth and node feature aggrega- used to solve a variety of problems like node classification [15, 27, 1,
tion steps. This makes it challenging to analyze the importance 7], link prediction [32, 3, 4], graph classification [33, 34], prediction
of features at various hops and the expressiveness of the neural of molecular properties [10, 18], natural language processing [19],
network layers. As different graph datasets show varying levels of node ranking [20] and so on.
homophily and heterophily in features and class label distribution, In this work, we focus on the node classification task using
it becomes essential to understand which features are important for graph neural networks. Since the success of early GNN models
the prediction tasks without any prior information. In this work, like GCN [15], researchers have successively proposed numerous
we decouple the node feature aggregation step and depth of graph variants [30] to address various shortcomings in model training
neural network and introduce several key design strategies for and to improve the prediction capabilities. Some of the techniques
graph neural networks. More specifically, we propose to use soft- used in these variants include neighbor sampling [12, 6], attention
max as a regularizer and "Soft-Selector" of features aggregated from mechanism to assign different weights to neighbors [27], use of
neighbors at different hop distances; and "Hop-Normalization" over Personalized PageRank matrix instead of adjacency matrix [16]
GNN layers. Combining these techniques, we present a simple and and simplified model design [29]. Also, there has been a growing
shallow model, Feature Selection Graph Neural Network (FSGNN), interest in making the models deeper by stacking more layers and
and show empirically that the proposed model outperforms other using the residual connections to improve the expressiveness of
state of the art GNN models and achieves up to 64% improvements the model [23, 7]. However, most of these models by design are
in accuracy on node classification tasks. Moreover, analyzing the more suitable for homophily datasets where nodes linked to each
learned soft-selection parameters of the model provides a simple other are more likely to belong in the same class. They may not
way to study the importance of features in the prediction tasks. perform well with heterophily datasets which are more likely to
Finally, we demonstrate with experiments that the model is scalable have nodes with different labels connected together. Zhu et al.
for large graphs with millions of nodes and billions of edges. [35] highlight this problem and propose node’s ego-embedding
Source code at https://github.com/sunilkmaurya/FSGNN and neighbor-embedding separation to improve performance on
heterophily datasets.
KEYWORDS In general, GNN models combine feature aggregation and trans-
Graph Neural Networks, Node Classification, Model Design, Feature formation using a learnable weight matrix in the same layer, often
Selection referred to as graph convolutional layer. These layers are stacked
together with the non-linear transformation (e.g., ReLU) and reg-
ACM Reference Format:
ularization(e.g., Dropout) as a learning framework on the graph
Sunil Kumar Maurya, Xin Liu, and Tsuyoshi Murata. 2021. Improving Graph
Neural Networks with Simple Architecture Design. In Proceedings of ACM
data. Stacking the layers also has the effect of introducing powers
Conference (Conference’17). ACM, New York, NY, USA, 10 pages. https: of adjacency matrix (or laplacian matrix), which helps to generate a
//doi.org/10.1145/nnnnnnn.nnnnnnn new set of features for a node by aggregating neighbor’s features at
multiple hops, thus encoding the neighborhood information. The
Permission to make digital or hard copies of all or part of this work for personal or number of these unique features depends on the propagation steps
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation or the depth of the model. The final node embeddings are the output
on the first page. Copyrights for components of this work owned by others than ACM of just stacked layers or, for some models, also has skip connection
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, or residual connection combined at final layer.
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected]. However, such a combination muddles the distinction between
Conference’17, July 2017, Washington, DC, USA the importance of features and expressiveness of MLP. It becomes
© 2021 Association for Computing Machinery. challenging to analyze which features are essential and how much
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn expressiveness MLP requires over a specific task. To overcome this
Conference’17, July 2017, Washington, DC, USA Maurya et al.

challenge, we provide a framework to treat the feature propagation For heterophily datasets, we require a propagation scheme to sepa-
and learning separately. With this freedom, we propose a simple rate features of neighbors from node’s own features. So we use the
GNN model with three unique design considerations: Soft-selection following formulation for the GNN layer,
of features using softmax function, Hop-Normalization, and unique
mapping of features. With experimental results, we show that our 𝐻 (𝑖+1) = 𝜎 (𝐴𝑠𝑦𝑚 𝐻 (𝑖) 𝑊 (𝑖) ) (2)
simple 2-layer GNN outperforms other state-of-art GNN models
− 21 − 21
where 𝐴𝑠𝑦𝑚 = 𝐷 𝐴𝐷 is symmetric normalized adjacency ma-
(both shallow and deep) and achieves up to 64% higher node clas-
sification accuracy. In addition, analyzing the model parameters trix without added self-loops. To combine features from multiple
gives us an insight into identifying which features are most respon- hops, concatenation operator can be used before the final layer.
sible for classification accuracy. One interesting observation we Following the conventional GNN formulation using 𝐴, ˜ a simple
find is regarding Chameleon and Squirrel datasets. These are dense 2-layered GNN can be represented as [15],
graph datasets and are generally regarded as being low-quality het-
erophily datasets. However, in our experiments with our proposed 𝑍 = 𝐴˜𝑠𝑦𝑚 𝜎 (𝐴˜𝑠𝑦𝑚 𝑋𝑊 (0) )𝑊 (1) (3)
model, we find them to be showing strong heterophily properties
with improved classification results. 2.2 Node Classification
Furthermore, we demonstrate that due to the simple design of
Node classification is an extensively studied graph based semi-
our model, it can scale up for very large graph datasets. We run
supervised learning problem. It encompasses training the GNN to
experiments on ogbn-papers100M dataset, which is the largest
predict labels of nodes based on the features and neighborhood
publicly available node classification dataset, and achieve higher
structure of the nodes. GNN model is considered as a function
accuracy than the state of the art models.
𝑓 (𝑋, 𝐴) conditioned on node features 𝑋 and adjacency matrix 𝐴.
The rest of the paper is organized as follows: Section 2 outlines
Taking the example of Eq. 3, GNN aggregates the features of two
formulation of graph neural networks and details node classifica-
hops of neighbors and outputs 𝑍 . Softmax function is applied row-
tion task. In Section 3, we discuss design strategies for GNNs and
wise, and cross-entropy error is calculated over all labeled training
propose the GNN model FSGNN. In Section 4, we briefly introduce
examples. The gradients of loss are back-propagated through the
relevant GNN literature. Section 5 contains the experimental details
GNN layers. Once trained, the model can be used for the prediction
and comparison with other GNN models. In Section 6, we empiri-
of labels of nodes in the test set.
cally analyze our proposed design strategies and their effect on the
model’s performance. Section 7 summarizes the paper.
2.3 Homophily vs Heterophily
Node classification problem relies on the graph structure and fea-
2 PRELIMINARIES tures of the nodes to identify the labels of the node. Under ho-
Let 𝐺 = (𝑉 , 𝐸) be an undirected graph with 𝑛 nodes and 𝑚 edges. mophily, nodes are assumed to have neighbors with similar features
For numerical calculations, graph is represented as adjacency matrix and labels. Thus, the cumulative aggregation of node’s self-features
denoted by 𝐴 ∈ {0, 1}𝑛×𝑛 with each element 𝐴𝑖 𝑗 = 1 if there exists with that of neighbors reinforce the signal corresponding to the
an edge between node 𝑣𝑖 and 𝑣 𝑗 , otherwise 𝐴𝑖 𝑗 = 0. If self-loops label and help to improve accuracy of the predictions. While in the
are added to the graph then, resultant adajcency matrix is denoted case of heterophily, nodes are assumed to have dissimilar features
as 𝐴˜ = 𝐴 + 𝐼 . Diagonal degree matrix of 𝐴 and 𝐴˜ are denoted as 𝐷 and labels. In this case, the cumulative aggregation will reduce the
and 𝐷.˜ Each node is associated with a d-dimensional feature vector signal and add more noise causing neural network to learn poorly
and the feature matrix for all nodes is represented as 𝑋 ∈ R𝑛×𝑑 . and causing drop in performance. Thus it is essential to have node’s
self-features separate from the neighbor’s features. In real-world
2.1 Graph Neural Networks datasets, homophily and heterophily levels may vary, hence it is
optimal to have both aggregation schemes (Eq. 1 & 2)
Graph Neural Networks (GNNs) leverage feature propagation mech-
anism [10] to aggregate neighborhood information of a node and
3 PROPOSED ARCHITECTURE
use non-linear transformation with trainable weight matrix to get
the final embeddings for the nodes. Conventionally, a simple GNN For the design of a GNN with good generalization capability and
layer is defined as performance, there are many aspects of the data that needs to be
considered. The feature propagation and aggregation scheme is
governed by if the class label distribution has strong homophily or
𝐻 (𝑖+1) = 𝜎 (𝐴˜𝑠𝑦𝑚 𝐻 (𝑖) 𝑊 (𝑖) ) (1)
heterophily or some combination of both. The number of hops (and
1 1
depth of the model for many GNN models) for feature aggregation
where 𝐴˜𝑠𝑦𝑚 = 𝐷˜ − 2 𝐴˜𝐷˜ − 2 is a symmetric normalized adjacency are dependent on graph structure and size as well as label distri-
matrix with added self-loops. 𝐻 𝑖 represents features from the pre- bution among neighbors of the nodes. Also, the type and amount
vious layer, 𝑊 𝑖 denotes the learnable weight matrix, and 𝜎 is a of regularization during training needs to be decided, for example,
non-linear activation function, which is usually ReLU in most im- using dropout on input features or on graph edges.
plementation of GNNs. However, this formulation is suitable for Keeping these aspects under consideration, we propose three
homophily datasets as features are cumulatively aggregated i.e. design strategies that help to create a versatile and simple GNN
node’s own features are added together with neighbor’s features. model.
Improving Graph Neural Networks with Simple Architecture Design Conference’17, July 2017, Washington, DC, USA

𝜶𝟏
(𝟎)
𝑾𝟏 ∥∙∥𝟐
𝑋

𝜶𝟐
𝑯(𝟏)
𝐴𝑠𝑦𝑚 𝑋 (𝟎)
𝑾𝟐 ∥∙∥𝟐 CONCAT 𝝈 𝑾(𝟐) 𝒁
Logits

𝜶𝟑
𝐴ሚ𝑠𝑦𝑚 𝑋 (𝟎)
𝑾𝟑 ∥∙∥𝟐
𝐾
෍ α𝑖 = 1
𝑖=1
Softmax Normalized
...

...

𝐴ሚ𝐾
𝑠𝑦𝑚 𝑋

˜
Figure 1: Figure shows model diagram of FSGNN. Input features are generated based on powers of 𝐴 and 𝐴.

3.1 Design Strategies for GNNs (ii) We can precompute and fix the node features set and ex-
3.1.1 Decouple feature generation and representation learning. periment with the neural network architectures for the best
As discussed in Sec. 2.1, these features can be aggregated cumula- performance. Precomputing features also helps to scale the
tively (homophily-based) or non-cumulatively (heterophily-based). training of the model for large graphs with batchwise train-
Moreover, the features can also be combined based on some arbi- ing.
trary criteria. We assume a function, (iii) In conventional GNN setting, stacking many layers also
causes oversmoothing of node features [5] and adversely
𝑔(𝑋, 𝐴, 𝐾) ↦→ {𝑋 1, 𝑋 2, . . . , 𝑋𝑝 } affects the performance of the model. Recently proposed
models use skip connection or residual connection to over-
The function takes 𝑋 as node features matrix, 𝐴 as an adjacency come this issue. However, they fail to demonstrate which
matrix, 𝐾 as the power of the adjacency matrix or number of hops features are useful. We provide an alternate scheme where
to propagate features and outputs a set of aggregated features. the model can learn weights that identify which features are
These features then can be combined using sum or concatenation useful for the prediction task.
operation to get final representation of the node. However, in the
node classification task, for a given label distribution, only a subset For the model design, instead of a single input channel, we pro-
of these features are useful to predict the label of the node. For pose to have all these features as input in parallel. Please refer Fig.1
example, features of node’s neighbors that lie at a greater distance for the illustration. Each feature is mapped to a separate linear layer.
in the graph may not be sufficiently informative or useful for node’s Hence the linear transformations are uniquely learned for all input
label prediction. features.
Conventionally, GNN models have feature propagation and trans- 3.1.2 Feature Selection.
formation combined into a single layer, and the layers are stacked As features are aggregated over many hops, some features are
together. This step makes it difficult to distinguish the importance useful and correlate with the label distribution, while others are
of the features and the role of MLP. To overcome this limitation, we not very useful for learning and act more like the noise for the
propose to separate the feature generation step and representation model. As we propose to input the feature set in parallel channels,
learning over features separately. This provides us with three main we can design the model to learn which features are more relevant
benefits. for lower loss value and giving higher weights to those features
(i) Features generated for nodes are not constrained by the while simultaneously reducing the weights on other features. We
design of the GNN model. We get the freedom to choose propose to weight these features with a single scalar value that is
the feature set as required by the problem and the neural multiplied to each input feature matrix and impose a constraint
network design, which is sufficiently expressive. on these values by softmax function. Let 𝛼𝑖 be the scalar value for
Conference’17, July 2017, Washington, DC, USA Maurya et al.

the 𝑖 𝑡ℎ feature matrix, then 𝛼𝑖 scales the magnitude of the features after hidden layers are not commonly used. It may be in part due
(0) to the common practice of normalizing node/edge features and
as 𝛼𝑖 𝑋𝑖𝑊𝑖 . Softmax function is used in deep learning as a non-
linear normalizer, and its output is often practically interpreted as symmetric/non-symmetric normalization of the adjacency matrix.
probabilities. Before training, the scalar values corresponding to We propose to normalize all aggregated features from different
each feature matrix are initialized with equal values and softmax hops after linear transformation, hence the term "Hop-Normalization".
is applied on these values. The resultant normalized values 𝛼𝑖 are We propose row-wise L2-normalize the hidden layer activations as,
then multiplied with the input features, and the concatenation
operator is applied. Considering 𝐿 number of input feature matrices ℎ𝑖 𝑗
ℎ𝑖 𝑗 = (6)
𝑋𝑙 , 𝑙 ∈ {1 .. 𝐿} , the formulation can be described as, ∥ ℎ𝑖 ∥ 2
where ℎ𝑖 represents the 𝑖 𝑡ℎ row vector of activations and ℎ𝑖 𝑗
𝐿
n (0) represents individual values. L2-normalization scales the node em-
𝐻 (1) = 𝛼𝑙 𝑋𝑙 𝑊𝑙 (4) bedding vectors to lie on the "unit sphere". In the later section, we
𝑙=1
empirically show significant improvements in the performance of
𝐿
∑︁ the model with the use of this scheme.
where 𝛼𝑙 = 1
𝑙=1 3.2 Feature Selection Graph Neural Network
While training, the scalar values of relevant features correspond- Combining the design strategies proposed earlier, we propose a
ing to the labels increase towards 1 while others decrease towards simple and shallow (2-layered) graph GNN model called Feature
0. The features that are not useful and represent more noise than Selection Graph Neural Network (FSGNN). Figure 1 shows the
signal have their magnitudes reduced with corresponding decreas- diagrammatic representation of our model. Input features are pre-
ing in their scalar values. Since we are not using a binary selection computed using 𝐴𝑠𝑦𝑚 and 𝐴˜𝑠𝑦𝑚 and transformed using a linear
of features, we term this selection procedure as "soft-selection" of layer unique to each feature matrix. Hop-normalization is applied
features. on the output activations of the first layer and weighted with scalar
This formulation can be understood in two ways. As GNNs have weights regularized by the softmax function. Output features are
represented with a polynomial filter, then concatenated and non-linearly transformed using ReLU and
𝐾−1
mapped to the second linear layer. Cross-entropy loss is calculated
with output logits of second layer.
∑︁
𝑔𝜃 (𝑃) = 𝜃𝑘 𝑃 𝑘 (5)
𝑘=0
where 𝜃 ∈ R𝐾 is a vector of polynomial coefficients and P can be Algorithm 1: Pseudo Code FSGNN (Forward propagation)
adjacency matrix [15][7], laplacian matrix [21] or PageRank based Input :𝐴𝑠𝑦𝑚 ; 𝐴˜𝑠𝑦𝑚 ; No. of hops 𝐾; weight matrices 𝑊 (𝑘) ;
matrix [2]. As the polynomial coefficients are scalar parameters 𝛼 vector of dimension 2K+1;
then our scheme can be considered as applying regularization on Output : Logits
these parameters using the softmax function. The other way to
look is to simply consider it as a weighting scheme. As the input 1 𝛼𝑖 ← 1.0, 𝑖 = 1...2𝐾 + 1
features can be arbitrarily chosen, and instead of a scalar weighting 2 𝛼 ← 𝑆𝑂𝐹𝑇 𝑀𝐴𝑋 (𝛼)
scheme, a more sophisticated scheme can be used. 3 𝑙𝑖𝑠𝑡_𝑚𝑎𝑡 ← [𝑋 ]
For practical implementation, since all weights are initialized 4 𝑋𝐴 ← 𝑋
as equal, they can be set equal to 1. After normalizing with soft- 5 𝑋𝐴˜ ← 𝑋
max function, the individual scalar values becomes equal to 1/𝐿. 6 for 𝑘 = 1...𝐾 do
During training, these values change, denoting the importance of 7 𝑋𝐴 ← 𝐴𝑠𝑦𝑚 𝑋𝐴
the features. In some cases, initial 𝛼𝑙 = 1/𝐿 value may be too small 8 𝑋𝐴˜ ← 𝐴˜𝑠𝑦𝑚 𝑋𝐴˜
and may adversely affect training. In that case, a constant 𝛾 may be 9 𝑙𝑖𝑠𝑡_𝑚𝑎𝑡 .𝐴𝑃𝑃𝐸𝑁 𝐷 ( 𝑋𝐴 )
multiplied after softmax normalization to increase the initial magni-
(0) 10 𝑙𝑖𝑠𝑡_𝑚𝑎𝑡 .𝐴𝑃𝑃𝐸𝑁 𝐷 ( 𝑋𝐴˜ )
tude as 𝛾𝛼𝑙 𝑋𝑙 𝑊𝑙 . Since 𝛾 remains constant during the training, it
11 end
does not affect the softmax regularization of the scalar parameters.
12 𝑙𝑖𝑠𝑡_𝑐𝑎𝑡 = 𝐿𝐼𝑆𝑇 ()
As the scalar values affect the magnitude of the features, they
13 for 𝑗 = 1...2𝐾 + 1 do
also affect the gradients propagated back to the linear layer, which
14 𝑋 𝑓 ← 𝑙𝑖𝑠𝑡_𝑚𝑎𝑡 [ 𝑗]
transforms the input features. Hence it is important to have a unique
(0)
weight matrix for each input feature matrix. 15 𝑂𝑢𝑡 ← 𝐻𝑂𝑃𝑁𝑂𝑅𝑀 ( 𝑋 𝑓 𝑊 𝑗 )
16 𝑙𝑖𝑠𝑡_𝑐𝑎𝑡 .𝐴𝑃𝑃𝐸𝑁 𝐷 ( 𝛼 𝑗 ⊙ 𝑂𝑢𝑡 )
3.1.3 Hop-Normalization.
The third strategy we propose is Hop-Normalization. It is a 17 end
common practice in the deep learning field to use different types 18 𝐻 (1) ← 𝐶𝑂𝑁𝐶𝐴𝑇 ( 𝑙𝑖𝑠𝑡_𝑐𝑎𝑡 )
of normalization schemes, for example, batch normalization [14], 19 𝑍 ← 𝑅𝑒𝐿𝑈 ( 𝐻 (1) )𝑊 (2)
layer normalization, weight normalization, and so on. However, in
graph neural network frameworks, normalization of activations
Improving Graph Neural Networks with Simple Architecture Design Conference’17, July 2017, Washington, DC, USA

4 RELATED WORK Table 1: Statistics of the node classification datasets

GNNs have emerged as an indispensable tool to learn graph-centric
Datasets Hom. Ratio Nodes Edges Features Classes
data. Many prediction tasks like node classification, link predic- Cora 0.81 2,708 5,429 1,433 7
tion, graph classification, etc. [8][15] introduced a simple end-to- Citeseer 0.74 3,327 4,732 3,703 6
end training framework using approximations of spectral graph Pubmed 0.80 19,717 44,338 500 3
convolutions. Since then, there has been a focus in the research Chameleon 0.23 2,277 36,101 2,325 4
Wisconsin 0.21 251 499 1,703 5
community to improve the performance of GNNs, and a variety of Texas 0.11 183 309 1,703 5
techniques have been introduced. Earlier GNN frameworks utilized Cornell 0.30 183 295 1,703 5
a fixed propagation scheme along all edges, which is sometimes Squirrel 0.22 5,201 198,353 2,089 5
Actor 0.22 7,600 26,659 932 5
not scalable for larger graphs. GraphSAGE[12] and FastGCN[6]
introduce neighbor sampling approaches in graph neural networks. ogbn-papers100M 111,059,956 1,615,685,872 128 172
GAT [27] introduces the use of the attention mechanism to pro- 5.2 Preprocessing
vide weights to features that are aggregated from the neighbors.
We follow the same preprocessing steps used by [22] and [7]. Other
APPNP [16], JK [31] and Geom-GCN [22] aim to improve the fea-
models also follow the same set of procedures. Initial node fea-
ture propagation scheme within layers of the model. More recently,
tures are row-normalized. To account for both homophily and
researchers are proposing to make GNN models deeper. However,
heterophily, we use the adjacency matrix and adjacency matrix
deeper models suffer from oversmoothing, where after stacking
with added-self loops for feature transformation. Both matrices are
many GNN layers, features of the node become indistinguishable
symmetrically normalized. For efficient computation, adjacency
from each other, and there is a drop in the performance of the model.
matrices are stored and used as sparse matrices.
DropEdge [23] proposes to drop a certain number of edges to re-
duce the speed of convergence of oversmoothing and relieves the
information loss. GCNII [7] use residual connections and identity
mapping in GNN layers to enable deeper networks. 5.3 Settings and Baselines
For a fully-supervised node classification task, each dataset is split
5 EXPERIMENTS evenly for each class into 60%, 20%, and 20% for training, valida-
In this section, we evaluate the empirical performance of our pro- tion, and testing. We report the performance as mean classification
posed model on real-world datasets on the node classification task accuracy over 10 random splits.
and compare with other graph neural network models. We fix the embedding size to 64 and set the initial learnable scalar
parameter with respect to each hop to 1 and 𝛾 is set to 1. Thus, the
5.1 Datasets initial scalar value 𝛼𝑖 is set to 1/𝐿. Hyper-parameter settings of the
For fully-supervised node classification tasks, we perform experi- model for best performance are found by performing a grid-search
ments on nine datasets commonly used in graph neural networks over a range of hyper-parameters.
literature. Details of the datasets are presented in Table 1. Ho- We compare our model to 8 different baselines and use the pub-
mophily ratio [35] denotes the fraction of edges which connects lished results as the best performance of these models. GCNII [7]
two nodes of the same label. A higher value (closer to 1) indicates and H2GCN [35] have proposed multiple variants of their model.
strong homophily, while a lower value (closer to 0) indicates strong We have chosen the variant with the best performance on most
heterophily in the dataset. Cora, Citeseer, and Pubmed [25] are datasets.
citation networks based datasets and in general, are considered
as homophily datasets. Graphs in Wisconsin, Cornell, Texas [22]
represent links between webpages, Actor [26] represent actor co-
occurrence in Wikipedia pages, Chameleon and Squirrel [24] repre- 5.4 Results
sent the web pages in Wikipedia discussing corresponding topics. Table 2 shows the comparison of the mean classification accuracy
These datasets are considered as heterophily datasets. To provide a of our model with other popular GNN models. On heterophily
fair comparison, we use publicly available data splits taken from datasets, our model shows significant improvements especially 64%
[22]1 . These splits have been frequently used by researchers for on Squirrel and 23% on Chameleon dataset. Similarly, on Wisconsin,
experiments in their publications. Results of comparison methods Texas, and Cornell, improvements are 2%, 3%, and 7%, respectively.
presented in this paper are also based on this split. H2GCN has closer performance to our model than other GNN mod-
In the analysis section, to demonstrate the scalability of the els as its architecture design accounts for the heterophily present in
model for large graphs, we use ogbn-papers100M dataset2 , which class labels and distinguishes node’s self-features from neighbor’s
is the largest publicly available node classification dataset. Many features. However, with our proposed model, we are able to achieve
nodes in this dataset do not have labels assigned, hence homophily higher accuracy. The performance of other GNN models is quite a
ratio is not calculated. We use standard split provided [13] to train bit lower as their design is more suitable for homophily datasets.
and evaluate the model. On homophily datasets, we observe most of the models have
comparable performance with GCNII and GEOM-GCN in the lead.
1 https://github.com/graphdml-uiuc-jlu/geom-gcn Our model is still comparable to state of the art and coming as
2 https://ogb.stanford.edu/docs/nodeprop/ second-best among various comparison measures.
Conference’17, July 2017, Washington, DC, USA Maurya et al.

Table 2: Mean classification accuracy on fully-supervised node classification task. Results for GCN, GAT, GraphSAGE,
Cheby+JK, MixHop and H2GCN-1 are taken from [35]. For GEOM-GCN and GCNII results are taken from the respective
article. Best performance for each dataset is marked as bold and second best performance is underlined for comparison.

Cora Citeseer Pubmed Chameleon Wisconsin Texas Cornell Squirrel Actor

GCN 87.28±1.26 76.68±1.64 87.38±0.66 59.82±2.58 59.80±6.99 59.46±5.25 57.03±4.67 36.89±1.34 30.26±0.79
GAT 82.68±1.80 75.46±1.72 84.68±0.44 54.69±1.95 55.29±8.71 58.38±4.45 58.92±3.32 30.62±2.11 26.28±1.73
GraphSAGE 86.90±1.04 76.04±1.30 88.45±0.50 58.73±1.68 81.18±5.56 82.43±6.14 75.95±5.01 41.61±0.74 34.23±0.99
Cheby+JK 85.49±1.27 74.98±1.18 89.07±0.30 63.79±2.27 82.55±4.57 78.38±6.37 74.59±7.87 45.03±1.73 35.14±1.37
MixHop 87.61±0.85 76.26±1.33 85.31±0.61 60.50±2.53 75.88±4.90 77.84±7.73 73.51±6.34 43.80±1.48 32.22±2.34
GEOM-GCN 85.27 77.99 90.05 60.90 64.12 67.57 60.81 38.14 31.63
GCNII 88.01±1.33 77.13±1.38 90.30±0.37 62.48±2.74 81.57±4.98 77.84±5.64 76.49±4.37 N/A N/A
H2GCN-1 86.92±1.37 77.07±1.64 89.40±0.34 57.11±1.58 86.67±4.69 84.86±6.77 82.16±4.80 36.42±1.89 35.86±1.03
Ours(3-hop) 87.73±1.36 77.19±1.35 89.73±0.39 78.14±1.25 88.43±3.22 87.30±5.55 87.03±5.77 73.48±2.13 35.67±0.69
Ours(8-hop) 87.93±1.00 77.40±1.93 89.75±0.39 78.27±1.28 87.84±3.37 87.30±5.28 87.84±6.19 74.10±1.89 35.75±0.96

Table 3: Ablation study over 1080 different hyperparameter settings.

Cora Citeseer Pubmed Chameleon Wisconsin Texas Cornell Squirrel Actor

Proposed 83.68±2.22 74.48±1.44 89.24±0.27 72.48±4.16 81.48±5.62 78.80±5.88 78.09±2.22 63.57±6.83 33.54±1.21
Without soft-selection 87.07±0.26 76.45±0.27 89.09±0.39 72.27±1.34 78.03±6.55 76.28±6.72 74.32±6.54 61.73±4.15 34.15±0.64
Common weight (𝑊 (0) ) 83.19±1.41 72.15±1.02 88.96±0.28 68.24±6.03 70.56±10.94 68.45±7.65 68.18±9.13 56.63±8.54 32.73±1.48
Without Hop-normalization 77.12±3.49 71.40±10.01 87.72±0.77 53.06±6.18 82.60±2.68 76.33±3.87 76.18±3.43 32.60±6.38 36.66±0.55

6 DISCUSSION One exception is Pubmed, where the model’s performance is rela-

tively unperturbed under various hyperparameter combinations.
6.1 Ablation Studies
In this section, we consider the effect of various proposed design
strategies on the performance of the model. In general, graph neural Cora 0.8
networks are sensitive to the hyperparameters used in training and
require some amount of tuning to get the best performance. Since
Citeseer 0.7
each dataset may have different set of best hyperparameters, it can Pubmed
be difficult to judge design decisions based just on best performance 0.6
of the model with single hyperparameter setting. To provide a Chameleon
comprehensive evaluation, we compare the average accuracy of 0.5
the model over 1080 combinations of the hyperparameters. The Wisconsin
hyperparameters we tune are learning rate and weight decay of 0.4
layers and dropout value applied as regularization between layers. Cornell
Table 3 shows the average of classification accuracy values under 0.3
various settings. Texas
For most datasets, our proposed design schemes lead to better av- 0.2
Squirrel
erage accuracy. Cora and Citeseer show better average performance
without softmax regularization, however, the peak performance is Actor 0.1
marginally less with regularization. Even though Wisconsin shows
higher average accuracy without normalization, however, the best X AX (A + I)2X A2X (A + I)2X A3X (A + I)3X
performance on the dataset was achieved with the normalization
layer. We found that Actor was the only dataset where performance Figure 2: Heatmap of average of learned soft-selection scalar
reduced with the addition of the normalization layer. Without the for all datasets
normalization layer, our model achieves 37.63% accuracy. How-
ever, to maintain consistency, we do not include it in the main
results. These variations also highlight the fact that a single set of 6.2 Soft-Selection Parameter Analysis
design choices may not apply to all datasets/tasks and some level We analyze the learned soft-selection parameters on average over
of exploration is required. different model hyperparameter combinations. We use four dif-
It is interesting to note that performance on almost all datasets ferent settings: Proposed model setting, without softmax regular-
is sensitive to the choice of the hyperparameters for training the ization on scalar weight parameters, shared linear transformation
model as there is a wide gap between best and average performance. layer on input features, and without Hop-Normalization on in-
put feature activations. For homophily datasets, it is easy to see
Improving Graph Neural Networks with Simple Architecture Design Conference’17, July 2017, Washington, DC, USA

Without hop-normalization With hop-normalization

60
80 0 0
1 40 1
60 2 2
3 3
Squirrel dataset

40 4 20 4
20 0
0
20
20
40
40
60 60
60 40 20 0 20 40 60 40 20 0 20 40 60

0 60 0
30 1 1
2 2
Chameleon dataset

20 40
3 3
10 4 20 4
0 0
10 20
20
40
30
60
60 40 20 0 20 40 60 80 60 40 20 0 20 40

Figure 3: Figure shows t-SNE plots of trained embeddings (3-hop) of Squirrel and Chameleon datasets without (left) and with
hop-normalization (right). Points represent nodes and colors represent their respective labels. Mean classification accuracy
without and with hop-normalization are 39.92% and 73.48% for Squirrel; 61.38% and 78.14% for Chameleon datasets respectively.

that self-looped features are given more importance. Among het- for 3-hop aggregation. The dimension of the hidden layer is set to
erophily datasets, Wisconsin, Cornell, Texas, and Actor have the 256 and 𝛾 is set to L=7 (equal to number of input features) to provide
most weights on node’s ego features. In these datasets, graph struc- stable training. The model is trained batchwise with input features
ture plays a limited role in the performance accuracy of the model. for 10 random initializations, and we report mean accuracy.
For Chameleon and Squirrel datasets, we observed that the node’s We compare the accuracy of our model with SGC [29], Node2Vec
own features and first-hop features(without self-loop) were more [11] and SIGN [9]. Similar to our method, input features can be
useful for classification than any other features. precomputed in SGC and SIGN, thus making them scalable for larger
datasets. Once features are computed, the model can be trained
6.3 Hop-Normalization with small input batches of node features on the GPU. Many other
In our experimental results, we find Chameleon and Squirrel datasets GNN models cannot be trained for larger graphs as the feature
have significant improvements. To understand the results better, we generation, and model training are combined.
create 2-dimensional plot of the trained embeddings of both datasets Table 4 shows the mean node classification accuracy along with
using t-SNE[17]. Figure 4 shows the comparison of embeddings published results of other methods taken from [9][13]. Our model
with and without hop-normalization. Without hop-normalization, outperforms all other methods, with SIGN having a closer perfor-
embeddings of the nodes are not separated clearly, thus resulting in mance to ours. However, SIGN uses the adjacency matrix of both
lower classification performance. We observe similar performance directed and undirected versions of the graph for feature transfor-
on other GNN models. While with hop-normalization, the node mations, while our model only utilizes the adjacency matrix of the
embeddings are well separated into clusters corresponding to their undirected graph.
label leading to a higher observed performance with our model.

6.4 Model Scalability 6.5 Effect of increase in hops

Many GNN models by design are not scalable for large graph In this section, we evaluate the change in model’s performance with
datasets with millions of nodes. To demonstrate the scalability increase in the hops for aggregation. We choose one homophily
of our model, we run experiments on ogbn-papers100M dataset dataset (Cora) and one heterophily dataset (Chameleon). Experi-
[28][13] which is a citation graph with about 111 million nodes, 1.6 ments are run with hop values set to 3,8,16, and 32. Figure 4 shows
billion edges and 172 node label classes. Similar to our previous the performance of the model for each hop setting. We observe that
experimental settings, we generate a set of features with 𝐴 and 𝐴˜ there is little variation in the performance of the model. This result
Conference’17, July 2017, Washington, DC, USA Maurya et al.

Table 4: Mean classification accuracy on ogbn-100M dataset. For node classfication results (2), we do grid search for learning
SGC result is taken from [13] and Node2Vec and SIGN re- rate and weight decay of the layers and dropout between the layers.
sults are taken from [9]. Best performance is marked bold Hyperparameters are set for first layer 𝑓 𝑐1, second layer 𝑓 𝑐2 and
and second best performance is underlined. scalar weight parameter 𝑠𝑐𝑎. ReLU is used as non-linear activation
and Adam is used as the optimizer. Table 5 shows details of hyperpa-
Method Accuracy rameter search space. Table 6 and 7 show the best hyperparameters
SGC 63.29±0.19 for the model in 3-hop and 8-hop configuration respectively.
Node2Vec 58.07±0.28 For experiments on ogbn-papers100M dataset, we did not do grid
SIGN 65.11±0.14 search. Based on the data from earlier experiments we manually
FSGNN 67.17±0.14 tuned the hyperparameters to get the accuracy result. Batch size
of 10000 was used for training data. Table 8 shows the relevant
is intuitive as aggregated features from higher hops are not very hyperparameters for the model.
useful, and the model can learn to place low weights on them.
Table 5: Hyperparameter search space
100
Cora

90 Hyperparameter Values
0.0, 0.0001, 0.001, 0.01, 0.1
Accuracy (%)

80
𝑊 𝐷𝑠𝑐𝑎
𝐿𝑅𝑠𝑐𝑎 0.04, 0.02, 0.01, 0.005
70 𝑊 𝐷 𝑓 𝑐1 0.0, 0.0001, 0.001
60 𝑊 𝐷 𝑓 𝑐2 0.0, 0.0001, 0.001
𝐿𝑅 𝑓 𝑐 0.01, 0.005
50
2 4 8 16 32 𝐷𝑟𝑜𝑝𝑜𝑢𝑡 0.5, 0.6, 0.7
100
Chameleon

90
Table 6: Hyperparameters of the 3-hop model
Accuracy (%)

70 Datasets 𝑊 𝐷𝑠𝑐𝑎 𝐿𝑅𝑠𝑐𝑎 𝑊 𝐷 𝑓 𝑐1 𝑊 𝐷 𝑓 𝑐2 𝐿𝑅 𝑓 𝑐 𝐷𝑟𝑜𝑝𝑜𝑢𝑡

60 Cora 0.1 0.01 0.001 0.0001 0.01 0.6
Citeseer 0.0001 0.005 0.001 0.0 0.01 0.5
50 Pubmed 0.01 0.005 0.0001 0.0001 0.01 0.7
2 4 8 16 32
No. of hops Chameleon 0.1 0.005 0.0 0.0 0.005 0.5
Wisconsin 0.0001 0.01 0.001 0.0001 0.01 0.5
Figure 4: Figure shows the effect on classification accuracy Texas 0.001 0.01 0.001 0.0 0.01 0.7
of FSGNN with increase in the number of hops of fea- Cornell 0.0 0.01 0.001 0.001 0.01 0.5
ture aggregation on Cora (homophily) and Chameleon (het- Squirrel 0.1 0.04 0.0 0.001 0.01 0.7
Actor 0.0 0.04 0.001 0.0001 0.01 0.7
erophily) dataset. x-axis is in logarithmic scale.

7 CONCLUSION Table 7: Hyperparameters of the 8-hop model

We discuss three GNN design strategies: separation of feature ag-
gregation and representation learning; soft-selection of features, Datasets 𝑊 𝐷𝑠𝑐𝑎 𝐿𝑅𝑠𝑐𝑎 𝑊 𝐷 𝑓 𝑐1 𝑊 𝐷 𝑓 𝑐2 𝐿𝑅 𝑓 𝑐 𝐷𝑟𝑜𝑝𝑜𝑢𝑡
and hop-normalization. Using these simple and effective strategies, Cora 0.1 0.02 0.001 0.0001 0.01 0.6
we propose a novel GNN model, called FSGNN. Using extensive Citeseer 0.0001 0.01 0.001 0.0001 0.01 0.5
experiments, we show that FSGNN outperforms the current state Pubmed 0.01 0.02 0.0001 0.0 0.005 0.7
Chameleon 0.1 0.01 0.0 0.0 0.005 0.5
of the art GNN models on the node classification task. Analysis
Wisconsin 0.001 0.02 0.001 0.0001 0.01 0.5
of the learned parameters provides us the crucial information of Texas 0.01 0.01 0.001 0.0 0.01 0.7
feature importance. Furthermore, we show that our model can be Cornell 0.0 0.01 0.001 0.0001 0.01 0.5
scaled for graphs with millions of nodes and billions of edges. Squirrel 0.1 0.02 0.0 0.0001 0.01 0.5
Actor 0.0001 0.04 0.001 0.0001 0.01 0.7

IMPLEMENTATION DETAILS
For reproducibility of experimental results, we provide the details Table 8: Hyperparameters for the ogbn-paper100M dataset
of our experiment setup and hyperparameters of the model.
We use PyTorch 1.6.0 as deep learning framework on Python 3.8. Dataset 𝑊 𝐷𝑠𝑐𝑎 𝐿𝑅𝑠𝑐𝑎 𝑊 𝐷 𝑓 𝑐1 𝑊 𝐷 𝑓 𝑐2 𝐿𝑅 𝑓 𝑐1 𝐿𝑅 𝑓 𝑐2 𝐷𝑟𝑜𝑝𝑜𝑢𝑡
ogbn-papers100M 0.1 0.0001 0.001 0.000001 0.00005 0.0002 0.5
Model training is done on Nvidia V100 GPU with 16 GB graphics
memory and CUDA version 10.2.89.
Improving Graph Neural Networks with Simple Architecture Design Conference’17, July 2017, Washington, DC, USA

ACKNOWLEDGEMENT [17] L. v. d. Maaten. Accelerating t-SNE using tree-based algo-

This work was supported by JSPS Grant-in-Aid for Scientific Re- rithms. Journal of Machine Learning Research, 15(93):3221–
search (Grant Number 21K12042, 17H01785), JST CREST (Grant 3245, 2014.
Number JPMJCR1687), and the New Energy and Industrial Tech- [18] K. Madhawa et al. GraphNVP: an invertible flow model
nology Development Organization (Grant Number JPNP20006) for generating molecular graphs. arXiv:1905.11600 [cs, stat],
May 28, 2019. arXiv: 1905.11600.
REFERENCES [19] D. Marcheggiani et al. Encoding sentences with graph con-
volutional networks for semantic role labeling. In Proceed-
[1] S. Abu-El-Haija et al. MixHop: higher-order graph convolu-
ings of the 2017 Conference on Empirical Methods in Natural
tional architectures via sparsified neighborhood mixing. In
Language Processing. EMNLP 2017, pages 1506–1515, Copen-
ICML, 2019.
hagen, Denmark. Association for Computational Linguistics,
[2] D. Berberidis et al. Adaptive diffusions for scalable learning
Sept. 2017.
over graphs. IEEE Transactions on Signal Processing, 67(5):1307–
[20] S. K. Maurya et al. Fast approximations of betweenness cen-
1321, Mar. 2019. Conference Name: IEEE Transactions on
trality using graph neural networks. In International Confer-
Signal Processing.
ence on Information and Knowledge Management (CIKM),
[3] R. v. d. Berg et al. Graph convolutional matrix completion.
2019.
ArXiv, abs/1706.02263, 2017.
[21] H. NT et al. Stacked graph filter. arXiv:2011.10988 [cs], Nov. 22,
[4] I. Chami et al. Hyperbolic graph convolutional neural net-
2020. arXiv: 2011.10988.
works. NeurIPS, 2019.
[22] H. Pei et al. Geom-GCN: geometric graph convolutional
[5] D. Chen et al. Measuring and relieving the over-smoothing
networks. ICLR, 2020.
problem for graph neural networks from the topological
[23] Y. Rong et al. DropEdge: towards deep graph convolutional
view. arXiv:1909.03211 [cs, stat], Nov. 18, 2019. arXiv: 1909.
networks on node classification. In ICLR, 2020.
03211.
[24] B. Rozemberczki et al. Multi-scale attributed node embed-
[6] J. Chen et al. FastGCN: fast learning with graph convolu-
ding. arXiv:1909.13021 [cs, stat], Mar. 10, 2020. arXiv: 1909.
tional networks via importance sampling. In International
13021.
Conference on Learning Representations, Feb. 15, 2018.
[25] P. Sen et al. Collective classification in network data. AI Mag.,
[7] M. Chen et al. Simple and deep graph convolutional net-
2008.
works. ICML, 2020.
[26] J. Tang et al. Social influence analysis in large-scale networks.
[8] M. Defferrard et al. Convolutional neural networks on graphs
In Proceedings of the 15th ACM SIGKDD international con-
with fast localized spectral filtering. arXiv:1606.09375 [cs,
ference on Knowledge discovery and data mining, KDD ’09,
stat], June 30, 2016. arXiv: 1606.09375.
pages 807–816, New York, NY, USA. Association for Com-
[9] F. Frasca et al. SIGN: scalable inception graph neural net-
puting Machinery, June 28, 2009.
works. arXiv:2004.11198 [cs, stat], Nov. 3, 2020. arXiv: 2004.
[27] P. Velickovic et al. Graph attention networks. ArXiv, abs/
11198.
1710.10903, 2017.
[10] J. Gilmer et al. Neural message passing for quantum chem-
[28] K. Wang et al. Microsoft academic graph: when experts
istry. In Proceedings of the 34th International Conference on
are not enough. Quantitative Science Studies, 1(1):396–413,
Machine Learning - Volume 70, ICML’17, pages 1263–1272,
Jan. 23, 2020. Publisher: MIT Press.
2017.
[29] F. Wu et al. Simplifying graph convolutional networks. In
[11] A. Grover et al. Node2vec: scalable feature learning for net-
ICML, 2019. arXiv: 1902.07153.
works. KDD, 2016.
[30] Z. Wu et al. A comprehensive survey on graph neural net-
[12] W. Hamilton et al. Inductive representation learning on large
works. arXiv:1901.00596 [cs, stat], Dec. 3, 2019. arXiv: 1901.
graphs. In I. Guyon et al., editors, NIPS, pages 1024–1034.
00596.
2017.
[31] K. Xu et al. Representation learning on graphs with jumping
[13] W. Hu et al. Open graph benchmark: datasets for machine
knowledge networks. In International Conference on Machine
learning on graphs. arXiv:2005.00687 [cs, stat], Jan. 23, 2021.
Learning. International Conference on Machine Learning,
arXiv: 2005.00687. version: 4.
pages 5453–5462. PMLR, July 3, 2018. ISSN: 2640-3498.
[14] S. Ioffe et al. Batch normalization: accelerating deep network
[32] R. Ying et al. Graph convolutional neural networks for web-
training by reducing internal covariate shift. In International
scale recommender systems. In KDD ’18, 2018.
Conference on Machine Learning. International Conference
[33] R. Ying et al. Hierarchical graph representation learning
on Machine Learning, pages 448–456. PMLR, June 1, 2015.
with differentiable pooling. In Proceedings of the 32Nd Inter-
ISSN: 1938-7228.
national Conference on Neural Information Processing Systems,
[15] T. N. Kipf et al. Semi-supervised classification with graph
NIPS’18, pages 4805–4815, USA. Curran Associates Inc., 2018.
convolutional networks. ICLR, 2017. arXiv: 1609.02907.
event-place: Montréal, Canada.
[16] J. Klicpera et al. Predict then propagate: combining neural
[34] M. Zhang et al. An end-to-end deep learning architecture
networks with personalized pagerank for classification on
for graph classification. In AAAI, 2018.
graphs. 2018. (Visited on 01/29/2021).
Conference’17, July 2017, Washington, DC, USA Maurya et al.

[35] J. Zhu et al. Beyond homophily in graph neural networks:

current limitations and effective designs. Advances in Neural
Information Processing Systems, 33, 2020.

Xu Et Al. 2023
No ratings yet
Xu Et Al. 2023
12 pages
2024 - Introduction To Graph Neural Networks A Starting
No ratings yet
2024 - Introduction To Graph Neural Networks A Starting
49 pages
Seminar Presentation
No ratings yet
Seminar Presentation
19 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
29256-Article Text-33310-1-2-20240324
No ratings yet
29256-Article Text-33310-1-2-20240324
9 pages
How Powerful Are Graph Neural Networks
No ratings yet
How Powerful Are Graph Neural Networks
17 pages
Yang 20 A
No ratings yet
Yang 20 A
16 pages
Graph Neural Networks Beyond Homophily
No ratings yet
Graph Neural Networks Beyond Homophily
56 pages
GNNs
No ratings yet
GNNs
28 pages
Are Powerful Graph Neural Nets Necessary
No ratings yet
Are Powerful Graph Neural Nets Necessary
16 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
20 pages
Graph Neural Networks Methods Applications and Opp
No ratings yet
Graph Neural Networks Methods Applications and Opp
35 pages
Graph Transformer Networks: Corresponding Author
No ratings yet
Graph Transformer Networks: Corresponding Author
11 pages
GNNs for IoMT and Computer Vision
No ratings yet
GNNs for IoMT and Computer Vision
15 pages
GraphNorm: Boosting GNN Training
No ratings yet
GraphNorm: Boosting GNN Training
25 pages
Chap7 GNN (20240229) - DL4H Practioner Guide
No ratings yet
Chap7 GNN (20240229) - DL4H Practioner Guide
37 pages
Graph Neural Networks: A Review of Methods and Applications
No ratings yet
Graph Neural Networks: A Review of Methods and Applications
22 pages
Approximation - and Quantization-Aware Training For Graph Neural Networks
No ratings yet
Approximation - and Quantization-Aware Training For Graph Neural Networks
14 pages
DGCNN
No ratings yet
DGCNN
8 pages
Graph Neural Networks for Classification
No ratings yet
Graph Neural Networks for Classification
15 pages
Enhanced Graph Representations For Graph Convolutional Network Models
No ratings yet
Enhanced Graph Representations For Graph Convolutional Network Models
18 pages
GNN Foundations Frontiers and Applications Chapter3
No ratings yet
GNN Foundations Frontiers and Applications Chapter3
11 pages
Lecture 14 Graph Neural Networks (GNNS)
No ratings yet
Lecture 14 Graph Neural Networks (GNNS)
16 pages
Graph Neural Networks
No ratings yet
Graph Neural Networks
13 pages
Introduction to Graph Neural Networks
No ratings yet
Introduction to Graph Neural Networks
49 pages
Ishigurognnintroduction201023 201027054344
No ratings yet
Ishigurognnintroduction201023 201027054344
81 pages
Expressive Power of Graph Neural Networks
No ratings yet
Expressive Power of Graph Neural Networks
42 pages
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
No ratings yet
A Survey of Graph Neural Networks in Various Learning Paradigms Methods, Applications, and Challenges
70 pages
Applsci 13 07150
No ratings yet
Applsci 13 07150
15 pages
Self-Constructing Graph Convolutional Networks For Semantic Labeling
No ratings yet
Self-Constructing Graph Convolutional Networks For Semantic Labeling
4 pages
Rolip2 Report GNN
No ratings yet
Rolip2 Report GNN
6 pages
Thesis Master 2022 Application of GNN For Graph Classification
No ratings yet
Thesis Master 2022 Application of GNN For Graph Classification
81 pages
F-GCN: Enhanced Graph Convolutional Networks
No ratings yet
F-GCN: Enhanced Graph Convolutional Networks
10 pages
Introduction to Graph Neural Networks
No ratings yet
Introduction to Graph Neural Networks
22 pages
Dirac-Bianconi Graph Neural Networks - Enabling Non-Diffusive Long-Range Graph Predictions
No ratings yet
Dirac-Bianconi Graph Neural Networks - Enabling Non-Diffusive Long-Range Graph Predictions
14 pages
Self-Supervised Learning in GNNs
No ratings yet
Self-Supervised Learning in GNNs
107 pages
19 - Large-Scale Learnable Graph Convolutional Networks
No ratings yet
19 - Large-Scale Learnable Graph Convolutional Networks
9 pages
Graph GPT
No ratings yet
Graph GPT
10 pages
Review of Image Classification Algorithms Based On
No ratings yet
Review of Image Classification Algorithms Based On
10 pages
Graph Neural Network-Based Fault Diagnosis: A Review
No ratings yet
Graph Neural Network-Based Fault Diagnosis: A Review
17 pages
Unit III GNN
100% (1)
Unit III GNN
56 pages
GNN Design for AI Researchers
No ratings yet
GNN Design for AI Researchers
9 pages
Defence Transcription
No ratings yet
Defence Transcription
4 pages
Esgnn
No ratings yet
Esgnn
14 pages
Adaptive Graph Diffusion
No ratings yet
Adaptive Graph Diffusion
18 pages
Graph Neural Networks Explained
No ratings yet
Graph Neural Networks Explained
22 pages
Theory of Graph Neural Networks: Representation and Learning
No ratings yet
Theory of Graph Neural Networks: Representation and Learning
23 pages
A Comprehensive Survey On Graph Neural Networks
No ratings yet
A Comprehensive Survey On Graph Neural Networks
22 pages
LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity
No ratings yet
LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity
12 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
GNNS
No ratings yet
GNNS
7 pages
Content Augmented Graph Neural Networks
No ratings yet
Content Augmented Graph Neural Networks
15 pages
Hierarchical Graph Pooling With Structure Learning
No ratings yet
Hierarchical Graph Pooling With Structure Learning
9 pages
Graph Conv
No ratings yet
Graph Conv
16 pages
Raw-Gnn: Random Walk Aggregation Based Graph Neural Network
No ratings yet
Raw-Gnn: Random Walk Aggregation Based Graph Neural Network
7 pages
Overview of Graph Neural Networks
No ratings yet
Overview of Graph Neural Networks
22 pages
NeurIPS 2020 Graph Random Neural Networks For Semi Supervised Learning On Graphs Paper
No ratings yet
NeurIPS 2020 Graph Random Neural Networks For Semi Supervised Learning On Graphs Paper
12 pages
Graph Convolutional Networks Review
No ratings yet
Graph Convolutional Networks Review
23 pages
Detecting Key Players
No ratings yet
Detecting Key Players
6 pages
Vectors Matrices
No ratings yet
Vectors Matrices
46 pages
Functions, Algebra Revision Notes From A-Level Maths Tutor
No ratings yet
Functions, Algebra Revision Notes From A-Level Maths Tutor
6 pages
Understanding Analytic Functions in C
No ratings yet
Understanding Analytic Functions in C
9 pages
Understanding Rational Functions and Equations
No ratings yet
Understanding Rational Functions and Equations
23 pages
Trigonometric Formulae
No ratings yet
Trigonometric Formulae
3 pages
Fourier Series Problem Set
No ratings yet
Fourier Series Problem Set
2 pages
Application of Derivatives in Geometry
No ratings yet
Application of Derivatives in Geometry
24 pages
Algorithms09
No ratings yet
Algorithms09
38 pages
Reveal Math 2020 Algebra 2
No ratings yet
Reveal Math 2020 Algebra 2
20 pages
Logarithm Rules & Applications
No ratings yet
Logarithm Rules & Applications
9 pages
Maxima and Minima Methods Explained
0% (1)
Maxima and Minima Methods Explained
13 pages
Chapter 4.3 Part 1 Circular Functions PDF
No ratings yet
Chapter 4.3 Part 1 Circular Functions PDF
4 pages
MA2002 Calculus Tutorial 2 Limits
No ratings yet
MA2002 Calculus Tutorial 2 Limits
3 pages
Lesson 2: Evaluating Functions
No ratings yet
Lesson 2: Evaluating Functions
13 pages
Differential Calculus for JEE Preparation
No ratings yet
Differential Calculus for JEE Preparation
28 pages
Data Structures Exam: Stacks, Queues, Trees, Graphs
No ratings yet
Data Structures Exam: Stacks, Queues, Trees, Graphs
4 pages
Network Optimization: Nodes or Vertices. The Lines Are Called Arcs. The Arcs May Have A Direction On Them, in Which
No ratings yet
Network Optimization: Nodes or Vertices. The Lines Are Called Arcs. The Arcs May Have A Direction On Them, in Which
8 pages
L02 AsymptoticAnalysis I
No ratings yet
L02 AsymptoticAnalysis I
24 pages
3.4.6 Test (TST) - Working With Trigonometric Functions (Test)
No ratings yet
3.4.6 Test (TST) - Working With Trigonometric Functions (Test)
11 pages
Unit 4-Decrease and Conquer & Divide and Conquer
No ratings yet
Unit 4-Decrease and Conquer & Divide and Conquer
13 pages
Q2 - G8 Math
No ratings yet
Q2 - G8 Math
11 pages
The Cubic Formula
No ratings yet
The Cubic Formula
1 page
Unit-2 Ai
No ratings yet
Unit-2 Ai
40 pages
DFS on Undirected Graph G
No ratings yet
DFS on Undirected Graph G
37 pages
Numerical Differentiation and Integration
No ratings yet
Numerical Differentiation and Integration
17 pages
2nd PT Gen - Math For SY 2024 2025
No ratings yet
2nd PT Gen - Math For SY 2024 2025
5 pages
Question Bank For Seen-Preboard - Maths - XII - 2024-25
No ratings yet
Question Bank For Seen-Preboard - Maths - XII - 2024-25
68 pages
11 Mathematics Practice Paper 2023-24
No ratings yet
11 Mathematics Practice Paper 2023-24
7 pages
Limit of A Function and Limit Theorems: Mathematics 21
No ratings yet
Limit of A Function and Limit Theorems: Mathematics 21
27 pages

Improving Graph Neural Networks With Simple Architecture Design

Uploaded by

Improving Graph Neural Networks With Simple Architecture Design

Uploaded by

Improving Graph Neural Networks with Simple Architecture

4 RELATED WORK Table 1: Statistics of the node classification datasets

Cora Citeseer Pubmed Chameleon Wisconsin Texas Cornell Squirrel Actor

Table 3: Ablation study over 1080 different hyperparameter settings.

Cora Citeseer Pubmed Chameleon Wisconsin Texas Cornell Squirrel Actor

6 DISCUSSION One exception is Pubmed, where the model’s performance is rela-

Without hop-normalization With hop-normalization

6.4 Model Scalability 6.5 Effect of increase in hops

70 Datasets 𝑊 𝐷𝑠𝑐𝑎 𝐿𝑅𝑠𝑐𝑎 𝑊 𝐷 𝑓 𝑐1 𝑊 𝐷 𝑓 𝑐2 𝐿𝑅 𝑓 𝑐 𝐷𝑟𝑜𝑝𝑜𝑢𝑡

7 CONCLUSION Table 7: Hyperparameters of the 8-hop model

ACKNOWLEDGEMENT [17] L. v. d. Maaten. Accelerating t-SNE using tree-based algo-

[35] J. Zhu et al. Beyond homophily in graph neural networks:

You might also like