Large Language Models On Graphs: A Comprehensive Survey
Large Language Models On Graphs: A Comprehensive Survey
8, AUGUST 2021 1
✦
arXiv:2312.02783v1 [cs.CL] 5 Dec 2023
or scenarios where graph data are paired with rich textual information
(e.g., molecules with descriptions). Besides, although LLMs have shown Pure Graphs Text-Paired Graphs Text-Rich Graphs
their pure text-based reasoning ability, it is underexplored whether
such ability can be generalized to graph scenarios (i.e., graph-based
reasoning). In this paper, we provide a systematic review of scenarios
and techniques related to large language models on graphs. We first
summarize potential scenarios of adopting LLMs on graphs into three
categories, namely pure graphs, text-rich graphs, and text-paired graphs.
We then discuss detailed techniques for utilizing LLMs on graphs,
including LLM as Predictor, LLM as Encoder, and LLM as Aligner, LLM LLM GNN LLM GNN
and compare the advantages and disadvantages of different schools
of models. Furthermore, we mention the real-world applications of “text” GNN “text” “text” LLM
such methods and summarize open-source codes and benchmark
“text”
datasets. Finally, we conclude with potential future research directions
in this fast-growing field. The related source can be found at https: LLM as Predictor LLM as Aligner LLM as Encoder
//github.com/PeterGriffinJin/Awesome-Language-Model-on-Graphs.
ffi
Large Language Models’ Roles
Index Terms—Large Language Models, Graph Neural Networks, Natural
Language Processing, Graph Representation Learning Fig. 1. According to the relationship between graph and text, we catego-
rize three LLM on graph scenarios. Depending on the role of LLM, we
summarize three LLM-on-graph techniques. “LLM as Predictor” is where
LLMs are responsible for predicting the final answer. “LLM as Aligner”
1 I NTRODUCTION will align the inputs-output pairs with those of GNNs. “LLM as Encoder”
refers to using LLMs to encode and obtain feature vectors.
L ARGE language models (LLMs) (e.g., BERT [22], T5
[30], LLaMA [119]) which are pretrained on very large
text corpus has been demonstrated to be very powerful in
While LLMs are extensively applied to process pure texts,
solving natural language processing (NLP) tasks, including
there is an increasing number of applications where the text
question answering [1], text generation [2] and document
data are associated with structure information which are
understanding [3]. Early LLMs (e.g., BERT [22], RoBERTa [23])
represented in the form of graphs. As presented in Fig. 1, in
adopt an encoder-only architecture and are mainly applied
academic networks, papers (with title and description) and
for text representation learning [4] and natural language
authors (with profile text), are interconnected with author-
understanding [3]. In recent years, more focus has been
ship relationships. Understanding both the author/paper’s
given to decoder-only architectures [119] or encoder-decoder
text information and author-paper structure information
architectures [30]. As the model size scales up, such LLMs
on such graphs can contribute to advanced author/paper
have also shown reasoning ability and even more advanced
modeling and accurate recommendations for collaboration;
emergent ability [5], exposing a strong potential for Artificial
In the scientific domain, molecules are represented as
General Intelligence (AGI).
graphs and are often paired with text that describes their
basic information (e.g., toxicity). Joint modeling of both the
• * The first three authors contributed equally to this work.
• Bowen Jin, Chi Han, Heng Ji, Jiawei Han: University of Illinois at Urbana- molecule structure (graph) and the associated rich knowledge
Champaign. {bowenj4, chihan3, hengji, hanj}@illinois.edu (text) is important for deeper molecule understanding. Since
• Gang Liu, Meng Jiang: University of Notre Dame. {gliu7, LLMs are mainly proposed for modeling texts that lie in a
mjiang2@}@nd.edu
sequential fashion, those scenarios mentioned above pose
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 2
textual information dvi ∈ D. For instance, in an academic where Q, K, V ∈ RNS ×dk are the query, key, and value vec-
citation network, one can interpret v ∈ V as the scholarly tors for each word in the sentence, respectively. The attention
articles, e ∈ E as the citation links between them, and d ∈ D mechanism is designed to capture the dependencies between
as the textual content of these articles. A graph with node- words in a sentence in a flexible way, an advantage also
level textual information is also called a text-rich graph [32], potentially useful for combining with other input formats
a text-attributed graph [62], or a textual graph [73]. like graphs. BERT is useful as a text representation model,
Definition 3 (Graph with edge-level textual information): A where the last layer outputs the representation of the input
graph with node-level textual information can be denoted text hS ∈ Rd . Following BERT, many other masked language
as G = (V, E, D), where V , E and D are node set, edge set, models are proposed, such as RoBERTa [23], ALBERT [116],
and text set, respectively. Each eij ∈ E is associated with and ELECTRA [117], with similar architectures and objectives
some textual information deij ∈ D. For example, in a social of text representation. This type of model is also called the
network, one can interpret v ∈ V as the users, e ∈ E as pretrained language model (PLM).
the interaction between the users, and d ∈ D as the textual Although the original Transformer paper [94] was ex-
content of the messages sent between the users. perimented on machine translation, it was not until the
Definition 4 (Graph with graph-level textual information): A release of GPT-2 [113] that causal language modeling (i.e.,
graph data object with graph-level textual information can be text generation) became impactful on downstream tasks.
denoted as the pair (G, dG ), where G = (V, E). V and E are Causal language modeling is the task of predicting the next
node set and edge set. dG is the text set paired to the graph G . word given the previous words in a sentence. The objective
For instance, in a molecular graph G , v ∈ V denotes an atom, of causal language modeling is defined as:
e ∈ E represents the strong attractive forces or chemical
bonds that hold molecules together, and dG represents the X
ES∼D log p(si |s1 , . . . , si−1 ) . (3)
textual description of the molecule.
si ∈S
The READOUT functions include mean pooling, max pool- (solve graph theory problems) or serve as knowledge sources
ing, and so on. Subsequent work on GNN tackles the issues to enhance the large language models (alleviate hallucina-
of over-smoothing [137], over-squashing [138], interpretabil- tion).
ity [147], and bias [143]. While message-passing-based Text-Rich Graphs refers to graphs where nodes or edges
GNNs have demonstrated advanced structure encoding are associated with semantically rich text information. Such
capability, researchers are exploring further enhancing its graphs are also called text-rich networks [32], text-attributed
expressiveness with Transformers (i.e., graph Transformers). graphs [62], textual graphs [73] or textual-edge networks
Graph Transformers utilize a global multi-head attention [75]. Real-world examples include academic networks, e-
mechanism to expand the receptive field of each graph commerce networks, social networks, and legal case net-
encoding layer [141]. They integrate the inductive biases works. On these graphs, people are interested in learning
of graphs into the model by positional encoding, structural representations for nodes or edges with both textual infor-
encoding, the combination of message-passing layers with mation and structure information [73] [75].
attention layers [142], or improving the efficiency of attention Text-Paired Graphs are graphs where the textual description
on large graphs [144]. Graph Transformers have been proven is defined on the whole graph structure. Such graphs include
as the state-of-the-art solution for many pure graph problems. molecules or proteins where nodes represent atoms and
We refer readers to [140] for the most recent advances in GTs. edges represent chemical bonds. The text description can
GTs can be treated as a special type of GNN. be molecule captions or protein textual features. Although
Language Models vs. Graph Transformers. Modern lan- the graph structure is the most significant factor influencing
guage models and graph Transformers both use Transformers molecular properties, text descriptions of molecules can serve
[94] as the base model architecture. This makes the two as a complementary knowledge source to help understand
concepts hard to distinguish, especially when the language molecules [148]. The graph scenarios can be found in Fig. 1.
models are adopted on graph applications. In this paper,
“Transformers” typically refers to Transformer language
3.2 Categorization of LLM on Graph Techniques
models for simplicity. Here, we provide three points to
help distinguish them: 1) Tokens (word token vs. node According to the roles of LLMs and what are the final
token): Transformers take a token sequence as inputs. For components for solving graph-related problems, we classify
language models, the tokens are word tokens; while for graph LLM on graph techniques into three main categories:
Transformers, the tokens are node tokens. In those cases LLM as Predictor. This category of methods serves LLM
where tokens include both word tokens and node tokens if as the final component to output representations or predic-
the backbone Transformers is pretrained on text corpus (e.g., tions. It can be enhanced with GNNs and can be classified
BERT [22] and LLaMA [119]), we will call it a “language depending on how the graph information is injected into
model”. 2) Positional Encoding (sequence vs. graph): language LLM: 1) Graph as Sequence: This type of method makes no
models typically adopt the absolute or relative positional changes to the LLM architecture, but makes it be aware
encoding considering the position of the word token in the of graph structure by taking a “graph token sequence” as
sequence, while graph Transformers adopt shortest path input. The “graph token sequence” can be natural language
distance [141], random walk distance, the eigenvalues of the descriptions for a graph or hidden representations outputted
graph Laplacian [142] to consider the distance of nodes in by graph encoders. 2) Graph-Empowered LLM: This type of
the graph. 3) Goal (text vs. graph): The language models method modifies the architecture of the LLM base model
are originally proposed for text encoding and generation; (i.e., Transformers) and enables it to conduct joint text and
while graph Transformers are proposed for node encoding graph encoding inside their architecture. 3) Graph-Aware LLM
or graph encoding. In those cases where texts are served Finetuning: This type of method makes no changes to the
as nodes/edges on the graph if the backbone Transformers input of the LLMs or LLM architectures, but only fine-tunes
is pretrained on text corpus, we will call it a “language the LLMs with supervision from the graph.
model”. LLM as Encoder. This method is mostly utilized for graphs
where nodes or edges are associated with text information
(solving node-level or edge-level tasks). GNNs are the final
3 C ATEGORIZATION AND F RAMEWORK components and we adopt LLM as the initial text encoder.
In this section, we first introduce our categorization of graph To be specific, LLMs are first utilized to encode the text
scenarios where language models can be adopted. Then associated with the nodes/edges. The outputted feature
we discuss the categorization of LLM on graph techniques. vectors by LLMs then serve as input embeddings for GNNs
Finally, we summarize the training & inference framework for graph structure encoding. The output embeddings from
for language models on graphs. the GNNs are adopted as final node/edge representations
for downstream tasks. However, these methods suffer from
convergence issues, sparse data issues, and inefficient issues,
3.1 Categorization of Graph Scenarios with Language where we summarize solutions from optimization, data
Models. augmentation, and knowledge distillation perspectives.
Pure Graphs without Textual Information are graphs with LLM as Aligner. This category of methods adopts LLMs
no text information or no semantically rich text information. as text-encoding components and aligns them with GNNs
Examples of those graphs include traffic graphs and power which serve as graph structure encoding components. LLMs
transmission graphs. Those graphs often serve as context to and GNNs are adopted together as the final components for
test the graph reasoning ability of large language models task solving. To be specific, the alignment between LLMs and
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 5
Fig. 2. A taxonomy of LLM on graph scenarios and techniques with representative examples.
GNNs can be categorized into 1) Prediction Alignment where model fine-tuning methodology can be further categorized
the generated pseudo labels from one modality are utilized into fully fine-tuning, efficient fine-tuning, and instruction
for training on the other modality in an iterative learning tuning.
fashion and 2) Latent Space Alignment where contrastive • Full Finetuning means updating all the parameters
learning is adopted to align text embeddings generated by inside the language model. It is the most commonly
LLMs and graph embeddings generated by GNNs. used fine-tuning method that fully stimulates the
language model’s potential for downstream tasks, but
3.3 Training & Inference Framework with LLMs can suffer from heavy computational overload [37]
There are two typical training and inference paradigms and result in overfitting issues [36].
to apply language models on graphs: 1) Pretraining-then- • Efficient Finetuning refers to only fine-tuning a
finetuning: typically adopted for medium-scale large lan- subset of parameters inside the language model.
guage models; and 2) Pretraining-then-prompting: typically Efficient tuning methods for pure text include prompt
adopted for large-scale large language models. tuning [38], prefix tuning [39], adapter [40] and LoRA
Pretraining denotes training the language model with unsu- [41]. Efficient language model fine-tuning methods
pervised objectives to initialize them with language under- particularly designed for graph data include graph
standing and inference ability for downstream tasks. Typical neural prompt [42] and graph-enhanced prefix [43].
pretraining objectives for pure text include masked language • Instruction Tuning denotes fine-tuning language
modeling [22], auto-regressive causal language modeling model with downstream task instructions [44] [45] to
[25], corruption-reconstruction language modeling [29] and encourage model generalization to unseen tasks in
text-to-text transfer modeling [30]. When extended in the inference. It is an orthogonal concept with full fine-
graph domain, language model pretraining strategies include tuning and efficient fine-tuning, in other words, one
document relation prediction [31], network-contextualized can choose both full fine-tuning and efficient fine-
masked language modeling [32], contrastive social prediction tuning for instruction tuning. Instruction tuning is
[33] and context graph prediction [34]. adopted in the graph domain for node classification
Finetuning refers to the process of training the language [46], link prediction [47], and graph-level tasks [48].
model with labeled data for the downstream tasks. Language Prompting is a technique to apply language model for
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 6
downstream task solving without updating the model param- input graphs as a starting point or a baseline, partially
eters. One needs to formulate the test samples into natural because of the simplicity of the approach and partially in
language sequences and ask the language model to directly awe of other emergent abilities of LLMs. This approach can
conduct inference based on the in-context demonstrations. be viewed as a probe of graph understanding of LLMs (in
This is a technique particularly popular for large-scale au- contrast to graph “reasoning”), which tests if LLMs acquire
toregressive language models. Apart from direct prompting, a good enough representation internally for LLMs to directly
following-up works propose chain-of-thought prompting “guess” the answers. Although various attempts have been
[49], tree-of-thought prompting [50], and graph-of-thought made to optimize how graphs are presented in the input
prompting [51]. sequence, which we will discuss in the following sections,
In the following sections, we will follow our categoriza- bounded by the finite sequence length and computational
tion in Section 3 and discuss detailed methodologies for each operations, there is a fundamental limitation of this approach
graph scenario. to solving complex reasoning problems such as NP-complete
ones. Unsurprisingly, most studies find that LLMs possess
preliminary graph understanding ability, but the perfor-
4 P URE G RAPHS mance is less satisfactory on more complex problems or
Problems on pure graphs provide a fundamental motivation larger graphs [43], [124]–[126], [128], [131]. In the following,
for why and how LLMs are introduced into graph-related we will discuss the details of these studies based on their
reasoning problems. Investigated thoroughly in graph theory, input representation methods and their main difference.
pure graphs serve as a universal representation format for a Plainly Verbalizing Graphs. Verbalizing the graph structure
wide range of classical algorithmic problems in all perspec- in natural language is the most straightforward way of
tives in computer science. Many graph-based concepts, such representing graphs. Representative approaches include
as shortest paths, particular sub-graphs, and flow networks, describing the edge and adjacency lists, widely studied
have strong connections with real-world applications [133]– in [124], [125], [128], [131]. For example, for a triangle graph
[135]. Therefore, pure graph-based reasoning is vital in with three nodes, the edge list can be written as “[(0, 1), (1, 2),
providing theoretical solutions and insights for reasoning (2, 0)]”, which means node 0 is connected to node 1, node 1
problems grounded in real-world applications. is connected to node 2, node 2 is connected to node 0. It can
Nevertheless, many reasoning tasks require a computa- also be written in natural language such as “There is an edge
tion capacity beyond traditional GNNs. GNNs are typically between node 0 and node 1, an edge between node 1 and node 2,
designed to carry out a bounded number of operations and an edge between node 2 and node 0.” On the other hand, we
given a graph size. In contrast, graph reasoning problems can describe the adjacency list from the nodes’ perspective.
can require up to indefinite complexity depending on the For example, for the same triangle graph, the adjacency list
task’s nature. Training conventional GNNs on general reason- can be written as “Node 0 is connected to node 1 and node 2.
ing–intensive problems is challenging without prior assump- Node 1 is connected to node 0 and node 2. Node 2 is connected to
tions and specialized model design. This fundamental gap node 0 and node 1.” On these inputs, one can prompt LLMs to
motivates researchers to seek to incorporate LLMs in graph answer questions either in zero-shot or few-shot (in-context
problems. On the other hand, LLMs demonstrate excellent learning) settings, the former of which is to directly ask
emergent reasoning ability [49], [110], [111] recently. This questions given the graph structure, while the latter is to ask
is partially due to their autoregressive mechanism, which questions about the graph structure after providing a few
enables computing indefinite sequences of intermediate steps examples of questions and answers. [124]–[126] do confirm
with careful prompting or training [49], [50]. that LLMs can answer easier questions such as connectivity,
The following subsections discuss the attempts to incorpo- neighbor identification, and graph size counting but fail
rate LLMs into pure graph reasoning problems. We will also to answer more complex questions such as cycle detection
discuss these works’ challenges, limitations, and findings. and Hamiltonian pathfinding. Their results also reveal that
Table 2 lists a rough categorization of these efforts. Usually, providing more examples in the few-shot setting increases
input graphs are serialized as part of the input sequence, the performance, especially on easier problems, although it
either by verbalizing the graph structure [124]–[126], [128]– is still not satisfactory.
[132] or by encoding the graph structure into implicit feature Paraphrasing Graphs. The verbalized graphs can be lengthy,
sequences [43]. The studied reasoning problems range from unstructured, and complicated to read, even for humans,
simpler ones like connectivity, shortest paths, and cycle de- so they might not be the best input format for LLMs to
tection to harder ones like maximum flow and Hamiltonian infer the answers. To this end, researchers also attempt to
pathfinding (an NP-complete problem). A comprehensive list paraphrase the graph structure into more natural or concise
of the studied problems is listed in Table 4. Note that we only sentences. [126] find that by prompting LLMs to generate a
list representative problems here. This table does not include format explanation of the raw graph inputs for itself (Format-
more domain-specific problems, such as the spatial-temporal Explanation) or to pretend to play a role in a natural task
reasoning problems in [128]. (Role Prompting), the performance on some problems can be
improved but not systematically. [131] explores the effect of
grounding the pure graph in a real-world scenario, such as
4.1 Direct Answering social networks, friendship graphs, or co-authorship graphs.
Although graph-based reasoning problems usually involve In such graphs, nodes are described as people, and edges are
complex computation, researchers still attempt to let lan- relationships between people. Results indicate that encoding
guage models directly generate answers from the serialized in real-world scenarios can improve the performance on
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 7
TABLE 2
A collection of LLM reasoning methods on pure graph discussed in Section 4. We do not include the backbone models used in these methods
studied in the original papers, as these methods generally apply to any LLMs. The “Papers” column lists the papers that study the specific methods.
some problems, but still not consistently. examples. These techniques are studied in [43], [124]–[126],
Encoding Graphs Into Implicit Feature Sequences. Finally, [128], [131], [132]. Results indicate that CoT-style reasoning
researchers also attempt to encode the graph structure into can improve the performance on simpler problems, such as
implicit feature sequences as part of the input sequence [43]. cycle detection and shortest path. Still, the improvement is
Unlike the previous verbalizing approaches, this usually inconsistent or diminishes on more complex problems, such
involves training a graph encoder to encode the graph as Hamiltonian path finding and topological sorting.
structure into a sequence of features and fine-tuning the Retrieving Subgraphs as Evidence. Many graph reasoning
LLMs to adapt to the new input format. [43] demonstrates problems, such as node degree counting and neighborhood
drastic performance improvement on problems including detection, only involve reasoning on a subgraph of the
substructure counting, maximum triplet sum, shortest path, whole graph. Such properties allow researchers to let LLMs
and bipartite matching, evidence that fine-tuning LLMs has retrieve the subgraphs as evidence and perform reasoning
great fitting power on a specific task distribution. on the subgraphs. Build-a-Graph prompting [124] encour-
ages LLMs to reconstruct the relevant graph structures
4.2 Heuristic Reasoning to the questions and then perform reasoning on them.
Direct mapping to the output leverages the LLMs’ powerful This method demonstrates promising results on problems
representation power to “guess” the answers. Still, it does not except for Hamiltonian pathfinding, a notoriously tricky
fully utilize the LLMs’ impressive emergent reasoning ability, problem requiring reasoning on the whole graph. Another
which is essential for solving complex reasoning problems. approach, Context-Summarization [126], encourages LLMs to
To this end, attempts have been made to let LLMs perform summarize the key nodes, edges, or sub-graphs and perform
heuristic reasoning on graphs. This approach encourages reasoning. They evaluate only on node classification, and
LLMs to perform a series of intermediate reasoning steps results show improvement when combined with CoT-style
that might heuristically lead to the correct answer. reasoning, an intuitive outcome considering the local nature
Reasoning Step by Step. Encouraged by the success of of the node classification problem.
chain-of-thought (CoT) reasoning [49], [111], researchers also Searching on Graphs. This kind of reasoning is related to
attempt to let LLMs perform reasoning step by step on the search algorithms on graphs, such as breadth-first search
graphs. Chain-of-thought encourages LLMs to roll out a (BFS) and depth-first search (DFS) Although not universally
sequence of reasoning steps to solve a problem, similar to applicable, BFS and DFS are the most intuitive and effective
how humans solve problems. It usually incorporates a few ways to solve some graph reasoning problems. Numer-
demonstrative examples to guide the reasoning process. Zero- ous explorations have been made to simulate searching-
shot CoT is a similar approach that does not require any based reasoning, especially on knowledge-graph question
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8
TABLE 3
A collection of pure graph reasoning problems studied in Section 4. G = (V, E) denotes a graph with vertices V and edges E . v and e denote
individual vertices and edges, respectively. The “Papers” column lists the papers that study the problem using LLMs. The “Complexity” column lists
the time complexity of standard algorithms for the problem, ignoring more advanced but complex algorithms that are not comparable to LLMs’
reasoning processes.
answering. This approach enjoys the advantage of providing 4.3 Algorithmic Reasoning
interpretable evidence besides the answer. Reasoning-on-
The previous two approaches are heuristic, which means
Graphs (RoG) [129] is a representative approach that prompts
that the reasoning process accords with human intuition
LLMs to generate several relation paths as plans, which are
but is not guaranteed to lead to the correct answer. In
then retrieved from the knowledge graph (KG) and used
contrast, these problems are usually solved by algorithms
as evidence to answer the questions. Another approach is
in computer science. Therefore, researchers also attempt to
to iteratively retrieve and reason on the subgraphs from
let LLMs perform algorithmic reasoning on graphs. [124]
KG [130], [132], simulating a dynamic searching process. At
proposed “Algorithmic Prompting”, which prompts the LLMs
each step, the LLMs retrieve neighbors of the current nodes
to recall the algorithms that are relevant to the questions
and then decide to answer the question or continue the next
and then perform reasoning step by step according to the
search step.
algorithms. Their results, however, do not show consistent
improvement over the heuristic reasoning approach, such
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 9
as the BaG prompting proposed in the same paper. This 5.1.1 Graph as Sequence.
might be because the algorithmic reasoning approach is still
In these methods, the graph information is mainly encoded
complex to simulate without more careful techniques. A more
into the LLM from the “input” side. The ego-graphs associ-
direct approach, Graph-ToolFormer [127], lets LLMs generate
ated with nodes/edges are serialized into a sequence HGv
API calls as explicit reasoning steps. These API calls are then
which can be fed into the LLM together with the texts dv :
executed externally to acquire answers on an external graph.
This approach is suitable for converting tasks grounded in HGv = Graph2Seq(Gv ), (7)
real tasks into pure graph reasoning problems, demonstrating
efficacy on various applications such as knowledge graphs, hv = LLM([HGv , dv ]). (8)
social networks, and recommender systems.
Depending on the choice of Graph2Seq(·) function, the
methods can be further categorized into rule-based methods
4.4 Discussion and GNN-based methods. The illustration of the categories
can be found in Fig. 3.
The above approaches are not mutually exclusive, and they
can be combined to achieve better performance. Moreover, Rule-based: Linearizing Graphs into Text Sequence with
strictly speaking, heuristic reasoning can also conduct direct Rules. These methods design rules to describe the structure
answering, while algorithmic reasoning contains the capacity with natural language and adopt a text prompt template
of heuristic reasoning as a special case. Researchers are as Graph2Seq(·). For example, given an ego-graph Gvi of
advised to select the most suitable approach for a specific the paper node vi connecting to author nodes vj and vk
problem. For example, direct answering is suitable for prob- and venue nodes vt and vs , HGvi = Graph2Seq(Gvi ) = “The
lems that are easy to solve and where the pre-training dataset centor paper node is vi . Its author neighbor nodes are vj and
provides sufficient bias for a good guess, such as common vk and its venue neighbor nodes are vt and vs ”. This is the
entity classification and relationship detection. Heuristic most straightforward and easiest way (without introducing
reasoning is suitable for problems that are hard to deal with extra model parameters) to encode graph structures into
explicitly, but intuition can provide some guidance, such language models. Along this line, InstructGLM [47] designs
as graph-based question answering and knowledge graph templates to describe local ego-graph structure (maximum
reasoning. Algorithmic reasoning is suitable for problems 3-hop connection) for each node and conduct instruction
that are hard to solve but where the algorithmic solution is tuning for node classification and link prediction. GraphText
well-defined, such as route planning and pattern matching. [66] further proposes a syntax tree-based method to structure
into text sequence. Researchers [84] also study when and why
the linearized structure information on graphs can improve
5 T EXT-R ICH G RAPHS . the performance of LLM on node classification and find
that the structure information is beneficial when the textual
Graphs with node/edge-level textual information (text-rich information associated with the node is scarce.
graphs) exist ubiquitously in the real world, e.g., academic GNN-based: Encoding Graphs into Special Tokens with
networks, social networks, and legal case networks. Learning GNNs. Different from rule-based methods which use natural
on such networks requires the model to encode both the language prompts to linearize graphs into sequences, GNN-
textual information associated with the nodes/edges and based methods adopt graph encoder models (i.e., GNN) to
the structure information lying inside the input graph. encode the ego-graph associated with nodes into special
Depending on the role of LLM, existing works can be token representations which are concatenated with the pure
categorized into three types: LLM as Predictor, LLM as text information into the language model:
Encoder, and LLM as Aligner. We summarize all surveyed
methods in Table 5. HGv = Graph2Seq(Gv ) = GraphEnc(Gv ). (9)
…
…
Graphs
Center: 𝑣! ; 1-hop neighbors: 𝑣" , 𝑣# ;... …
(a) Rule-based Graph as Sequence (b) GNN-based Graph as Sequence (c) Graph-Empowered LM
Fig. 3. The illustration of various LLM as Predictor methods, including (a) Rule-based Graph As Sequence, (b) GNN-based Graph As Sequence, (c)
Graph-Empowered LLMs.
5.1.2 Graph-Empowered LLMs. networks where some nodes are associated with text (text-
In these methods, researchers design advanced LLM archi- rich) and others are not (textless). Virtual neighbor tokens
tecture (i.e., Graph-Empowered LLMs) which can conduct for text-rich neighbors and textless neighbors are concate-
joint text and graph encoding inside their model architecture. nated with the original text tokens and inputted into each
Transformers [94] serve as the base model for nowadays pre- Transformer layer. Edgeformers [75] are proposed for repre-
trained LMs [22] and LLMs [37]. However, they are designed sentation learning on textual-edge networks where edges are
for natural language (sequence) encoding and do not take associated with rich textual information. When conducting
non-sequential structure information into consideration. To edge encoding, virtual node tokens will be concatenated onto
this end, Graph-Empowered LLMs are proposed. They have the original edge text tokens for joint encoding.
a shared philosophy of introducing virtual structure tokens
HGv inside each Transformer layer: 5.1.3 Graph-Aware LLM finetuning.
In these methods, the graph information is mainly injected
f(l) = [H (l) , H (l) ]
H (10) into the LLM by “fine-tuning on graphs”. Researchers
dv Gv dv
assume that the structure of graphs can provide hints on
where HGv can be learnable embeddings or output from what documents are “semantically similar” to what other
graph encoders. Then the original multi-head attention documents. For example, papers citing each other in an
(MHA) in Transformers is modified into an asymmetric MHA academic graph can be of similar topics; items co-purchased
to take the structure tokens into consideration: by many users in an e-commerce graph can be of related
functions. These methods adopt vanilla language models
(l) f(l) U (l) f(l) that take text as input (e.g., BERT [22] and SciBERT [24]) as
MHAasy (Hdv , H dv ) = ∥u=1 headu (Hdv , Hdv ),
(l) f(l)⊤
! the base model and fine-tune them with structure signals
(l) f(l) Qu K u eu(l) , on the graph. After that, the LLMs will learn node/edge
where headu (Hdv , Hdv ) = softmax p ·V
d/U representations that capture the graph homophily from the
Q(l)
(l) (l) f(l) f(l) (l)
u = Hdv WQ,u , Ku = Hdv WK,u , Vu
f(l) W (l) .
e (l) = H text perspective.
dv V,u
(11) Most methods adopt the two-tower encoding and training
pipeline, where the representation of each node is obtained
With the asymmetric MHA mechanism, the node encoding separately:
process of the (l + 1)-th layer will be:
′
hvi = LLMθ (dvi ), (13)
f(l) = Normalize(H (l) + MHAasy (H
H f(l) , H (l) )),
dv dv dv dv
(12) and the model is optimized by
(l+1) (l)′ (l)′
Hdv = Normalize(H
f + MLP(H
dv
f )).
dv
min f (hvi , {hv+ }, {hv− }). (14)
θ i i
Along this line of work, GreaseLM [68] proposes to have a
language encoding component and a graph encoding com- Here vi+ represents the positive nodes to vi , vi− represents the
ponent in each layer. The two components interact through negative nodes to vi and f (·) denotes the pairwise training
an MInt layer, where a special structure token is added to objective. Different methods have different strategies for vi+
the text Transformer input, and a special node is added to and vi− with different training objectives f (·). SPECTER [52]
the graph encoding layer. DRAGON [83] further proposes constructs the positive text/node pairs with the citation
strategies to pretrain GreaseLM with unsupervised signals. relation, explores random negatives and structure hard
GraphFormers [73] are designed for node representation negatives, and fine-tunes SciBERT [24] with the triplet
learning on homogeneous text-attributed networks where loss. SciNCL [53] extends SPECTER by introducing more
the current layer [CLS] token hidden states of neighboring advanced positive and negative sampling methods based on
documents are aggregated and added as a new token on the embeddings trained on graphs. Touchup-G [55] proposes the
current layer center node text encoding. Patton [32] further measurement of feature homophily on graphs and brings
proposes to pretrain GraphFormers with two novel strategies: up a binary cross-entropy fine-tuning objective. TwHIN-
network-contextualized masked language modeling and BERT [57] mines positive node pairs with off-the-shelf
masked node prediction. Heterformer [74] is introduced heterogeneous information network embeddings and trains
for learning representations on heterogeneous text-attributed the model with a contrastive social loss. MICoL [60] discovers
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11
TABLE 4
A summarization of Graph-Aware LLM finetuning objectives on text-rich graphs. vi+ and vi− denote a positive training node and a negative training
node to vi respectively.
−
SciNCL [53] ||hvi − hv+ ||2 ∈ (k+ − c+ ; k+ ] ||hvi − hv− ||2 ∈ (khard − c− −
hard ; khard ] max{||hvi − hv+ ||2 − ||hvi − hv− ||2 + m, 0}
i i i i
semantically positive node pairs with meta-path [91] and LLMs) which are hard to adopt for generation tasks. A
adopts the InfoNCE objective. E2EG [61] utilizes a similar potential solution is to design Graph-Empowered LLMs
philosophy from GIANT [59] and adds a neighbor prediction with decoder-only or encoder-decoder LLMs as the base
objective apart from the downstream task objective. A architecture. 2) Pretraining. Pretraining is important to enable
summarization of the two-tower graph-centric LLM fine- LLMs with contextualized data understanding capability,
tuning objectives can be found in Table 4. which can be generalized to other tasks. However, existing
There are other methods using the one-tower pipeline, works mainly focus on pretraining LLMs on homogeneous
where node pairs are concatenated and encoded together: text-rich networks. Future studies are needed to explore LLM
pretraining in more diverse real-world scenarios including
hvi ,vj = LLMθ (dvi , dvj ), (15) heterogeneous text-rich networks [74], dynamic text-rich
min f (hvi ,vj ). (16) networks [128], and textual-edge networks [75].
θ
🔥 🔥 🔥 🔥 🔥
🔥Trainable
🔥 🔥 🔥 🔥 🔥 🔥
❄ Fixed LLM
step1 step2 ❄ Teacher Student
(a) One-step Training (b) Two-step Training (c) Augmenta7on (d) Dis7lla7on
Fig. 4. The illustration of various techniques related to LLM as Encoder, including (a) One-step Training, (b) Two-step Training, (c) Data Augmentation,
and (d) Knowledge Distillation.
few one-hop neighbors regarding memory complexity) and node and its neighbors. A straightforward solution is to
local minimal [36] (LLM underfits the data) issues. serve the LLM-GNN cascade pipeline as the teacher model
Two-step training means first adapting LLMs to the graph, and distill it into an LLM as the student model. In this
and then finetuning the whole LLM-GNN cascaded pipeline. case, during inference, the model (which is a pure LLM)
This strategy can effectively alleviate the insufficient training only needs to encode the text on the center node and avoid
of the LLM which contributes to higher text representation time-consuming neighbor sampling. AdsGNN [80] proposes
quality. GIANT [59] proposes to conduct neighborhood an L2-loss to force the outputs of the student model to
prediction with the use of XR-Transformers [81] and results preserve topology after the teacher model is trained. GraD
in an LLM that can output better feature vectors than [70] introduces three strategies including the distillation
bag-of-words and vanilla BERT [22] embedding for node objective and task objective to optimize the teacher model
classification. LM-GNN [69] introduces graph-aware pre-fine- and distill its capability to the student model.
tuning to warm up the LLM on the given graph before fine-
tuning the whole LLM-GNN pipeline and demonstrating 5.2.4 Discussion
significant performance gain. SimTeG [36] finds that the sim- Given that GNNs are demonstrated as powerful models in
ple framework of first training the LLMs on the downstream encoding graphs, “LLMs as encoders” seems to be the most
task and then fixing the LLMs and training the GNNs can straightforward way to utilize LLMs on graphs. Although
result in outstanding performance. They further find that we have discussed much research on “LLMs as encoders”,
using the efficient fine-tuning method, e.g., LoRA [41] to tune there are still open questions to be solved.
the LLM can alleviate overfitting issues. GaLM [82] explores Limited Task: Go Beyond Representation Learning. Current
ways to pretrain the LLM-GNN cascaded architecture. “LLMs as encoders” methods or LLM-GNN cascaded architec-
tures are mainly focusing on representation learning, given
5.2.2 Data Augmentation the single embedding propagation-aggregation mechanism
With its demonstrated zero-shot capability [44], LLMs can be of GNNs, which prevents it from being adopted to generation
used for data augmentation to generate additional text data tasks (e.g., node/text generation). A potential solution to
for the LLM-GNN cascaded architecture. The philosophy of this challenge can be to conduct GNN encoding for LLM
using LLM to generate pseudo data is widely explored in outputted token-level representations and to design proper
NLP [90]. LLM-GNN [64] proposes to conduct zero-shot node decoders that can perform generation based on the LLM-
classification on text-attributed networks by labeling a few GNN cascaded model outputs.
nodes and using the pseudo labels to fine-tune GNNs. TAPE Low Efficiency: Advanced Knowledge Distillation. The
[71] presents a method that uses LLM to generate prediction LLM-GNN cascaded pipeline suffers from time complexity
text and explanation text, which serve as augmented text issues since the model needs to conduct neighbor sampling
data compared with the original text data. A following and then embedding encoding for each neighboring node.
medium-scale language model is adopted to encode the texts Although there are methods that explore distilling the
and output features for augmented texts and original text learned LLM-GNN model into an LLM model for fast
respectively before feeding into GNNs. ENG [72] brings inference, they are far from enough given that the inference
forward the idea of generating labeled nodes for each of LLM itself is time-consuming. A potential solution is to
category, adding edges between labeled nodes and other distill the model into a much smaller LM or even an MLP.
nodes, and conducting semi-supervised GNN learning for Similar methods [88] have been proven effective in GNN to
node classification. MLP distillation and are worth exploring for the LLM-GNN
cascaded pipeline as well.
5.2.3 Knowledge Distillation
LLM-GNN cascaded pipeline is capable of capturing both
text information and structure information. However, the 5.3 LLM as Aligner
pipeline suffers from time complexity issues during inference, These methods contain an LLM component for text encoding
since GNNs need to conduct neighbor sampling and LLMs and a GNN component for structure encoding. The two
need to encode the text associated with both the center components are served equally and trained iteratively or
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13
parallelly. LLMs and GNNs can mutually enhance each other pseudo-label by GM contrastive learning
since the LLMs can provide textual signals to GNNs, while
the GNNs can deliver structure information to LLMs. Accord- text
ing to how the LLM and the GNN interact, these methods can graph
be further categorized into: LLM-GNN Prediction Alignment
and LLM-GNN Latent Space Alignment. The illustration of pseudo-label by LM graph text
the two categories of method can be found in Fig. 5. (a) LLM-GNN Prediction Alignment (b) LLM-GNN Latent Space Alignment
5.3.1 LLM-GNN Prediction Alignment Fig. 5. The illustration of LLM as Aligner methods, including (a) LLM-GNN
Prediction Alignment and (b) LLM-GNN Latent Space Alignment.
This refers to training the LLM with the text data on a
graph and training the GNN with the structure data on a
graph iteratively. LLM will generate labels for nodes from the 5.3.3 Discussion.
text perspective and serve them as pseudo-labels for GNN In “LLMs as Aligners” methods, most research is adopt-
training, while GNN will generate labels for nodes from the ing shallow GNNs (e.g., GCN, GAT, with thousands of
structure perspective and serve them as pseudo-labels for parameters) to be the graph encoders that are aligned
LLM training. with LLMs through iterative training (i.e., prediction align-
By this design, the two modality encoders can learn from ment) or contrastive training (i.e., latent space alignment).
each other and contribute to a final joint text and graph Although LLMs (with millions or billions of parameters)
encoding. In this direction, LTRN [58] proposes a novel GNN have strong expressive capability, the shallow GNNs (with
architecture with personalized PageRank [95] and attention limited representative capability) can constrain the mutual
mechanism for structure encoding while adopting BERT learning effectiveness between LLMs and GNNs. A potential
[22] as the language model. The pseudo labels generated by solution is to adopt GNNs which can be scaled up [89].
LLM and GNN are merged for the next iteration of training. Furthermore, deeper research to explore what is the best
GLEM [62] formulates the iterative training process into a model size combination for LLMs and GNNs in such “LLMs
pseudo-likelihood variational framework, where the E-step as Aligners” LLM-GNN mutual enhancement framework is
is to optimize LLM and the M-step is to train the GNN. very important.
TABLE 5
Summary of large language models on text-rich graphs. Role of LM: “TE”, “SE”, “ANN” and “AUG” denote text encoder, structure encoder, annotator
(labeling the node/edges), and augmentator (conduct data augmentation). Task: “NC”, “UAP”, “LP”, “Rec”, “QA”, “NLU”, “EC”, “LM”, “RG” denote
node classification, user activity prediction, link prediction, recommendation, question answering, natural language understanding, edge
classification, language modeling, and regression task.
Step 1: Rule-based Graph Linearization. Rule-based lin- single molecule and SMILES sometimes represent invalid
earization converts molecular graphs into text sequences molecules; LLMs learned from these linearized sequences can
that can be processed by LLMs. To achieve this, researchers easily generate invalid molecules (e.g., incorrect ring closure
develop specifications based on human expertise in the form symbols and unmatched parentheses) due to syntactical
of line notations [152]. For example, the Simplified Molecular- errors. To this end, DeepSMILES [154] is proposed. It can
Input Line-Entry System (SMILES) [152] records the symbols alleviate this issue in most cases but does not guarantee
of nodes encountered during a depth-first traversal of 100% robustness. The linearized string could still violate
a molecular graph. The International Chemical Identifier basic physical constraints. To fully address this problem,
(InChI) [153], created by the International Union of Pure and SELFIES [155] is introduced which consistently yields valid
Applied Chemistry (IUPAC), encodes molecular structures molecular graphs.
into unique string texts with more hierarchical information. Step 2: Tokenization. The tokenization approaches for these
Canonicalization algorithms produce unique SMILES for linearized sequences are typically language-independent.
each molecule, often referred to as canonical SMILES. How- They operate at both the character level [176], [187] and the
ever, there are more than one SMILES corresponding to a substring level [171], [178], [182]–[185], based on Sentence-
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 15
TABLE 6
Model collection in Section 6 for text-captioned graphs. “Lin.” and “Vec.” represent Linearized Graph Encoding and Vectorized Graph Encoding.
“Classif.”, “Regr.”, “NER”, “RE”, “Retr.”, “Gen.”, “Cap.” represent classification, regression, named entity recognition, relation extraction, (molecule)
graph retrieval, (molecule) graph generation, (molecule) graph captioning.
Piece [163] or BPE [162]. Additionally, RT [173] proposes a joint understanding of them [168], [184]: The linearized graph
tokenization approach that facilitates handling regression sequence is concatenated on the raw natural language data
tasks within LM Transformers. and then input into the LLMs. Specifically, KV-PLM [184] is
built based on BERT [22] to understand the molecular struc-
Step 3: Encoding the Linearized Graph with LLMs. Encoder- ture in a biomedical context. CatBERTa [168], as developed
only LLMs. Earlier LLMs like SciBERT [24] and BioBERT [189] from RoBERTa [23], specializes in the prediction of catalyst
are trained on scientific literature to understand natural properties for molecular graphs.
language descriptions related to molecules but are not capa-
ble of comprehending molecular graph structures. To this Encoder-Decoder LLMs. Encoder-only LLMs may lack the
end, SMILES-BERT [188] and MFBERT [185] are proposed capability for generation tasks. In this section, we discuss
for molecular graph classification with linearized SMILES LLMs with encoder-decoder architectures. For example,
strings. Since scientific natural language descriptions contain Chemformer [164] uses a similar architecture as BART [29].
human expertise which can serve as a supplement for The representation from the encoder can be used for property
molecular graph structures, recent advances emphasize the prediction tasks, and the whole encoder-decoder architecture
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16
can be optimized for molecule generation. Other works focus between tokens i and j as follows:
on molecule captioning (which involves generating textual
descriptions from a molecule) and text-based molecular
i−j if i, j ∈ dG ,
generation (where a molecular graph structure is generated
GSD(i, j) + Mean
ek ∈SP(i,j) xek if i, j ∈ V ,
PE(i, j) =
from a natural description). Specifically, MolT5 [123] is
−∞ if i ∈ V, j ∈ dG ,
developed based on the T5 [30], suitable for these two tasks.
0 if i ∈ dG , j ∈ V .
It formulates molecule-text translation as a multilingual (22)
problem and initializes the model using the T5 checkpoint. Here, GSD denotes the graph shortest distance between
The model was pre-trained on two monolingual corpora: two nodes, and Meank∈SP(i,j) represents the mean pooling
the Colossal Clean Crawled Corpus (C4) [30] for the natural of the edge features xek along the shortest path SP(i, j)
language modality and one million SMILES [164] for the between nodes i and j . GIMLET [48] adapts bi-directional
molecule modality. Text+Chem T5 [180] extends the input attention for node tokens and enables texts to selectively
and output domains to include both SMILES and texts, attend to nodes. These designs render the Transformer’s
unlocking LLMs for more generation functions such as submodule, which handles the graph part, equivalent to a
text or reaction generation. ChatMol [175] exploits the Graph Transformer [141].
interactive capabilities of LLMs and proposes designing There are other works that modify cross-attention mod-
molecule structures through multi-turn dialogs with T5. ules to facilitate interaction between graph and text repre-
Decoder-only LLMs. Decoder-only architectures have been sentations. Given the graph hidden state hG , its node-level
adopted for recent LLMs due to their advanced generation hidden state Hv and text hidden state HdG , Text2Mol [122]
ability. MolGPT [186] and MolXPT [178] are GPT-style models implemented interaction between representations in the hid-
used for molecule classification and generation. Specifically, den layers of encoders, while Prot2Text [170] implemented
MolGPT [186] focuses on conditional molecule generation this interaction within the layers of between encoder and
tasks using scaffolds, while MolXPT [178] formulates the decoder:
classification task as a question-answering problem with yes
!
WQ HdG · (WK Hv )T
or no responses. RT [173] adopts XLNet [26] and focuses HdG = softmax √ · WV Hv , (23)
dk
on molecular regression tasks. It frames the regression as a
conditional sequence modeling problem. Galactica [187] is Where WQ , WK , WV are trainable parameters that trans-
a set of LLMs with a maximum of 120 billion parameters, form the query modality (e.g., sequences) and the key/value
which is pretrained on two million compounds from Pub- modality (e.g., graphs) into the attention space. Furthermore,
Chem [194]. Therefore, Galactica could understand molecular Prot2Text [170] utilizes two trainable parameter matrices
graph structures through SMILES. With instruction tuning W1 and W2 to integrate the graph representation into the
data and domain knowledge, researchers also adapt general- sequence representation.
domain LLMs such as LLaMA to recognize molecular graph
structures and solve molecule tasks [169]. Recent studies HdG = HdG + 1|dG | hG W1 W2 . (24)
also explore the in-context learning capabilities of LLMs
6.1.3 Discussion
on graphs. LLM-ICL [177] assesses the performance of
LLMs across eight tasks in the molecular domain, ranging LLM Inputs with Sequence Prior. The first challenge is that the
from property classification to molecule-text translation. progress in advanced linearization methods has not progressed in
MolReGPT [174] proposes a method to retrieve molecules tandem with the development of LLMs. Emerging around 2020,
with similar structures and descriptions to improve in- linearization methods for molecular graphs like SELFIES
context learning. LLM4Mol [172] utilizes the summarization offer significant grammatical advantages, yet advanced LMs
capability of LLMs as a feature extractor and combines it and LLMs from graph machine learning and language model
with a smaller, tunable LLM for specific prediction tasks. communities might not fully utilize these, as these encoded
results are not part of pretraining corpora prior to their
proposal. Consequently, recent studies [177] indicate that
6.1.2 Graph-Empowered LLMs LLMs, such as GPT-3.5/4, may be less adept at using SELFIES
compared to SMILES. Therefore, the performance of LM-only
Different from the methods that adopt the original LLM and LLM-only methods may be limited by the expressiveness
architecture (i.e., Transformers) and input the graphs as of older linearization methods, as there is no way to optimize
sequences to LLMs, graph-empowered LLMs attempt to these hard-coded rules during the learning pipeline of LLMs.
design LLM architectures that can conduct joint encoding of However, the second challenge remains as the inductive bias of
text and graph structures. Some works modify the positional graphs may be broken by linearization. Rule-based linearization
encoding of Transformers. For instance, GIMLET [48] treats methods introduce inductive biases for sequence modeling,
nodes in a graph as tokens. It uses a single Transformer thereby breaking the permutation invariance assumption
to manage both the graph structure and text sequence inherent in molecular graphs. It may reduce task difficulty
[v1 , v2 , . . . , v|V| , s|V|+1 , . . . , s|V|+|dG | ], where v ∈ V is a node by introducing sequence order to reduce the search space.
and s ∈ dG is a token in the text associated with G . It However, it does not mean model generalization. Specifically,
had three sub-encoding approaches for positional encodings there could be multiple string-based representations for a
to cater to different data modalities and their interactions. single graph from single or different approaches. Numerous
Specifically, it adopted the structural position encoding (PE) studies [156]–[159] have shown that training on different
from the Graph Transformer and defines the relative distance string-based views of the same molecule can improve the
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17
TABLE 7
Data collection in Section 5 for text-rich graphs. Task: “NC”, “UAP”, “LP”, “Rec”, “EC”, “RG” denote node classification, user activity
prediction, link prediction, recommendation, edge classification, and regression task.
Text. Data Year Task # Nodes # Edges Domain Source & Notes
ogb-arxiv 2020.5 NC 169,343 1,166,243 Academic OGB [204]
ogb-products 2020.5 NC 2,449,029 61,859,140 E-commerce OGB [204]
ogb-papers110M 2020.5 NC 111,059,956 1,615,685,872 Academic OGB [204]
ogb-citation2 2020.5 LP 2,927,963 30,561,187 Academic OGB [204]
Cora 2000 NC 2,708 5,429 Academic [9]
Citeseer 1998 NC 3,312 4,732 Academic [10]
DBLP 2023.1 NC, LP 5,259,858 36,630,661 Academic www.aminer.org/citation
Node
MAG 2020 NC, LP, Rec RG ∼ 10M ∼ 50M Academic multiple domains [11] [12]
Goodreads-books 2018 NC, LP ∼ 2M ∼ 20M Books multiple domains [13]
Amazon-items 2018 NC, LP, Rec ∼ 15.5M ∼ 100M E-commerce multiple domains [14]
SciDocs 2020 NC, UAP, LP, Rec - - Academic [52]
PubMed 2020 NC 19,717 44,338 Academic [15]
Wikidata5M 2021 LP ∼ 4M ∼ 20M Wikipedia [16]
Twitter 2023 NC, LP 176,279 2,373,956 Social [54]
Goodreads-reviews 2018 EC, LP ∼ 3M ∼ 100M Books multiple domains [13]
Edge
PyTorch Geometric. PyG is an open-source Python library molecules [225], which is beyond the capacity of exploration
for graph machine learning. It packages more than 60 types in wet lab experiments. One of the biggest challenges lies
of GNN layers, combined with various aggregation and in generating high-quality candidates, rather than randomly
pooling layers. producing candidates in irrelevant subspaces. Molecular
Deep Graph Library. DGL is another open-source Python generation with multiple conditions (textual, numerical,
library for graph machine learning. categorical) shows promise to solve this problem.
RDKit. RDKit is one of the most popular open-source Synthesis Planning. Synthesis designs start from available
cheminformatics software programs that facilitates various molecules and involve planning a sequence of steps that
operations and visualizations for molecular graphs. It offers can finally produce a desired chemical compound through
many useful APIs, such as the linearization implementation a series of reactions [228]. This procedure includes a se-
for molecular graphs, to convert them into easily stored quence of reactant molecules and reaction conditions. Both
SMILES and to convert these SMILES back into graphs. graphs and texts play important roles in this process. For
example, graphs may represent the fundamental structure of
7.3 Practical applications molecules, while texts may describe the reaction conditions,
additives, and solvents. LLMs can also assist in the planning
7.3.1 Scientific Discovery
by suggesting possible synthesis paths directly or by serving
Virtual Screening. While we may have numerous unlabeled as agents to operate on existing planning tools [148].
molecule candidates for drug and material design, chemists
are often interested in only a small portion of them that are 7.3.2 Computational Social Science
located in a specific area of chemical space [226]. Machine In computational social science, researchers are interested
learning models could help researchers automatically screen in modeling the behavior of people/users and discovering
out trivial candidates. However, training accurate models is new knowledge that can be utilized to forecast the future.
not an easy task because labeled molecule datasets often have The behaviors of users and interactions between users can
small sizes and imbalanced data distribution [143]. There are be modeled as graphs, where the nodes are associated with
many efforts to improve GNNs against data sparsity [143], rich text information (e.g., user profile, messages, emails). We
[147], [229]. However, it is difficult, if not impossible, for will show two example scenarios below.
a model to generalize and understand in-depth domain E-commerce. In E-commerce platforms, there are many in-
knowledge that it has never been trained on. Texts, therefore, teractions (e.g., purchase, view) between users and products.
could be complementary sources of knowledge. Discovering For example, users can view, cart, or purchase products. In
task-related content from massive scientific papers and using addition, the users, products, and their interactions are asso-
them as instructions has great potential to improve GNNs in ciated with rich text information. For instance, products have
accurate virtual screening tasks [48]. titles/descriptions and users can leave a review of products.
Optimizing Scientific Hypotheses. Molecular generation In this case, we can construct a graph where nodes are users
and optimization represent one of the fundamental goals in and products, while edges are their interactions. Both nodes
chemical science for drug and material discovery [227]. Sci- and edges are associated with text. It is important to utilize
entific hypotheses, such as the complex molecules [228], can both the text information and the graph structure information
be represented in the joint space of GNNs and LLMs. Then, (user behavior) to model users and items and solve complex
one may search in the latent space for a better hypothesis downstream tasks (e.g., item recommendation [104], bundle
that aligns with the text description (human requirements) recommendation [105], and product understanding [106]).
and adheres to structural constraints like chemical validity. Social Media. In social media platforms, there are many
Chemical space has been found to contain more than 1060 users and they interact with each other through messages,
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 20
TABLE 8
Data collection in Section 6 for text-captioned graphs. The availability of text refers to the text descriptions of graphs, not the format of the linearized
graphs like the SMILES-represented molecules. “PT”, “FT”, “Cap.”, “GC”, “Retr.’, “Gen.”, and “GR” refer to pretraining, finetuning, caption, graph
classification, retrieval, graph generation, graph regression, respectively. The superscript for the size denotes # graph-text pairs1 , # graphs2 , #
assays3 .
emails, and so on. In this case, we can build a graph where Academic Domain. In the academic domain, networks [11]
nodes are users and edges are the interaction between users. are constructed with papers as nodes and their relations
There will be text associated with nodes (e.g., user profile) (e.g., citation, authorship, etc) as edges. The representation
and edges (e.g., messages). Interesting research questions learned for papers on such networks can be utilized for paper
will be how to do joint text and graph structure modeling recommendation [101], paper classification [102], and author
to deeply understand the users for friend recommendation identification [103].
[107], user analysis [108], and community detection [109]. Legal Domain. In the legal domain, opinions given by
the judges always contain references to opinions given for
7.3.3 Specific Domains previous cases. In such a scenario, people can construct an
In many specific domains, text data are interconnected and opinion network [98] based on the citation relations between
lie in the format of graphs. The structure information on the opinions. The representations learned on such a network
graphs can be utilized to better understand the text unit and with both text and structure information can be utilized for
contribute to advanced problem-solving. clause classification [99] and opinion recommendation [100].
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 21
[9] McCallum, A.K., Nigam, K., Rennie, J. and Seymore, K., “Automat- [34] Zou, T., Yu, L., Huang, Y., Sun, L. and Du, B., “Pretraining Language
ing the construction of internet portals with machine learning,” in Models with Text-Attributed Heterogeneous Graphs,” in arXiv
Information Retrieval, 3, pp.127-163, 2000. preprint arXiv:2310.12580, 2023.
[10] Giles, C.L., Bollacker, K.D. and Lawrence, S., “CiteSeer: An au- [35] Song, K., Tan, X., Qin, T., Lu, J. and Liu, T.Y., “Mpnet: Masked and
tomatic citation indexing system,” in Proceedings of the third ACM permuted pre-training for language understanding,” in NeurIPs.,
conference on Digital libraries (pp. 89-98), 1998. 2020.
[11] Wang, K., Shen, Z., Huang, C., Wu, C.H., Dong, Y. and Kanakia, [36] Duan, K., Liu, Q., Chua, T.S., Yan, S., Ooi, W.T., Xie, Q. and He, J.,
A., “Microsoft academic graph: When experts are not enough,” in “Simteg: A frustratingly simple approach improves textual graph
Quantitative Science Studies, 1(1), pp.396-413, 2020. learning,” in arXiv preprint arXiv:2308.02565., 2023.
[12] Zhang, Y., Jin, B., Zhu, Q., Meng, Y. and Han, J., “The Effect of [37] Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D.,
Metadata on Scientific Literature Tagging: A Cross-Field Cross- Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E. and
Model Study,” in WWW, 2023. Krusche, S., “ChatGPT for good? On opportunities and challenges
[13] Wan, M. and McAuley, J., “Item recommendation on monotonic of large language models for education,” in Learning and individual
behavior chains,” in Proceedings of the 12th ACM conference on differences, 103., 2023.
recommender systems, 2018. [38] Lester, B., Al-Rfou, R. and Constant, N., “The power of scale for
[14] Ni, J., Li, J. and McAuley, J., “Justifying recommendations using parameter-efficient prompt tuning,” in EMNLP, 2021.
distantly-labeled reviews and fine-grained aspects,” in EMNLP- [39] Li, X.L. and Liang, P., “Prefix-tuning: Optimizing continuous
IJCNLP, 2019. prompts for generation,” in ACL, 2021.
[15] Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B. and Eliassi- [40] Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Larous-
Rad, T., “Collective classification in network data,” in AI magazine, silhe, Q., Gesmundo, A., Attariyan, M. and Gelly, S., “Parameter-
29(3), pp.93-93, 2008. efficient transfer learning for NLP,” in ICML, 2019.
[16] Wang, X., Gao, T., Zhu, Z., Zhang, Z., Liu, Z., Li, J. and Tang, [41] Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang,
J., “KEPLER: A unified model for knowledge embedding and pre- L. and Chen, W., “Lora: Low-rank adaptation of large language
trained language representation,” in TACL, 2021. models,” in ICLR, 2022.
[17] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y., [42] Tian, Y., Song, H., Wang, Z., Wang, H., Hu, Z., Wang, F., Chawla,
“A comprehensive survey on graph neural networks,” in IEEE N.V. and Xu, P., “Graph Neural Prompting with Large Language
transactions on neural networks and learning systems, 32(1), 4-24, 2020. Models,” in arXiv preprint arXiv:2309.15427., 2023.
[18] Liu, J., Yang, C., Lu, Z., Chen, J., Li, Y., Zhang, M., Bai, T., Fang, Y., [43] Chai, Z., Zhang, T., Wu, L., Han, K., Hu, X., Huang, X. and Yang, Y.,
Sun, L., Yu, P.S. and Shi, C., “Towards Graph Foundation Models: A “GraphLLM: Boosting Graph Reasoning Ability of Large Language
Survey and Beyond,” in arXiv preprint arXiv:2310.11829, 2023. Model,” in arXiv preprint arXiv:2310.05845., 2023.
[19] Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J. and Wu, X., “Unifying [44] Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N.,
Large Language Models and Knowledge Graphs: A Roadmap,” in Dai, A.M. and Le, Q.V., “Finetuned language models are zero-shot
arXiv preprint arXiv:2306.08302, 2023. learners,” in ICLR., 2022.
[20] Sanh, V., Debut, L., Chaumond, J. and Wolf, T., “DistilBERT, a [45] Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai,
distilled version of BERT: smaller, faster, cheaper and lighter.,” in Z., Chaffin, A., Stiegler, A., Scao, T.L., Raja, A. and Dey, M.,
arXiv preprint arXiv:1910.01108, 2019. “Multitask prompted training enables zero-shot task generalization,”
in ICLR., 2022.
[21] Wang, Y., Le, H., Gotmare, A.D., Bui, N.D., Li, J. and Hoi, S.C.,
[46] Tang, J., Yang, Y., Wei, W., Shi, L., Su, L., Cheng, S., Yin, D.
“Codet5+: Open code large language models for code understanding
and Huang, C., “GraphGPT: Graph Instruction Tuning for Large
and generation.,” in arXiv preprint arXiv:2305.07922, 2023.
Language Models,” in arXiv preprint arXiv:2310.13023., 2023.
[22] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., “Bert: Pre-
[47] Ye, R., Zhang, C., Wang, R., Xu, S. and Zhang, Y., “Natural language
training of deep bidirectional transformers for language understand-
is all a graph needs,” in arXiv preprint arXiv:2308.07134., 2023.
ing,” in NAACL, 2019.
[48] Zhao, H., Liu, S., Ma, C., Xu, H., Fu, J., Deng, Z.H., Kong, L. and Liu,
[23] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis,
Q., “GIMLET: A Unified Graph-Text Model for Instruction-Based
M., Zettlemoyer, L. and Stoyanov, V., “Roberta: A robustly optimized
Molecule Zero-Shot Learning,” in bioRxiv, pp.2023-05., 2023.
bert pretraining approach,” in arXiv preprint arXiv:1907.11692, 2019.
[49] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le,
[24] Beltagy, I., Lo, K. and Cohan, A., “SciBERT: A pretrained language Q.V. and Zhou, D., “Chain-of-thought prompting elicits reasoning
model for scientific text,” in arXiv preprint arXiv:1903.10676, 2019. in large language models,” in NeurIPs., 2022.
[25] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, [50] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T.L., Cao, Y. and
P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., and Agarwal, Narasimhan, K., “Tree of thoughts: Deliberate problem solving with
“Language models are few-shot learners,” in NeurIPS, 2020. large language models,” in arXiv preprint arXiv:2305.10601., 2023.
[26] Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R. and Le, [51] Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L.,
Q.V., “Xlnet: Generalized autoregressive pretraining for language Gajda, J., Lehmann, T., Podstawski, M., Niewiadomski, H., Nyczyk,
understanding,” in NeurIPS, 2019. P. and Hoefler, T., “Graph of thoughts: Solving elaborate problems
[27] Clark, K., Luong, M.T., Le, Q.V. and Manning, C.D., “Electra: Pre- with large language models,” in arXiv preprint arXiv:2308.09687.,
training text encoders as discriminators rather than generators,” in 2023.
ICLR, 2020. [52] Cohan, A., Feldman, S., Beltagy, I., Downey, D. and Weld, D.S.,
[28] Meng, Y., Xiong, C., Bajaj, P., Bennett, P., Han, J. and Song, X., “Specter: Document-level representation learning using citation-
“Coco-lm: Correcting and contrasting text sequences for language informed transformers,” in ACL., 2020.
model pretraining,” in NeurIPs, 2021. [53] Ostendorff, M., Rethmeier, N., Augenstein, I., Gipp, B. and Rehm,
[29] Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, G., “Neighborhood contrastive learning for scientific document
A., Levy, O., Stoyanov, V. and Zettlemoyer, L., “Bart: Denoising representations with citation embeddings,” in EMNLP., 2022.
sequence-to-sequence pre-training for natural language generation, [54] Brannon, W., Fulay, S., Jiang, H., Kang, W., Roy, B., Kabbara, J. and
translation, and comprehension,” in ACL, 2020. Roy, D., “ConGraT: Self-Supervised Contrastive Pretraining for Joint
[30] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, Graph and Text Embeddings,” in arXiv preprint arXiv:2305.14321.,
M., Zhou, Y., Li, W. and Liu, P.J., “Exploring the limits of transfer 2023.
learning with a unified text-to-text transformer,” in JMLR, 2020. [55] Zhu, J., Song, X., Ioannidis, V.N., Koutra, D. and Faloutsos, C.,
[31] Yasunaga, M., Leskovec, J. and Liang, P., “LinkBERT: Pretraining “TouchUp-G: Improving Feature Representation through Graph-
Language Models with Document Links,” in ACL, 2022. Centric Finetuning,” in arXiv preprint arXiv:2309.13885., 2023.
[32] Jin, B., Zhang, W., Zhang, Y., Meng, Y., Zhang, X., Zhu, Q. and Han, [56] Li, Y., Ding, K. and Lee, K., “GRENADE: Graph-Centric Lan-
J., “Patton: Language Model Pretraining on Text-Rich Networks,” in guage Model for Self-Supervised Representation Learning on Text-
ACL, 2023. Attributed Graphs,” in EMNLP., 2023.
[33] Zhang, X., Malkov, Y., Florez, O., Park, S., McWilliams, B., Han, J. [57] Zhang, X., Malkov, Y., Florez, O., Park, S., McWilliams, B., Han, J.
and El-Kishky, A., “TwHIN-BERT: a socially-enriched pre-trained and El-Kishky, A., “TwHIN-BERT: A Socially-Enriched Pre-trained
language model for multilingual Tweet representations,” in KDD, Language Model for Multilingual Tweet Representations at Twitter,”
2023. in KDD., 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 23
[58] Zhang, X., Zhang, C., Dong, X.L., Shang, J. and Han, J., “Minimally- [83] Yasunaga, M., Bosselut, A., Ren, H., Zhang, X., Manning,
supervised structure-rich text categorization via learning on text-rich C.D., Liang, P.S. and Leskovec, J., “Deep bidirectional language-
networks,” in WWW., 2021. knowledge graph pretraining,” in NeurIPs., 2022.
[59] Chien, E., Chang, W.C., Hsieh, C.J., Yu, H.F., Zhang, J., Milenkovic, [84] Huang, J., Zhang, X., Mei, Q. and Ma, J., “CAN LLMS EF-
O., and Dhillon, I.S., “Node feature extraction by self-supervised FECTIVELY LEVERAGE GRAPH STRUCTURAL INFORMATION:
multi-scale neighborhood prediction,” in ICLR., 2022. WHEN AND WHY,” in arXiv preprint arXiv:2309.16595.., 2023.
[60] Zhang, Y., Shen, Z., Wu, C.H., Xie, B., Hao, J., Wang, Y.Y., Wang, K. [85] Kipf, T.N. and Welling, M., “Semi-supervised classification with
and Han, J., “Metadata-induced contrastive learning for zero-shot graph convolutional networks,” in ICLR., 2017.
multi-label text classification,” in WWW., 2022. [86] Hamilton, W., Ying, Z. and Leskovec, J., “Inductive representation
[61] Dinh, T.A., Boef, J.D., Cornelisse, J. and Groth, P., “E2EG: End- learning on large graphs,” in NeurIPs., 2017.
to-End Node Classification Using Graph Topology and Text-based [87] Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P. and
Node Attributes,” in arXiv preprint arXiv:2208.04609., 2022. Bengio, Y., “Graph attention networks,” in ICLR., 2018.
[62] Zhao, J., Qu, M., Li, C., Yan, H., Liu, Q., Li, R., Xie, X. and Tang, [88] Zhang, S., Liu, Y., Sun, Y. and Shah, N., “Graph-less Neural
J., “Learning on large-scale text-attributed graphs via variational Networks: Teaching Old MLPs New Tricks Via Distillation,” in
inference,” in ICLR., 2023. ICLR., 2022.
[63] Wen, Z. and Fang, Y., “Augmenting Low-Resource Text Classifica- [89] Liu, M., Gao, H. and Ji, S., “Towards deeper graph neural networks,”
tion with Graph-Grounded Pre-training and Prompting,” in SIGIR., in KDD., 2020.
2023. [90] Meng, Y., Huang, J., Zhang, Y. and Han, J., “Generating training
[64] Chen, Z., Mao, H., Wen, H., Han, H., Jin, W., Zhang, H., Liu, H. data with language models: Towards zero-shot language under-
and Tang, J., “Label-free Node Classification on Graphs with Large standing,” in NeurIPs., 2022.
Language Models (LLMS),” in arXiv preprint arXiv:2310.04668., 2023. [91] Sun, Y., Han, J., Yan, X., Yu, P.S. and Wu, T., “Pathsim: Meta
[65] Huang, X., Han, K., Bao, D., Tao, Q., Zhang, Z., Yang, Y. and Zhu, path-based top-k similarity search in heterogeneous information
Q., “Prompt-based Node Feature Extractor for Few-shot Learning on networks,” in VLDB., 2011.
Text-Attributed Graphs,” in arXiv preprint arXiv:2309.02848., 2023. [92] Liu, H., Li, C., Wu, Q. and Lee, Y.J., “Visual instruction tuning,” in
[66] Zhao, J., Zhuo, L., Shen, Y., Qu, M., Liu, K., Bronstein, M., Zhu, Z. NeurIPs., 2023.
and Tang, J., “Graphtext: Graph reasoning in text space,” in arXiv [93] Park, C., Kim, D., Han, J. and Yu, H., “Unsupervised attributed
preprint arXiv:2310.01089., 2023. multiplex network embedding,” in AAAI., 2020.
[67] Meng, Y., Zong, S., Li, X., Sun, X., Zhang, T., Wu, F. and Li, J., [94] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
“Gnn-lm: Language modeling based on global contexts via gnn,” in A.N., Kaiser, Ł. and Polosukhin, I., “Attention is all you need,” in
ICLR., 2022. NeurIPs., 2017.
[68] Zhang, X., Bosselut, A., Yasunaga, M., Ren, H., Liang, P., Manning, [95] Haveliwala, T.H., “Topic-sensitive pagerank,” in WWW., 2002.
C.D. and Leskovec, J., “Greaselm: Graph reasoning enhanced [96] Oord, A.V.D., Li, Y. and Vinyals, O., “Representation learning with
language models for question answering,” in ICLR., 2022. contrastive predictive coding,” in arXiv preprint arXiv:1807.03748.,
[69] Ioannidis, V.N., Song, X., Zheng, D., Zhang, H., Ma, J., Xu, Y., Zeng, 2018.
B., Chilimbi, T. and Karypis, G., “Efficient and effective training of [97] Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agar-
language and graph neural network models,” in AAAI, 2023. wal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and Krueger,
[70] Mavromatis, C., Ioannidis, V.N., Wang, S., Zheng, D., Adeshina, S., G., “Learning transferable visual models from natural language
Ma, J., Zhao, H., Faloutsos, C. and Karypis, G., “Train Your Own supervision,” in ICML., 2021.
GNN Teacher: Graph-Aware Distillation on Textual Graphs,” in [98] Whalen, R., “Legal networks: The promises and challenges of legal
PKDD, 2023. network analysis,” in Mich. St. L. Rev.., 2016.
[71] He, X., Bresson, X., Laurent, T. and Hooi, B., “Explanations as [99] Friedrich, A. and Palmer, A. and Pinkal, M., “Situation entity types:
Features: LLM-Based Features for Text-Attributed Graphs,” in arXiv automatic classification of clause-level aspect,” in Proceedings of the
preprint arXiv:2305.19523., 2023. 54th Annual Meeting of the Association for Computational Linguistics
[72] Yu, J., Ren, Y., Gong, C., Tan, J., Li, X. and Zhang, X., “Empower Text- (Volume 1: Long Papers)., 2016.
Attributed Graphs Learning with Large Language Models (LLMs),” [100] Guha, N., Nyarko, J., Ho, D.E., Ré, C., Chilton, A., Narayana,
in arXiv preprint arXiv:2310.09872., 2023. A., Chohlas-Wood, A., Peters, A., Waldon, B., Rockmore, D.N. and
[73] Yang, J., Liu, Z., Xiao, S., Li, C., Lian, D., Agrawal, S., Singh, A., Zambrano, D., “Legalbench: A collaboratively built benchmark for
Sun, G. and Xie, X., “GraphFormers: GNN-nested transformers for measuring legal reasoning in large language models,” in arXiv
representation learning on textual graph,” in NeurIPs., 2021. preprint arXiv:2308.11462., 2023.
[74] Jin, B., Zhang, Y., Zhu, Q. and Han, J., “Heterformer: Transformer- [101] Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X. and Xia, F., “Scientific
based deep node representation learning on heterogeneous text-rich paper recommendation: A survey,” in Ieee Access, 7, pp.9324-9339,
networks,” in KDD., 2023. 2019.
[75] Jin, B., Zhang, Y., Meng, Y. and Han, J., “Edgeformers: Graph- [102] Chowdhury, S. and Schoen, M.P., “Research paper classification
Empowered Transformers for Representation Learning on Textual- using supervised machine learning techniques,” in Intermountain
Edge Networks,” in ICLR., 2023. Engineering, Technology and Computing, 2020.
[76] Jin, B., Zhang, W., Zhang, Y., Meng, Y., Zhao, H. and Han, J., [103] Madigan, D., Genkin, A., Lewis, D.D., Argamon, S., Fradkin, D.
“Learning Multiplex Embeddings on Text-rich Networks with One and Ye, L., “Author identification on the large scale,” in Proceedings
Text Encoder,” in arXiv preprint arXiv:2310.06684., 2023. of the 2005 Meeting of the Classification Society of North America (CSNA),
[77] Qin, Y., Wang, X., Zhang, Z. and Zhu, W., “Disentangled Represen- 2005.
tation Learning with Large Language Models for Text-Attributed [104] He, X., Deng, K., Wang, X., Li, Y., Zhang, Y. and Wang, M.,
Graphs,” in arXiv preprint arXiv:2310.18152., 2023. “Lightgcn: Simplifying and powering graph convolution network
[78] Zhang, Y., Shen, Z., Dong, Y., Wang, K. and Han, J., “MATCH: for recommendation,” in SIGIR, 2020.
Metadata-aware text classification in a large hierarchy,” in WWW., [105] Chang, J., Gao, C., He, X., Jin, D. and Li, Y., “Bundle recommenda-
2021. tion with graph convolutional networks,” in SIGIR, 2020.
[79] Zhu, J., Cui, Y., Liu, Y., Sun, H., Li, X., Pelger, M., Yang, T., Zhang, [106] Xu, H., Liu, B., Shu, L. and Yu, P., “Open-world learning and
L., Zhang, R. and Zhao, H., “Textgnn: Improving text encoder via application to product classification,” in WWW, 2019.
graph neural network in sponsored search,” in WWW., 2021. [107] Chen, L., Xie, Y., Zheng, Z., Zheng, H. and Xie, J., “Friend recom-
[80] Li, C., Pang, B., Liu, Y., Sun, H., Liu, Z., Xie, X., Yang, T., Cui, mendation based on multi-social graph convolutional network,” in
Y., Zhang, L. and Zhang, Q., “Adsgnn: Behavior-graph augmented IEEE Access, 8, pp.43618-43629, 2020.
relevance modeling in sponsored search,” in SIGIR., 2021. [108] Wang, G., Zhang, X., Tang, S., Zheng, H. and Zhao, B.Y., “Unsu-
[81] Zhang, J., Chang, W.C., Yu, H.F. and Dhillon, I., “Fast multi- pervised clickstream clustering for user behavior analysis,” in CHI,
resolution transformer fine-tuning for extreme multi-label text 2016.
classification,” in NeurIPs., 2021. [109] Shchur, O. and Günnemann, S., “Overlapping community detec-
[82] Xie, H., Zheng, D., Ma, J., Zhang, H., Ioannidis, V.N., Song, X., Ping, tion with graph neural networks,” in arXiv preprint arXiv:1909.12201.,
Q., Wang, S., Yang, C., Xu, Y. and Zeng, B., “Graph-Aware Language 2019.
Model Pre-Training on a Large Graph Corpus Can Help Multiple [110] Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S.,
Graph Applications,” in KDD., 2023. Yogatama, D., Bosma, M., Zhou, D., Metzler, D. and Chi, E.H., 2022.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 24
”Emergent Abilities of Large Language Models” in Transactions on of large language model with knowledge graph” in arXiv preprint
Machine Learning Research, 2022. arXiv:2307.07697, 2023.
[111] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y. and Iwasawa, Y., 2022. [133] Danny Z. Chen. 1996. ”Developing algorithms and software for
”Large language models are zero-shot reasoners” in Advances in geometric path planning problems” in ACM Comput. Surv. 28, 4es
neural information processing systems, 35, pp.22199-22213. (Dec. 1996), 18–es. https://doi.org/10.1145/242224.242246, 1996.
[112] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, [134] Iqbal A., Hossain Md., Ebna A. (2018). ”Airline Scheduling with
E., Le, Q.V. and Zhou, D., 2022. ”Chain-of-thought prompting Max Flow algorithm” in International Journal of Computer Applications,
elicits reasoning in large language models” in Advances in Neural 2018.
Information Processing Systems, 35, pp.24824-24837. [135] Li Jiang, Xiaoning Zang, Ibrahim I.Y. Alghoul, Xiang Fang, Junfeng
[113] Radford, A., 2019. ”Language Models are Unsupervised Multitask Dong, Changyong Liang, 2022. ”Scheduling the covering delivery
Learners” in OpenAI Blog, 2019. problem in last mile delivery” in Expert Systems with Applications,
[114] Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. ”Efficient 2022.
estimation of word representations in vector space” in arXiv preprint [136] Zhang, X., Wang, L., Helwig, J., Luo, Y., Fu, C., Xie, Y., ... & Ji, S.
arXiv:1301.3781. (2023). Artificial intelligence for science in quantum, atomistic, and
[115] Pennington, J., Socher, R. and Manning, C.D., 2014, October. continuum systems. arXiv preprint arXiv:2307.08423.
”Glove: Global vectors for word representation” in Proceedings of [137] Rusch, T. K., Bronstein, M. M., & Mishra, S. (2023). A sur-
the 2014 conference on empirical methods in natural language processing vey on oversmoothing in graph neural networks. arXiv preprint
(EMNLP) (pp. 1532-1543). arXiv:2303.10993.
[116] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P. and Soricut, [138] Topping, J., Di Giovanni, F., Chamberlain, B. P., Dong, X., & Bron-
R., 2019, September. ”ALBERT: A Lite BERT for Self-supervised stein, M. M. (2021). Understanding over-squashing and bottlenecks
Learning of Language Representations” in International Conference on graphs via curvature. arXiv preprint arXiv:2111.14522.
on Learning Representations. [139] Zhang, B., Luo, S., Wang, L., & He, D. (2023). Rethinking the
[117] Clark, K., Luong, M.T., Le, Q.V. and Manning, C.D., 2019, expressive power of gnns via graph biconnectivity. arXiv preprint
September. ”ELECTRA: Pre-training Text Encoders as Discriminators arXiv:2301.09505.
Rather Than Generators” in International Conference on Learning [140] Müller L, Galkin M, Morris C, Rampášek L. Attending to graph
Representations. transformers. arXiv preprint arXiv:2302.04181. 2023 Feb 8.
[118] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, [141] Ying, C., Cai, T., Luo, S., Zheng, S., Ke, G., He, D., ... & Liu, T. Y.
E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S. and Nori, H., (2021). Do transformers really perform badly for graph represen-
2023. ”Sparks of artificial general intelligence: Early experiments tation?. Advances in Neural Information Processing Systems, 34,
with gpt-4” in arXiv preprint arXiv:2303.12712. 28877-28888.
[119] Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, [142] Rampášek, L., Galkin, M., Dwivedi, V. P., Luu, A. T., Wolf, G.,
Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S. and Bikel, D., & Beaini, D. (2022). Recipe for a general, powerful, scalable graph
2023. ”Llama 2: Open foundation and fine-tuned chat models” in transformer. Advances in Neural Information Processing Systems,
arXiv preprint arXiv:2307.09288. 35, 14501-14515.
[120] Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chap-
[143] Liu, G., Zhao, T., Inae, E., Luo, T., & Jiang, M. (2023).
lot, D.S., Casas, D.D.L., Bressand, F., Lengyel, G., Lample, G.,
Semi-Supervised Graph Imbalanced Regression. arXiv preprint
Saulnier, L. and Lavaud, L.R., 2023. ”Mistral 7B” in arXiv preprint
arXiv:2305.12087.
arXiv:2310.06825.
[144] Wu Q, Zhao W, Li Z, Wipf DP, Yan J. Nodeformer: A scalable graph
[121] Alayrac, J.B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson,
structure learning transformer for node classification. Advances in
Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M. and Ring, R.,
Neural Information Processing Systems. 2022 Dec 6;35:27387-401.
2022. ”Flamingo: a visual language model for few-shot learning” in
Advances in Neural Information Processing Systems (pp. 23716-23736). [145] Su, J., Lu, Y., Pan, S., Murtadha, A., Wen, B., & Liu, Y. (2021).
Roformer: Enhanced transformer with rotary position embedding.
[122] Edwards, C., Zhai, C. and Ji, H., 2021, November. ”Text2mol:
arXiv preprint arXiv:2104.09864.
Cross-modal molecule retrieval with natural language queries” in
Proceedings of the 2021 Conference on Empirical Methods in Natural [146] Balaban, A. T., Applications of graph theory in chemistry. Journal
Language Processing (pp. 595-607). of chemical information and computer sciences, 25(3), 334-343, 1985.
[123] Edwards, C., Lai, T., Ros, K., Honke, G., Cho, K. and Ji, H., 2022, [147] Liu, G., Zhao, T., Xu, J., Luo, T., & Jiang, M., Graph rationalization
December. ”Translation between Molecules and Natural Language” with environment-based augmentations, In ACM SIGKDD, 2022.
in Proceedings of the 2022 Conference on Empirical Methods in Natural [148] Bran, A. M., Cox, S., White, A. D., & Schwaller, P., ChemCrow:
Language Processing (pp. 375-413). Augmenting large-language models with chemistry tools, arXiv
[124] Wang, H., Feng, S., He, T., Tan, Z., Han, X. and Tsvetkov, Y., 2023. preprint arXiv:2304.05376, 2023.
”Can Language Models Solve Graph Problems in Natural Language?” [149] Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V.
in arXiv preprint arXiv:2305.10037., 2023. N., Smola, A. J., & Kriegel, H. P., Protein function prediction via
[125] Liu, C. and Wu, B., 2023. ”Evaluating large language models on graph kernels. Bioinformatics, 21, i47-i56, 2005.
graphs: Performance insights and comparative analysis” in arXiv [150] Riesen, K., & Bunke, H., IAM graph database repository for
preprint arXiv:2308.11224, 2023. graph based pattern recognition and machine learning. In Structural,
[126] Guo, J., Du, L. and Liu, H., 2023. ”GPT4Graph: Can Large Syntactic, and Statistical Pattern Recognition: Joint IAPR International
Language Models Understand Graph Structured Data? An Empirical Workshop, SSPR & SPR 2008, Orlando, USA, December 4-6, 2008.
Evaluation and Benchmarking” in arXiv preprint arXiv:2305.15066, Proceedings (pp. 287-297). Springer Berlin Heidelberg.
2023. [151] Jain, N., Coyle, B., Kashefi, E., & Kumar, N., Graph neural network
[127] Zhang, J., 2023. ”Graph-ToolFormer: To Empower LLMs with initialisation of quantum approximate optimisation. Quantum, 6,
Graph Reasoning Ability via Prompt Augmented by ChatGPT” in 861, 2022.
arXiv preprint arXiv:2304.11116, 2023. [152] Weininger, D., SMILES, a chemical language and information
[128] Zhang, Z., Wang, X., Zhang, Z., Li, H., Qin, Y., Wu, S. and Zhu, system. 1. Introduction to methodology and encoding rules. Journal
W., 2023. ”LLM4DyG: Can Large Language Models Solve Problems of chemical information and computer sciences, 28(1), 31-36, 1988
on Dynamic Graphs?” in arXiv preprint arXiv:2310.17110, 2023. [153] Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I. InChI-
[129] Luo, L., Li, Y.F., Haffari, G. and Pan, S., 2023. ”Reasoning on the worldwide chemical structure identifier standard. Journal of
graphs: Faithful and interpretable large language model reasoning” cheminformatics. 2013 Dec;5(1):1-9.
in arXiv preprint arXiv:2310.01061, 2023. [154] O’Boyle, N., & Dalke, A., DeepSMILES: an adaptation of SMILES
[130] Jiang, J., Zhou, K., Dong, Z., Ye, K., Zhao, W.X. and Wen, J.R., 2023. for use in machine-learning of chemical structures, 2018.
”Structgpt: A general framework for large language model to reason [155] Krenn, M., Häse, F., Nigam, A., Friederich, P., & Aspuru-Guzik, A.,
over structured data” in arXiv preprint arXiv:2305.09645, 2023. Self-referencing embedded strings (SELFIES): A 100% robust molec-
[131] Fatemi, B., Halcrow, J. and Perozzi, B., 2023. ”Talk like a graph: ular string representation. Machine Learning: Science and Technology,
Encoding graphs for large language models” in arXiv preprint 1(4), 045024, 2020.
arXiv:2310.04560, 2023. [156] Bjerrum, E. J. (2017). SMILES enumeration as data augmenta-
[132] Sun, J., Xu, C., Tang, L., Wang, S., Lin, C., Gong, Y., Shum, H.Y. tion for neural network modeling of molecules. arXiv preprint
and Guo, J., 2023. ”Think-on-graph: Deep and responsible reasoning arXiv:1703.07076.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 25
[157] Arús-Pous, J., Johansson, S. V., Prykhodko, O., Bjerrum, E. J., [182] Lacombe, R., Gaut, A., He, J., Lüdeke, D., & Pistunova, K., Extract-
Tyrchan, C., Reymond, J. L., ... & Engkvist, O. (2019). Randomized ing Molecular Properties from Natural Language with Multimodal
SMILES strings improve the quality of molecular generative models. Contrastive Learning, ICML Workshop on Computational Biology, 2023.
Journal of cheminformatics, 11(1), 1-13. [183] Su, B., Du, D., Yang, Z., Zhou, Y., Li, J., Rao, A., ... & Wen, J. R.,
[158] Tetko IV, Karpov P, Bruno E, Kimber TB, Godin G. Augmentation A molecular multimodal foundation model associating molecule
is what you need!. InInternational Conference on Artificial Neural graphs with natural language, arXiv preprint arXiv:2209.05481. 2022.
Networks 2019 Sep 9 (pp. 831-835). Cham: Springer International [184] Zeng, Z., Yao, Y., Liu, Z., & Sun, M., A deep-learning system
Publishing. bridging molecule structure and biomedical text with comprehen-
[159] van Deursen R, Ertl P, Tetko IV, Godin G. GEN: highly efficient sion comparable to human professionals, Nature communications,
SMILES explorer using autodidactic generative examination net- 13(1), 862 ,2022.
works. Journal of Cheminformatics. 2020 Dec;12(1):1-4. [185] Iwayama, M., Wu, S., Liu, C., & Yoshida, R., Functional Output
[160] Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C., & Laino, T., Regression for Machine Learning in Materials Science. Journal of
“Found in Translation”: predicting outcomes of complex organic Chemical Information and Modeling, 62(20), 4837-4851, 2022.
chemistry reactions using neural sequence-to-sequence models. [186] Bagal V, Aggarwal R, Vinod PK, Priyakumar UD. MolGPT:
Chemical science, 9(28), 6091-6098, 2018. molecular generation using a transformer-decoder model. Journal of
[161] Morgan, H. L.,. The generation of a unique machine description Chemical Information and Modeling. 2021 Oct 25;62(9):2064-76.
for chemical structures-a technique developed at chemical abstracts [187] Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A.,
service. Journal of chemical documentation, 5(2), 107-113, 1965. Saravia, E., ... & Stojnic, R., Galactica: A large language model for
[162] Sennrich, R., Haddow, B., & Birch, A. Neural machine translation science. arXiv preprint arXiv:2211.09085, 2022.
of rare words with subword units, in ACL, 2016. [188] Wang, S., Guo, Y., Wang, Y., Sun, H., & Huang, J., Smiles-bert: large
[163] Kudo, T., & Richardson, J., Sentencepiece: A simple and language scale unsupervised pre-training for molecular property prediction.
independent subword tokenizer and detokenizer for neural text In BCB, 2019
processing, in EMNLP, 2018. [189] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J.,
[164] Irwin, R., Dimitriadis, S., He, J., & Bjerrum, E. J. (2022). Chem- BioBERT: a pre-trained biomedical language representation model
former: a pre-trained transformer for computational chemistry. for biomedical text mining. Bioinformatics, 36(4), 1234-1240, 2020.
Machine Learning: Science and Technology, 3(1), 015022. [190] Ma, R., & Luo, T. (2020). PI1M: a benchmark database for polymer
[165] Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, informatics. Journal of Chemical Information and Modeling, 60(10),
Zhang J, Dong Z, Du Y. A survey of large language models. arXiv 4684-4690.
preprint arXiv:2303.18223. 2023 Mar 31. [191] Li, X., Xu, Y., Lai, L., & Pei, J. Prediction of human cytochrome
[166] Shi, Y., Zhang, A., Zhang, E., Liu, Z., & Wang, X., ReLM: P450 inhibition using a multitask deep autoencoder neural network.
Leveraging Language Models for Enhanced Chemical Reaction Molecular pharmaceutics, 15(10), 4336-4345, 2020.
Prediction, in EMNLP, 2023. [192] Stanley M, Bronskill JF, Maziarz K, Misztela H, Lanini J, Segler M,
[167] Liu P, Ren Y, Ren Z., Git-mol: A multi-modal large language model Schneider N, Brockschmidt M. Fs-mol: A few-shot learning dataset
for molecular science with graph, image, and text, arXiv preprint of molecules, in NeurIPS, 2021.
arXiv:2308.06911, 2023 [193] Hastings, J., Owen, G., Dekker, A., Ennis, M., Kale, N., Muthukr-
[168] Ock J, Guntuboina C, Farimani AB. Catalyst Property Prediction ishnan, V., ... & Steinbeck, C., ChEBI in 2016: Improved services and
with CatBERTa: Unveiling Feature Exploration Strategies through an expanding collection of metabolites. Nucleic acids research, 44(D1),
Large Language Models. arXiv preprint arXiv:2309.00563, 2023. D1214-D1219, 2016.
[169] Fang Y, Liang X, Zhang N, Liu K, Huang R, Chen Z, Fan X, Chen [194] Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., ... &
H., Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset Bolton, E. E., PubChem 2019 update: improved access to chemical
for Large Language Models. arXiv preprint arXiv:2306.08018, 2023. data, Nucleic acids research, 47(D1), D1102-D1109, 2019.
[170] Abdine H, Chatzianastasis M, Bouyioukos C, Vazirgiannis M., [195] Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M.,
Prot2Text: Multimodal Protein’s Function Generation with GNNs Hersey, A., ... & Overington, J. P., ChEMBL: a large-scale bioactivity
and Transformers, arXiv preprint arXiv:2307.14367, 2023. database for drug discovery. Nucleic acids research, 40(D1), D1100-
[171] Luo Y, Yang K, Hong M, Liu X, Nie Z., MolFM: A Multimodal D1107. 2012.
Molecular Foundation Model, arXiv preprint arXiv:2307.09484, 2023. [196] Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett
[172] Qian, C., Tang, H., Yang, Z., Liang, H., & Liu, Y., Can large S, de Veij M, Ioannidis H, Lopez DM, Mosquera JF, Magarinos MP.
language models empower molecular property prediction? arXiv The ChEMBL Database in 2023: a drug discovery platform spanning
preprint arXiv:2307.07443, 2023 multiple bioactivity data types and time periods. Nucleic Acids
[173] Born, J., & Manica, M., Regression Transformer enables concur- Research. 2023 Nov 2:gkad1004.
rent sequence regression and generation for molecular language [197] Sterling, T. and Irwin, J.J., ZINC 15–ligand discovery for everyone.
modelling. Nature Machine Intelligence, 5(4), 432-444, 2023. Journal of chemical information and modeling, 55(11), pp.2324-2337,
[174] Li J, Liu Y, Fan W, Wei XY, Liu H, Tang J, Li Q., Empower- 2015.
ing Molecule Discovery for Molecule-Caption Translation with [198] Blum LC, Reymond JL. 970 million druglike small molecules for
Large Language Models: A ChatGPT Perspective. arXiv preprint virtual screening in the chemical universe database GDB-13. Journal
arXiv:2306.06615, 2023. of the American Chemical Society. 2009 Jul 1;131(25):8732-3.
[175] Zeng, Z., Yin, B., Wang, S., Liu, J., Yang, C., Yao, H., ... & Liu, [199] Mellor, C. L., Robinson, R. M., Benigni, R., Ebbrell, D., Enoch, S.
Z., Interactive Molecular Discovery with Natural Language. arXiv J., Firman, J. W., ... & Cronin, M. T. D. (2019). Molecular fingerprint-
preprint arXiv:2306.11976, 2023. derived similarity measures for toxicological read-across: Recom-
[176] Liu Z, Li S, Luo Y, Fei H, Cao Y, Kawaguchi K, Wang X, Chua TS., mendations for optimal use. Regulatory Toxicology and Pharmacology,
MolCA: Molecular Graph-Language Modeling with Cross-Modal 101, 121-134.
Projector and Uni-Modal Adapter, in EMNLP, 2023. [200] Maggiora, G., Vogt, M., Stumpfe, D., & Bajorath, J. (2014). Molec-
[177] Guo T, Guo K, Liang Z, Guo Z, Chawla NV, Wiest O, Zhang X. ular similarity in medicinal chemistry: miniperspective. Journal of
What indeed can GPT models do in chemistry? A comprehensive medicinal chemistry, 57(8), 3186-3204.
benchmark on eight tasks. in NeurIPS, 2023. [201] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints.
[178] Liu Z, Zhang W, Xia Y, Wu L, Xie S, Qin T, Zhang M, Liu TY., Journal of chemical information and modeling, 50(5), 742-754.
MolXPT: Wrapping Molecules with Text for Generative Pre-training, [202] Beisken, S., Meinl, T., Wiswedel, B., de Figueiredo, L. F., Berthold,
in ACL, 2023. M., & Steinbeck, C. (2013). KNIME-CDK: Workflow-driven chemin-
[179] Seidl, P., Vall, A., Hochreiter, S., & Klambauer, G., Enhancing formatics. BMC bioinformatics, 14, 1-4.
activity prediction models in drug discovery with the ability to [203] Krenn, M., Ai, Q., Barthel, S., Carson, N., Frei, A., Frey, N. C., ...
understand human language, in ICML, 2023. & Aspuru-Guzik, A. (2022). SELFIES and the future of molecular
[180] Christofidellis, D., Giannone, G., Born, J., Winther, O., Laino, T., string representations. Patterns, 3(10).
& Manica, M., Unifying molecular and textual representations via [204] Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., ... &
multi-task language modelling, in ICML, 2023. Leskovec, J., Open graph benchmark: Datasets for machine learning
[181] Liu, S., Nie, W., Wang, C., Lu, J., Qiao, Z., Liu, L., ... & Anandku- on graphs. In NeurIPS, 2020.
mar, A. Multi-modal molecule structure-text model for text-based [205] Xu, K., Hu, W., Leskovec, J., & Jegelka, S. How powerful are graph
retrieval and editing, Nature Machine Intelligence, 2023. neural networks? In ICLR, 2019.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 26
[206] Leman, A. A., & Weisfeiler, B. (1968). A reduction of a graph [232] Jo, J., Lee, S., & Hwang, S. J. (2022, June). Score-based genera-
to a canonical form and an algebra arising during this reduction. tive modeling of graphs via the system of stochastic differential
Nauchno-Technicheskaya Informatsiya, 2(9), 12-16. equations. In International Conference on Machine Learning (pp.
[207] Li, J., Li, D., Savarese, S., & Hoi, S., Blip-2: Bootstrapping language- 10362-10383). PMLR.
image pre-training with frozen image encoders and large language
models. arXiv preprint arXiv:2301.12597.
[208] Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C.,
Pappu, A. S., ... & Pande, V. (2018). MoleculeNet: a benchmark for
molecular machine learning. Chemical science, 9(2), 513-530.
[209] AIDS Antiviral Screen Data. https://wiki.nci.nih.gov/display/NCIDTPdata/
AIDS+Antiviral+Screen+Data, Accessed: 2017-09-27
[210] Subramanian, G., Ramsundar, B., Pande, V., & Denny, R. A. (2016).
Computational modeling of β -secretase 1 (BACE-1) inhibitors using
ligand based approaches. Journal of chemical information and
modeling, 56(10), 1936-1949.
[211] Martins, I. F., Teixeira, A. L., Pinheiro, L., & Falcao, A. O. (2012).
A Bayesian approach to in silico blood-brain barrier penetration
modeling. Journal of chemical information and modeling, 52(6),
1686-1697.
[212] Tox21 Challenge. https://tripod.nih.gov/tox21/challenge/, Ac-
cessed: 2017-09- 27
[213] Altae-Tran H, Ramsundar B, Pappu AS, Pande V. Low data drug
discovery with one-shot learning. ACS central science. 2017 Apr
26;3(4):283-93.
[214] Novick PA, Ortiz OF, Poelman J, Abdulhay AY, Pande VS.
SWEETLEAD: an in silico database of approved drugs, regulated
chemicals, and herbal isolates for computer-aided drug discovery.
PloS one. 2013 Nov 1;8(11):e79568.
[215] Aggregate Analysis of ClincalTrials.gov (AACT) Database.
https://www.ctti-clinicaltrials.org/aact-database, Accessed: 2017-
09-27.
[216] Mobley DL, Guthrie JP. FreeSolv: a database of experimental
and calculated hydration free energies, with input files. Journal of
computer-aided molecular design. 2014 Jul;28:711-20.
[217] Delaney, J. S. (2004). ESOL: estimating aqueous solubility directly
from molecular structure. Journal of chemical information and
computer sciences, 44(3), 1000-1005.
[218] Zhao, T., Liu, G., Wang, D., Yu, W., & Jiang, M. (2022, June). Learn-
ing from counterfactual links for link prediction. In International
Conference on Machine Learning (pp. 26911-26926). PMLR.
[219] Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., &
Frossard, P. (2022). Digress: Discrete denoising diffusion for graph
generation. arXiv preprint arXiv:2209.14734.
[220] Zang, C., & Wang, F. Moflow: an invertible flow model for
generating molecular graphs. In ACM SIGKDD, 2020.
[221] Deng, J., Yang, Z., Wang, H., Ojima, I., Samaras, D., & Wang, F.
(2023). A systematic study of key elements underlying molecular
property prediction. Nature Communications, 14(1), 6395.
[222] Böhm, H. J., Flohr, A., & Stahl, M. (2004). Scaffold hopping. Drug
discovery today: Technologies, 1(3), 217-224.
[223] Renz, P., Van Rompaey, D., Wegner, J. K., Hochreiter, S., &
Klambauer, G. (2019). On failure modes in molecule generation
and optimization. Drug Discovery Today: Technologies, 32, 55-63.
[224] Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov,
S., Tatanov, O., Belyaev, S., ... & Zhavoronkov, A. (2020). Molecular
sets (MOSES): a benchmarking platform for molecular generation
models. Frontiers in pharmacology, 11, 565644.
[225] Reymond, J. L. (2015). The chemical space project. Accounts of
Chemical Research, 48(3), 722-730.
[226] Lin, A., Horvath, D., Afonina, V., Marcou, G., Reymond, J. L., &
Varnek, A. (2018). Mapping of the Available Chemical Space versus
the Chemical Universe of Lead-Like Compounds. ChemMedChem,
13(6), 540-554.
[227] Gao, W., Fu, T., Sun, J., & Coley, C. (2022). Sample efficiency mat-
ters: a benchmark for practical molecular optimization. Advances in
Neural Information Processing Systems, 35, 21342-21357.
[228] Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., ... & Zitnik,
M. (2023). Scientific discovery in the age of artificial intelligence.
Nature, 620(7972), 47-60.
[229] Liu, G., Inae, E., Zhao, T., Xu, J., Luo, T., & Jiang, M. (2023). Data-
Centric Learning from Unlabeled Graphs with Diffusion Model.
arXiv preprint arXiv:2303.10108.
[230] https://practicalcheminformatics.blogspot.com/2023/08/we-
need-better-benchmarks-for-machine.html?m=1
[231] Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G.,
& Cubuk, E. D. (2023). Scaling deep learning for materials discovery.
Nature, 1-6.