A Systematic Survey On Large Language Models For Algorithm Design
A Systematic Survey On Large Language Models For Algorithm Design
China
XIALIANG TONG, Huawei Noah’s Ark Lab, China
MINGXUAN YUAN, Huawei Noah’s Ark Lab, China
ZHICHAO LU, City University of Hong Kong, China
ZHENKUN WANG, Southern University of Science and Technology, China
QINGFU ZHANG∗ , City University of Hong Kong, China
Algorithm Design (AD) is crucial for effective problem-solving across various domains. The advent of Large Language
Models (LLMs) has notably enhanced the automation and innovation within this field, offering new perspectives and
promising solutions. Over the past three years, the integration of LLMs into AD (LLM4AD) has seen substantial
progress, with applications spanning optimization, machine learning, mathematical reasoning, and scientific discovery.
Given the rapid advancements and expanding scope of this field, a systematic review is both timely and necessary.
This paper provides a systematic review of LLM4AD. First, we offer an overview and summary of existing studies.
Then, we introduce a taxonomy and review the literature across four dimensions: the roles of LLMs, search methods,
prompt methods, and application domains with a discussion of potential and achievements of LLMs in AD. Finally,
we identify current challenges and highlight several promising directions for future research.
Additional Key Words and Phrases: Large language model, Automated algorithm design, Optimization, Heuristic,
Hyperheuristic, Evolutionary algorithm.
1 Introduction
Algorithms play a crucial role in addressing various problems across various domains such as industry,
economics, healthcare, and technology [26, 75]. Traditionally, designing algorithm has been a labor-intensive
process that demands deep expertise. Recently, there has been a surge in interest towards employing learning
∗
the corresponding author
Authors’ Contact Information: Fei Liu, [email protected], City University of Hong Kong, Hong Kong, China; Yiming
Yao, [email protected], City University of Hong Kong, Hong Kong, China; Ping Guo, [email protected].
hk, City University of Hong Kong, Hong Kong, China; Zhiyuan Yang, [email protected], City University of Hong
Kong, Hong Kong, China and Huawei Noah’s Ark Lab, Hong Kong, China; Xi Lin, [email protected], City University of
Hong Kong, Hong Kong, China; Zhe Zhao, [email protected], City University of Hong Kong, Hong Kong, China and
University of Science and Technology of China, Hefei, China; Xialiang Tong, [email protected], Huawei Noah’s Ark
Lab, Shenzhen, China; Mingxuan Yuan, [email protected], Huawei Noah’s Ark Lab, Hong Kong, China; Zhichao
Lu, [email protected], City University of Hong Kong, Hong Kong, China; Zhenkun Wang, [email protected],
Southern University of Science and Technology, Shenzhen, China; Qingfu Zhang, [email protected], City University
of Hong Kong, Hong Kong, China.
Manuscript under review 1
and computational intelligence methods techniques to enhance and automate the algorithm development
process [10, 142].
In the realm of artificial intelligence, Large Language Models (LLMs) have marked a significant ad-
vancement. Characterized by their vast scale, extensive training, and superior performance, LLMs have
made notable impacts in the fields such as mathematical reasoning [4], code generation [72], and scientific
discovery [152].
Over the past three years, the application of Large Language Models for Algorithm Design (LLM4AD)
has merged as a promising research area with the potential to fundamentally transform the ways in which
algorithms are designed, optimized and implemented. The remarkable capability and flexibility of LLMs have
demonstrated potential in enhancing the algorithm design process, including performance prediction [56],
heuristic generation [88], code optimization [59], and even the invention of new algorithmic ideas [46]
specifically tailored to target tasks. This approach not only reduces the human effort required in the design
phase but also enhances the creativity and efficiency of the produced solutions [88, 128].
While LLM4AD is gaining traction, there is a notable absence of a systematic review in this emerging
field. The existing literature primarily focuses on the applications of LLMs within specific algorithmic
contexts. For instance, several studies have been conducted to survey the use of LLMs for optimization
topics [58, 64, 163], while others review general LLM applications [53] or their use in particular domains
such as electronic design automation [189], planning [118], recommendation systems [162], and agents [154].
This paper aims to address this gap by providing a systematic review with a multi-dimensional taxonomy
of the current state of LLMs in algorithm design. We will also explore various applications, discuss key
challenges, and propose directions for future research. By synthesizing these insights, this paper contributes
to a deeper understanding of the potential of LLMs to enhance and automate algorithm design and lays
the groundwork for further innovations in this exciting field. We expect this paper to be a helpful resource
for both newcomers to the field and experienced experts seeking a consolidated and systematic update on
current developments. The contributions of this paper are outlined as follows:
∙ Systematic Review of LLM4AD: We present the first systematic review of the developments in using
LLMs for algorithm design, covering a significant corpus of 180+ highly related research papers
published in the last three years.
∙ Development of a Multi-dimensional Taxonomy: We introduce a multi-dimensional taxonomy that
categorizes the works and functionalities of LLM4AD into four distinct dimensions: 1) Roles of LLMs
in algorithm design, which delineates how these models contribute to or enhance algorithm design; 2)
Search methods, which explores the various approaches used by LLMs to navigate and optimize search
spaces in algorithm design; 3) Prompt methods, which examines how diverse prompting strategies are
used; and 4) Application domains, which identifies the key fields and industries where LLMs are being
applied to solve complex algorithmic challenges. This taxonomy not only clarifies the landscape but
also aids in identifying gaps and opportunities for future research.
∙ Discussion on Challenges and Future Directions: We go beyond mere summarization of existing
literature to critically analyze the limitations present in current research on LLMs for algorithm design.
Furthermore, we highlight potential future research directions, including developing domain-specific
LLMs, exploring multi-modal LLMs, facilitating human-LLM interaction, using LLMs for algorithm
2
assessment and understanding LLM behavior, advancing fully automated algorithm design, and
benchmarking for systematic evaluation of LLMs in algorithm design. This discussion is intended to
spur novel approaches and foster further advancements in the field.
∙ The term Large Language Models refers to language models of sufficient scale. These models typically
utilize a transformer architecture and operate in an autoregressive manner [188]. Studies employing
smaller models for algorithm design, such as conventional model-based and machine learning-assisted
algorithms [10], are excluded. Research utilizing other large models that lack language processing
capabilities, such as purely vision-based models, are not considered. However, multi-modal LLMs that
include language processing are within our scope.
∙ The term Algorithm in this context refers to a set of mathematical instructions or rules designed to
solve a problem, particularly when executed by a computer [26]. This broad definition encompasses
traditional mathematical algorithms [4], most heuristic approaches [108], and certain agents or policies
that can be interpreted as algorithms [165].
We introduce the detailed pipeline for paper collection and scanning, which consists of four stages:
∙ Stage I Data Extraction and Collection: We collect the related papers through Google Scholar, Web of
Science, and Scopus. The logic of our search is the title must include any combinations of at least one
of the following two groups of words “LLM”, “LLMs”, “Large Language Model”, “Large Language
Models” and “Algorithm”, “Heuristic”, “Search”, “Optimization”, “Optimizer”, “Design”, “Function”
(e.g., LLM and optimization, LLMs and algorithm). After removing duplicated papers, we ended up
with 850 papers as of July 1, 2024.
∙ Stage II Abstract Scanning: We check the title and abstract of each paper to efficiently exclude
irrelevant papers. The criteria used for exclusion include these papers that are not in English, not for
algorithm design and not using large language models. After scanning, 260 papers are remaining.
∙ Stage III Full Scanning: We thoroughly review each manuscript to exclude papers lacking relevant
content. After scanning, there are 160 papers left.
∙ Stage IV Supplementation: We append some related works manually according to our past knowledge
in this field to avoid missing any important contributions. After integrating the additional papers, we
got 180+ papers in the end.
We will first present an overview of the LLM4AD paper list and then present a taxonomy to systematically
review the progress. In addition to the organized list of papers, we also incorporate some important
publications released after July 1, 2024.
3
Stage I: Data Extraction and Collection
Date: 2020.1.1~2024 7.1
Key Words: Title = (LLM OR Large Language Model) AND (Algorithm OR Heuristic OR Search OR
Optimization OR Optimizer OR Design OR Function)
Database: Google scholar, Web of Science, Scopus
Results (remove duplication): 850 papers
2.2 Overview
Fig. 2a illustrates the trend in the number of papers published over time, with the timeline expressed in
months. The graph shows a marked rise in research activity related to LLM4AD, particularly noting that
most of the studies have been conducted in the last year. This suggests that LLM4AD is an emerging field,
and we expect a significant increase in research output in the near future as scholars from diverse fields
become aware of its considerable potential.
Fig. 2c and Fig. 2b display the leading institutions and their respective countries contributing to
publications on LLM4AD. The United States leads, closely followed by China, with these two countries
alone accounting for 50% of the publications. The next eight countries, including Singapore, Canada, and
Japan, collectively contribute one-third of the total publications. Prominent institutions involved in this
research include esteemed universities such as Tsinghua University, Nanyang Technological University, and
the University of Toronto, alongside major corporations like Huawei, Microsoft, and Google. This distribution
underscores the widespread interest in the research topics and their substantial relevance to practical
applications in the real world.
In Fig. 3, the word cloud is generated from the titles and abstracts of all reviewed papers, with each word
appearing at least five times. It showcases the top 80 keywords, organized into four color-coded clusters
on “language”, “GPT”, “search and optimization”, and “scientific discovery”. Several keywords such as
“evolution”, “strategy”, “optimizer”, and “agent” are also highlighted.
2.3 Taxonomy
This paper presents a taxonomy organized into four dimensions as shown in Fig. 4: 1) LLM Roles, 2) Search
Methods, 3) Prompt Techniques, and 4) Applications.
4
'