0% found this document useful (0 votes)
68 views35 pages

A Systematic Survey On Large Language Models For Algorithm Design

A comprehensive overview of DP: basic principles (optimal substructure, overlapping subproblems), complexity analyses, classic problems (e.g. knapsack), plus applications like bioinformatics, NLP etc.

Uploaded by

Anjali Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views35 pages

A Systematic Survey On Large Language Models For Algorithm Design

A comprehensive overview of DP: basic principles (optimal substructure, overlapping subproblems), complexity analyses, classic problems (e.g. knapsack), plus applications like bioinformatics, NLP etc.

Uploaded by

Anjali Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Systematic Survey on Large Language Models for Algorithm Design

FEI LIU, City University of Hong Kong, China


YIMING YAO, City University of Hong Kong, China
PING GUO, City University of Hong Kong, China
ZHIYUAN YANG, City University of Hong Kong, China and Huawei Noah’s Ark Lab, China
XI LIN, City University of Hong Kong, China
ZHE ZHAO, City University of Hong Kong, China and University of Science and Technology of China,
arXiv:2410.14716v3 [cs.LG] 1 Nov 2024

China
XIALIANG TONG, Huawei Noah’s Ark Lab, China
MINGXUAN YUAN, Huawei Noah’s Ark Lab, China
ZHICHAO LU, City University of Hong Kong, China
ZHENKUN WANG, Southern University of Science and Technology, China
QINGFU ZHANG∗ , City University of Hong Kong, China
Algorithm Design (AD) is crucial for effective problem-solving across various domains. The advent of Large Language
Models (LLMs) has notably enhanced the automation and innovation within this field, offering new perspectives and
promising solutions. Over the past three years, the integration of LLMs into AD (LLM4AD) has seen substantial
progress, with applications spanning optimization, machine learning, mathematical reasoning, and scientific discovery.
Given the rapid advancements and expanding scope of this field, a systematic review is both timely and necessary.
This paper provides a systematic review of LLM4AD. First, we offer an overview and summary of existing studies.
Then, we introduce a taxonomy and review the literature across four dimensions: the roles of LLMs, search methods,
prompt methods, and application domains with a discussion of potential and achievements of LLMs in AD. Finally,
we identify current challenges and highlight several promising directions for future research.

Additional Key Words and Phrases: Large language model, Automated algorithm design, Optimization, Heuristic,
Hyperheuristic, Evolutionary algorithm.

1 Introduction
Algorithms play a crucial role in addressing various problems across various domains such as industry,
economics, healthcare, and technology [26, 75]. Traditionally, designing algorithm has been a labor-intensive
process that demands deep expertise. Recently, there has been a surge in interest towards employing learning

the corresponding author

Authors’ Contact Information: Fei Liu, [email protected], City University of Hong Kong, Hong Kong, China; Yiming
Yao, [email protected], City University of Hong Kong, Hong Kong, China; Ping Guo, [email protected].
hk, City University of Hong Kong, Hong Kong, China; Zhiyuan Yang, [email protected], City University of Hong
Kong, Hong Kong, China and Huawei Noah’s Ark Lab, Hong Kong, China; Xi Lin, [email protected], City University of
Hong Kong, Hong Kong, China; Zhe Zhao, [email protected], City University of Hong Kong, Hong Kong, China and
University of Science and Technology of China, Hefei, China; Xialiang Tong, [email protected], Huawei Noah’s Ark
Lab, Shenzhen, China; Mingxuan Yuan, [email protected], Huawei Noah’s Ark Lab, Hong Kong, China; Zhichao
Lu, [email protected], City University of Hong Kong, Hong Kong, China; Zhenkun Wang, [email protected],
Southern University of Science and Technology, Shenzhen, China; Qingfu Zhang, [email protected], City University
of Hong Kong, Hong Kong, China.
Manuscript under review 1
and computational intelligence methods techniques to enhance and automate the algorithm development
process [10, 142].
In the realm of artificial intelligence, Large Language Models (LLMs) have marked a significant ad-
vancement. Characterized by their vast scale, extensive training, and superior performance, LLMs have
made notable impacts in the fields such as mathematical reasoning [4], code generation [72], and scientific
discovery [152].
Over the past three years, the application of Large Language Models for Algorithm Design (LLM4AD)
has merged as a promising research area with the potential to fundamentally transform the ways in which
algorithms are designed, optimized and implemented. The remarkable capability and flexibility of LLMs have
demonstrated potential in enhancing the algorithm design process, including performance prediction [56],
heuristic generation [88], code optimization [59], and even the invention of new algorithmic ideas [46]
specifically tailored to target tasks. This approach not only reduces the human effort required in the design
phase but also enhances the creativity and efficiency of the produced solutions [88, 128].
While LLM4AD is gaining traction, there is a notable absence of a systematic review in this emerging
field. The existing literature primarily focuses on the applications of LLMs within specific algorithmic
contexts. For instance, several studies have been conducted to survey the use of LLMs for optimization
topics [58, 64, 163], while others review general LLM applications [53] or their use in particular domains
such as electronic design automation [189], planning [118], recommendation systems [162], and agents [154].
This paper aims to address this gap by providing a systematic review with a multi-dimensional taxonomy
of the current state of LLMs in algorithm design. We will also explore various applications, discuss key
challenges, and propose directions for future research. By synthesizing these insights, this paper contributes
to a deeper understanding of the potential of LLMs to enhance and automate algorithm design and lays
the groundwork for further innovations in this exciting field. We expect this paper to be a helpful resource
for both newcomers to the field and experienced experts seeking a consolidated and systematic update on
current developments. The contributions of this paper are outlined as follows:

∙ Systematic Review of LLM4AD: We present the first systematic review of the developments in using
LLMs for algorithm design, covering a significant corpus of 180+ highly related research papers
published in the last three years.
∙ Development of a Multi-dimensional Taxonomy: We introduce a multi-dimensional taxonomy that
categorizes the works and functionalities of LLM4AD into four distinct dimensions: 1) Roles of LLMs
in algorithm design, which delineates how these models contribute to or enhance algorithm design; 2)
Search methods, which explores the various approaches used by LLMs to navigate and optimize search
spaces in algorithm design; 3) Prompt methods, which examines how diverse prompting strategies are
used; and 4) Application domains, which identifies the key fields and industries where LLMs are being
applied to solve complex algorithmic challenges. This taxonomy not only clarifies the landscape but
also aids in identifying gaps and opportunities for future research.
∙ Discussion on Challenges and Future Directions: We go beyond mere summarization of existing
literature to critically analyze the limitations present in current research on LLMs for algorithm design.
Furthermore, we highlight potential future research directions, including developing domain-specific
LLMs, exploring multi-modal LLMs, facilitating human-LLM interaction, using LLMs for algorithm
2
assessment and understanding LLM behavior, advancing fully automated algorithm design, and
benchmarking for systematic evaluation of LLMs in algorithm design. This discussion is intended to
spur novel approaches and foster further advancements in the field.

2 Methodology and Taxonomy


2.1 Scope of Survey
This paper aims to conduct a systematic survey and classification of existing research works in the emerging
field of Large Language Model for Algorithm Design (LLM4AD). We do not intend to cover all the literature
on both LLMs and algorithms. We delineate the scope of our survey as follows:

∙ The term Large Language Models refers to language models of sufficient scale. These models typically
utilize a transformer architecture and operate in an autoregressive manner [188]. Studies employing
smaller models for algorithm design, such as conventional model-based and machine learning-assisted
algorithms [10], are excluded. Research utilizing other large models that lack language processing
capabilities, such as purely vision-based models, are not considered. However, multi-modal LLMs that
include language processing are within our scope.
∙ The term Algorithm in this context refers to a set of mathematical instructions or rules designed to
solve a problem, particularly when executed by a computer [26]. This broad definition encompasses
traditional mathematical algorithms [4], most heuristic approaches [108], and certain agents or policies
that can be interpreted as algorithms [165].

We introduce the detailed pipeline for paper collection and scanning, which consists of four stages:

∙ Stage I Data Extraction and Collection: We collect the related papers through Google Scholar, Web of
Science, and Scopus. The logic of our search is the title must include any combinations of at least one
of the following two groups of words “LLM”, “LLMs”, “Large Language Model”, “Large Language
Models” and “Algorithm”, “Heuristic”, “Search”, “Optimization”, “Optimizer”, “Design”, “Function”
(e.g., LLM and optimization, LLMs and algorithm). After removing duplicated papers, we ended up
with 850 papers as of July 1, 2024.
∙ Stage II Abstract Scanning: We check the title and abstract of each paper to efficiently exclude
irrelevant papers. The criteria used for exclusion include these papers that are not in English, not for
algorithm design and not using large language models. After scanning, 260 papers are remaining.
∙ Stage III Full Scanning: We thoroughly review each manuscript to exclude papers lacking relevant
content. After scanning, there are 160 papers left.
∙ Stage IV Supplementation: We append some related works manually according to our past knowledge
in this field to avoid missing any important contributions. After integrating the additional papers, we
got 180+ papers in the end.

We will first present an overview of the LLM4AD paper list and then present a taxonomy to systematically
review the progress. In addition to the organized list of papers, we also incorporate some important
publications released after July 1, 2024.
3
Stage I: Data Extraction and Collection
Date: 2020.1.1~2024 7.1
Key Words: Title = (LLM OR Large Language Model) AND (Algorithm OR Heuristic OR Search OR
Optimization OR Optimizer OR Design OR Function)
Database: Google scholar, Web of Science, Scopus
Results (remove duplication): 850 papers

Stage II: Abstract Scanning


Content: Title and Abstract
Exclusion Criteria: not English, not algorithm design, not using large language model
Remaining Results: 260 papers

Stage III: Full Scanning


Content: Full paper
Exclusion Criteria: Research relevant to the topic
Remaining Results: 160 papers

Stage IV: Supplementation


Additional pertinent papers gathered from experience
Final Results: 180 papers

Fig. 1. Four stages for paper collection.

2.2 Overview
Fig. 2a illustrates the trend in the number of papers published over time, with the timeline expressed in
months. The graph shows a marked rise in research activity related to LLM4AD, particularly noting that
most of the studies have been conducted in the last year. This suggests that LLM4AD is an emerging field,
and we expect a significant increase in research output in the near future as scholars from diverse fields
become aware of its considerable potential.
Fig. 2c and Fig. 2b display the leading institutions and their respective countries contributing to
publications on LLM4AD. The United States leads, closely followed by China, with these two countries
alone accounting for 50% of the publications. The next eight countries, including Singapore, Canada, and
Japan, collectively contribute one-third of the total publications. Prominent institutions involved in this
research include esteemed universities such as Tsinghua University, Nanyang Technological University, and
the University of Toronto, alongside major corporations like Huawei, Microsoft, and Google. This distribution
underscores the widespread interest in the research topics and their substantial relevance to practical
applications in the real world.
In Fig. 3, the word cloud is generated from the titles and abstracts of all reviewed papers, with each word
appearing at least five times. It showcases the top 80 keywords, organized into four color-coded clusters
on “language”, “GPT”, “search and optimization”, and “scientific discovery”. Several keywords such as
“evolution”, “strategy”, “optimizer”, and “agent” are also highlighted.

2.3 Taxonomy
This paper presents a taxonomy organized into four dimensions as shown in Fig. 4: 1) LLM Roles, 2) Search
Methods, 3) Prompt Techniques, and 4) Applications.
4
 'DWD3RLQWV
3RO\QRPLDO)LW
 0RQWKO\&RXQWV



1XPEHURI3DSHUV













 
 






 
 
 
 






























0RQWKV6LQFH

(a) Number of publications


2WKHUV $XVWUDOLD
,WDO\6ZLW]HUODQG
8. 
7VLQJKXD8QLYHUVLW\
*HUPDQ\ 1DQ\DQJ7HFKQRORJLFDO8QLYHUVLW\ 
8QLYHUVLW\RI7RURQWR 

 -DSDQ 8QLYHUVLW\RI:DVKLQJWRQ 
 +XDZHL 
 0LFURVRIW 
 
 3HNLQJ8QLYHUVLW\
6LQJDSRUH 6RXWKHUQ8QLYHUVLW\RI6FLHQFHDQG7HFKQRORJ\ 
*RRJOH'HHS0LQG 
 8QLYHUVLW\RI%ULWLVK&ROXPELD 
&LW\8QLYHUVLW\RI+RQJ.RQJ 
7KH&KLQHVH8QLYHUVLW\RI+RQJ.RQJ 
 6KDQJKDL-LDR7RQJ8QLYHUVLW\ 
 &DQDGD &KLQHVH$FDGHP\RI6FLHQFHV 
86$ &DUQHJLH0HOORQ8QLYHUVLW\ 
0LFURVRIW5HVHDUFK 
7KH+RQJ.RQJ3RO\WHFKQLF8QLYHUVLW\ 
 6WDQIRUG8QLYHUVLW\ 
9HFWRU,QVWLWXWH 
+RQJ.RQJ8QLYHUVLW\RI6FLHQFHDQG7HFKQRORJ\ 
       
&KLQD 1XPEHURI3XEOLFDWLRQV

(b) Country distribution (c) Institutions

Fig. 2. Overview on LLM4AD papers.

∙ LLM Roles: This dimension examines how LLMs are integrated into algorithm development, dividing
the workds into four categories: LLMs as Optimizers (LLMaO), LLMs as Predictors (LLMaP), LLMs
as Extractors (LLMaE), and LLMs as Designers (LLMaD).
∙ Search Methods: This category involves the search methods used including the basic sampling methods
and more complex approaches like evolutionary algorithms, and uncertainty-guided methods.
∙ Prompt Methods: The effectiveness of pre-trained LLMs is heavily influenced by prompt engineering.
We investigate the typical prompt strategies used in LLM4AD including zero-shot, few-shot, chain-of-
thought, self-consistency, and reflection prompts.
∙ Applications: The broad application of LLM4AD covers diverse fields. We identify the main domains
including optimization, machine learning, industry, and scientific discovery.

3 LLM Roles
According to the roles of LLM in algorithm design, existing works can be categorized into four classes:
LLMs as Optimizers (LLMaO), LLMs as Predictors (LLMaP), LLMs as Extractors (LLMaE), and LLMs as
5
Fig. 3. The word cloud is generated from the titles and abstracts of all reviewed papers, with each word appearing at least
five times. It features the top 80 keywords, organized into four color-coded clusters.

LLM Algorithm Application


LLM4AD

LLM Roles Search Methods Prompt Methods Applications

LLM as Optimizers Random Search Zero-shot Optimization

LLM as Predictors Single-point-based Few-shot Machine Learning


Search
LLM as Extractors Chain-of-thought Industry
Population-based
LLM as Designers Search Self-consistency Science Discovery
Uncertainty-
guided Search Reflection

Fig. 4. A four-dimensional taxonomy.

Designers (LLMaD). This section presents the progress and explore the advantages and limitations associated
with each category.

3.1 LLMs as Optimizers (LLMaO)


In LLMaO, LLMs are utilized as a black-box optimizer within an algorithm framework to generate and refine
solutions (Fig. 5). The integration of LLMs into optimization tasks leverages their ability to understand
and generate complex patterns and solutions and good flexibility [86, 92, 167]. These capabilities are often
6
challenging for traditional optimizers and models to match. However, the black-box nature of LLMs typically
results in a lack of interpretability and presents difficulties in large-scale generation [86].

Problem Algorithm Solution

Solution Solution

LLM

Fig. 5. Large Language Models as Optimizers (LLMaO). LLMs serve as optimizers within the algorithm to generate new
solutions. This typically involves using the LLM in an iterative search process to enhance solution quality. Here, the algorithm
and its parameters are usually crafted by humans.

One of the initial efforts to utilize LLMs as optimizers in algorithm design is by Yang et al. [167]. They
leverage the in-context learning capabilities of LLMs to generate novel solutions for specific problems
based on previously evaluated solutions. This method is applied iteratively to refine solutions further. Yang
et al. [167] have successfully demonstrated this technique across various domains, including continuous and
combinatorial optimization, as well as machine learning tasks.
From an evolutionary algorithm (EA) perspective, using LLMs to generate solutions from existing data
can be seen as analogous to search operators in EA. For instance, Liu et al. [86] introduce the use of
LLMs as evolutionary operators to tackle multi-objective problems. This method involves breaking down a
multi-objective problem into simpler single-objective tasks, with LLMs acting as black-box search operators
for each sub-problem to suggest new solutions. In a related study, Liu et al. [92] explore the integration of
LLMs within EAs, not just for generating solutions but also for guiding selection, crossover, and mutation
processes. Meanwhile, Brahmachary et al. [14] propose a new population-based evolutionary framework that
includes both exploration and exploitation pools, with solutions being exchanged during the optimization
process and LLMs generating solutions for both pools.
Differing from direct solution generation, Lange et al. [77] investigate the use of LLMs in designing
evolution strategies, introducing a new prompting strategy to enhance the mean statistic in their EvoLLM
method, which shows superior performance over baseline algorithms in synthetic black-box optimization
functions and neuroevolution tasks. They also demonstrate that fine-tuning LLMs with data from teacher
algorithms can further improve the performance of EvoLLM. Moreover, Custode et al. [28] present a
preliminary study that uses LLMs to automate hyperparameter selection by analyzing optimization logs
and providing real-time recommendations.
Beyond traditional optimization tasks, LLMaO has been widely adopted in prompt engineering for LLMs,
a process often referred to as “automatic prompt optimization” [192]. These methods primarily involve
iterative refinement of prompts by LLMs to improve their effectiveness for specific models (typically LLMs).
Techniques include resampling-based strategies, where LLMs generate variations of original prompts while
maintaining semantic similarity [156], and reflection-based strategies, where LLMs optimize by analyzing
and learning from previous prompt iterations or errors [50], have been explored. Ma et al. [99] note that
LLM optimizers often struggle to accurately identify the root causes of errors during the reflection process,
7
influenced by their pre-existing knowledge rather than an objective analysis of mistakes. To address these
issues, they propose a new approach termed “automatic behavior optimization”, aimed at directly and more
effectively controlling the behavior of target models.

3.2 LLMs as Predictors (LLMaP)


LLMaP employs LLMs as surrogate models (Fig. 6) to predict the outcomes or responses of solutions,
functioning either in a classification or regression context [56]. Compared to other model-based predictors,
such as the Gaussian process and conventional neural networks, 1) LLMs are capable of processing and
generating human-like responses. This capability allows them to understand and interpret complex patterns
in data, making them suitable for tasks where traditional modeling techniques might struggle due to the
intricacies in the data and complicated representations [36, 161]. 2) Pre-trained LLMs can significantly
reduce the computational load and time required compared to training high-fidelity models [56, 70].

Problem Algorithm Solution

Fitness /
Solution
Category

LLM

Fig. 6. Large Language Models as Predictors (LLMaP). LLMs are utilized iteratively in algorithms to predict a solution’s
outcomes or responses, typically functioning in classification or regression tasks.

The majority of LLMaP works use LLMs as pre-trained models to predict solution scores. For instance,
LLMs have been used as performance predictors for deep neural network architectures by Jawahar et al.
[70]. It offers a cost-effective alternative for performance estimation in neural architecture search. Zhang
et al. [185] introduce LINVIT, an algorithm that incorporates guidance from LLMs as a regularization
factor in value-based RL to improve sample efficiency. Science discovery is another domain that LLMaP
has commonly investigated. For example, Li et al. [82] introduce CodonBERT for sequence optimization of
mRNA-based vaccines and therapeutics. CodonBERT uses codons as inputs and is trained on over 10 million
mRNA sequences from various organisms. Soares et al. [138] demonstrate the use of LLMs in predicting
the performance of battery electrolytes. Other applications include employing LLMs to determine the fame
score of celebrities to predict the box office performance of projects in the motion pictures industry [6]
and adopting LLMs to score the video question answering by using detailed video captions as content
proxies [182].
For classification, Hao et al. [56] introduce LAEA, which employs LLMs as surrogate models within
evolutionary algorithms for both regression and classification, eliminating the need for costly model training.
In another study, Chen et al. [23] develope a label-free node classification method that leverages LLMs
to annotate nodes. These annotations are subsequently used to train graph neural networks, resulting in
enhanced performance. Moving beyond binary classification, Bhambri et al. [12] utilize LLMs to predict
discrete actions for constructing reward shaping functions in Reinforcement Learning (RL). Their method
demonstrate effectiveness within the BabyAI environment, showcasing the versatility of LLMs in various
8
settings. Wang et al. [155] explore the use of LLMs in federated search, applying them in a zero-shot
setting to effectively select resources. This approach highlights the potential of LLMs in improving resource
selection without prior explicit training on specific tasks. Lastly, Mehrdad et al. [106] focus on automating
the relevance judgment of query-item pairs in product searches. By finetuning LLMs, they have achieved
significant improvements, underscoring the adaptability and effectiveness of LLMs in complex search tasks.
Despite these advantages, using LLMs as predictors for algorithm design also suffers from a lack of
interpretability, and the results can be influenced by the quality of prompt engineering [23]. Moreover, when
the target landscape is easily understood, conventional machine learning techniques and surrogate models
are often preferred. In such scenarios, LLMs can enhance the utilization of existing modeling methods, for
example, by aiding in the selection of the most effective model [127].

3.3 LLMs as Extractors (LLMaE)


LLMaE employs LLMs to mine and extract embedded features or specific knowledge from target problems
and/or algorithms, which are then utilized in the enhancement of algorithm-based problem solving (Fig 7).
For example, Kristiadi et al. [76] use LLMs as pre-trained feature extractors and the embeddings are used
to enhance standard Bayesian optimization surrogate models in the molecular space.

Problem Algorithm Solution

Embedding

LLM

Fig. 7. Large Language Models as Extractors (LLMaE). LLMs are employed to extract features or specific knowledge from
target problem and/or algorithms to enhance problem-solving.

Beyond embedding-based feature extraction, LLMs excel in text comprehension and knowledge extraction,
allowing them to discern subtle patterns and relationships within the data that might not be evident
through conventional feature extraction methods. For example, Wu et al. [164] utilize LLMs to extract
high-dimensional algorithm representations by comprehending code text. These representations are combined
with problem representations to determine the most suitable algorithm for a specific problem. Du et al.
[32] propose a mixture-of-experts framework augmented with LLMs to optimize various wireless user
tasks. The LLM is used to analyze user objectives and constraints, thus selecting specialized experts, and
weighing decisions from the experts, reducing the need for training new models for each unique optimization
problem. Additionally, Becker et al. [9] discuss the use of LLMs in the automotive and supplier industries,
specifically focusing on retrieval-augmented generation systems to improve information retrieval from
technical documentation. Memduhoğlu et al. [107] use LLM to enhance the classification of urban building
functions by interpreting OpenStreetMap tags and integrating them with physical and spatial metrics.
Traditional techniques, which have previously struggled with semantic ambiguities, are outperformed by
LLMs due to their superior ability to capture broader language contexts.
9
The multi-modal capabilities of LLMs also distinguish them from traditional methods. Park et al. [120]
demonstrate that incorporating lexical information extracted from LLMs into an acoustic-based speaker
diarization system through a probabilistic multi-modal decoding process and beam searches can significantly
improve speech-processing performance.

3.4 LLMs as Designers (LLMaD)


LLMaD directly creates algorithms or specific components (Fig. 8). This utilization of LLMs extends their
application beyond traditional boundaries, enabling them to actively participate in algorithm development
by generating heuristics [88], writing code snippets [59], or formulating functions [128] that can be integrated
into algorithmic systems. By doing so, LLMs can significantly accelerate the algorithm design process, reduce
human effort, and bring creativity and optimization to algorithm development [88], which is difficult to
achieve through traditional methods.

Problem Algorithm Solution

Algorithm

LLM

Fig. 8. Large Language Models as Designers (LLMaD). LLMs are used to directly create algorithms or specific components,
which are commonly incorporated iteratively to continuously search for better designs.

Function design is among the early applications of LLMaD. Eureka [100] leverages the capabilities of
LLMs in code-writing, and in-context learning to evolve and optimize reward functions for RL. It can
generate reward functions without specific prompts or predefined templates, achieving better performance
than rewards designed by human experts. Similarly, Auto MC-Reward [80] utilizes LLMs to automatically
design dense reward functions for RL agents in environments with sparse rewards. The three key components
of Auto MC-Reward work together to iteratively refine the reward function based on feedback from the
agent’s interactions with the environment. Through this iterative process, the agent is able to learn complex
tasks more efficiently, as demonstrated in experiments in Minecraft. Moreover, FunSearch [128] adopts
LLMs for function generation in an evolutionary framework with a multi-island population management. It
demonstrates promising results on both mathematical problems and combinatorial optimization problems.
EoH [88], originates from AEL [87], presents an early attempt to adopt the LLM as a designer for automated
heuristic design. It uses both heuristic ideas and code implementations to represent heuristics and adopts LLM
in an evolutionary framework to create, combine, and revise the heuristics. EoH is applied on well-studied
combinatorial optimization problems, where it surpasses both traditional human-designed metaheuristics
and deep-learning-based neural solvers. The application of EoH has expanded to Bayesian optimization [171],
image adversary attack [49], and edge server task scheduling [172], among others. Moreover, LLaMEA [150]
develops an iterative framework to generate, mutate, and select algorithms based on performance metrics
and runtime evaluations. The automatically designed algorithms outperform state-of-the-art optimization
10
algorithms on some benchmark instances. ReEvo [176] introduces an evolutionary framework with both
short and long-term reflections, which provides a search direction to explore the heuristic space. With the
reflection, better results on black-box cases (no problem-specific information is provided in the prompts) have
been observed. Unlike previous studies that focus on optimizing a single performance criterion, MEoH [170]
considers multiple performance metrics, including optimality and efficiency, and seeks a set of trade-off
algorithms in a single run in a multi-objective evolutionary framework. A dominance-dissimilarity score is
designed for effectively searching the complex algorithm space.
LLM-based agent design has also gained much attention. For example, ADAS [63] proposes an automated
design of agentic systems, which aims to automatically generate powerful agentic system designs by using
meta agents that program new agents. They present a novel algorithm, meta agent search, which iteratively
creates new agents from an archive of previous designs, demonstrating through experiments that these
agents can outperform state-of-the-art hand-designed agents across various domains. Further studies on
LLM-based agent systems are discussed in [139] and [94].
There has been a considerable effort on using LLMs for optimization modeling. Ahmed and Choudhury
[3] assess various LLMs for transforming linguistic descriptions into mathematical optimization problems,
introducing a fine-tuning framework, LM4OPT, to enhance the performance of smaller models. Tang et al.
[145] outline four key training dataset requirements for operational research LLMs and propose OR-Instruct,
a semi-automated method for generating tailored synthetic data. Additionally, Mostajabdaveh et al. [111]
present a novel multi-agent framework that translates natural language into optimization models, utilizing
relational identifier agents and a verification mechanism for improved assessment.
We identify two primary challenges in existing works on LLMaD: 1) LLMs struggle with complex algorithm
design tasks. Most existing studies focus on developing specific algorithmic components, such as key heuristics
and code snippets, rather than complete algorithms [88]. 2) Algorithm design is domain-specific, and LLMs
typically possess limited knowledge about specific algorithm design tasks. Consequently, standalone LLMs
often fall short in terms of performance [183]. The effectiveness of LLMaD depends on search frameworks
that can iteratively interact with both LLMs and their environments. Details on the search methods are
provided in the next section.

4 Search Methods
Incorporating LLMs within a search framework significantly enhances LLMs’ utility in algorithm design,
making it the preferred option [88, 100, 128]. This section categorizes existing research according to the
search methods employed, presents recent advancements, and discusses some of the limitations.

4.1 Sampling
The most straightforward search manner is by repeatedly instructing LLM to sample new designs [192]. The
best sample is selected as the final design. However, recent studies have shown that simple sampling from
LLMs can be costly [183].

4.1.1 Beam Search. Beam search explores multiple promising paths within a search space. For example, in
the context of LLM-based prompt engineering [99, 122, 192], increasing the beam size has been shown to
significantly improve performance. Beyond prompt engineering. Beam search is also employed in a variety of
11
other applications. For instance, Text2Motion [84] uses beam search for robotic task planning. Similarly,
SayCanPay [57] applies it to plan action sequences in robotics.

4.1.2 MCTS. Monte Carlo Tree Search (MCTS) is a tree search algorithm that uses random sampling to
build a search tree and make decisions based on the results of these samples. It is particularly effective in
problems with large and complex search spaces. For example, Dainese et al. [29] propose GIF-MCTS, a
code generation strategy using MCTS, in which nodes in the tree are programs and edges are actions. Each
action taken from a parent node produces a new complete program. The upper confidence bound for trees
is used to select which action to take. Similarly, VerMCTS [15] constructs a search tree with progressive
widening to effectively manage large action spaces defined by lines of code. In this search tree, expansion
and evaluation are guided by the verifier, serving as a computationally inexpensive upper bound on the
value function. Moreover, Wang et al. [156] regard prompt optimization as a strategic planning challenge,
utilizing a principled planning algorithm based on MCTS to effectively explore the expert-level prompt
space. The proposed PromptAgent generates precise insights and detailed instructions by analyzing model
errors and providing constructive feedback.

4.2 Single-point-based Search


Single-point-based search methods iteratively refine a solution by leveraging neighborhood structures [90] or
specific search directions [122]. While these methods can produce satisfactory results within a reasonable
number of iterations, they may struggle with maintaining diversity and robustness in their search processes.

4.2.1 Hillclimb. Hillclimb search iteratively searches for better results, where the new one is generated based
on the last one and the better one is survived. A basic version of Hillclimb is investigated by Zhang et al.
[183] for algorithm design, where a heuristic is iteratively refined. The reasoning over the existing status is
usually adopted to enhance each search step. For example, Li et al. [80] leverage LLMs to automatically
design dense reward functions for RL agents in environments with sparse rewards, which iteratively refines
the reward function based on feedback from the agent’s interactions with the environment. Yang et al.
[169] develop a multi-module framework with innovative feedback mechanisms, demonstrating through both
LLM-based and expert evaluations that LLMs can effectively produce scientific hypotheses that are both
novel and reflective of reality.

4.2.2 Neighborhood Search. Neighborhood search methods explore the solution space by iteratively mod-
ifying a current solution to find an improved one in a structured neighborhood. Notably, the LLM-GS
framework [90] has introduced a scheduled neighborhood search method that leverages two distinct strategies:
1) Programmatic Neighborhood Generation, which involves generating a neighborhood of programs by
selecting a node from the abstract syntax tree of a given program and substituting it with a subtree that is
randomly generated following the production rules and sampling strategies of a domain-specific language;
and 2) Latent Space Representation, where the neighborhood is defined in a latent space by training a
variational autoencoder on a corpus of randomly generated DSL programs, thereby creating a more abstract
and potentially more informative neighborhood structure. Additionally, Jin et al. [73] contribute to this line
of research by defining neighborhood algorithms based on a minimum cosine similarity threshold of 60%
12
between the embeddings of a query vector and code snippets, enhancing the search process by ensuring that
algorithmic variations are semantically coherent and aligned with the underlying problem structure.

4.2.3 Gradient-based Search. Gradient-based search utilizes a (pseudo) gradient direction to generate new
designs in each iteration. Unlike searches in a continuous space with a differentiable objective, calculating
the gradient in the algorithm design space poses significant challenges. Consequently, a pseudo descent
direction is often employed. Pryzant et al. [122] introduce automatic prompt optimization, which improves
prompts automatically by adjusting them based on natural language “gradients” derived from training data.
Similarly, Tang et al. [143] introduce GPO, which draws parallels with gradient-based model optimizers for
prompt optimization. Moreover, Nie et al. [115] explore the use of LLMs as interactive optimizers for solving
maximization problems in a text space through natural language and numerical feedback. For an accurate
gradient, Guo et al. [51] perform the gradient-based search in a mapped continuous space and propose a
collaborative optimization strategy that combines a gradient-based optimizer with an LLM for tackling
complex non-convex optimization challenges. These works employ either pseudo gradients or gradients over
mapped text space, however, a systematic study of gradients over general algorithm space is anticipated.

4.2.4 Reinforcement Learning. Reinforcement learning learns to make decisions by interacting with an
environment to maximize rewards. Zhuge et al. [194] and Duan et al. [35] leverage LLMs and RL techniques
to optimize code, reducing complexity and enhancing efficiency. Similarly, Liu et al. [90] present a novel
LLM-guided search framework called LLM-GS, which aims at addressing the sample inefficiency in state-
of-the-art programmatic RL methods by utilizing the programming expertise of LLMs to boost search
efficiency. Additionally, LLMs have been utilized in the design of RL systems, particularly in the creation of
reward-shaping functions. Moreover, Bhambri et al. [12] use LLMs to generate a guide policy for constructing
reward-shaping functions to tackle the issue of sample inefficiency.

4.3 Population-based Search


Population-based evolutionary search is the main tool investigated in LLM4AD papers due to its effectiveness
and robustness in complex search space [117, 163]. The majority of these works use a simple genetic algorithm
with greedy population management [88]. Some of them investigate advanced population management
including multi-island [128], quality-diversity [63], and dominance-dissimilarity measurement [170].

4.3.1 Single-objective Evolutionary Search. Liu et al. [92] explore the use of LLMs as evolutionary operators
in conventional EA, guiding them in solution selection, crossover, and mutation processes. Moreover,
Brahmachary et al. [14] propose a population-based evolutionary framework comprising exploration and
exploitation pools, with LLMs facilitating solution generation for both pools and enabling communication
between them during optimization. Lange et al. [77] investigate the application of LLMs in designing
evolution strategies, introducing a prompting strategy to enhance mean statistic performance.
On algorithm design tasks involving text and code search, the complicated search space poses challenges
for diversity control and population management. The majority of works adopt a greedy way [20, 49, 59, 80,
88, 112, 150, 171]. In these works, a population of individuals is maintained and only the ones with better
fitness will survive. Romera-Paredes et al. [128] adopt a multi-island evolutionary algorithm to explore a
diverse population with a dynamically adjusted population size. Moreover, Wong et al. [160] use LLMs
13
to create digital artifacts for creative and engineering applications, utilizing a multi-factorial evolutionary
algorithm to drive an LLM. Quality diversity methods [123] have also been investigated for this purpose,
Lehman et al. [78] and Hu et al. [63] utilize the MAP-Elites algorithm with niches to generate diverse and
high-quality solutions.

4.3.2 Multi-objective Evolutionary Search. Liu et al. [86] propose MOEA/D-LMO to use LLM to solve
continuous multi-objective optimization problems. Benefiting from the decomposition-based framework, the
in-context learning of LLM is used to generate new solutions for each subproblem. For multi-objective LLM
instruction generation, Yang and Li [168] develop InstOptima, which utilizes an LLM to simulate instruction
operators and improve instruction quality to simultaneously optimize performance, length, and perplexity of
the instruction. Moreover, Morris et al. [110] introduce guided evolution for neural architecture design. It
adopts NSGA-II and optimizes both accuracy and model size. In order to enhance the diversity control in
multiobjective heuristic design, MEoH [170] introduce a dominance-dissimilarity score for multi-objective
heuristic design, achieving a good balance of exploration and exploitation in the heuristic search space.

4.4 Uncertainty-guided Search


Uncertainty-guided search combines Bayesian Optimal Experiment Design (BOED) or Bayesian Optimization
(BO) with Large Language Models (LLMs). It starts with a prior based on initial parameter beliefs, then
uses uncertainty-driven strategies to iteratively refine the beliefs, enhancing decision-making efficiency in
a sample-efficient way. This method has shown effectiveness in various applications. For example, Handa
et al. [54] develop OPEN, which uses LLMs to extract environmental features and translate feature queries
into conversational natural language questions, employing BOED to select the most informative queries for
learning user preferences efficiently. Austin et al. [8] introduce the PEBOL algorithm within a BO framework
to improve multi-turn and decision-theoretic reasoning, using Beta distributions to model user preferences
probabilistically. In LLM decoding, Grosse et al. [47] propose ULTS, which treats decoding as a tree-search
problem and optimizes path selection through sequential decision-making under uncertainty.

5 Prompt Strategies
Prompt strategies are vital for effectively utilizing LLMs, particularly in algorithm design tasks that require
instructions on reasoning and reflection on the target tasks. We will begin by providing an overview of LLMs
and the prompt strategies employed in existing works, followed by a review to works using each strategy.
Fig. 9a depicts the percentage of domain or pre-trained LLMs used in the literature. Among them, over
80% choose to use the pre-trained model without any specific finetuning, and about 10% fine-tuned the
pre-trained model on domain datasets [27] in which 4.4% are trained from scratch. LLM4AD papers show a
strong preference for GPT models. GPT-4 and GPT-3.5 are the most commonly utilized LLMs, collectively
accounting for approximately 50%. Llama-2 is the most used open-source LLM. Once we have the pre-trained
LLMs, prompt engineering is significant for effectively integrating LLMs into algorithm design. LLM4AD
papers involve many prompt engineering methods (as illustrated in Fig. 9b) including Zero-Shot (ZS),
Few-Shot (FS), Chain-of-Thought (CoT), Self-Consistency (SC), and Reflection (RF) [130].
14
 




1XPEHURI3XEOLFDWLRQV





 



7K  =6

7

RQV  5)

&
)6

&R

 6
RW

W

Q

QF\
R

KW

WLR
6K

6K

RXJ

LVWH
F
)HZ

IOH
UR
=H

5H
RI

IF
DLQ

6HO
&K
(a) LLMs (b) Prompt engineering techniques

Fig. 9. LLM types and prompt engineering methods

5.1 Zero-Shot
Zero-Shot (ZS) prompting enables a model to comprehend and perform tasks without prior specific training
on those tasks. In algorithm design, zero-shot prompting allows direct requests to the LLM for solutions
or responses, examples including conversational replies [132], assertion design [109], RL guidance [185],
and molecule generation [173]. The model leverages its pre-trained knowledge to generate these responses.
Although this method is advantageous for swiftly producing solutions based on general knowledge, it may
not consistently deliver tailored responses to the nuanced demands of complex algorithmic challenges [183].

5.2 Few-Shot
Few-Shot (FS) prompting involves providing the model with a handful of examples to illustrate the task before
it attempts to solve a similar problem. This approach enhances the model’s understanding of the context
and its ability to tailor responses to specific tasks. In algorithm design, this method proves particularly
effective by presenting the model with examples of algorithms [87], solutions [167], prompts [143, 156], and
codes [101, 110]. The examples used in the prompt may be manually-designed seeds [63, 128] or derived
from LLM-generated designs [28, 88, 98, 110, 143, 156, 160]. These samples are typically sorted to suggest a
preference to enhance performance [128, 167].

5.3 Chain-of-Thought
Chain-of-Thought (CoT) prompting encourages the model to articulate intermediate steps or reasoning
paths that lead to the final answer. This technique is particularly valuable in algorithm design, where
understanding the step-by-step process or engaging in instructed reasoning over existing designs helps
prevent outlier results and fosters effective algorithm development. For instance, Liu et al. [88] introduce
several prompting strategies, one of which directs the LLM to reason over existing heuristics, summarize
the general pipeline, and design a new one based on this reasoning. Custode et al. [28] prompt the LLM to
evaluate whether the current step size is sufficient for convergence and to suggest adjustments based on
15
its reasoning. In addition, multiple LLM-based agents are employed within a CoT framework by Sun et al.
[140] to enhance heuristic design for a boolean satisfiability problem solver.

5.4 Self-consistency
Self-Consistency (SC) involves generating multiple answers or solutions to the same prompt and then
synthesizing them to improve accuracy and reliability. In algorithm design, this could mean prompting
the model multiple times for different approaches to solve a problem and then comparing these solutions
to identify the most efficient or robust algorithm. This approach leverages the model’s ability to generate
diverse solutions and builds a more comprehensive understanding of possible strategies. For example, Guan
et al. [48] set the number of responses as 10 and uses the majority vote to get the predicted label. As each
response may provide different keywords or regular expressions, it takes the union of the keywords or regular
expressions to create a candidate set. Moreover, Lehman et al. [79] sample multiple revised codes from the
same selected code and updates the population accordingly.

5.5 Reflection
Reflection (RF) in prompt engineering involves asking the model to critique or evaluate its own responses or
solutions. After generating an algorithm/solution, the model can be prompted to reflect on the efficiency,
potential flaws, or improvements. Ma et al. [99] investigate both implicit and explicit reflection in LLM-driven
prompt optimization, which lets LLM analyze the errors and generate a reflection or feedback regarding the
current prompt. An additional study by Ye et al. [175] incorporates short-term and long-term reflections in
heuristic design, while Zhang et al. [179] focus on self-evaluation and reasoning over results from network
detection. Furthermore, both Ma et al. [100] and Narin [112] adopt LLM for RL reward function design with
a reward reflection that monitors the scalar values of all reward components and the task fitness function at
various policy checkpoints during training.

6 Applications
6.1 Optimization
In this subsection, we delve into the practical applications of LLMs in algorithm design for optimization.
We categorize the existing literature into combinatorial optimization, continuous optimization, Bayesian
optimization, prompt optimization, and optimization modeling based on the specific domains of optimization
applications. Then we proceed to compare the various roles played by LLMs, the prompt strategies employed,
and the specific problems or tasks to which they are applied. The comparative analysis is summarized in
Table 1, where we list the names of the frameworks or methods proposed by the authors. For studies that
do not explicitly name their methods, we assign appropriate designations in our article and denote them
with asterisks for easy reference (e.g., MH-LLM* for [131]).

6.1.1 Combinatorial Optimization. In the domain of Combinatorial Optimization (CO), automated algorithm
heuristics design has been a significant area of interest for a long time. The Traveling Salesman Problem
(TSP) stands out as one of the most renowned CO problems, involving the quest for the shortest route to
visit all specified locations exactly once and return to the starting point. Some recent work leverages LLMs
to evolve algorithms within Evolutionary Computation (EC) framework, such as AEL [87], ReEvo [176] and
16
EoH [88]. Differently, OPRO [167] employs LLMs as optimizers with a proposed meta-prompt, in which
the solution-score pairs with task descriptions are added in each optimization step. Additionally, LMEA
[91] investigates the utilization of LLMs as evolutionary combinatorial optimizers for generating offspring
solutions, wherein a self-adaptation mechanism is introduced to balance exploration and exploitation. The
Capacitated Vehicle Routing Problem (CVRP) extends the TSP by introducing constraints related to vehicle
capacity. To address this challenge, MLLM [68] devises a multi-modal LLM-based framework with textual
and visual inputs to enhance optimization performance. In addition to routing problems, other combinatorial
optimization problems that have also been investigated include cap set [128], bin packing [88], flow job shop
scheduling [88] and social networks problems [131].

6.1.2 Continuous Optimization. In the realm of single-objective continuous optimization, LLaMEA [150]
utilizes LLMs to automate the evolution of algorithm design. It demonstrates the effectiveness in generating
new metaphor-based optimization algorithms on BBOB benchmark [55] within IOHexperimenter bench-
marking tool [30], which supports evaluating the quality of the generated algorithms and also provides
feedback to the LLM during evolution. Instead of creating new algorithms, EvolLLM [77] introduces a
prompt strategy that enables LLM-based optimization to act as an Evolution Strategy (ES) and showcases
robust performance on synthetic BBOB functions and neuroevolution tasks. OPRO [167] illustrates that
LLMs can effectively capture optimization directions for linear regression problems by leveraging the past
optimization trajectory from the meta-prompt. Additionally, LEO [14] devises an explore-exploit policy
using LLMs for solution generation, the method has been tested in both benchmark functions as well as
industrial engineering problems. Different with directly employing LLMs for generating solutions, LAEA [56]
introduces LLM-based surrogate models for both regression and classification tasks and has been validated
on 2D test functions using nine mainstream LLMs.
Multi-objective optimization problems (MOPs) seek to identify a set of optimal solutions, referred to
as Pareto optimal solution set. An initial exploration of utilizing LLMs to tackle MOPs is introduced in
MOEA/D-LMO [86]. Benefiting from the decomposition-based framework, the in-context learning process
of LLMs is easily incorporated to generate candidate solutions for each subproblem derived from the
original MOP. In the realm of large-scale MOPs, LLM-MOEA* [136] showcases the inferential capabilities
of LLMs in multi-objective sustainable infrastructure planning problem. The study highlights the LLM’s
proficiency in filtering crucial decision variables, automatically analyzing the Pareto front, and providing
customized inferences based on varying levels of expertise. Additionally, CMOEA-LLM [158] leverages LLMs
with evolutionary search operators to address the challenges of constrained MOPs and exhibits robust
competitiveness in DAS-CMOP test suite [39].

6.1.3 Bayesian Optimization. Bayesian optimization (BO) is a model-based optimization paradigm for
solving expensive optimization problems and has found wide applications in various real-world scenarios
[44]. It typically employs a surrogate model to approximate the expensive function and well-designed
Acquisition Functions (AFs) to select potential solutions in a sample-efficient manner. To facilitate the direct
generation of solutions using LLMs, HPO-LLM* [181] provides LLMs with an initial set of instructions that
outlines the specific dataset, model, and hyperparameters to propose recommended hyperparameters for
evaluation in Hyperparameter Optimization (HPO) tasks. Furthermore, LLAMBO [93] incorporates LLM
capabilities to enhance BO efficiency, in which three specific enhancements throughout the BO pipeline
17
have been systematically investigated on tasks selected from Bayesmark [148] and HPOBench [37]. Instead
of utilizing LLMs for direct solution generation, BO-LIFT* [125] utilizes predictions with uncertainties
provided by a Language-Interfaced Fine-Tuning (LIFT) framework [31] with LLMs to perform BO for
catalyst optimization using natural language. EvolCAF [171] introduces a novel paradigm integrating LLMs
within EC framework to design AFs automatically for cost-aware BO, the approach showcases remarkable
efficiency and generalization for synthetic functions and HPO tasks with heterogeneous costs. Similarly,
FunBO [1] discoveres novel and well-performing AFs for BO by extending FunSearch [128], the discovered
AFs are evaluated on various global optimization benchmarks in and out of the training distribution and
HPO tasks for RBF-based SVM and AdaBoost algorithms.

6.1.4 Prompt Optimization. Prompt optimization aims to identify the most effective task prompt that
maximizes the performance of the LLM on a specific task dataset. Despite of requiring specialized training
for each specific task, traditional discrete or continuous approaches [83, 121] typically necessitate access to
the logits or internal states of LLMs, which may not be applicable when the LLM can only be accessed
through an API. To address these issues, recent works propose to model the optimization problem in
natural language with LLMs as prompts. APE [192] utilizes the LLM as an inference model to generate
instruction candidates directly based on a small set of demonstrations in the form of input-output pairs. This
approach has demonstrated human-level performance on various tasks, including Instruction Induction [60]
and Big-Bench Hard (BBH) [141]. OPRO [167] enables the LLM as an optimizer to gradually generate new
prompts based on the full optimization trajectory, the optimizer prompt showcases significant improvement
compared with human-designed prompts on BBH and GSM8K [25]. Inspired by the numerical gradient
descent method, APO [122] conducts textual “gradient descent” by identifying the current prompts’ flaws
and adjusting the prompt in the opposite semantic direction of the gradient. Similar practices are also
found in the gradient-inspired LLM-based prompt optimizer named GPO [143], as well as the collaborative
optimization approach [52] integrating a gradient-based optimizer and an LLM-based optimizer. Differently,
Guo et al. [50] introduce a discrete prompt tuning framework named EvoPrompt that prompts LLM to act
like evolutionary operators in generating new candidate prompts, harnessing the benefits of evolutionary
algorithms that strike a good balance between exploration and exploitation. StrategyLLM [45] integrates
four LLM-based agents—strategy generator, executor, optimizer, and evaluator—to collaboratively induce
and deduce reasoning. This method generates more generalizable and consistent few-shot prompts than CoT
prompting techniques.

6.1.5 Optimization Modeling. Optimization modeling [11] aims to construct a mathematical model of a
real-world optimization problem in a standard form, represented with an objective function and a set
of constraints [2, 145]. Recently, LLMs have opened up promising avenues for automating optimization
modeling, facilitating the formulation of problems and the implementation of solutions, thereby democratizing
expertise in optimization skills and domain knowledge in various applications. For example, OptiMUS
[2] is developed as an LLM-based agent to formulate and solve optimization problems interpreted from
natural language. Despite its outstanding performance on the proposed NLP4LP dataset [2], the heavy
reliance on sensitive data submission to proprietary LLMs can pose data privacy concerns in industrial
applications. To solve this issue, Tang et al. [145] propose training open-source LLMs with OR-Instruct,
a semi-automated process for generating synthetic data customized to meet specific requirements. The
18
best-performing resulting model named ORLM demonstrates state-of-the-art performance on NL4Opt [124],
MAMO [67], and the proposed IndustryOR benchmark. A related study is found in LM4OPT [3], which
is a progressive fine-tuning framework to enhance LLMs’ specificity for optimization problem formulation
task. To make it accessible to decision makers lacking problem-related modeling skills, DOCP [159] aids
in decision-making by interacting with the user in natural language, and utilizes human’s knowledge and
feedback to create and solve a problem-specific optimization model.

Table 1. An Overview of Optimization Applications Utilizing Language Models Across Various Domains and Tasks.

Application Method Role of LLM Prompt Strategy Specific Problems or Tasks


AEL [87] LLMaD FS, CoT TSP
ReEvo [176] LLMaD CoT, RF TSP
OPRO [167] LLMaO FS TSP
Combinatorial Optimization
LMEA [91] Mixed FS TSP
MLLM [68] Mixed ZS, FS CVRP
FunSearch [128] LLMaD FS Cap Set Problem, Online BPP
EoH [88] LLMaD FS, OS, CoT TSP, Online BPP, FSSP
MH-LLM* [131] LLMaD ZS, FS Social Networks Problem
LLaMEA [150] LLMaD CoT BBOB
EvoLLM [77] LLMaO FS BBOB, Neuroevolution
OPRO [167] LLMaO FS Linear Regression
Continuous Optimization
LEO [14] LLMaO FS Numerical Benchmarks, Industrial Engineering Problems
LAEA [56] LLMaP FS, CoT Ellipsoid, Rosenbrock, Ackley, Griewank
MOEA/D-LMO [86] LLMaO FS ZDT, UF
LLM-MOEA* [136] LLMaE ZS Multi-objective Sustainable Infrastructure Planning Problem
CMOEA-LLM [158] LLMaO FS DAS-CMOP
HPO-LLM* [181] LLMaO FS HPOBench
LLAMBO [93] Mixed FS, ZS, CoT Bayesmark, HPOBench
Bayesian Optimization
BO-LIFT* [125] LLMaD FS Catalyst Optimization
EvolCAF [171] LLMaD FS, OS, CoT Synthetic Functions, HPO
FunBO [1] LLMaD FS Synthetic Functions, HPO
APE [192] LLMaO FS Instruction Induction, BBH
OPRO [167] LLMaO FS GSM8K, BBH
Prompt Optimization APO [122] LLMaO ZS, FS NLP Benchmark Classification Tasks
GPO [143] LLMaO FS Reasoning, Knowledge-intensive, NLP Tasks
MaaO [52] LLMaO FS NLU Tasks, Image Classification Tasks
EvoPrompt [50] LLMaO CoT, FS Language Understanding and Generation Tasks, BBH
StrategyLLM [45] Mixed ZS, FS, CoT Reasoning Tasks
DOCP [159] LLMaO FS, CoT Facility Location Problems
Optimization Modeling OptiMUS [2] Mixed ZS, FS NL4LP
ORLMs [145] LLMaD ZS NL4Opt, MAMO, IndustryOR
LM4OPT [3] LLMaD FS, ZS NL4Opt
AS-LLM [164] LLMaE ZS Algorithm Selection
Other Applications LLaMoCo [101] LLMaD FS Optimization Code Generation

6.1.6 Other Applications. In addition to the previously mentioned optimization applications, we have
identified further uses of LLMs in related fields. For example, AS-LLM [164] utilizes LLMs to extract and
select key algorithm features for algorithm selection, enhancing the match between algorithms and problems.
LLaMoCo [101] introduces the first instruction-tuning framework for LLMs, optimizing code generation with
a structured instruction set and a two-phase training approach, thereby minimizing the need for extensive
domain expertise and prompt design effort.
19
6.2 Machine Learning
In this subsection, we investigate the applications of LLMs in the machine learning domain, focusing on
their contribution to algorithmic design. These applications are summarized in Table 2.

6.2.1 Task Planning with LLMs. The advent of LLMs has garnered significant interest in their application
to planning tasks, owing to their remarkable generative capabilities of instructions in the form of natural
language. Pioneering work by Huang et al. [65] demonstrates the effectiveness of LLMs in leveraging world
knowledge to produce actionable plans for VirtualHome environments. This method was further enhanced
by Ahn et al. [5], who employed a value function to ground the world knowledge, enabling the derivation of
an optimal policy for robotic task execution in lifelike kitchen settings. In subsequent research, Huang et al.
[66] broaden the scope of grounded information in a more adaptive way. Moreover, Lin et al. [84] utilize
greedy search in combination with heuristic algorithms to refine the planning system. Distinctly, Singh
et al. [137] tackle planning challenges by integrating programming languages, thereby generating code to
streamline planning processes. A more comprehensive approach, named SayCanPay [57], is developed for
planning by considering both the affordance function and reward within the task framework.

6.2.2 Reinforcement Learning. Reinforcement Learning (RL) has been the de facto standard for sequential
decision-making tasks, and recently, the synergy between RL and LLMs has emerged as a novel trend in the
domain. This convergence mirrors the dynamics of task planning, yet places RL at the core of its methodology.
Many of the LLM4AD papers on RL is for automatically designing the reward functions [12, 100, 112]. In
addition, Shah et al. [133] investigate the employment of LLMs for heuristic planning to steer the search
process within RL frameworks. Zhang et al. [185] integrate LLMs into RL by introducing a Kullback-Leibler
divergence regularization term that aligns LLM-driven policies with RL-derived policies. LLMs have also
extended their reach to multi-agent RL scenarios, as shown by Du et al. [32], who illustrates their application
within a Mixture-of-Experts system to direct RL models in the realm of intelligent network solutions.

6.2.3 Neural Architecture Search. Neural Architecture Search (NAS), which is a significant focus within the
AutoML community, has been investigated in many LLM4AD papers. For example, Chen et al. [20] have
integrated LLMs with evolutionary search to successfully generate NAS code for diverse tasks. Nasir et al.
[113] introduce a quality-diversity algorithm tailored to NAS, producing architectures for CIFAR-10 and
NAS-bench-201 benchmarks. Moreover, Morris et al. [110] introduce guided evolution for the development
of neural architectures and suggest the concept of the evolution of thought in algorithm design. Except
for using LLM for design, Jawahar et al. [70] employ LLMs in predicting NAS performance, combining
this approach with evolutionary search to effectively create novel network architectures. In contrast to the
LLM-based architecture design and performance prediction, Zhou et al. [191] explore the adoption of LLMs
for transferring design principles to narrow and guide the search space.

6.2.4 Graph Learning. Graph learning is another application with the advancing capabilities of LLMs in
symbolic reasoning and graph processing. For example, Chen et al. [23] apply LLMs to the task of labeling
in Text-Attributed Graphs (TAGs), capitalizing on the language task proficiency of LLMs. Both Mao et al.
[104] and Chen et al. [21] adopt LLMs in an evolutionary framework for designing functions. The former
evolves heuristic code functions to identify critical nodes in a graph while the latter identifies meta-structures
20
within heterogeneous information networks to enhance their interpretability. Moreover, knowledge graphs
have also seen substantial benefits from the application of LLMs. Zhang et al. [184] introduce AutoAlign, a
method that employs LLMs to semantically align entities across different knowledge graphs, and Feng et al.
[41] develop the knowledge search language to effectively conduct searches within knowledge graphs.

6.2.5 Dataset Labeling. LLMs have been used for mining semantic and multi-modal information from
datasets. LLMs are employed to train interpretable classifiers to extract attributes from images [24] and to
generate label functions for weakly supervised learning [48].

Table 2. An Overview of Machine Learning Applications Utilizing Language Models Across Various Domains and Tasks.

Application Method Role of LLM Prompt Strategy Specific Problems or Tasks


Zero-Planner [65] LLMaP ZS, FS VirtualHome Tasks
SayCan [5] LLMaP FS Real-world Kitchen Tasks with Robots
Task Planning LLM-GM [66] LLMaP FS Real-world Kitchen Tasks with Robots
Text2Motion [84] LLMaP FS Long Sequence Tasks for Robots
ProgPrompt [137] LLMaD FS, COT VirtualHome Tasks
SayCanPay [57] LLMaP FS VirtualHome Tasks, Ravens Tasks, BabyAI Tasks
LFG [133] LLMaP FS, CoT ObjectNav Tasks, Real-world Tasks
SLINVIT [185] LLMaP ZS, FS ALFWorld Tasks, InterCode Tasks, BlocksWorld Tasks
Reinforcement Learning MEDIC [12] LLMaP ZS BabyAI Tasks
Eureka [100] LLMaD FS, CoT IsaacGym Tasks, Bidexterous Manipulation Tasks
EROM [112] LLMaD ZS, CoT IsaacGym Tasks
LLM-MOE [32] LLMaP FS, CoT Intelligent Networks
EvoPrompting [19] LLMaD FS, ZS, CoT MNIST Dataset, CLRS Algorithmic Reasoning
Neural Architecture Search HS-NAS [70] LLMaP FS, ZS, CoT Machine Translation Tasks
LLMatic [113] LLMaD FS, ZS, CoT CIFAR-10 Dataset, NAS-bench-201 Benchmarks
LAPT [191] LLMaD ZS, FS NAS201, Trans101, DARTs
LLM-GE [110] LLMaD FS, ZS, CoT CIFAR-10 Dataset
LLM-Critical [104] LLMaD FS, ZS, CoT Critical Node Identification
LLM-GNN [23] LLMaP FS, CoT Label-free Node Classification
Graph Learning
ReStruct [21] LLMaE FS, ZS Meta-structure Discovery
AutoAlign [184] LLMaE FS, ZS Entity Type Inference
KSL [41] LLMaP FS Knowledge Search
Inter-Classier [24] LLMaO FS, ZS iNaturalist Datasets, KikiBouba datasets
Dataset Labeling
DataSculpt [48] LLMaP FS Label Function Design
LLM2FEA [160] LLMaP FS, CoT Objective-oriented Generation
Other Applications tnGPS [178] LLMaD FS, OS, CoT Tensor Network Structure Search
L-AutoDA [49] LLMaD FS, OS, CoT Adversarial Attack

6.2.6 Other Applications. Other applications of LLMs extend to a myriad of machine learning tasks. Notably,
Wong et al. [160] capitalize on multi-task learning and the iterative refinement of prompts to foster innovative
design approaches. Zeng et al. [178] integrate LLMs with evolutionary search to develop a heuristic function
aimed at efficiently sifting through candidate tensor sets representing the primal tensor network. Furthermore,
Guo et al. [49] have blazed a trail in employing LLMs to generate novel decision-based adversarial attack
algorithms, thus opening up a new diagram for the automatic assessment of model robustness.

6.3 Science Discovery


This subsection is dedicated to exploring LLM-based scientific discoveries related to algorithm designs.
Table. 3 lists the related works in this domain.
21
6.3.1 General Science Discovery. In the realm of scientific discovery, LLMs are usually adopted for equation
or functioin search. Notably, Du et al. [34] introduce LLM4ED, a framework that employs iterative strategies,
including a black-box optimizer and evolutionary operators, to generate and optimize equations. This
approach has shown significant advancements in stability and usability for uncovering physical laws from
nonlinear dynamic systems. Similarly, Shojaee et al. [135] present LLM-SR, which combines LLMs’ extensive
scientific knowledge and code generation capabilities with evolutionary search. This framework excels in
proposing and refining initial equation structures based on physical understanding, outperforming traditional
symbolic regression methods in discovering physically accurate equations across multiple scientific domains. A
bilevel optimization framework, named SGA, is introduce by Ma et al. [98]. It merges the abstract reasoning
capabilities of LLMs with the computational power of simulations. This integration facilitates hypothesis
generation and discrete reasoning with simulations for experimental feedback and optimization of continuous
variables, leading to improved performance.

6.3.2 Chemistry, Biology & Physics. In the field of chemistry, LLMs can be applied not only to conventional
molecular generation and design [13, 69], but also to specialized areas such as drug molecule design [173],
chemical reaction prediction and optimization [126], and catalyst design [153], providing customized solutions.
Furthermore, LLMs have also shown promising application prospects in materials discovery [71, 180], synthesis
route planning [89], green chemistry [129], and other areas. These studies demonstrate the advantages of
LLMs in molecular representation and optimization.
In biology, LLMs are increasingly being used for tasks such as protein engineering [97, 134], enzyme
design [146], and biological sequence analysis [42]. By combining LLMs with vast amounts of biological data,
they can more accurately predict interactions between biological molecules and significantly improve the
efficiency of bioinformatics workflows. This has important implications for drug discovery and therapeutic
protein design. The unique sequence generation and optimization capabilities of LLMs offer new possibilities
for tackling combinatorial optimization challenges in biomacromolecular design.
Although applications in physics are relatively fewer, some emerging work has demonstrated the broad
prospects of LLMs. Pan et al. [119] use multi-step prompt templates to prove that LLMs can perform complex
analytical calculations in theoretical physics. Quantum computing algorithm design, physics simulation
optimization, and computational methods in condensed matter.

6.3.3 Mechanics. MechAgents [114] introduces a class of physics-inspired generative machine learning
platforms that utilize multiple LLMs to solve mechanics problems, such as elasticity, by autonomously
collaborating to write, execute, and self-correct code using finite element methods. Moreover, Du et al. [33]
adopt LLMs in automatically discovering governing equations from data, utilizing the models’ generation and
reasoning capabilities to iteratively refine candidate equations. This approach, tested on various nonlinear
systems including the Burgers, Chafee–Infante, and Navier–Stokes equations, demonstrates a superior ability
to uncover correct physical laws and shows better generalization on unseen data compared to other models.
Another example in fluid dynamics has been investigated by Zhu et al. [193], which introduces FLUID-LLM,
a novel framework that integrates pre-trained LLMs to enhance the prediction of unsteady fluid dynamics.
AutoTurb [186], on the other hand, adopts LLMs in an evolutionary framework to automatically design and
search for turbulence model in computational fluid dynamics. Furthermore, Buehler [17] study LLM-based
22
methods for forward and inverse mechanics problems including bio-inspired hierarchical honeycomb design,
carbon nanotube mechanics, and protein unfolding.

Table 3. An Overview of Science Discovery Applications Utilizing Language Models Across Various Domains and Tasks.

Application Method Role of LLM Specific Problems or Tasks


Bilevel [98] LLMaD Physical Scientific Discovery
General Scientific Equation Discovery LLM-SR [135] LLMaD Scientific Equation Discovery
LLM4ED [34] LLMaD Equation Discovery
ChatChemTS [69] LLMaD Molecule Design
Debjyoti Bhattacharya, et al. [13] LLMaO Molecule Design
Agustinus Kristiadi, et al. [76] LLMaE Molecule Design
Chemical
BoChemian [126] LLMaE Chemical Reaction
Multi-modal MoLFormer [138] LLMaP Battery Electrolytes Formulation Design
CataLM [153] LLMaP Catalyst Design
Gavin Ye [173] LLMaD Drug Design
DrugAssist [174] LLMaO Drug Design
MLDE [147] LLMaP Protein Design
Prollama [97] Mixed Protein Design
Biology
X-LoRA [16] LLMaP Protein Design
CodonBERT [82] LLMaP mRNA Design and Optimization
Revisiting-PLMs [62] LLMaP Protein Function Prediction
FLUID-LLM [193] LLMaP Computational Fluid Dynamics
MechAgents [114] LLMaD Mechanics Design
Mechanics
MeLM [17] LLMaP, LLMaD Carbon Nanotubes and Proteins Design
Mengge Du, et al. [33] LLMaD Nonlinear Dynamics Equation Discovery
AutoTurb [186] LLMaD Computational Fliud Dynamics

6.4 Industry
This subsection explores the transformative impact of LLMs on algorithm design across various industries.
As we envision the future of manufacturing, LLMs are poised to play a pivotal role. Existing developments
focus on enhancing research and investment to refine these models while assessing their performance in
terms of safety, bias, and utility. LLMs can be seen as a gateway between humans and machines, facilitating
informed decision-making.
For instance, LLMs are instrumental in building a comprehensive intelligent 6G network system [95],
addressing challenges such as low latency, high-frequency bands, high bandwidth, high transmission rates,
and overall intelligence [190]. Moreover, LLMs have the potential to revolutionize the telecom industry by
streamlining tasks and reducing the need for extensive manpower and engineering expertise. However, it is
crucial to address existing challenges to fully realize their potential [102].
Recently, the application of LLMs in Electronic Design Automation (EDA) has emerged as a promising
area of exploration. LLM-based solutions have been developed to enhance interactions with EDA tools [40].
In particular, within the VLSI design flow, it is essential to verify that the implementation conforms to its
specifications. To achieve this, system verilog assertions need to be generated automatically. Generative AI
techniques, such as LLMs, can be effectively utilized to facilitate this process [109].
To enhance the reliability and availability of cloud services, conducting root cause analysis (RCA) for
cloud incidents is essential. RCACopilot [22] effectively matches incoming incidents with the appropriate
23
incident handlers based on their alert types. It aggregates critical runtime diagnostic information, predicts the
incident’s root cause category, and provides explanatory results without the need for manual investigations.
More applicable scenarios include various industrial design challenges, such as chip design [18], car shape
design [161], aircraft concept design [96] UI design [43], and robot design [177]. LLMs primarily leverage
billions of web resources. However, practical applications require fine-tuning with thousands of specific
representations of design intent to recognize the distinct patterns associated with each design task. Other
traditional industrial tasks, such as itinerary planning, can also be enhanced by leveraging the power of
LLMs [166]. LLM-based open-domain urban itinerary planning [144] combines spatial optimization with
LLMs to create detailed urban itineraries tailored to users’ requests.

7 Challenges of LLMs in Algorithm Design


7.1 Scalability
One of the primary limitations of LLMs in algorithm design is their scalability. LLMs are limited by a fixed
context size, restricting the amount of information they can process at once, which is a challenge for complex
algorithmic tasks requiring detailed specifications and extensive content. Additionally, even when input and
output lengths are sufficient, LLMs struggle with comprehension and reasoning over long inputs, impacting
their effectiveness in designing complex algorithms. For instance, LLMs tend to solve only low-dimensional
problems when used as optimizers [167], and they are employed to design a function or heuristic component
of an algorithm rather than the entire algorithm when used as designers [88].

7.2 Interpretability
Interpretability is another challenge when using pre-trained LLMs for algorithm design. It is difficult to
trace how specific inputs lead to particular outputs in LLMs due to their black-box nature. This opacity can
be problematic, especially in critical applications, e.g., some industry applications, where understanding the
decision-making process is essential for trust and reliability. Efforts to enhance the interpretability involve
techniques such as feature visualization, model simplification [86], and the study of the contribution of different
parts of the data to the model’s behavior [7]. Despite these efforts, achieving principled interpretability
without compromising the performance of LLMs remains a key area of ongoing research.

7.3 Security
Integrating LLMs into algorithm design introduces significant security risks, including the mishandling of
personal data and the potential creation of harmful code. LLMs are trained on extensive datasets that
often contain sensitive information, posing a risk of privacy breaches if these models inadvertently generate
outputs influenced by this data. Additionally, the autonomous nature of LLMs can lead to the unintentional
development of flawed or malicious algorithms, which is a critical concern in sectors like financial services and
personal data management [187]. To address these issues, it is vital to establish stringent data governance
and model training protocols that prevent the inclusion of sensitive data, rigorously test algorithms under
various conditions to prevent unintended behaviors, and continuously update security measures to counteract
emerging vulnerabilities.
24
7.4 Cost
The utilization of LLMs in algorithm design also faces challenges related to high cost in model training and
usage. Training domain LLMs requires significant computational resources, often involving extensive GPU
or TPU usage over several weeks or even months, which incurs high financial and time costs. Once trained,
the inference cost—the computational expense involved in applying the model to algorithm design—also
becomes a critical factor, especially when the model is used frequently, such as the evolutionary search used
in many LLM4AD works [128]. For practical applications in algorithm design, where multiple iterations and
tests might be necessary, these costs can accumulate, making the use of LLMs potentially less viable for
researchers without substantial computational budgets.

7.5 Idea Innovation


Despite extensive research on LLM4AD, their capacity for generating novel ideas remains questionable.
LLMs are adept at interpolating within their training data but often falter in extrapolating to new scenarios.
In algorithm design, this means LLMs can suggest or slightly enhance existing algorithms but struggle
to develop radically new methods. Although there is evidence of LLMs producing new ideas in certain
tasks [171], consistently generating practical, innovative concepts continues to be a challenge [63].

8 Future Directions
8.1 Domain LLM for Algorithm Design
Instead of using a general pre-trained general-purpose LLM, it is worthwhile studying how to train an
LLM specifically for automatic algorithm design tasks. The following aspects can be explored in developing
domain LLMs: 1) Training domain LLMs are costly and resource-consuming. On the one hand, advanced
light fine-tuning techniques (e.g., LoRA [61]) could be used for efficient domain adaptation. On the other
hand, the size of algorithm LLMs for a specific application can be reduced with the help of domain data and
knowledge. 2) Generating and collecting domain data for algorithm design poses challenges. Unlike general
code generation or linguistic process tasks, there is no large and formatted data specific for algorithm design.
Some Github repositories on algorithms might be helpful, but the resources should be scanned and cleaned
for algorithm design. Lehman et al. [79] provide an example of generation domain data for LLMs, where the
samples generated during searching are used as the training dataset to fine-tune a pre-trained model resulting
in better performance on the target problem. 3) Instead of learning a text and code generation model, how
to learn the algorithm development ideas and the algorithmic reasoning ability remain unexplored [116, 157].

8.2 Multi-modal LLM for Algorithm Design


The primary focus of existing LLM4AD works is on utilizing the text comprehension and generation
capabilities of LLMs, whether in language, code, or statistics. One advantage of LLMs compared to
traditional model-based optimization is their ability to process multi-modal information like humans, which
is rarely studied. Some attempts have been conducted to showcase the advantages of incorporating multi-
modal information in algorithm design, as evidenced by Huang et al. [68] and Narin [112]. It is anticipated
that more methods and applications utilizing multi-modal LLMs will be developed in the future.
25
8.3 Interaction with Human Experts
Further research is needed to explore the interaction between LLMs and human experts in algorithm
design [81]. For example, in LLMaD works, LLMs can be seen as intelligent agents, making it feasible for
human experts to step in and take over tasks such as generating, modifying, and evaluating algorithms.
Investigating how to facilitate efficient and productive collaboration between LLMs and human experts
would be valuable. Ideas and techniques in collective intelligence [103] can be used for this purpose.

8.4 Multi-objective Algorithm Design


There are usually multiple criteria (e.g., optimal solution generated by algorithm, optimization efficiency, and
generalization performance) in real-world algorithm design. The majority of existing LLM4AD papers consider
one objective [88, 128] focusing only the optimal solutions. A preliminary attempt has been conducted
by Yao et al. [170], where a dominance-dissimilarity measurement, considering both the dominance relation
in objective space and the distance in code space, is designed for efficient multi-objective search in the
heuristics space. More future works are anticipated to develop new methods for effective multi-objective
algorithm design.

8.5 LLM-based Algorithm Assessment


LLMs can be benefitial in algorithm assessment. Some attempts have been carried out on test instance
generation and algorithm evaluation. For example, Jorgensen et al. [74] use LLMs to generate test cases of
increasing difficulties to enhance genetic programming and OMNI-EPIC [38] utilizes LLMs to automatically
produce code that defines the next engaging and learnable tasks for algorithm evaluation. Further research
in this area is anticipated.

8.6 Understand the Behavior of LLM


In the majority of works, LLM works as a black-box model. The interpretation of LLM’s behavior not
only enriches our understanding of LLM’s behavior but also benefits cases where it is challenging or costly
to directly request LLMs. Some attempts have been made to approximate and understand the in-context
learning behavior of LLM in solution generation. For example, Liu et al. [86] has designed a white-box linear
operator to approximate the results of LLM in multi-objective evolutionary optimization. In spite of these
preliminary attempts, how to interpret LLM’s behavior is still an open question in many algorithm design
cases including heuristic generation and idea exploration.

8.7 Fully Automatic Algorithm Design


Fully automatic algorithm design faces two primary challenges: 1) the generation of novel algorithmic ideas
and 2) the creation of complex, lengthy code. While the generation of new ideas has been investigated in
some works [88], the complete design of algorithms (instead of a heuristic component), remains a challenge.
Existing applications typically focus on automating components within predefined algorithm frameworks
rather than creating new algorithms from scratch. Future research needs to tackle these complexities to
advance the field of fully automatic algorithm design.

8.8 Benchmarking LLM4AD


Benchmarking allows for a fair, standardized, and convenient comparison, which can identify best practices
and enable generating new insights for innovation. While we are glad to witness the emergence of diverse
26
research works and applications, LLM4AD still lacks a benchmarking on either algorithm design test sets for
systematic and scientific assessment or a strict pipeline and evaluation criteria on large language models for
algorithm design. Several attempts have been made on related topics such as mathematical reasoning [85] and
planning [149]. [151] presents an algorithm test set built on some basic algorithms and [105] demonstrates the
superior performance of LLM when compared to neural algorithm reasoners. In the future, more benchmarks
are expected and they are going to play a crucial role in advancing LLM4AD.

9 Conclusion
This paper has provided a systematic and up-to-date survey on large language models for algorithm design
(LLM4AD). By systematically reviewing a significant corpus of main contributions in this emerging research
field, this paper not only highlights the current state and evolution of LLM applications in algorithm design
but also introduces a multi-dimensional taxonomy with comprehensive lists of papers that categorize the
roles and functionalities of LLMs in this domain. This taxonomy serves as a framework for both academic
and industrial researchers to understand and navigate the complex landscape of algorithm design using
LLMs. We have also identified the limitations and challenges currently faced by the field, such as issues
of scalability, interpretability, and security. Moreover, we have highlighted and discussed some promising
research directions to inspire and guide future studies.
As we look forward, the intersection of LLMs and algorithm design holds promising potential for
revolutionizing how algorithms are developed and implemented. We hope this paper can contribute to a
deeper understanding of this potential and set the stage for future innovations and collaborations in this
emerging avenue of research.

References
[1] Virginia Aglietti, Ira Ktena, Jessica Schrouff, Eleni Sgouritsa, Francisco JR Ruiz, Alexis Bellot, and Silvia Chiappa. 2024.
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch. arXiv preprint arXiv:2406.04824
(2024).
[2] Ali AhmadiTeshnizi, Wenzhi Gao, and Madeleine Udell. 2023. Optimus: Optimization modeling using mip solvers and
large language models. arXiv preprint arXiv:2310.06116 (2023).
[3] Tasnim Ahmed and Salimur Choudhury. 2024. LM4OPT: Unveiling the potential of Large Language Models in
formulating mathematical optimization problems. INFOR: Information Systems and Operational Research (2024),
1–14.
[4] Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, and Wenpeng Yin. 2024. Large language models for
mathematical reasoning: Progresses and challenges. arXiv preprint arXiv:2402.00157 (2024).
[5] Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana
Gopalakrishnan, Karol Hausman, Alexander Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan,
Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J. Joshi, Ryan Julian, Dmitry Kalashnikov,
Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao,
Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev,
Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, and Mengyuan Yan. 2022. Do As I Can, Not As I Say:
Grounding Language in Robotic Affordances. CoRR (2022).
[6] Mohammad Alipour-Vaezi and Kwok-Leung Tsui. 2024. Data-Driven Portfolio Management for Motion Pictures
Industry: A New Data-Driven Optimization Methodology Using a Large Language Model as the Expert. arXiv preprint
arXiv:2404.07434 (2024).
[7] Zeyuan Allen-Zhu and Yuanzhi Li. 2023. Physics of language models: Part 3.1, knowledge storage and extraction.
arXiv preprint arXiv:2309.14316 (2023).
[8] David Eric Austin, Anton Korikov, Armin Toroghi, and Scott Sanner. 2024. Bayesian Optimization with LLM-Based
Acquisition Functions for Natural Language Preference Elicitation. arXiv preprint arXiv:2405.00981 (2024).
27
[9] Robert Becker, Laura Steffny, Thomas Bleistein, and Dirk Werth. 2024. From Data to Design: LLM-enabled information
extraction across industries. atp magazin 66, 6-7 (2024), 80–87.
[10] Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. 2021. Machine learning for combinatorial optimization: a
methodological tour d’horizon. European Journal of Operational Research 290, 2 (2021), 405–421.
[11] John Berry and Ken Houston. 1995. Mathematical modelling. Gulf Professional Publishing.
[12] Siddhant Bhambri, Amrita Bhattacharjee, Huan Liu, and Subbarao Kambhampati. 2024. Efficient Reinforcement
Learning via Large Language Model-based Search. arXiv preprint arXiv:2405.15194 (2024).
[13] Debjyoti Bhattacharya, Harrison J Cassady, Michael A Hickner, and Wesley F Reinhart. 2024. Large Language Models
as Molecular Design Engines. Journal of Chemical Information and Modeling (2024).
[14] Shuvayan Brahmachary, Subodh M Joshi, Aniruddha Panda, Kaushik Koneripalli, Arun Kumar Sagotra, Harshil Patel,
Ankush Sharma, Ameya D Jagtap, and Kaushic Kalyanaraman. 2024. Large Language Model-Based Evolutionary
Optimizer: Reasoning with elitism. arXiv preprint arXiv:2403.02054 (2024).
[15] David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe Loughridge, Federico Cassano, Sabrina Ruixin
Hu, Jianang Yang, William E. Byrd, Robert Zinkov, and Nada Amin. 2024. VerMCTS: Synthesizing Multi-Step
Programs using a Verifier, a Large Language Model, and Tree Search. arXiv:2402.08147 [cs.SE]
[16] Eric L Buehler and Markus J Buehler. 2024. X-LoRA: Mixture of low-rank adapter experts, a flexible framework for
large language models with applications in protein mechanics and molecular design. APL Machine Learning 2, 2
(2024).
[17] Markus J Buehler. 2023. MeLM, a generative pretrained language modeling framework that solves forward and inverse
mechanics problems. Journal of the Mechanics and Physics of Solids 181 (2023), 105454.
[18] Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan,
Yunhao Zhou, et al. 2024. Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data
augmentation framework. arXiv preprint arXiv:2403.11202 (2024).
[19] Angelica Chen, David Dohan, and David So. 2023. EvoPrompting: Language Models for Code-Level Neural Architecture
Search. In Advances in Neural Information Processing Systems.
[20] Angelica Chen, David Dohan, and David So. 2024. EvoPrompting: language models for code-level neural architecture
search. Advances in Neural Information Processing Systems 36 (2024).
[21] Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, and Pan Hui. 2024. Large Language Model-driven
Meta-structure Discovery in Heterogeneous Information Network. In Proceedings of the 30th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining, KDD. ACM.
[22] Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming
Wen, et al. 2024. Automatic root cause analysis via large language models for cloud incidents. In Proceedings of the
Nineteenth European Conference on Computer Systems. 674–688.
[23] Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, and Jiliang Tang. 2024. Label-
free Node Classification on Graphs with Large Language Models (LLMs). In The Twelfth International Conference on
Learning Representations. https://openreview.net/forum?id=hESD2NJFg8
[24] Mia Chiquier, Utkarsh Mall, and Carl Vondrick. 2024. Evolving Interpretable Visual Classifiers with Large Language
Models. CoRR (2024).
[25] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry
Tworek, Jacob Hilton, Reiichiro Nakano, et al. 2021. Training verifiers to solve math word problems. arXiv preprint
arXiv:2110.14168 (2021).
[26] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2022. Introduction to algorithms. MIT
press.
[27] Chris Cummins, Volker Seeker, Dejan Grubisic, Baptiste Roziere, Jonas Gehring, Gabriel Synnaeve, and Hugh
Leather. 2024. Meta Large Language Model Compiler: Foundation Models of Compiler Optimization. arXiv preprint
arXiv:2407.02524 (2024).
[28] Leonardo Lucio Custode, Fabio Caraffini, Anil Yaman, and Giovanni Iacca. 2024. An investigation on the use of
Large Language Models for hyperparameter tuning in Evolutionary Algorithms. In Proceedings of the Genetic and
Evolutionary Computation Conference Companion. 1838–1845.
[29] Nicola Dainese, Matteo Merler, Minttu Alakuijala, and Pekka Marttinen. 2024. Generating Code World Models with
Large Language Models Guided by Monte Carlo Tree Search. arXiv preprint arXiv:2405.15383 (2024).
[30] Jacob de Nobel, Furong Ye, Diederick Vermetten, Hao Wang, Carola Doerr, and Thomas Bäck. 2024. Iohexperimenter:
Benchmarking platform for iterative optimization heuristics. Evolutionary Computation (2024), 1–6.
[31] Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papail-
iopoulos, and Kangwook Lee. 2022. Lift: Language-interfaced fine-tuning for non-language machine learning tasks.
Advances in Neural Information Processing Systems 35 (2022), 11763–11784.
28
[32] Hongyang Du, Guangyuan Liu, Yijing Lin, Dusit Niyato, Jiawen Kang, Zehui Xiong, and Dong In Kim. 2024. Mixture
of Experts for Network Optimization: A Large Language Model-enabled Approach. arXiv preprint arXiv:2402.09756
(2024).
[33] Mengge Du, Yuntian Chen, Zhongzheng Wang, Longfeng Nie, and Dongxiao Zhang. 2024. Large language models for
automatic equation discovery of nonlinear dynamics. Physics of Fluids 36, 9 (2024).
[34] Mengge Du, Yuntian Chen, Zhongzheng Wang, Longfeng Nie, and Dongxiao Zhang. 2024. LLM4ED: Large Language
Models for Automatic Equation Discovery. arXiv preprint arXiv:2405.07761 (2024).
[35] Shukai Duan, Nikos Kanakaris, Xiongye Xiao, Heng Ping, Chenyu Zhou, Nesreen K Ahmed, Guixiang Ma, Mihai
Capota, Theodore L Willke, Shahin Nazarian, et al. 2023. Leveraging Reinforcement Learning and Large Language
Models for Code Optimization. arXiv preprint arXiv:2312.05657 (2023).
[36] Naoki Egami, Musashi Hinck, Brandon Stewart, and Hanying Wei. 2024. Using imperfect surrogates for downstream
inference: Design-based supervised learning for social science applications of large language models. Advances in
Neural Information Processing Systems 36 (2024).
[37] Katharina Eggensperger, Philipp Müller, Neeratyoy Mallik, Matthias Feurer, René Sass, Aaron Klein, Noor Awad,
Marius Lindauer, and Frank Hutter. 2021. HPOBench: A collection of reproducible multi-fidelity benchmark problems
for HPO. arXiv preprint arXiv:2109.06716 (2021).
[38] Maxence Faldor, Jenny Zhang, Antoine Cully, and Jeff Clune. 2024. OMNI-EPIC: Open-endedness via Models of
human Notions of Interestingness with Environments Programmed in Code. arXiv preprint arXiv:2405.15568 (2024).
[39] Zhun Fan, Wenji Li, Xinye Cai, Hui Li, Caimin Wei, Qingfu Zhang, Kalyanmoy Deb, and Erik Goodman. 2020.
Difficulty adjustable and scalable constrained multiobjective test problem toolkit. Evolutionary computation 28, 3
(2020), 339–378.
[40] Wenji Fang, Mengming Li, Min Li, Zhiyuan Yan, Shang Liu, Hongce Zhang, and Zhiyao Xie. 2024. Assertllm:
Generating and evaluating hardware verification assertions from design specifications via multi-llms. arXiv preprint
arXiv:2402.00386 (2024).
[41] Chao Feng, Xinyu Zhang, and Zichu Fei. 2023. Knowledge Solver: Teaching LLMs to Search for Domain Knowledge
from Knowledge Graphs. (2023).
[42] Ruijun Feng, Chi Zhang, and Yang Zhang. 2024. Large language models for biomolecular analysis: From methods to
applications. TrAC Trends in Analytical Chemistry (2024), 117540.
[43] Sidong Feng, Mingyue Yuan, Jieshan Chen, Zhenchang Xing, and Chunyang Chen. 2023. Designing with Language:
Wireframing UI Design Intent with Generative Large Language Models. arXiv preprint arXiv:2312.07755 (2023).
[44] Peter I Frazier and Jialei Wang. 2016. Bayesian optimization for materials design. Information science for materials
discovery and design (2016), 45–75.
[45] Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, and Wai Lam. 2023. Strategyllm: Large language models as strategy
generators, executors, optimizers, and evaluators for problem solving. arXiv preprint arXiv:2311.08803 (2023).
[46] Karan Girotra, Lennart Meincke, Christian Terwiesch, and Karl T Ulrich. 2023. Ideas are dimes a dozen: Large language
models for idea generation in innovation. Available at SSRN 4526071 (2023).
[47] Julia Grosse, Ruotian Wu, Ahmad Rashid, Philipp Hennig, Pascal Poupart, and Agustinus Kristiadi. 2024. Uncertainty-
Guided Optimization on Large Language Model Search Trees. arXiv preprint arXiv:2407.03951 (2024).
[48] Naiqing Guan, Kaiwen Chen, and Nick Koudas. 2023. Can Large Language Models Design Accurate Label Functions?
CoRR (2023). arXiv:2311.00739
[49] Ping Guo, Fei Liu, Xi Lin, Qingchuan Zhao, and Qingfu Zhang. 2024. L-autoda: Leveraging large language models for
automated decision-based adversarial attacks. arXiv preprint arXiv:2401.15335 (2024).
[50] Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2023.
Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. arXiv preprint
arXiv:2309.08532 (2023).
[51] Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, and Wangmeng Zuo. [n. d.]. Two Optimizers Are Better
Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning. arXiv:2405.19732 [cs.CV]
[52] Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, and Wangmeng Zuo. 2024. Two Optimizers Are Better
Than One: LLM Catalyst for Enhancing Gradient-Based Optimization. arXiv preprint arXiv:2405.19732 (2024).
[53] Muhammad Usman Hadi, Qasem Al Tashi, Abbas Shah, Rizwan Qureshi, Amgad Muneer, Muhammad Irfan, Anas
Zafar, Muhammad Bilal Shaikh, Naveed Akhtar, Jia Wu, et al. 2024. Large language models: a comprehensive survey
of its applications, challenges, limitations, and future prospects. Authorea Preprints (2024).
[54] Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin, and Belinda Z Li. 2024. Bayesian
preference elicitation with language models. arXiv preprint arXiv:2403.05534 (2024).
[55] Nikolaus Hansen and Raymond Ros. 2010. Black-box optimization benchmarking of NEWUOA compared to BIPOP-
CMA-ES: on the BBOB noiseless testbed. In Proceedings of the 12th annual conference companion on Genetic and
29
evolutionary computation. 1519–1526.
[56] Hao Hao, Xiaoqun Zhang, and Aimin Zhou. 2024. Large Language Models as Surrogate Models in Evolutionary
Algorithms: A Preliminary Study. arXiv preprint arXiv:2406.10675 (2024).
[57] Rishi Hazra, Pedro Zuidberg Dos Martires, and Luc De Raedt. 2024. Saycanpay: Heuristic planning with large language
models using learnable domain knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38.
20123–20133.
[58] Cheng He, Ye Tian, and Zhichao Lu. [n. d.]. Artificial Evolutionary Intelligence (AEI): Evolutionary Computation
Evolves with Large Language Models. ([n. d.]).
[59] Erik Hemberg, Stephen Moskal, and Una-May O’Reilly. 2024. Evolving code with a large language model. Genetic
Programming and Evolvable Machines 25, 2 (2024), 21.
[60] Or Honovich, Uri Shaham, Samuel R Bowman, and Omer Levy. 2022. Instruction induction: From few examples to
natural language task descriptions. arXiv preprint arXiv:2205.10782 (2022).
[61] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. LoRA:
Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
[62] Mingyang Hu, Fajie Yuan, Kevin Yang, Fusong Ju, Jin Su, Hui Wang, Fei Yang, and Qiuyang Ding. 2022. Exploring
evolution-aware &-free protein language models as protein function predictors. Advances in Neural Information
Processing Systems 35 (2022), 38873–38884.
[63] Shengran Hu, Cong Lu, and Jeff Clune. 2024. Automated design of agentic systems. arXiv preprint arXiv:2408.08435
(2024).
[64] Sen Huang, Kaixiang Yang, Sheng Qi, and Rui Wang. 2024. When Large Language Model Meets Optimization. arXiv
preprint arXiv:2405.10098 (2024).
[65] Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Language Models as Zero-Shot Planners:
Extracting Actionable Knowledge for Embodied Agents. In International Conference on Machine Learning, ICML
(Proceedings of Machine Learning Research). PMLR.
[66] Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey
Levine, Karol Hausman, and Brian Ichter. 2023. Grounded Decoding: Guiding Text Generation with Grounded Models
for Embodied Agents. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural
Information Processing Systems 2023, NeurIPS.
[67] Xuhan Huang, Qingning Shen, Yan Hu, Anningzhe Gao, and Benyou Wang. 2024. Mamo: a Mathematical Modeling
Benchmark with Solvers. arXiv preprint arXiv:2405.13144 (2024).
[68] Yuxiao Huang, Wenjie Zhang, Liang Feng, Xingyu Wu, and Kay Chen Tan. 2024. How multimodal integration
boost the performance of llm for optimization: Case study on capacitated vehicle routing problems. arXiv preprint
arXiv:2403.01757 (2024).
[69] Shoichi Ishida, Tomohiro Sato, Teruki Honma, and Kei Terayama. 2024. Large Language Models Open New Way of
AI-Assisted Molecule Design for Chemists. (2024).
[70] Ganesh Jawahar, Muhammad Abdul-Mageed, Laks VS Lakshmanan, and Dujian Ding. 2023. LLM Performance
Predictors are good initializers for Architecture Search. arXiv preprint arXiv:2310.16712 (2023).
[71] Shuyi Jia, Chao Zhang, and Victor Fung. 2024. LLMatDesign: Autonomous Materials Discovery with Large Language
Models. arXiv preprint arXiv:2406.13163 (2024).
[72] Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2024. A Survey on Large Language Models for
Code Generation. arXiv preprint arXiv:2406.00515 (2024).
[73] Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023.
Inferfix: End-to-end program repair with llms. In Proceedings of the 31st ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering. 1646–1656.
[74] Steven Jorgensen, Giorgia Nadizar, Gloria Pietropolli, Luca Manzoni, Eric Medvet, Una-May O’Reilly, and Erik
Hemberg. 2024. Large Language Model-based Test Case Generation for GP Agents. In Proceedings of the Genetic and
Evolutionary Computation Conference. 914–923.
[75] J Kleinberg. 2006. Algorithm Design. Vol. 92. Pearson Education.
[76] Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta, Pascal Poupart, Alan Aspuru-Guzik, and Geoff Pleiss. 2024.
A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?.
In Forty-first International Conference on Machine Learning. https://openreview.net/forum?id=Pa3GyTe3kf
[77] Robert Lange, Yingtao Tian, and Yujin Tang. 2024. Large language models as evolution strategies. In Proceedings of
the Genetic and Evolutionary Computation Conference Companion. 579–582.
[78] Joel Lehman, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, and Kenneth O Stanley. 2023. Evolution
through large models. In Handbook of Evolutionary Machine Learning. Springer, 331–366.

30
[79] Joel Lehman, Jonathan Gordon, Shawn Jain, Kamal Ndousse, Cathy Yeh, and Kenneth O. Stanley. 2024. Evolution
Through Large Models. Springer Nature Singapore, Singapore, 331–366.
[80] Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, and Jifeng
Dai. 2024. Auto mc-reward: Automated dense reward design with large language models for minecraft. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16426–16435.
[81] Jiayang Li, Jiale Li, and Yunsheng Su. 2024. A Map of Exploring Human Interaction Patterns with LLM: Insights into
Collaboration and Creativity. In International Conference on Human-Computer Interaction. Springer, 60–85.
[82] Sizhen Li, Saeed Moayedpour, Ruijiang Li, Michael Bailey, Saleh Riahi, Lorenzo Kogler-Anele, Milad Miladi, Jacob
Miner, Dinghai Zheng, Jun Wang, et al. 2023. Codonbert: Large language models for mrna design and optimization.
bioRxiv (2023), 2023–09.
[83] Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint
arXiv:2101.00190 (2021).
[84] Kevin Lin, Christopher Agia, Toki Migimatsu, Marco Pavone, and Jeannette Bohg. 2023. Text2motion: From natural
language instructions to feasible plans. Autonomous Robots 47, 8 (2023), 1345–1365.
[85] Xiaohan Lin, Qingxing Cao, Yinya Huang, Zhicheng Yang, Zhengying Liu, Zhenguo Li, and Xiaodan Liang. 2024. ATG:
Benchmarking Automated Theorem Generation for Generative Language Models. arXiv preprint arXiv:2405.06677
(2024).
[86] Fei Liu, Xi Lin, Zhenkun Wang, Shunyu Yao, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. 2023. Large Language
Model for Multi-objective Evolutionary Optimization. arXiv preprint arXiv:2310.12541 (2023).
[87] Fei Liu, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. 2023. Algorithm evolution using large language model.
arXiv preprint arXiv:2311.15249 (2023).
[88] Fei Liu, Tong Xialiang, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. 2024.
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model. In Forty-first
International Conference on Machine Learning.
[89] Gang Liu, Michael Sun, Wojciech Matusik, Meng Jiang, and Jie Chen. 2024. Multimodal Large Language Models for
Inverse Molecular Design with Retrosynthetic Planning. arXiv preprint arXiv:2410.04223 (2024).
[90] Max Liu, Chan-Hung Yu, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen, and Shao-Hua Sun. 2024. Synthesiz-
ing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search. arXiv preprint
arXiv:2405.16450 (2024).
[91] Shengcai Liu, Caishun Chen, Xinghua Qu, Ke Tang, and Yew-Soon Ong. 2024. Large language models as evolutionary
optimizers. In 2024 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8.
[92] Siyi Liu, Chen Gao, and Yong Li. 2024. Large Language Model Agent for Hyper-Parameter Optimization. arXiv
preprint arXiv:2402.01881 (2024).
[93] Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. 2024. Large language models to enhance
bayesian optimization. arXiv preprint arXiv:2402.03921 (2024).
[94] Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K Choubey, Tian Lan, Jason
Wu, Huan Wang, et al. 2024. AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent
System. arXiv preprint arXiv:2402.15538 (2024).
[95] Sifan Long, Fengxiao Tang, Yangfan Li, Tiao Tan, Zhengjie Jin, Ming Zhao, and Nei Kato. 2024. 6G comprehensive
intelligence: network operations and optimization based on Large Language Models. arXiv preprint arXiv:2404.18373
(2024).
[96] Jorge Lovaco, Raghu Chaitanya Munjulury, Ingo Staack, and Petter Krus. 2024. LARGE LANGUAGE MODEL-
DRIVEN SIMULATIONS FOR SYSTEM OF SYSTEMS ANALYSIS IN FIREFIGHTING AIRCRAFT CONCEPTUAL
DESIGN. In 34th Congress of the International Council of the Aeronautical Sciences, 2024.
[97] Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, and Yonghong
Tian. 2024. Prollama: A protein large language model for multi-task protein language processing. arXiv preprint
arXiv:2402.16445 (2024).
[98] Pingchuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B Tenenbaum, Daniela Rus, Chuang Gan, and
Wojciech Matusik. 2024. LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific
Discovery. arXiv preprint arXiv:2405.09783 (2024).
[99] Ruotian Ma, Xiaolei Wang, Xin Zhou, Jian Li, Nan Du, Tao Gui, Qi Zhang, and Xuanjing Huang. 2024. Are Large
Language Models Good Prompt Optimizers? arXiv preprint arXiv:2402.02101 (2024).
[100] Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi
Fan, and Anima Anandkumar. 2024. Eureka: Human-Level Reward Design via Coding Large Language Models. In The
Twelfth International Conference on Learning Representations.

31
[101] Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Guojun Peng, Zhiguang Cao, Yining Ma, and Yue-Jiao Gong. 2024. LLaMoCo:
Instruction Tuning of Large Language Models for Optimization Code Generation. arXiv preprint arXiv:2403.01131
(2024).
[102] Ali Maatouk, Nicola Piovesan, Fadhel Ayed, Antonio De Domenico, and Merouane Debbah. 2024. Large language
models for telecom: Forthcoming impact on the industry. IEEE Communications Magazine (2024).
[103] Thomas W Malone and Michael S Bernstein. 2022. Handbook of collective intelligence. MIT press.
[104] Jinzhu Mao, Dongyun Zou, Li Sheng, Siyi Liu, Chen Gao, Yue Wang, and Yong Li. 2024. Identify Critical Nodes in
Complex Network with Large Language Models. (2024).
[105] Sean McLeish, Avi Schwarzschild, and Tom Goldstein. 2024. Benchmarking ChatGPT on Algorithmic Reasoning. arXiv
preprint arXiv:2404.03441 (2024).
[106] Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit
Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, et al. 2024. Large Language Models for Relevance
Judgment in Product Search. arXiv preprint arXiv:2406.00247 (2024).
[107] Abdulkadir Memduhoğlu, Nir Fulman, and Alexander Zipf. 2024. Enriching building function classification using Large
Language Model embeddings of OpenStreetMap Tags. Earth Science Informatics (2024), 1–16.
[108] Jean-Yves Potvin Michel Gendreau. 2019. Handbook of metaheuristics. Springer.
[109] Samit Shahnawaz Miftah, Amisha Srivastava, Hyunmin Kim, and Kanad Basu. 2024. Assert-O: Context-based Assertion
Optimization using LLMs. In Proceedings of the Great Lakes Symposium on VLSI 2024. 233–239.
[110] Clint Morris, Michael Jurado, and Jason Zutty. 2024. LLM Guided Evolution-The Automation of Models Advancing
Models. arXiv preprint arXiv:2403.11446 (2024).
[111] Mahdi Mostajabdaveh, Timothy T Yu, Rindranirina Ramamonjison, Giuseppe Carenini, Zirui Zhou, and Yong Zhang.
2024. Optimization modeling and verification from problem specifications using a multi-agent multi-stage LLM
framework. INFOR: Information Systems and Operational Research (2024), 1–19.
[112] Ali Narin. 2024. Evolutionary Reward Design and Optimization with Multimodal Large Language Models. In Proceedings
of the 3rd Workshop on Advances in Language and Vision Research (ALVR). 202–208.
[113] Muhammad U Nasir, Sam Earle, Julian Togelius, Steven James, and Christopher Cleghorn. 2023. LLMatic: Neural
Architecture Search via Large Language Models and Quality-Diversity Optimization. arXiv preprint arXiv:2306.01102
(2023).
[114] Bo Ni and Markus J Buehler. 2024. MechAgents: Large language model multi-agent collaborations can solve mechanics
problems, generate new data, and integrate knowledge. Extreme Mechanics Letters 67 (2024), 102131.
[115] Allen Nie, Ching-An Cheng, Andrey Kolobov, and Adith Swaminathan. 2024. The Importance of Directional Feedback
for LLM-based Optimizers. arXiv preprint arXiv:2405.16434 (2024).
[116] OpenAI. 2024. Introducing OpenAI o1-preview.
[117] Una-May O’Reilly and Erik Hemberg. 2024. Using Large Language Models for Evolutionary Search. In Proceedings of
the Genetic and Evolutionary Computation Conference Companion. 973–983.
[118] Vishal Pallagani, Bharath Chandra Muppasani, Kaushik Roy, Francesco Fabiano, Andrea Loreggia, Keerthiram
Murugesan, Biplav Srivastava, Francesca Rossi, Lior Horesh, and Amit Sheth. 2024. On the prospects of incorporating
large language models (llms) in automated planning and scheduling (aps). In Proceedings of the International
Conference on Automated Planning and Scheduling, Vol. 34. 432–444.
[119] Haining Pan, Nayantara Mudur, Will Taranto, Maria Tikhanovskaya, Subhashini Venugopalan, Yasaman Bahri,
Michael P Brenner, and Eun-Ah Kim. 2024. Quantum Many-Body Physics Calculations with Large Language Models.
arXiv preprint arXiv:2403.03154 (2024).
[120] Tae Jin Park, Kunal Dhawan, Nithin Koluguri, and Jagadeesh Balam. 2024. Enhancing speaker diarization with large
language models: A contextual beam search approach. In ICASSP 2024-2024 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP). IEEE, 10861–10865.
[121] Archiki Prasad, Peter Hase, Xiang Zhou, and Mohit Bansal. 2022. Grips: Gradient-free, edit-based instruction search
for prompting large language models. arXiv preprint arXiv:2203.07281 (2022).
[122] Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, and Michael Zeng. 2023. Automatic prompt optimization
with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495 (2023).
[123] Justin K Pugh, Lisa B Soros, and Kenneth O Stanley. 2016. Quality diversity: A new frontier for evolutionary
computation. Frontiers in Robotics and AI 3 (2016), 202845.
[124] Rindranirina Ramamonjison, Timothy Yu, Raymond Li, Haley Li, Giuseppe Carenini, Bissan Ghaddar, Shiqi He, Mahdi
Mostajabdaveh, Amin Banitalebi-Dehkordi, Zirui Zhou, et al. 2023. Nl4opt competition: Formulating optimization
problems based on their natural language descriptions. In NeurIPS 2022 Competition Track. PMLR, 189–203.
[125] Mayk Caldas Ramos, Shane S Michtavy, Marc D Porosoff, and Andrew D White. 2023. Bayesian optimization of
catalysts with in-context learning. arXiv preprint arXiv:2304.05341 (2023).
32
[126] Bojana Ranković and Philippe Schwaller. 2023. BoChemian: Large language model embeddings for Bayesian optimization
of chemical reactions. In NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the
Real World.
[127] Thiago Rios, Felix Lanfermann, and Stefan Menzel. 2024. Large language model-assisted surrogate modelling for
engineering optimization. In 2024 IEEE Conference on Artificial Intelligence (CAI). IEEE, 796–803.
[128] Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien
Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. 2024. Mathematical discoveries
from program search with large language models. Nature 625, 7995 (2024), 468–475.
[129] Emily F Ruff, Jeanne L Franz, and Joseph K West. 2024. Using ChatGPT for Method Development and Green
Chemistry Education in Upper-Level Laboratory Courses. Journal of Chemical Education 101, 8 (2024), 3224–3232.
[130] Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A systematic
survey of prompt engineering in large language models: Techniques and applications. arXiv preprint arXiv:2402.07927
(2024).
[131] Camilo Chacón Sartori, Christian Blum, Filippo Bistaffa, and Guillem Rodríguez Corominas. 2024. Metaheuristics and
Large Language Models Join Forces: Towards an Integrated Optimization Approach. arXiv preprint arXiv:2405.18272
(2024).
[132] Ivan Sekulić, Mohammad Alinannejadi, and Fabio Crestani. 2024. Analysing utterances in llm-based user simulation
for conversational search. ACM Transactions on Intelligent Systems and Technology 15, 3 (2024), 1–22.
[133] Dhruv Shah, Michael Robert Equi, Błażej Osiński, Fei Xia, Brian Ichter, and Sergey Levine. 2023. Navigation with
large language models: Semantic guesswork as a heuristic for planning. In Conference on Robot Learning. PMLR,
2683–2699.
[134] Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Liò, and
Yu Guang Wang. 2024. TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein
Sequences for Protein Engineering. arXiv preprint arXiv:2408.15299 (2024).
[135] Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. 2024. LLM-SR:
Scientific Equation Discovery via Programming with Large Language Models. arXiv preprint arXiv:2404.18400
(2024).
[136] Gaurav Singh and Kavitesh Kumar Bali. 2024. Enhancing Decision-Making in Optimization through LLM-Assisted
Inference: A Neural Networks Perspective. arXiv preprint arXiv:2405.07212 (2024).
[137] Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason,
and Animesh Garg. 2023. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. In IEEE
International Conference on Robotics and Automation, ICRA. IEEE.
[138] Eduardo Soares, Vidushi Sharma, Emilio Vital Brazil, Young-Hye Na, and Renato Cerqueira. 2024. Capturing
Formulation Design of Battery Electrolytes with Chemical Large Language Model. (2024).
[139] Chuanneng Sun, Songjun Huang, and Dario Pompili. 2024. LLM-based Multi-Agent Reinforcement Learning: Current
and Future Directions. arXiv preprint arXiv:2405.11106 (2024).
[140] Yiwen Sun, Xianyin Zhang, Shiyu Huang, Shaowei Cai, Bing-Zhen Zhang, and Ke Wei. 2024. AutoSAT: Automatically
Optimize SAT Solvers via Large Language Models. arXiv preprint arXiv:2402.10705 (2024).
[141] Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha
Chowdhery, Quoc V Le, Ed H Chi, Denny Zhou, et al. 2022. Challenging big-bench tasks and whether chain-of-thought
can solve them. arXiv preprint arXiv:2210.09261 (2022).
[142] Ke Tang and Xin Yao. 2024. Learn to Optimize-A Brief Overview. National Science Review (2024), nwae132.
[143] Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Siyuan Lu, Yaliang Li, and Ji-Rong Wen. 2024. Unleashing the Potential
of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers.
arXiv preprint arXiv:2402.17564 (2024).
[144] Yihong Tang, Zhaokai Wang, Ao Qu, Yihao Yan, Kebing Hou, Dingyi Zhuang, Xiaotong Guo, Jinhua Zhao, Zhan Zhao,
and Wei Ma. 2024. Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary
Planning. arXiv preprint arXiv:2402.07204 (2024).
[145] Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, and Benyou Wang. 2024. ORLM:
Training Large Language Models for Optimization Modeling. arXiv preprint arXiv:2405.17743 (2024).
[146] Yves Gaetan Nana Teukam, Federico Zipoli, Teodoro Laino, Emanuele Criscuolo, Francesca Grisoni, and Matteo Manica.
2024. Integrating Genetic Algorithms and Language Models for Enhanced Enzyme Design. (2024).
[147] Thanh VT Tran and Truong Son Hy. 2024. Protein design by directed evolution guided by large language models.
IEEE Transactions on Evolutionary Computation (2024).
[148] Uber. 2020. Uber/bayesmark: Benchmark framework to easily compare bayesian optimization methods on real machine
learning tasks.
33
[149] Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. 2024.
Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change.
Advances in Neural Information Processing Systems 36 (2024).
[150] Niki van Stein and Thomas Bäck. 2024. LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically
Generating Metaheuristics. arXiv preprint arXiv:2405.20132 (2024).
[151] Petar Veličković, Adrià Puigdomènech Badia, David Budden, Razvan Pascanu, Andrea Banino, Misha Dashevskiy,
Raia Hadsell, and Charles Blundell. 2022. The CLRS algorithmic reasoning benchmark. In International Conference
on Machine Learning. PMLR, 22084–22102.
[152] Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu,
Peter Van Katwyk, Andreea Deac, et al. 2023. Scientific discovery in the age of artificial intelligence. Nature 620, 7972
(2023), 47–60.
[153] Ludi Wang, Xueqing Chen, Yi Du, Yuanchun Zhou, Yang Gao, and Wenjuan Cui. 2024. CataLM: Empowering Catalyst
Design Through Large Language Models. arXiv preprint arXiv:2405.17440 (2024).
[154] Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen,
Yankai Lin, et al. 2024. A survey on large language model based autonomous agents. Frontiers of Computer Science
18, 6 (2024), 186345.
[155] Shuai Wang, Shengyao Zhuang, Bevan Koopman, and Guido Zuccon. 2024. ReSLLM: Large Language Models are
Strong Resource Selectors for Federated Search. arXiv preprint arXiv:2401.17645 (2024).
[156] Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, and
Zhiting Hu. 2023. PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization.
arXiv:2310.16427 [cs.CL]
[157] Xuezhi Wang and Denny Zhou. 2024. Chain-of-thought reasoning without prompting. arXiv preprint arXiv:2402.10200
(2024).
[158] Zeyi Wang, Songbai Liu, Jianyong Chen, and Kay Chen Tan. 2024. Large Language Model-Aided Evolutionary
Search for Constrained Multiobjective Optimization. In International Conference on Intelligent Computing. Springer,
218–230.
[159] Segev Wasserkrug, Leonard Boussioux, Dick den Hertog, Farzaneh Mirzazadeh, Ilker Birbil, Jannis Kurtz, and Donato
Maragno. 2024. From Large Language Models and Optimization to Decision Optimization CoPilot: A Research
Manifesto. arXiv preprint arXiv:2402.16269 (2024).
[160] Melvin Wong, Jiao Liu, Thiago Rios, Stefan Menzel, and Yew Soon Ong. 2024. LLM2FEA: Discover Novel Designs
with Generative Evolutionary Multitasking. arXiv preprint arXiv:2406.14917 (2024).
[161] Melvin Wong, Thiago Rios, Stefan Menzel, and Yew Soon Ong. 2024. Generative AI-based Prompt Evolution Engineering
Design Optimization With Vision-Language Model. arXiv preprint arXiv:2406.09143 (2024).
[162] Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu,
Qi Liu, et al. 2024. A survey on large language models for recommendation. World Wide Web 27, 5 (2024), 60.
[163] Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, and Kay Chen Tan. 2024. Evolutionary Computation in the Era of
Large Language Model: Survey and Roadmap. arXiv preprint arXiv:2401.10034 (2024).
[164] Xingyu Wu, Yan Zhong, Jibin Wu, Bingbing Jiang, Kay Chen Tan, et al. 2024. Large language model-enhanced
algorithm selection: towards comprehensive algorithm representation. International Joint Conference on Artificial
Intelligence.
[165] Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin,
Enyu Zhou, et al. 2023. The rise and potential of large language model based agents: A survey. arXiv preprint
arXiv:2309.07864 (2023).
[166] Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, and Yu Su. 2024.
Travelplanner: A benchmark for real-world planning with language agents. arXiv preprint arXiv:2402.01622 (2024).
[167] Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2024. Large
Language Models as Optimizers. In The Twelfth International Conference on Learning Representations.
[168] Heng Yang and Ke Li. 2023. Instoptima: Evolutionary multi-objective instruction optimization via large language
model-based instruction operators. arXiv preprint arXiv:2310.17630 (2023).
[169] Zonglin Yang, Xinya Du, Junxian Li, Jie Zheng, Soujanya Poria, and Erik Cambria. 2023. Large language models for
automated open-domain scientific hypotheses discovery. arXiv preprint arXiv:2309.02726 (2023).
[170] Shunyu Yao, Fei Liu, Xi Lin, Zhichao Lu, Zhenkun Wang, and Qingfu Zhang. 2024. Multi-objective Evolution of
Heuristic Using Large Language Model. arXiv preprint arXiv:2409.16867 (2024).
[171] Yiming Yao, Fei Liu, Ji Cheng, and Qingfu Zhang. 2024. Evolve Cost-aware Acquisition Functions Using Large
Language Models. arXiv preprint arXiv:2404.16906 (2024).

34
[172] Wang Yatong, Pei Yuchen, and Zhao Yuqi. 2024. TS-EoH: An Edge Server Task Scheduling Algorithm Based on
Evolution of Heuristic. arXiv preprint arXiv:2409.09063 (2024).
[173] Gavin Ye. 2024. De novo drug design as GPT language modeling: large chemistry models with supervised and
reinforcement learning. Journal of Computer-Aided Molecular Design 38, 1 (2024), 20.
[174] Geyan Ye, Xibao Cai, Houtim Lai, Xing Wang, Junhong Huang, Longyue Wang, Wei Liu, and Xiangxiang Zeng. 2023.
Drugassist: A large language model for molecule optimization. arXiv preprint arXiv:2401.10334 (2023).
[175] Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song.
2024. Large language models as hyper-heuristics for combinatorial optimization. arXiv preprint arXiv:2402.01145
(2024).
[176] Haoran Ye, Jiarui Wang, Zhiguang Cao, and Guojie Song. 2024. ReEvo: Large Language Models as Hyper-Heuristics
with Reflective Evolution. arXiv preprint arXiv:2402.01145 (2024).
[177] Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis
Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, et al. 2023. Language to rewards for robotic skill synthesis.
arXiv preprint arXiv:2306.08647 (2023).
[178] Junhua Zeng, Chao Li, Zhun Sun, Qibin Zhao, and Guoxu Zhou. 2024. tnGPS: Discovering Unknown Tensor Network
Structure Search Algorithms via Large Language Models (LLMs). In Forty-first International Conference on Machine
Learning, ICML.
[179] Han Zhang, Akram Bin Sediq, Ali Afana, and Melike Erol-Kantarci. 2024. Large Language Models in Wireless Application
Design: In-Context Learning-enhanced Automatic Network Intrusion Detection. arXiv preprint arXiv:2405.11002
(2024).
[180] Huan Zhang, Yu Song, Ziyu Hou, Santiago Miret, and Bang Liu. 2024. HoneyComb: A Flexible LLM-Based Agent
System for Materials Science. arXiv preprint arXiv:2409.00135 (2024).
[181] Michael R Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, and Jimmy Ba. 2023. Using large language models
for hyperparameter optimization. In NeurIPS 2023 Foundation Models for Decision Making Workshop.
[182] Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander
Hauptmann, Yonatan Bisk, et al. 2024. Direct Preference Optimization of Video Large Multimodal Models from
Language Model Reward. arXiv preprint arXiv:2404.01258 (2024).
[183] Rui Zhang, Fei Liu, Xi Lin, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. 2024. Understanding the Importance of
Evolutionary Search in Automated Heuristic Design with Large Language Models. In International Conference on
Parallel Problem Solving from Nature. Springer, 185–202.
[184] Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, and Jianzhong Qi. 2024.
AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment Enabled by Large Language Models. IEEE
Trans. Knowl. Data Eng. (2024).
[185] Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, and
Zhaoran Wang. 2024. How Can LLM Guide RL? A Value-Based Approach. arXiv preprint arXiv:2402.16181 (2024).
[186] Yu Zhang, Kefeng Zheng, Fei Liu, Qingfu Zhang, and Zhenkun Wang. 2024. AutoTurb: Using Large Language Models
for Automatic Algebraic Model Discovery of Turbulence Closure. arXiv preprint arXiv:2410.10657 (2024).
[187] Huaqin Zhao, Zhengliang Liu, Zihao Wu, Yiwei Li, Tianze Yang, Peng Shu, Shaochen Xu, Haixing Dai, Lin Zhao,
Gengchen Mai, et al. 2024. Revolutionizing finance with llms: An overview of applications and insights. arXiv preprint
arXiv:2401.11641 (2024).
[188] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie
Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
[189] Ruizhe Zhong, Xingbo Du, Shixiong Kai, Zhentao Tang, Siyuan Xu, Hui-Ling Zhen, Jianye Hao, Qiang Xu, Mingxuan
Yuan, and Junchi Yan. 2023. Llm4eda: Emerging progress in large language models for electronic design automation.
arXiv preprint arXiv:2401.12224 (2023).
[190] Hao Zhou, Chengming Hu, and Xue Liu. 2024. An Overview of Machine Learning-Enabled Optimization for Recon-
figurable Intelligent Surfaces-Aided 6G Networks: From Reinforcement Learning to Large Language Models. arXiv
preprint arXiv:2405.17439 (2024).
[191] Xun Zhou, Liang Feng, Xingyu Wu, Zhichao Lu, and Kay Chen Tan. 2024. Design Principle Transfer in Neural
Architecture Search via Large Language Models. arXiv preprint arXiv:2408.11330 (2024).
[192] Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. 2022.
Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022).
[193] Max Zhu, Adrián Bazaga, and Pietro Liò. 2024. FLUID-LLM: Learning Computational Fluid Dynamics with
Spatiotemporal-aware Large Language Models. arXiv preprint arXiv:2406.04501 (2024).
[194] Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. 2024.
GPTSwarm: Language Agents as Optimizable Graphs. In Forty-first International Conference on Machine Learning.
35

You might also like