Autoagent: A Fully-Automated and Zero-Code Framework For LLM Agents
Autoagent: A Fully-Automated and Zero-Code Framework For LLM Agents
Figure 1: AutoAgent is a new LLM agent framework that enables fully automated, zero-code devel-
opment for complex task automation. It features self-evolving agents via self-play customization and
a generalist agent stack for Search·Code·Files, unifying tool use and vector-native retrieval for RAG-
ready workflows. This language-driven approach democratizes AI development, allowing anyone to
create, adapt, and orchestrate agents, tools, and workflows from natural-language instructions.
Abstract
1 Introduction
The emergence of Large Language Models (LLMs) has revolutionized AI agent development, enabling
unprecedented breakthroughs in autonomous task execution and intelligent problem-solving. LLM-
powered agents excel at understanding context, making informed decisions, and seamlessly integrating
with various tools and APIs. Leading frameworks like LangChain [1], AutoGPT [2], AutoGen [3],
CAMEL [4], and MetaGPT [5] have demonstrated remarkable success in automating increasingly
complex workflows - from sophisticated web navigation to advanced data analysis and innovative
creative content production. By leveraging advanced mechanisms such as role-playing, structured
operating procedures, and dynamic agent coordination, these frameworks deliver exceptional problem-
solving capabilities while significantly reducing human intervention.
Despite remarkable advancements in AI agent development, a significant barrier persists: the creation
and optimization of LLM agent systems remains dependent on traditional programming expertise.
Current frameworks primarily cater to technically proficient developers who can navigate complex
codebases, understand API integrations, and implement sophisticated prompt engineering patterns.
This reliance on coding skills creates a substantial accessibility gap, as only 0.03% of the global popu-
lation possesses the necessary programming expertise to effectively build and customize these agents.
Even with well-documented frameworks and development tools, the entry barrier remains dauntingly
high for non-technical users. This limitation becomes particularly problematic given the universal
need for personalized AI assistants in digital age. Everyone, from business professionals seeking
workflow automation to educators designing interactive learning tools, requires customized LLM
agents tailored to their specific needs. For instance, a researcher might need an agent specialized in
literature review and data analysis, while a content creator might require an agent focused on creative
writing and media management. The current paradigm of coding-dependent agent development not
only severely restricts the user base but also creates a bottleneck in meeting the diverse and evolving
demands for personalized AI assistance. This misalignment between universal needs and limited
accessibility calls for a fundamental rethinking of how LLM agents are created and customized.
This stark contrast between universal needs and limited accessibility leads us to a fundamental research
question: Is it possible to democratize LLM agent development by enabling Natural Language-based
Creation and Customization? In this work, we aim to realize this vision by introducing AutoAgent, a
novel framework that fundamentally reimagines agent development as a fully automated, language-
driven process requiring zero programming expertise. To realize this vision, AutoAgentoperates as
an autonomous Agent Operating System with three key capabilities: 1) Natural Language-Driven
Multi-Agent Building - automatically constructing and orchestrating collaborative agent systems
purely through natural dialogue, eliminating the need for manual coding or technical configuration;
2) Self-Managing Workflow Generation - dynamically creating, optimizing and adapting agent
workflows based on high-level task descriptions, even when users cannot fully specify implementation
details; and 3) Intelligent Resource Orchestration - providing unified access to tools, APIs, and
computational resources via natural language while automatically managing resource allocation and
optimization. Through this innovative architecture, AutoAgentdemocratizes LLM agent development
while maintaining enterprise-grade sophistication, transforming a traditionally complex engineering
task into an intuitive conversation accessible to all users.
To enable fully-automated and zero-code LLM agent development, AutoAgent introduces several
synergistic technical innovations that form a complete framework: First, the Agentic System Utilities
provides a foundational multi-agent architecture, where specialized web, code, and file agents
collaborate seamlessly to handle diverse real-world tasks. At its core, the LLM-powered Actionable
Engine serves as the system’s brain, supporting flexible integration of any LLM provider through
both direct and transformed tool-use paradigms for robust action generation. To address the critical
challenge of information management, the Self-Managing File System enhances overall system
capability by automatically converting diverse data formats into queryable vector databases, enabling
2
efficient information access across all operations. Additionally, the Self-Play Agent Customization
not only transforms natural language requirements into executable agents through structured XML
schemas, but also automatically generates optimized workflows through iterative self-improvement,
eliminating the need for manual agent programming or workflow design. Together, these innovations
enable AutoAgent to democratize agent development while maintaining production-level robustness.
AutoAgent’s exceptional capabilities have been rigorously validated through comprehensive empirical
evaluation. In standardized benchmarks, it secured a strong second place on the Generalist Agent
Benchmark (GAIA), while significantly outperforming state-of-the-art RAG approaches on the
Retrieval-Augmented Generation benchmark. Beyond these quantitative achievements, extensive
case studies demonstrated AutoAgent’s robust self-development capabilities across diverse real-world
scenarios, highlighting its practical value in automated agent development.
3
User Interface The AutoAgent framework
Frontend: Language Input Agentic System Utilities LLM Actionable Engine Self-Play Agent Customization
Direct Tool-Use
Requirement of
Daily Tasks Daily Tasks Customize or
New Agent(s)
LLM Toolbox Execute
Backend Coding...
Self-Managing File System Agents...
Self-Managing
DB Providers LLM Providers Workflows...
Analysing...
LLM Organize Retrieve
Figure 2: AutoAgent is a fully automated, language-driven generalist agent system. The core
components that enable this include the Agentic System Utilities, the LLM-powered Actionable
Engine, the Self-Managing File System, and the Self-Play Agent Customization module.
For tasks with deterministic steps, workflow-driven mechanisms have emerged as a particularly effec-
tive approach. Notable examples include GPTSwarm [11], which models workflows as computational
graphs for complex data operations, and specialized systems for mathematical reasoning [12] and
code generation [13]. These advances have enabled strong capabilities across various applications,
including GUI interaction [14, 15], software development [16, 17], web browsing [18, 19], and
embodied tasks [20], demonstrating the versatility and potential of LLM-powered agent systems.
Workflow Design in Generalist MAS. While Generalist Multi-Agent Systems offer high flexibility,
particularly for open-ended and complex tasks, they also come with higher costs, increased complexity,
and potential for compounding errors [10]. For tasks with deterministic processes and expert domain
knowledge, fixed workflows can be a more stable and effective approach.
k c
A workflow in a MAS is defined as W = wi,j : πi −→ πj , where πi and πj represent agents, and ck
is the transfer condition. Effective workflow design involves defining conditional transfer equations
between agents based on downstream tasks, ensuring smooth and coordinated transmission - the
“Workflow Design Pattern”. Common patterns include Routing (directing tasks through a sequence),
Parallelization (distributing subtasks concurrently), and Evaluator-Optimizer (using agents to
assess and refine) [10]. These can create stable and efficient task-solving in Generalist MAS.
Fully-Automated Generalist MAS. While the successful design and execution of both multi-agent
systems and complex workflows typically require substantial expert knowledge and engineering
expertise, the goal of AutoAgent is to fully automate this process. The key challenge lies in seamlessly
bridging the gap from high-level user requirements to the practical implementation of effective MAS
and workflow solutions - all through natural language interactions.
The AutoAgent framework employs a modular, multi-agent architecture to address the key challenge
of developing intelligent personal assistant agents capable of seamlessly integrating and coordinating
diverse capabilities, from web browsing and information retrieval to data analysis and code execution.
This design choice, which comprises specialized agents for web, coding, and file management tasks,
as well as an orchestrator agent to decompose and delegate user requests, enables the agentic system
4
utilities to serve as a versatile and extensible foundation that can adapt to a wide range of user
requirements, facilitating the rapid development of tailored, agent-driven solutions. Detailed system
prompts and tool definitions for Agentic System Utilities can be found in Appendix Sec 6.
5
Agent for managing and analyzing local files is crucial. This Local File Agent can use a unified set of
tools to convert files into Markdown and perform effective analysis to assist with everyday tasks.
Environment. The interaction environment for the Local File Agent is similar to the Coding Agent,
utilizing an interactive Markdown Browser. Files converted into Markdown format can be displayed
in paginated form within the Markdown Browser, enabling the agent to analyze long texts and
complex files that exceed the context length limitations.
As the CPU executes instructions, manages resources, and coordinates processes in an OS, the
LLM-powered actionable engine can understand natural language, generate plans, and coordinate
tasks across agents. This enables seamless human-agent collaboration and task completion.
We utilize LiteLLM [23] to standardize LLM requests through an OpenAI-like interface, supporting
100+ models from various providers. For agent collaboration, the LLM receives all action-observation
pairs up to time t as state st to determine the next action. These pairs serve as system RAM, facilitating
efficient retrieval and enabling language-driven system coordination.
The file system in AutoAgent is a vector database that LLM agents can retrieve and understand. In our
design framework, users can upload text files in any format (e.g., .pdf, .doc, .txt) or compressed
archives and folders containing any text files. The system tools in the file system automatically
convert these files into a consistent text format and store them in a user-defined collection within
the vector database (using the save_raw_docs_to_vector_db tool). This enables agents to self-
manage their database memory and perform efficient and accurate retrieval and generation using tools
like query_db and answer_query. The detailed definitions of the tools are presented in Tab 4.
To allow users to customize tools and agents for specific scenarios or build their own multi-agent sys-
tems and workflows, it is designed as a code-driven, controllable self-programming agent framework.
By implementing constraints, error-handling mechanisms, and customized workflows, it enables
controlled code generation, facilitating the creation of tools, agents, and workflows. The AutoAgent
supports two distinct modes: agent creation without workflow and agent creation with workflow.
6
To address this challenge, our AutoAgent provides a powerful workflow-based mode allowing users to
generate sophisticated agent systems with minimal domain expertise. In this mode, the user provides
high-level descriptions of the desired agent(s), such as the agent’s name and a brief sentence-level
description for the expected agent functionalities. AutoAgent then uses this input to automatically
generate the appropriate agent(s) and the necessary workflow(s) to orchestrate their collaborative
efforts. This is all done based on the current state of the framework, including the available tools,
agents, and workflows. The key steps in this workflow-based agent generation approach are:
• Analyze Requirements and Existing Components. The process begins by carefully analyzing the
user’s requirements in the context of the existing tools and agents already available in the system.
This in-depth analysis is performed using the specialized profiling agent, which helps thoroughly
assess the current capabilities and resources that can be leveraged to fulfill the user’s needs.
• Analyze Tools and Structure Agents. Based on the comprehensive analysis performed, the system
then carefully determines the need for creating new tools, meticulously evaluates whether existing
tools can be effectively utilized, and subsequently structures the seamless collaboration between
multiple agents as appropriate. This ensures the optimal and comprehensive use of available
resources, ultimately leading to the efficient agent system design.
• Generate Detailed XML Agent Specifications. This step is designed to generate structured XML
code that accurately represents the detailed agent creation requirements. This comprehensive XML
representation captures the necessary information, including the agent’s functionality, dependencies,
and interactions, to enable the seamless and efficient subsequent process of agent generation.
Optimized Tool Creation with Third-Party APIs. The Tool Editor Agent can seamlessly integrate
various third-party APIs, such as LangChain, RapidAPI, and Hugging Face, to create powerful tools.
It expertly utilizes advanced retrieval techniques to search for and surface relevant API documentation,
including comprehensive details like names, descriptions, and invocation methods. The robust system
currently supports an extensive collection of 145 APIs from 8 diverse categories in RapidAPI,
LangChain [1], and a wide range of models from 9 categories in Hugging Face. Future plans include
seamlessly integrating more cutting-edge platforms like Composio [24].
The agent also generates tool code based on its knowledge, automatically checking for syntax errors.
It designs test cases, runs the tool, and verifies functionality. If the tool fails, the agent automatically
debugs the code until successful. This targeted approach allows a more customized and adaptable
tool set, rather than a bloated, integrated system.
Agent Creation and Execution. When the user’s requirements involve multiple agents focused on
different tasks, the Agent Editor Agent automatically identifies this need and performs the necessary
multi-step agent creation operations. After all agents are successfully created, the system invokes
the create_orchestrator_agent tool to generate an orchestrator agent that connects the required
agents. This orchestrator adheres to the Orchestrator-Workers MAS design pattern, with a system
prompt that includes task descriptions, sub-task decomposition rules, and other scenario-specific
details. Detailed algorithms and system prompts are provided in Appendix Sec 6.6.1.
7
During the parsing phase, a robust error detection mechanism verifies whether the generated workflow
form complies with system constraints (e.g., constraints on the on_start event). If the constraints
are not satisfied, detailed error messages are sent back to the Workflow Form Agent as feedback for
regeneration. If the constraints are satisfied, the workflow form is passed to the Workflow Editor
Agent, which creates new agents (if needed), constructs the new workflow, and executes it on the
task. Detailed algorithms and system prompts are shown in Appendix Sec 6.6.2.
4 Evaluation
Benchmark Dataset and Evaluation Protocols. The GAIA benchmark [29] is a comprehensive
evaluation framework to assess General AI Assistants. It tests fundamental abilities like Reasoning,
Multi-Modality Handing, Web Browsing, and Tool-Use Proficiency through a rigorous set of 466 test
and 165 validation questions, divided into 3 distinct difficulty levels.
GAIA emphasizes the importance of developing AI systems that can match human perfor-
mance on everyday tasks, serving as a milestone for assessing progress toward AGI. Specifi-
cally, we evaluated our AutoAgent on the GAIA validation set, using success rate as the pri-
mary evaluation metric. This metric measures the percentage of tasks successfully completed,
providing a clear indicator of its performance in handling real-world, human-like challenges.
Baseline Methods. The base-
lines we selected are divided into Table 1: Performance comparison between the baseline mod-
two categories: Open-Source: els and our AutoAgent on the GAIA benchmark. The results
FRIDAY [30], Magentic-1 [9], Multi- we report are those published on the GAIA leaderboard.
Agent Experiment v0.1 (powered
by AutoGen)[31], HuggingFace Agent Name Avg. L1 L2 L3
Agents[32], Langfun Agent [33]; TapeAgent v0.1 33.94 47.17 34.88 3.85
Closed-Source: TapeAgent, AgentIM, FRIDAY 34.55 45.28 34.88 11.54
Trase Agent [34], Omne, Barcelona2 , Magentic-1 36.97 54.72 33.72 11.54
and the h2oGPTe Agent [35]. These AgentIM 37.58 50.94 36.05 15.38
diverse baselines represent the current Multi-Agent Exp v0.1 39.39 54.72 38.37 11.54
state-of-the-art in open-source and AgentIM v1.1 40.00 50.94 40.70 15.38
proprietary multi-agent systems, Trase Agent 40.00 47.17 40.70 23.08
providing a comprehensive landscape HuggingFace Agents 44.24 58.49 43.02 19.23
for evaluating the performance Magentic-1 (o1) 46.06 56.60 46.51 23.08
and capabilities of our proposed omne 46.06 60.38 44.19 23.08
AutoAgent framework. Trase Agent v0.2 47.27 58.49 46.51 26.92
Barcelona v0.1 50.30 62.26 50.00 26.92
Implementation Details. To address Langfun Agent v2.0 54.55 60.38 59.30 26.92
tasks in the GAIA benchmark, we uti- h2oGPTe Agent v1.6.8 63.64 67.92 67.44 42.31
lize a combination of the System Util- AutoAgent 55.15 71.70 53.49 26.92
ities of the Model and the Tool Editor
Agent from the Agentic-SDK. The basic agents first attempt to complete the task while collecting
relevant information and reflections. If successful, the result is directly returned. If not, the Tool
Editor Agent creates new tools to continue the task. During validation, Claude-Sonnet-3.5 is used by
default.
Evaluation Results and Analysis. The results in Table 1 reveal the following key observations:
• Obs.1. Overall Superiority of AutoAgent: Our method significantly outperforms all open-
source agent systems and achieves performance close to the latest agent system, h2oGPTe Agent
v1.6.8 (submitted on December 16, 2024), securing a stable position in the top 2 rankings. Notably,
our approach demonstrates superior performance on Level 1 tasks compared to all state-of-the-art
baselines, becoming the first method to achieve over 70% accuracy rate. This success is attributed to
the well-designed System Utilities and the stable interaction of basic agents with the environment,
enabling efficient solutions to everyday simple tasks.
2
TapeAgent, AgentIM, Omne, and Barcelona are anonymous.
8
• Obs.2. Effectiveness of Key Components: Specifically, our framework demonstrates significantly
superior performance compared to Magentic-1 [9], a recent representative open-source MAS, and
FRIDAY, a classic self-improved framework. While Magentic-1 leverages the powerful reasoning
capabilities of o1-preview to design complex Orchestrator Agent (also the Coder Agent), our frame-
work emphasizes the stability of interactions between sub-agents and their respective environments,
as well as the precision of tool definitions. Under these conditions, the Orchestrator Agent achieves
better results with simple prompts and handoff tools.
• Obs.3. Error Analysis: The analysis revealed two key limitations in the current evaluation protocol
of the GAIA benchmark system: strict string matching that ignores semantic equivalence (e.g., "840
citations" vs. "Citations") and challenges posed by dynamic, anti-automation mechanisms during
web searches. These issues underscore the need for a more semantically-aware evaluation approach
to improve the system’s effectiveness.
Benchmark Dataset and Evaluation Protocols. To test the basic functionalities of the AutoAgent,
we use the RAG task as the testing benchmark. MultiHop-RAG [36] is a dataset designed to evaluate
RAG capabilities, requiring the RAG methods to gather information from multiple sources and
generate responses, which aligns with the file functionality logic of AutoAgent. We evaluate using
two metrics: Accuracy (Acc) measures response consistency with expected answers (e.g., “ChatGPT”
or “OpenAI’s ChatGPT” are both correct for “Which AI tool reached 100M daily users in March?”).
Error (Err) counts confident but incorrect responses (e.g., answering “Bard” to the above query).
Baseline Methods. The baselines represent a diverse range of LLM-based RAG techniques. The
chunk methods, such as NaiveRAG [37] and HyDE [38], utilize the original text segmentation. The
graph methods, including MiniRAG [39] and LightRAG [40], manage files as sophisticated graphs.
In contrast, Langchain’s Agentic RAG [1] innovatively accesses files through intelligent software
agents. These baselines cover a wide array of strategies for leveraging large language models to
retrieve and generate robust responses.
Implementation Details. We used gpt-4o-mini [41] as the LLM and text-embedding-3-small for
embeddings. We followed MultiHopRAG [36] for text chunking, with 256-token chunks and top-6
retrieval. This leverages the gpt-4o-mini’s language abilities while text-embedding-3-small provides
retrieval, with MultiHopRAG’s chunking managing information effectively.
Evaluation Results and Analysis. Table 2: Evaluation of AutoAgent and baselines for RAG.
We summarize the key observations
Method acc err acc err
from the results (Table 2).
NaiveRAG HyDE
Chunk-Based
• Superior Performance of AutoA- 53.36% 12.28% 56.59% 16.55%
gent. The results clearly demonstrate MiniRAG LightRAG
Graph-Based
the superior performance of our pro- 57.81% 34.78% 58.18% 35.40%
posed AutoAgent model compared to Langchain AutoAgent
Agent-Based
other baselines on the Multihop-RAG 62.83% 20.50% 73.51% 14.20%
task. By leveraging a more flexible
and adaptive agent-based framework, AutoAgent is able to dynamically orchestrate the retrieval and
reasoning process, outperforming even other agentic approaches.
• AutoAgent vs. LangChain. Our method significantly outperforms LangChain, which is also an
agentic RAG. This is due to AutoAgent’s more flexible framework, where agents do not need to rely
on predefined workflows and tools to execute file search tasks. The proposed model can orchestrate
workflows on the fly during the search process, leading to more efficient and accurate results.
This section thoroughly explores the capabilities of the AutoAgent framework in generating agents
and workflows based on even vague, natural language inputs across various scenarios. To illustrate
the breadth of AutoAgent’s abilities, we will examine its performance on tasks of varying difficulty -
from the creation of a single agent to the orchestration of multiple, coordinated agents.
9
Task with Single Agent. AutoAgent can create tools for third-party APIs (RapidAPI, Hugging Face).
We demonstrated this by generating a DaVinci Agent for image creation and refinement. This shows
AutoAgentś capability to build task-specific agents from natural language.
I want to create a ‘DaVinci Agent’ that can help me to generate the image with
natural language. it can:
1. generate the image with natural language and save it to the specified path on
the local machine using the HF model ’Sana_600M_1024px_diffusers’
2. evaluate the image using ‘visual_question_answering‘ tool according to the given
image.
3. interatively refine generated image based on the evaluation result.
Automated Agent Creation and Execution. AutoAgent begins generating an XML table from the
natural language requirements, using existing tools and agents. This structured form is then passed
to the Tool Editor Agent, which creates the necessary generate_image and refine_image tools.
The Agent Editor Agent composes the DaVinci Agent by integrating the new tools with an existing
visual_question_answering tool. This agent is executed, generating and storing several logo
designs, as shown in Fig 3. Due to limited local resources, a smaller model was used, yet the agent
successfully completed the task. This demonstrates AutoAgent’s seamless creation of the tailored
agent for complex, open-ended design challenges.
Task with Multi-Agents. To further validate AutoAgent’s capability to generate agents and integrate
third-party tools, we tasked it with creating a Financial Agent based on the following requirements:
I want to create ‘Financial Agent‘ that can help me to do two kinds of tasks:
1. Manage the private financial docs. I have a folder that contain the financial
docs in my local machine, and I want to help me to manage them.
2. Search the financial information online. You may help me to:
- get balance sheets for a given ticker over a given period.
- get cash flow statements for a given ticker over a given period.
- get income statements for a given ticker over a given period.
Building a Comprehensive Financial Agent. The Agent Form Agent automatically recognized
the need to create two distinct agents to handle the given tasks: the Document Manager Agent
and the Market Research Agent. The specific XML representation of this multi-agent structure
is shown in List 18 in the Appendix. After successfully validating the structured requirements, the
Tool Editor Agent proceeded to create the necessary tools: get_balance_sheet, get_cash_flow,
get_income_statement, and analyze_financial_data. With the tools in place, the Agent Edi-
tor Agent composed the Document Manager Agent and the Market Research Agent, and established
a Financial Analysis Orchestrator to coordinate them.
The Financial Analysis Orchestrator skillfully leveraged the innovative new tools and the organiza-
tion’s existing robust capabilities to comprehensively conduct thorough research and detailed analysis
on the critical local documents as well as valuable external data sources. This rigorous process
culminated in the production of a meticulously crafted, comprehensive research report, as showcased
in List 19 within the Appendix. The comprehensive agent trajectory for this intricate and complex
requirement is presented in illuminating detail in Tab 6 within the Appendix. This detailed account
reveals that while the Orchestrator encountered a SyntaxError during the initial creation phase, the
resilient and adaptable Agent Editor component was able to adeptly self-debug and successfully
complete the task, thereby strikingly demonstrating the robustness of AutoAgentsystem.
Workflow Generation. Scaling Test-Time Compute has been validated as a superior approach for
solving reasoning problems. However, manually constructing workflows poses a high barrier to entry.
We aim to explore whether AutoAgent’s automatic creation of agents and workflows can bridge
the gap between the idea of Test-Time Compute and the implementation of workflows. Taking the
majority voting method with multiple models as an example:
I want to create a workflow that can help me to solving the math problem.
The workflow should:
1. Parallelize solving the math problem with the same ‘Math Solver Agent‘ using
different language models (‘gpt-4o‘, ‘claude-3-5-sonnet‘, ‘deepseek-chat‘)
2. Aggregate the results from the ‘Math Solver Agent‘ and return the final result
using majority voting.
10
Table 3: Comparison between single LLMs and the AI-generated Majority Voting workflow.
Models gpt-4o claude-3.5-sonnet deepseek-v3 Majority Voting Workflow
0806 1022 (3 models)
Potential Test-Time Scaling Law. Upon receiving the requirements, the Workflow Form Agent
generated an XML-formatted workflow table (List 20). This table includes two new agents: Math
Solver Agent and Vote Aggregator Agent. After validation, the Agent Editor Agent created
agents. The Workflow Editor Agent then constructed a new workflow based on the form and
conducted tests. To validate the workflow’s practicality, we performed comparative experiments on
the MATH-500 dataset [42] using 3 LLMs (gpt-4o-20240806, claude-3.5-sonnet-20241022,
deepseek-v3) and a Majority Voting workflow. As shown in Tab 3, the generated workflow
performs significantly better than state-of-the-art baselines. We selected cases from deepseek-v3
(Tab 5) where AutoAgent’s workflow effectively corrected errors through multi-model collaboration,
demonstrating its potential to establish scaling laws in LLM agents.
5 Conclusion
The AutoAgent framework marks a significant advancement in democratizing LLM-powered agent
technology, making it accessible to the non-programming majority. By bridging high-level natural
language requirements with the practical implementation of multi-agent systems and workflows,
MetaChain empowers users to create, customize, and deploy agents, tools, and workflows without
requiring substantial technical expertise. Its modular architecture, versatile Agentic System Utilities,
and LLM-powered Actionable Engine work together to enable seamless automation of agent devel-
opment and task execution. Unique features such as the Self-Organizing File System and Self-Play
Agent Customization further enhance AutoAgent’s capabilities, allowing for dynamic agent evolution
and task-specific optimization. Extensive evaluations demonstrate AutoAgent’s superior performance,
highlighting its transformative potential in making LLM capabilities accessible to a broad user base.
11
References
[1] LangChain. Langchain: Build context-aware reasoning applications. https://github.
com/langchain-ai/langchain, 2023. URL https://github.com/langchain-ai/
langchain.
[2] Significant-Gravitas. Autogpt. https://github.com/Significant-Gravitas/AutoGPT,
2023.
[3] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li,
Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: Enabling next-gen LLM applications via
multi-agent conversation framework. CoRR, abs/2308.08155, 2023.
[4] Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL:
communicative agents for "mind" exploration of large language model society. In NeurIPS,
2023.
[5] Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao
Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng
Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta programming for a multi-agent
collaborative framework. In The Twelfth International Conference on Learning Representations,
2024. URL https://openreview.net/forum?id=VtmBAGCN7o.
[6] Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua,
Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming
Xiong, and Tao Yu. Openagents: An open platform for language agents in the wild, 2023.
[7] Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan. τ -bench: A benchmark for
tool-agent-user interaction in real-world domains. CoRR, abs/2406.12045, 2024.
[8] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan
Cao. React: Synergizing reasoning and acting in language models. In ICLR. OpenReview.net,
2023.
[9] Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Erkang Zhu,
Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, Peter Chang,
Ricky Loynd, Robert West, Victor Dibia, Ahmed Awadallah, Ece Kamar, Rafah Hosn, and
Saleema Amershi. Magentic-one: A generalist multi-agent system for solving complex tasks.
CoRR, abs/2411.04468, 2024.
[10] Anthropic. Building effective agents. https://www.anthropic.com/research/
building-effective-agents, 2024. URL https://www.anthropic.com/research/
building-effective-agents.
[11] Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen
Schmidhuber. Gptswarm: Language agents as optimizable graphs. In Forty-first International
Conference on Machine Learning.
[12] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha
Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language
models. In ICLR. OpenReview.net, 2023.
[13] Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen
Han, Sean Hendryx, Summer Yue, and Hugh Zhang. Planning in natural language improves
LLM search for code generation. CoRR, abs/2409.03733, 2024.
[14] Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang,
Zihan Wang, Yuxiao Dong, Ming Ding, and Jie Tang. Cogagent: A visual language model for
GUI agents. In CVPR, pages 14281–14290. IEEE, 2024.
[15] Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao
Yu, and Caiming Xiong. Aguvis: Unified pure vision agents for autonomous gui interaction.
2024. URL https://arxiv.org/abs/2412.04454.
12
[16] Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi
Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang
Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin,
Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. OpenHands: An Open Platform for
AI Software Developers as Generalist Agents, 2024. URL https://arxiv.org/abs/2407.
16741.
[17] John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R
Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated soft-
ware engineering. In The Thirty-eighth Annual Conference on Neural Information Processing
Systems, 2024. URL https://arxiv.org/abs/2405.15793.
[18] Alexandre Drouin, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom
Marty, David Vazquez, Nicolas Chapados, and Alexandre Lacoste. WorkArena: How capable
are web agents at solving common knowledge work tasks? In Ruslan Salakhutdinov, Zico
Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp,
editors, Proceedings of the 41st International Conference on Machine Learning, volume 235
of Proceedings of Machine Learning Research, pages 11642–11662. PMLR, 21–27 Jul 2024.
URL https://proceedings.mlr.press/v235/drouin24a.html.
[19] Yueqi Song, Frank Xu, Shuyan Zhou, and Graham Neubig. Beyond browsing: Api-based
web agents. arXiv preprint arXiv:2410.16464, 2024. URL https://arxiv.org/abs/2410.
16464.
[20] Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem
Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, et al. Embodied agent interface: Benchmarking
llms for embodied decision making. In NeurIPS 2024, 2024.
[21] OpenAI. Educational framework exploring ergonomic, lightweight multi-agent orchestra-
tion. https://github.com/openai/swarm, 2024. URL https://github.com/openai/
swarm.
[22] E2B. Secure open source cloud runtime for ai apps & ai agents. https://github.com/
e2b-dev/e2b, 2024. URL https://github.com/e2b-dev/e2b.
[23] BerriAI. Litellm: Proxy server (llm gateway) to call 100+ llm apis in openai format. https://
github.com/BerriAI/litellm, 2024. URL https://github.com/BerriAI/litellm.
[24] Composio. Composio: Production ready toolset for ai agents. https://github.com/
ComposioHQ/composio, 2024. URL https://github.com/ComposioHQ/composio.
[25] Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen
Schmidhuber. Gptswarm: Language agents as optimizable graphs. In ICML. OpenReview.net,
2024.
[26] LangChain. Langgraph: sbuild resilient language agents as graphs. https://github.
com/langchain-ai/langgraph, 2024. URL https://github.com/langchain-ai/
langgraph.
[27] Shengran Hu, Cong Lu, and Jeff Clune. Automated design of agentic systems. CoRR,
abs/2408.08435, 2024.
[28] Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen
Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin
Wu. Aflow: Automating agentic workflow generation. CoRR, abs/2410.10762, 2024.
[29] Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom.
GAIA: a benchmark for general AI assistants. In ICLR. OpenReview.net, 2024.
[30] Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu
Yao, Tao Yu, and Lingpeng Kong. Os-copilot: Towards generalist computer agents with
self-improvement. arXiv preprint arXiv:2402.07456, 2024.
13
[31] Microsoft. Multi-agent experiment v0.1 msr ai frontiers (autogen team members). https:
//aka.ms/gaia_multiagent_v01_march_1st, 2024. URL https://aka.ms/gaia_
multiagent_v01_march_1st.
[32] HuggingFace. Transformers documentation: Agents and toos. https://huggingface.
co/docs/transformers/agents, 2024. URL https://huggingface.co/docs/
transformers/agents.
[33] Google. Langfun: Oo for llms. https://github.com/google/langfun, 2024. URL
https://github.com/google/langfun.
[34] Trase. Meet trase systems, the ai agent platform. https://www.trasesystems.com/, 2024.
URL https://www.trasesystems.com/. Accessed: 2025-01-15.
[35] H2O.ai. Autonomous agentic ai: execute multi-step workflows autonomously. https://
h2o.ai/platform/enterprise-h2ogpte/#AgenticAI, 2024. URL https://h2o.ai/
platform/enterprise-h2ogpte/#AgenticAI.
[36] Yixuan Tang and Yi Yang. Multihop-rag: Benchmarking retrieval-augmented generation for
multi-hop queries. CoRR, abs/2401.15391, 2024.
[37] Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and
Weizhu Chen. Generation-augmented retrieval for open-domain question answering. arXiv
preprint arXiv:2009.08553, 2020.
[38] Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan. Precise zero-shot dense retrieval
without relevance labels. arXiv preprint arXiv:2212.10496, 2022.
[39] Tianyu Fan, Jingyuan Wang, Xubin Ren, and Chao Huang. Minirag: Towards extremely simple
retrieval-augmented generation. arXiv preprint arXiv:2501.06713, 2025.
[40] Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast
retrieval-augmented generation. 2024.
[41] OpenAI. Gpt-4 technical report, 2023.
[42] Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee,
Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. In ICLR.
OpenReview.net, 2024.
14
6 Appendix
In the supplementary materials, we provide a detailed technical description of the ’Agentic System
Utilities’ implementation within our AutoAgentframework.
To empower our diverse array of system-level agents, we have carefully curated and predefined seven
distinct categories of powerful tools. These tools span a wide range of functionalities, including
coding, web browsing, file management, creating new tools, agents, and workflows, as well as natural
language question answering for documents. The detailed names and comprehensive descriptions of
these versatile tools are presented in Table 4.
15
terminal_page_up Coding Scroll the viewport UP one page-length in the
current terminal. Use this function when the
terminal is too long and you want to scroll up
to see the previous content.
input_text Web Types the given text value into the specified
field.
click Web Clicks the mouse on the target with the given
element bid.
page_down Web Scrolls the entire browser viewport one page
DOWN towards the end.
page_up Web Scrolls the entire browser viewport one page
UP towards the beginning.
history_back Web Navigates back one page in the browser’s his-
tory. This is equivalent to clicking the browser
back button.
history_forward Web Navigates forward one page in the browser’s
history. This is equivalent to clicking the
browser forward button.
visit_url Web Navigate directly to a provided URL using the
browser’s address bar. Prefer this tool over
other navigation techniques in cases where
the user provides a fully-qualified URL (e.g.,
choose it over clicking links, or inputing
queries into search boxes).
web_search Web Performs a web search on ‘https://www.
google.com.sg/?hl=en&gl=US’ with the
given query.
sleep Web Wait a short period of time. Call this function
if the page has not yet fully loaded, or if it is
determined that a small delay would increase
the task’s chances of success.
get_page_markdown Web Get the markdown content of the current page.
Use this tool if you need to watch the Youtube
video, Wikipedia page, or other pages that con-
tain media content. Note that this tool can only
be used after you have visited a valid page.
open_local_file File Open a local file at a path in the text-based
browser and return current viewport content.
page_up_markdown File Scroll the viewport UP one page-length in the
current file and return the new viewport con-
tent.
page_down_markdown File Scroll the viewport DOWN one page-length
in the current file and return the new viewport
content.
find_next File Scroll the viewport to next occurrence of the
search string.
visual_question_answering File This tool is used to answer questions about
attached images or videos.
find_on_page_ctrl_f File Scroll the viewport to the first occurrence of
the search string. This is equivalent to Ctrl+F.
16
list_tools Tools Edit List all plugin tools in the MetaChain.
create_tool Tools Edit Create a plugin tool.
delete_tool Tools Edit Delete a plugin tool.
run_tool Tools Edit Run a tool with the given code.
search_trending_models_ Tools Edit Search trending models on Hugging Face. Use
on_huggingface this tool when you want to create a tool
that uses Hugging Face models, only sup-
port the following tags: [’audio-text-to-text’,
’text-to-image’, ’image-to-image’, ’image-
to-video’, ’text-to-video’, ’text-to-speech’,
’text-to-audio’, ’automatic-speech-recognition’,
’audio-to-audio’].
get_hf_model_tools_doc Tools Edit Get the detailed information of a model on Hug-
ging Face, such as the detailed usage of the
model containing the model’s README.md.
You should use this tool after you have used
‘search_trending_models_on_huggingface‘ to
find the model you want to use.
get_api_plugin_tools_doc Tools Edit Retrieve satisfied tool documents based on the
query text.
list_agents Agents List all plugin agents in the MetaChain.
Edit
read_agents Agents Get detailed information of plugin agents in the
Edit MetaChain.
delete_agent Agents Delete a plugin agent.
Edit
run_agent Agents Run a plugin agent.
Edit
create_agent Agents Use this tool to create a new agent or modify
Edit an existing agent.
create_orchestrator_agent Agents Use this tool to create a orchestrator agent for
Edit the given sub-agents. You MUST use this tool
when you need to create TWO or MORE agents
and regard them as a whole to complete a task.
create_workflow Workflows Create a workflow.
Edit
list_workflows Workflows List all workflows in the MetaChain.
Edit
run_workflow Workflows Run a workflow.
Edit
save_raw_docs_to_vector_db RAG Save the raw documents to the vector database.
The documents could be: - ANY text document
with the extension of pdf, docx, txt, etcs. - A
zip file containing multiple text documents - a
directory containing multiple text documents
All documents will be converted to raw text
format and saved to the vector database in the
chunks of 4096 tokens.
17
query_db RAG Retrieve information from the database. Use
this function when you need to search for infor-
mation in the database.
modify_query RAG Modify the query based on what you know.
Use this function when you need to modify the
query to search for more relevant information.
answer_query RAG Answer the user query based on the supporting
documents.
can_answer RAG Check if you have enough information to an-
swer the user query.
The specific tools and system prompt for implementing the Web Agent are as follows:
The Local File Agent is equipped with a tailored set of tools and system prompts to enable it to
efficiently manage and interact with files and directories. This specialized toolkit includes:
You can only access the files in the folder ‘/workplace‘ and when you want to
,→ open a file, you should use absolute path from root like ‘/workplace/...‘.
18
Note that ‘open_local_file‘ can read a file as markdown text and ask questions
,→ about it. And ‘open_local_file‘ can handle the following file extensions:
,→ [".html", ".htm", ".xlsx", ".pptx", ".wav", ".mp3", ".flac", ".pdf",
,→ ".docx"], and all other types of text files.
If the converted markdown text has more than 1 page, you can use ‘page_up‘,
,→ ‘page_down‘, ‘find_on_page_ctrl_f‘, ‘find_next‘ to navigate through the
,→ pages.
When you think you have completed the task the ‘System Orchestrate Agent‘ asked
,→ you to do, you should use ‘transfer_back_to_orchestrate_agent‘ to transfer
,→ the conversation back to the ‘System Orchestrate Agent‘. And you should not
,→ stop to try to solve the user’s request by transferring to ‘System
,→ Orchestrate Agent‘ only until the task is completed.
If you are unable to open the file, you can transfer the conversation back to
,→ the ‘System Orchestrate Agent‘, and let the ‘Coding Agent‘ try to solve the
,→ problem by coding.
The specific tools and system prompts for implementing the Coding Agent are as follows:
Listing 5: Tools of Coding Agent
[gen_code_tree_structure, execute_command, read_file, create_file, write_file,
list_files, create_directory, run_python, terminal_page_up, terminal_page_down,
terminal_page_to, transfer_back_to_orchestrate_agent]
19
12. If the terminal output is too long, you should use ‘terminal_page_up‘ to
,→ move the viewport up, ‘terminal_page_down‘ to move the viewport down,
,→ ‘terminal_page_to‘ to move the viewport to the specific page of terminal
,→ where the meaningful content is.
Note that you can use this agent to make complex computation, write a api
,→ request, and anything else that can be done by writing code.
When you think you have completed the task the ‘System Orchestrate Agent‘ asked
,→ you to do, you should use ‘transfer_back_to_orchestrate_agent‘ to transfer
,→ the conversation back to the ‘System Orchestrate Agent‘. And you should not
,→ stop to try to solve the user’s request by transferring to ‘System
,→ Orchestrate Agent‘ only until the task is completed.
[IMPORTANT] You can only complete the task by coding. Talk is cheap, show me
,→ the code with tools.
The specific tools and system prompt for implementing the Orchestrator Agent are as follows:
Listing 7: Tools of Orchestrator Agent
[transfer_to_local_file_agent, transfer_to_web_agent, transfer_to_coding_agent]
20
9: AgentProfile = agent_profile_agent(R, ParsingResultsi−1 |A, π)
10: ParsingResultsi = form_parsing_function(AgentProfile|R, A, π)
11: end if
12: end for
13: if There are new tools need to be created in AgentProfile then
14: // Automatically create tools and test them.
15: ToolsResults0 = tool_editor_agent(AgentProfile|R, A, π)
16: for i = 1 to M − 1 do
17: if ToolsResultsi−1 is ‘Success’ then
18: break
19: else
20: ToolsResultsi = tool_editor_agent(AgentProfile, ToolsResultsi−1 |R, A, π)
21: end if
22: end for
23: end if
24: // Automatically create agents and run them on the given task.
25: AgentsResults0 = agent_editor_agent(AgentProfile, T |R, A, π)
26: for i = 1 to M − 1 do
27: if AgentsResultsi−1 is ‘Success’ then
28: break
29: else
30: AgentsResultsi = agent_editor_agent(AgentProfile, T , AgentsResultsi−1 |R, A, π)
31: end if
32: end for
Your task is to analyze user requests and generate structured creation forms
,→ for either single or multi-agent systems.
21
* <key>: Identifier for what this agent produces
* <description>: Detailed explanation of the output format
IMPORTANT RULES:
- For single agent systems:
* system_input/output must match agent_input/output exactly
- For multi-agent systems:
* system_input should describe the complete input space
* Each agent_input should specify which subset of the system_input it handles
* system_output should represent the unified response format
User: I want to build an agent that can answer the user’s question about the
,→ OpenAI products. The document of the OpenAI products is available at
,→ ‘/workspace/docs/openai_products/‘.
The agent should be able to:
1. query and answer the user’s question about the OpenAI products based on the
,→ document.
2. send email to the user if the sending email is required in the user’s
,→ request.
22
- ANY text document with the extension of pdf, docx, txt, etcs.
- A zip file containing multiple text documents
- a directory containing multiple text documents
All documents will be converted to raw text format and saved to
,→ the vector database in the chunks of 4096 tokens.</description>
</tool>
<tool>
<name>query_db</name>
<description>Query the vector database to find the answer to the
,→ user’s question.</description>
</tool>
<tool>
<name>modify_query</name>
<description>Modify the user’s question to a more specific
,→ question.</description>
</tool>
<tool>
<name>answer_query</name>
<description>Answer the user’s question based on the answer from
,→ the vector database.</description>
</tool>
<tool>
<name>can_answer</name>
<description>Check if the user’s question can be answered by the
,→ vector database.</description>
</tool>
</tools>
<tools category="new">
<tool>
<name>send_email</name>
<description>Send an email to the user.</description>
</tool>
</tools>
<agent_input>
<key>user_question</key>
<description>The question from the user about the OpenAI
,→ products.</description>
</agent_input>
<agent_output>
<key>answer</key>
<description>The answer to the user’s question.</description>
</agent_output>
</agent>
</agents>
EXAMPLE 2 - MULTI-AGENT:
User: I want to build a multi-agent system that can handle two types of
,→ requests for the specific user:
1. Purchase a product or service
2. Refund a product or service
The specific user worked for is named John Doe.
23
</system_output>
<global_variables>
<variable>
<key>user_name</key>
<description>The name of the user.</description>
<value>John Doe</value>
</variable>
</global_variables>
<agent>
<name>Personal Sales Agent</name>
<description>The personal sales agent is an agent that serves as a
,→ personal sales agent for a specific user.</description>
<instructions>You are a personal sales agent that can be used to help
,→ the user {user_name} with their request.</instructions>
<tools category="new">
<tool>
<name>recommend_product</name>
<description>Recommend a product to the user.</description>
</tool>
<tool>
<name>recommend_service</name>
<description>Recommend a service to the user.</description>
</tool>
<tool>
<name>conduct_sales</name>
<description>Conduct sales with the user.</description>
</tool>
</tools>
<agent_input>
<key>user_request</key>
<description>Request from the specific user for purchasing a product
,→ or service.</description>
</agent_input>
<agent_output>
<key>response</key>
<description>The response of the agent to the user’s
,→ request.</description>
</agent_output>
</agent>
<agent>
<name>Personal Refunds Agent</name>
<description>The personal refunds agent is an agent that serves as a
,→ personal refunds agent for a specific user.</description>
<instructions>Help the user {user_name} with a refund. If the reason is
,→ that it was too expensive, offer the user a discount. If they insist,
,→ then process the refund.</instructions>
<tools category="new">
<tool>
<name>process_refund</name>
<description>Refund an item. Refund an item. Make sure you have
,→ the item_id of the form item_... Ask for user confirmation
,→ before processing the refund.</description>
</tool>
<tool>
<name>apply_discount</name>
<description>Apply a discount to the user’s cart.</description>
</tool>
</tools>
<agent_input>
<key>user_request</key>
<description>Request from the specific user for refunding a product
,→ or service.</description>
</agent_input>
<agent_output>
<key>response</key>
24
<description>The response of the agent to the user’s
,→ request.</description>
</agent_output>
</agent>
</agents>
GUIDELINES:
1. Each agent must have clear, focused responsibilities
2. Tool selections should be minimal but sufficient
3. Instructions should be specific and actionable
4. Input/Output definitions must be precise
5. Use global_variables for shared context across agents
Follow these examples and guidelines to create appropriate agent forms based on
,→ user requirements.
25
- Chain with other tools as needed
[AVAILABLE TOOLS]
1. get_api_plugin_tools_doc:
- PRIMARY tool for third-party API integration
- MUST be used FIRST for Finance, Entertainment, eCommerce, etc.
- Provides API documentation AND authentication keys
- API keys should be embedded in tool implementation
2. search_trending_models_on_huggingface:
- Use for finding models for media transformation tasks
- Supported tags: [’text-to-image’, ’image-to-image’, ’text-to-video’, etc.]
- Use AFTER checking no suitable API exists via ‘get_api_plugin_tools_doc‘
3. get_hf_model_tools_doc:
- Get the detailed information of a model on Hugging Face, such as the
,→ detailed usage of the model containing the model’s README.md.
- You should use this tool after you have used
,→ ‘search_trending_models_on_huggingface‘ to find the model you want to use.
[CRITICAL RULES]
1. Tool Creation Priority:
- FIRST: Check existing tools via list_tools()
- SECOND: Use ‘get_api_plugin_tools_doc‘ for API-based tools
- THIRD: Use ‘search_trending_models_on_huggingface‘ for media tasks
- LAST: Use internal knowledge if no other options available
2. API Implementation:
- NEVER expose API keys as parameters
- ALWAYS embed API keys in function body
- Get keys from ‘get_api_plugin_tools_doc‘
3. Tool Design:
- Tools MUST be abstract, modular, and reusable:
- Use generic function names (e.g., ‘download_media‘ instead of
,→ ‘download_youtube_video‘)
26
- Break complex tasks into smaller, reusable components
- Avoid task-specific implementations
- Use parameters instead of hardcoded values
- Include proper error handling
[TESTING]
Test new tools using ‘run_tool‘:
‘run_tool(tool_name="your_tool", run_code="from metachain.tools import
,→ your_tool; print(your_tool(param1=’value1’))")‘
CORE RESPONSIBILITIES:
1. Parse and implement agent forms
2. Create and manage individual agents
3. Orchestrate multi-agent systems
4. Handle dependencies and system requirements
AVAILABLE FUNCTIONS:
1. Agent Management:
- ‘create_agent‘: Create new agents or update existing ones strictly
,→ following the given agent form.
- ‘read_agent‘: Retrieve existing agent definitions. Note that if you want to
,→ use ‘create_agent‘ to update an existing agent, you MUST use the
,→ ‘read_agent‘ function to get the definition of the agent first.
- ‘delete_agent‘: Remove unnecessary agents.
- ‘list_agents‘: Display all available agents and their information.
- ‘create_orchestrator_agent‘: Create orchestrator for multi-agent systems.
,→ If the request is to create MORE THAN ONE agent, after you create ALL
,→ required agents, you MUST use the ‘create_orchestrator_agent‘ function to
,→ create an orchestrator agent that can orchestrate the workflow of the
,→ agents. And then use the ‘run_agent‘ function to run the orchestrator
,→ agent to complete the user task.
2. Execution:
- run_agent: Execute agent to complete the user task. The agent could be a
,→ single agent (single agent form) or an orchestrator agent (multi-agent
,→ form).
- execute_command: Handle system dependencies and requirements
- terminal_page_down: Move the terminal page down when the terminal output is
,→ too long.
- terminal_page_up: Move the terminal page up when the terminal output is too
,→ long.
- terminal_page_to: Move the terminal page to the specific page when the
,→ terminal output is too long, and you want to move to the specific page
,→ with the meaningful content.
WORKFLOW GUIDELINES:
27
- Monitor and handle any errors
2. Multi-Agent Implementation:
- Create all required agents individually using ‘create_agent‘
- MUST create an orchestrator agent using ‘create_orchestrator_agent‘
- Execute task through the ‘run_agent‘ function to execute the created
,→ orchestrator agent
- Monitor system performance
3. Error Handling:
- Check for missing dependencies using ‘execute_command‘
- Install required packages using execute_command
- Validate agent creation and execution
- Report any issues clearly
BEST PRACTICES:
1. Always verify existing agents using ‘read_agent‘ before updates
2. Create orchestrator agents for ANY multi-agent scenario using
,→ ‘create_orchestrator_agent‘
3. Handle dependencies proactively using ‘execute_command‘
4. Maintain clear documentation of created agents
5. Follow the exact specifications from the agent form XML
Remember: Your success is measured by both the accurate creation of agents and
,→ their effective execution of the given tasks.
28
Your task is to analyze user requests and generate structured creation forms
,→ for workflows consisting of multiple agents.
2. <name> - The name of the workflow. It should be a single word with ’_’ as
,→ the separator, and as unique as possible to describe the speciality of the
,→ workflow.
29
- GOTO: The event is not successful, and the workflow will wait for the
,→ next event. Value is the name of the event to go to. The event go to
,→ should NOT listen to this event.
- key: Output identifier (be a single word with ’_’ as the separator)
- description: Output explanation
- condition: when the output occurs, the action will be executed
* Can have single or multiple outputs:
- For single output (simple flow):
‘‘‘xml
<outputs>
<output>
<key>result_key</key>
<description>Description of the result</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
‘‘‘
- For multiple outputs (conditional flow):
‘‘‘xml
<outputs>
<output>
<key>success_result</key>
<description>Output when condition A is met</description>
<condition>When condition A is true</condition>
<action>
<type>RESULT</type>
</action>
</output>
<output>
<key>should_repeat</key>
<description>Output when condition B is met</description>
<condition>When condition B is true</condition>
<action>
<type>GOTO</type>
<value>target_event</value>
</action>
</output>
<output>
<key>failure_result</key>
<description>Output when condition C is met</description>
<condition>When condition C is true</condition>
<action>
<type>ABORT</type>
</action>
</output>
</outputs>
‘‘‘
- listen: Which events trigger this one.
- agent: Which agent handles this event. Every agent has the name of the
,→ agent, and the exact model of the agent (like ‘claude-3-5-sonnet-20241022‘
,→ or others)
IMPORTANT RULES:
0. The ‘on_start‘ event is a special event that:
- Must be the first event in the workflow
- Has inputs that match the system_input
- Has outputs that match the system_input (just pass through)
- Does not have an agent
- Does not have a task
- Does not have listen elements
Example:
‘‘‘xml
30
<event>
<name>on_start</name>
<inputs>
<input>
<key>user_topic</key>
<description>The user’s topic that user wants to write a
,→ wikipiead-like article about.</description>
</input>
</inputs>
<outputs>
<output>
<key>user_topic</key>
<description>The user’s topic that user wants to write a
,→ wikipiead-like article about.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
</event>
‘‘‘
The name of existing workflows: [...]. The name of the new workflow you are
,→ creating should be DIFFERENT from these names according to the speciality of
,→ the workflow.
31
</output>
<output>
<key>negative_case</key>
<description>Handle the negative case</description>
<condition>If data does not meet criteria A</condition>
<action>
<type>ABORT</type>
</action>
</output>
</outputs>
</event>
‘‘‘
<event>
<name>financial_analysis</name>
<listen>
<event>initial_analysis</event>
</listen>
<outputs>
<output>
<key>financial_result</key>
<description>Financial analysis result</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
</event>
32
<inputs>
<input>
<key>technical_result</key>
<description>The technical analysis result.</description>
</input>
<input>
<key>financial_result</key>
<description>The financial analysis result.</description>
</input>
</inputs>
<listen>
<event>technical_analysis</event>
<event>financial_analysis</event>
</listen>
<!-- This event will only execute when ALL listened events complete -->
</event>
‘‘‘
<event>
<name>evaluate_content</name>
<listen>
<event>generate_content</event>
</listen>
<task>Evaluate the quality of generated content</task>
<outputs>
<output>
<key>approved</key>
<description>Content meets quality standards</description>
<condition>If quality score >= threshold</condition>
<action>
<type>RESULT</type>
</action>
</output>
<output>
<key>needs_improvement</key>
<description>Content needs improvement</description>
<condition>If quality score < threshold</condition>
<action>
<type>GOTO</type>
<value>generate_content</value>
</action>
</output>
</outputs>
</event>
‘‘‘
33
1. If-Else Pattern:
- Use mutually exclusive conditions
- You can NOT place MORE THAN ONE OUTPUT with RESULT type
- Outputs determine which branch executes
2. Parallelization Pattern:
- Multiple events can listen to the same parent event
- Aggregator event must list ALL parallel events in its listen section
- All parallel events must complete before aggregator executes
- Model of agents in every parallel event could be different
3. Evaluator-Optimizer Pattern:
- Use GOTO action for iteration
- Include clear evaluation criteria in conditions
- Have both success and retry paths
- Consider adding maximum iteration limit in global_variables
EXAMPLE:
<events>
<event>
<name>on_start</name>
34
<inputs>
<input>
<key>user_topic</key>
<description>The user’s topic that user wants to write a
,→ wikipiead-like article about.</description>
</input>
</inputs>
<outputs>
<output>
<key>user_topic</key>
<description>The user’s topic that user wants to write a
,→ wikipiead-like article about.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
</event>
<event>
<name>on_search</name>
<inputs>
<input>
<key>user_topic</key>
<description>The user’s topic that user wants to write a
,→ wikipiead-like article about.</description>
</input>
</inputs>
<task>
search the information about the topic and return the result.
</task>
<outputs>
<output>
<key>search_result</key>
<description>The search result of the user’s
,→ topic.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>on_start</event>
</listen>
<agent>
<name>Web Surfer Agent</name>
<model>claude-3-5-sonnet-20241022</model>
</agent>
</event>
<event>
<name>on_outline</name>
<inputs>
<input>
<key>search_result</key>
<description>The search result of the user’s
,→ topic.</description>
</input>
</inputs>
<task>
write an outline for the user’s topic.
</task>
<outputs>
<output>
<key>outline</key>
<description>The outline of the user’s topic.</description>
<action>
35
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>on_start</event>
</listen>
<agent>
<name>Outline Agent</name>
<model>claude-3-5-sonnet-20241022</model>
</agent>
</event>
<event>
<name>on_evaluate</name>
<inputs>
<input>
<key>outline</key>
<description>The outline of the user’s topic.</description>
</input>
</inputs>
<task>
evaluate the outline of the user’s topic.
</task>
<outputs>
<output>
<key>positive_feedback</key>
<description>The positive feedback of the outline of the
,→ user’s topic.</description>
<condition>
If the outline is good enough, give positive feedback.
</condition>
<action>
<type>RESULT</type>
</action>
</output>
<output>
<key>negative_feedback</key>
<description>The negative feedback of the outline of the
,→ user’s topic.</description>
<condition>
If the outline is not good enough, give negative feedback.
</condition>
<action>
<type>GOTO</type>
<value>on_outline</value>
</action>
</output>
</outputs>
<listen>
<event>on_outline</event>
</listen>
<agent>
<name>Evaluator Agent</name>
<model>claude-3-5-sonnet-20241022</model>
</agent>
</event>
<event>
<name>on_write</name>
<inputs>
<input>
<key>outline</key>
<description>The outline of user’s topic.</description>
</input>
</inputs>
<task>
36
write the article for the user’s topic.
</task>
<outputs>
<output>
<key>article</key>
<description>The article of the user’s topic.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>on_evaluate</event>
</listen>
<agent>
<name>Article Writer Agent</name>
<model>claude-3-5-sonnet-20241022</model>
</agent>
</event>
</events>
</workflow>
GUIDELINES:
1. Each event should have clear inputs and outputs
2. Use conditions to handle different outcomes
3. Properly chain events using the listen element
4. Review steps should be included for quality control
5. Action types should be either RESULT or ABORT
Follow these examples and guidelines to create appropriate workflow forms based
,→ on user requirements.
CORE RESPONSIBILITIES:
1. Parse and implement workflow forms
2. Create necessary agents if specified in the workflow
3. Create and manage workflows
4. Execute workflows as needed
AVAILABLE FUNCTIONS:
1. Workflow Management:
- ‘create_workflow‘: Create new workflows based on the workflow form
- ‘run_workflow‘: Execute the created workflow
- ‘list_workflows‘: Display all available workflows
3. System Tools:
37
- ‘execute_command‘: Handle system dependencies
- ‘terminal_page_down‘, ‘terminal_page_up‘, ‘terminal_page_to‘: Navigate
,→ terminal output
3. Create Workflow:
- Use ‘create_workflow‘ to generate the workflow
- Ensure all required agents exist
- Validate workflow structure
4. Execute Workflow:
- Use ‘run_workflow‘ to execute the created workflow
- Monitor execution progress
- Handle any errors appropriately
BEST PRACTICES:
1. Always check if required agents exist before creating new ones
2. Use empty tool list ([]) when no specific tools are mentioned
3. Validate workflow creation before execution
4. Follow the exact specifications from the workflow form XML
5. Handle errors and dependencies appropriately
Remember: Your primary goal is to create and execute workflows according to the
,→ provided workflow forms, creating any necessary agents along the way.
38
<instructions>Use the HF model
’Efficient-Large-Model/Sana_600M_1024px_diffusers’ to
generate images from provided descriptions, evaluate these using visual QA, and
refine based
on feedback.</instructions>
<tools category="existing">
<tool>
<name>visual_question_answering</name>
<description>This tool is used to answer questions about attached
images or
videos.</description>
</tool>
</tools>
<tools category="new">
<tool>
<name>generate_image</name>
<description>Generate an image from a natural language
description and save
it to a specified path using the HF model
’Efficient-Large-Model/Sana_600M_1024px_diffusers’.</description>
</tool>
<tool>
<name>refine_image</name>
<description>Make iterative adjustments to the generated image
based on
evaluation results to meet quality criteria.</description>
</tool>
</tools>
<agent_input>
<key>image_description</key>
<description>A natural language description to generate an
image.</description>
</agent_input>
<agent_output>
<key>image_evaluation</key>
<description>The evaluation of the generated image after
processing.</description>
</agent_output>
</agent>
</agents>
39
Listing 18: AI-generated Creation Profile of Financial Agent
<agents>
<system_input>
Financial management requests, including:
1. Managing private financial documents stored in the ’financial_docs’
folder
2. Retrieving online financial information for specific companies
(balance sheets, cash flow statements, income statements)
</system_input>
<system_output>
<key>financial_response</key>
<description>Detailed response containing either document management
results or requested financial information.</description>
</system_output>
<agent>
<name>Document Manager Agent</name>
<description>Specialized agent for managing and analyzing private
financial documents stored locally.</description>
<instructions>You are responsible for managing financial documents in
the ’financial_docs’ folder. Your tasks include:
1. Organizing and categorizing financial documents
2. Extracting relevant information from documents
3. Providing summaries and analyses of document contents
4. Maintaining document organization and searchability</instructions>
<tools category="existing">
<tool>
<name>save_raw_docs_to_vector_db</name>
<description>Save the financial documents to the vector database
for efficient searching and retrieval.</description>
</tool>
<tool>
<name>query_db</name>
<description>Search through stored financial documents to find
relevant information.</description>
</tool>
<tool>
<name>visual_question_answering</name>
<description>Process and analyze any financial charts, graphs, or
visual data in the documents.</description>
</tool>
</tools>
<agent_input>
<key>doc_request</key>
<description>User request related to managing or querying private
financial documents.</description>
</agent_input>
<agent_output>
<key>doc_response</key>
<description>Results of document management operations or requested
document information.</description>
</agent_output>
</agent>
<agent>
<name>Market Research Agent</name>
<description>Specialized agent for retrieving and analyzing online
financial information for publicly traded companies.</description>
<instructions>You are responsible for retrieving and analyzing financial
information from online sources. Your tasks include:
1. Fetching balance sheets, cash flow statements, and income statements
2. Analyzing financial metrics and trends
3. Providing clear summaries of financial data
4. Ensuring data accuracy and proper citation of sources</instructions>
<tools category="new">
<tool>
40
<name>get_balance_sheet</name>
<description>Retrieve balance sheet data for a specific ticker
over a given time period.</description>
</tool>
<tool>
<name>get_cash_flow</name>
<description>Retrieve cash flow statement data for a specific
ticker over a given time period.</description>
</tool>
<tool>
<name>get_income_statement</name>
<description>Retrieve income statement data for a specific ticker
over a given time period.</description>
</tool>
<tool>
<name>analyze_financial_data</name>
<description>Analyze and summarize financial statements to
provide meaningful insights.</description>
</tool>
</tools>
<agent_input>
<key>market_request</key>
<description>User request for online financial information including
ticker symbol and time period.</description>
</agent_input>
<agent_output>
<key>market_response</key>
<description>Requested financial information and analysis from online
sources.</description>
</agent_output>
</agent>
</agents>
2. Investment Rationale:
41
- Strong emerging market potential
- Value characteristics at current levels
Microsoft:
- Cloud market share gains
- AI revenue monetization
- Gaming division integration
- Enterprise spending trends
- Margin sustainability
- Regulatory developments
Apple:
- Services revenue growth
- iPhone replacement cycles
- Emerging market penetration
- Margin pressure management
- AI/ML integration progress
- Supply chain diversification
5. Investment Approach:
Implementation Strategy:
- Consider dollar-cost averaging for new positions
- Maintain quarterly rebalancing schedule
- Set stop-loss levels based on risk tolerance
- Monitor position sizes relative to overall portfolio
Adjustment Triggers:
- Significant margin deterioration
- Market share losses in key segments
- Major regulatory actions
- Substantial technological disruption
- Material changes in competitive position
42
2. Strong ecosystem advantages
3. Significant services growth potential
4. Value characteristics at current levels
5. Consistent shareholder returns
43
<event>
<name>solve_with_gpt4</name>
<inputs>
<input>
<key>math_problem</key>
<description>The math problem that needs to be
solved.</description>
</input>
</inputs>
<task>Solve the math problem using systematic approach with
GPT-4.</task>
<outputs>
<output>
<key>gpt4_solution</key>
<description>The solution from GPT-4 solver.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>on_start</event>
</listen>
<agent>
<name>Math Solver Agent</name>
<model>gpt-4o-2024-08-06</model>
</agent>
</event>
<event>
<name>solve_with_claude</name>
<inputs>
<input>
<key>math_problem</key>
<description>The math problem that needs to be
solved.</description>
</input>
</inputs>
<task>Solve the math problem using systematic approach with
Claude.</task>
<outputs>
<output>
<key>claude_solution</key>
<description>The solution from Claude solver.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>on_start</event>
</listen>
<agent>
<name>Math Solver Agent</name>
<model>claude-3-5-sonnet-20241022</model>
</agent>
</event>
<event>
<name>solve_with_deepseek</name>
<inputs>
<input>
<key>math_problem</key>
<description>The math problem that needs to be
solved.</description>
</input>
</inputs>
44
<task>Solve the math problem using systematic approach with
DeepSeek.</task>
<outputs>
<output>
<key>deepseek_solution</key>
<description>The solution from DeepSeek solver.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>on_start</event>
</listen>
<agent>
<name>Math Solver Agent</name>
<model>deepseek/deepseek-chat</model>
</agent>
</event>
<event>
<name>aggregate_solutions</name>
<inputs>
<input>
<key>gpt4_solution</key>
<description>The solution from GPT-4 solver.</description>
</input>
<input>
<key>claude_solution</key>
<description>The solution from Claude solver.</description>
</input>
<input>
<key>deepseek_solution</key>
<description>The solution from DeepSeek solver.</description>
</input>
</inputs>
<task>Compare all solutions and determine the final answer through
majority voting.</task>
<outputs>
<output>
<key>final_solution</key>
<description>The final solution determined by majority
voting.</description>
<action>
<type>RESULT</type>
</action>
</output>
</outputs>
<listen>
<event>solve_with_gpt4</event>
<event>solve_with_claude</event>
<event>solve_with_deepseek</event>
</listen>
<agent>
<name>Vote Aggregator Agent</name>
<model>deepseek/deepseek-chat</model>
</agent>
</event>
</events>
</workflow>
Table 5: Comparison of the single LLM (Deepseek-v3) and AI-generated Majority Voting workflow.
45
Problem: Simplify tan 100◦ + 4 sin 100◦ .
√
Ground Truth: − 3
AI-generated Majority Voting workflow: Let me analyze all three solutions:
1. GPT-4 Solution: - Does not provide a definitive numerical answer - Focuses on theoretical approach and
approximations - Inconclusive result √
2. Claude Solution: - Uses supplementary angle properties - Converts 100° to 180° - 80° - Arrives at − 3
√
through numerical calculations - Final answer: − 3
3. Deepseek Solution: - Uses similar supplementary angle properties - Also converts 100° to 180° - 80° -
√
Goes through detailed algebraic steps - Final answer: 3
√ √
Analysis: 1. Two solutions provide definitive answers (− 3 and 3) , while one is inconclusive 2.
Both Claude and Deepseek use similar approaches with supplementary angles 3. The main difference is
in the sign of the final answer 4. Claude’s solution provides more explicit numerical calculations 5. Look-
ing at the steps: - 100° being in quadrant II - Using supplementary angle properties - tan 100° being negative
in quadrant II - The final combination with 4sin 100°
The Claude solution appears more precise in tracking signs and considering quadrant properties, making its
answer more reliable. √
Final Answer: The final answer is − 3. I hope it is correct.
Deepseek-v3: To simplify tan 100◦ + 4 sin 100◦ , we proceed as follows: 1. Recall that tan θ = sin θ
cos θ . Thus:
sin 100◦
tan 100◦ = .
cos 100◦
2. Substitute this into the original expression:
sin 100◦
tan 100◦ + 4 sin 100◦ = + 4 sin 100◦ .
cos 100◦
···
16. Therefore, the simplified form of tan 100◦ + 4 sin 100◦ is:
√
3.
√
Final Answer: The final answer is 3 . I hope it is correct.
Problem: What integer n satisfies 0 ≤ n < 18 and
n ≡ −11213141 (mod 18) ?
Ground Truth: 13
46
AI-generated Majority Voting workflow: Let me compare the three solutions:
1. GPT-4 Solution: Arrives at 13 through direct division and remainder calculation - Used division:
11213141 ÷ 18 - Found remainder of 5 for positive number - Adjusted for negative sign to get 13
2. Claude Solution: Arrives at 13 through division and remainder - Divided -11213141 by 18 - Found
remainder directly - Got 13 as the result in the range [0,18)
3. Deepseek Solution: Arrives at 5 through : - Division and remainder calculation - Used divisibility rules
for 2 and 9 - Got 5 as final answer
Analyzing the solutions: - Two solutions (GPT-4 and Claude) agree on 13 - One solution (Deepseek) gives
5 - Both GPT-4 and Claude show clearer arithmetic progressions - The key difference is in handling the
negative sign in the modulo operation - When working with negative numbers in modulo arithmetic, we need
to find the positive equivalent in the range [0,n) - The correct approach is to add the modulus (18) to the
negative remainder until we get a number in the range [0,18)
By majority voting and verification: - 13 is supported by two solutions -
The reasoning in these solutions is more complete - We can verify: -11213141 ≡ 13 (mod 18) and
0 ≤ 13 < 18
Final Answer: The final answer is 13. I hope it is correct.
Deepseek-v3: To solve for the integer n that satisfies 0 ≤ n < 18 and n ≡ −11213141 (mod 18), we need
to find the remainder when −11213141 is divided by 18. Here’s the step-by-step solution:
1. Simplify the modulus: First, note that −11213141 (mod 18) is equivalent to finding the positive
remainder when −11213141 is divided by 18. Since the modulus operation is periodic with period 18, we
can add multiples of 18 to −11213141 until we obtain a positive number within the range 0 ≤ n < 18.
2. Calculate the remainder: To find −11213141 (mod 18), we can instead compute 11213141 (mod 18)
and then take the negative of that result, adjusting it to be positive if necessary.
- First, compute 11213141 ÷ 18:
11213141 ÷ 18 = 623000 with a remainder of 11213141 − 18 × 623000 = 11213141 − 11214000 = −859
This means 11213141 ≡ −859 (mod 18).
- Now, compute −859 (mod 18):
−859 ÷ 18 = −48 with a remainder of − 859 − 18 × (−48) = −859 + 864 = 5
So, −859 ≡ 5 (mod 18).
- Therefore, −11213141 ≡ 5 (mod 18).
3. Final answer: The integer n that satisfies 0 ≤ n < 18 and n ≡ −11213141 (mod 18) is 5.
Final Answer: The final answer is 5. I hope it is correct.
7
Problem: In the diagram below, we have sin ∠RP Q = 25 . What is cos ∠RP S?
[asy]
pair R,P,Q,SS;
SS = (-2,0);
P = (0,0);
Q = (2,0);
R = rotate(aSin(7/25))*(1.5,0);
dot("S",SS,S);
dot("Q",Q,S);
dot("R",R,N);
dot("P ",P,S);
draw(Q–SS);
draw(P–R);
[/asy]
24
Ground Truth: − 25
47
AI-generated Majority Voting workflow: Let me analyze all three solutions:
1. GPT-4 Solution: Final answer = 24
25 - Used right triangle approach - Found cosine of RPQ = 24/25 -
Assumed RPS = RPQ
2. Claude Solution: Final answer = − 24 25 - Used supplementary angles property - Found
cos RPQ = 24/25 - Used the fact that cos(supplementary angles) are negatives of each other -
Concluded cos RPS = -cos RPQ = -24/25
7
3. DeepSeek Solution: Final answer = 25 - Used complementary angles property - Found cos RPQ = 24/25
- Incorrectly concluded cos RPS = sin RPQ
Analyzing the approaches: 1. Looking at the diagram, RPS and RPQ are supplementary angles (they form
a straight line) 2. For supplementary angles, their cosines are negatives of each other 3. Claude’s solution
correctly uses this property 4. GPT-4 incorrectly assumed the angles were equal 5. DeepSeek incorrectly
used complementary angle properties
Claude’s solution is the most mathematically sound because: - It correctly identifies that RPS and RPQ are
supplementary angles - It properly applies the relationship between cosines of supplementary angles - The
negative sign in the answer makes geometric sense given the angle’s position
Final Answer: The final answer is − 24
25 . I hope it is correct.
7
Deepseek-v3: We are given that sin ∠RP Q = 25 . Since ∠RP Q and ∠RP S are complementary angles
(they form a right angle together), we have:
7
cos ∠RP S = sin ∠RP Q = .
25
7
Final Answer: The final answer is 25 . I hope it is correct.
Problem: In the circle with center Q, radii AQ and BQ form a right angle. The two smaller regions are
tangent semicircles, as shown. The radius of the circle with center Q is 14 inches. What is the radius of the
smaller semicircle? Express your answer as a common fraction.
14
Ground Truth: 3
AI-generated Majority Voting workflow: Let me analyze all three solutions:
√
1. GPT-4’s solution: r = 7 2. Claude’s solution: r = 14/3 3. Deepseek’s solution: r = 14( 2 − 1)
Each solution uses different approaches: - GPT-4 assumes the semicircles are equal and uses a simple division
(14/2 = 7) - Claude uses the relationship r +√ r + r = R, leading to 14/3 - Deepseek uses the Pythagorean
theorem and complex algebra, arriving at 14( 2 − 1)
Let’s verify which one makes more sense:
1. Key geometric principles suggest that when two circles are tangent to each other and to two perpendicular
lines (in this case, the radii forming the right angle), there’s a special relationship where r + r + r = R, where
r is the radius of the smaller circles and R is the radius of the larger circle.
2. This principle is well-established in geometry. When two circles are placed in a right angle and are tangent
to each other and the lines forming the right angle, their radii follow this 3:1 ratio.
3. Therefore, Claude’s solution using 3r = 14 leading to r = 14/3 is the most mathematically sound .
4. We can verify this: - It satisfies the tangency conditions - The semicircles fit perfectly in the right angle -
The relationship maintains the proper geometric proportions
The√GPT-4 solution (r = 7) is too large as it would make the semicircles overlap. The Deepseek solution
14( 2 − 1) is more complex than necessary and doesn’t align with the fundamental geometric principle for
this configuration.
Final Answer: The final answer is 14 3 . I hope it is correct.
48
Deepseek-v3: Let’s solve the problem step by step.
1. Understand the setup: We have a circle with center Q and radius 14 inches. Radii AQ and BQ form a
right angle, meaning the angle ∠AQB = 90◦ . There are two smaller semicircles tangent to the larger circle
and to each other.
2. Key observations: - The two semicircles are tangent to the larger circle and to each other. This means the
distance from the center Q to the center of each semicircle is 14 − r, where r is the radius of the smaller
semicircle. - The centers of the two semicircles and the center Q form a right triangle because ∠AQB = 90◦ .
··· √
5. Simplify the answer: - The radius of the smaller semicircle is 14( 2 − 1). However, this can be expressed
as a common fraction: √ √
r = 14( 2 − 1) = 14 2 − 14.
√
14 2 − 14
r= .
1
√
14( 2 − 1)
r= .
1
√
r = 14( 2 − 1).
√
6. Final Answer: The radius of the smaller semicircle is 14( 2 − 1) .
√
Final Answer: The final answer is 14( 2 − 1). I hope it is correct.
49
ð: Agent output <agents>
Profiling Agent <system_input>
Financial management
,→ requests, including:
...
</system_input>
<system_output>
<key>financial_response</key>
...
</system_output>
<agent>
<name>Document Manager
,→ Agent</name>
...
</agent>
<agent>
<name>Market Research
,→ Agent</name>
...
</agent>
</agents>
50
ð: Tool Editor run_tool( [The output of the tool ‘run_tool‘
Agent ’tool_name’= ,→ showing in the interactive
’get_balance_sheet’, ,→ terminal]
’run_code’= Terminal viewport position: Showing
’from metachain.tools ,→ page 1 of 1.
,→ import get_balance_sheet ======================
result = [SUCCESS]
get_balance_sheet("AAPL", {
,→ "QUARTERLY") "symbol": "AAPL:NASDAQ",
print(result)’) "type": "stock",
"period": "QUARTERLY",
"balance_sheet": [
...
======================
51
ð: Tool Editor create_tool( [SUCCESS] Successfully created
Agent ’tool_name’= ,→ tool: analyze_financial_data in
’analyze_financial_data’, ,→ /workplace/metachain/metachain/
’tool_code’= tools/analyze_financial_data.py
’from metachain.types
,→ import Result, Agent
...
’)
Period-over-Period Comparison:
...
======================
52
ð: Agent Editor create_agent( Successfully created agent:
Agent ’agent_name’= ,→ Document Manager Agent in
’Document Manager Agent’, ,→ /workplace/metachain/metachain/
’agent_description’= agents/document_manager_agent.py
’Specialized agent for
,→ managing and analyzing
,→ private financial
,→ documents stored
,→ locally.’,
’agent_tools’=
[’query_db’, ...],
’agent_instructions’=’You
,→ are a specialized
,→ financial document
,→ manager responsible for
,→ managing and analyzing
,→ financial documents
,→ stored in specified
,→ directories. Your tasks
,→ include:...’)
53
ð: Agent Editor create_orchestrator_agent
Agent (’agent_name’= [ERROR] Failed to create agent.
’Financial Analysis ,→ Error: File
,→ Orchestrator’, ,→ "/workplace/metachain/metachain/
’agent_description’= agents/financial_analysis_orchestrator.py",
’Orchestrator agent that ,→ line 32
,→ coordinates between def
,→ document management and ,→ transfer_to_document_manager_agent(Document
,→ market research to ,→ analysis request including
,→ provide comprehensive ,→ directory paths and analysis
,→ financial analysis.’, ,→ requirements: str):
’sub_agents’=[{’name’: ^^^^^^^^
,→ ’Document Manager SyntaxError: invalid syntax
,→ Agent’, ’agent_input’:
,→ ’Document analysis
request
...’, ’agent_output’:
,→ ’Analyzed information
...’}, {’name’: ’Market
,→ Research Agent’,
,→ ’agent_input’: ’Market
research request ...’,
,→ ’agent_output’: ’Market
analysis ...’}],
’agent_instructions’=
’You are a Financial
,→ Analysis Orchestrator
,→ responsible for
,→ coordinating
,→ comprehensive financial
,→ analysis by leveraging
,→ both document analysis
,→ and market research
,→ capabilities. Your
,→ workflow: ...’)
54
ð: Agent Editor create_orchestrator_agent Successfully created agent:
Agent (’agent_name’= ,→ Financial Analysis Orchestrator
’Financial Analysis ,→ in
,→ Orchestrator’, ,→ /workplace/metachain/metachain/
’agent_description’= agents/financial_analysis_orchestrator.py
’Orchestrator agent that
,→ coordinates between
,→ document management and
,→ market research to
,→ provide comprehensive
,→ financial analysis.’,
’sub_agents’=[{’name’:
,→ ’Document Manager
,→ Agent’, ’agent_input’:
,→ ’doc_request’,
,→ ’agent_output’:
,→ ’doc_response’},
,→ {’name’: ’Market
,→ Research Agent’,
,→ ’agent_input’:
,→ ’market_request’,
,→ ’agent_output’:
,→ ’market_response’}],
’agent_instructions’=
’You are a Financial
,→ Analysis Orchestrator
,→ responsible for
,→ coordinating
,→ comprehensive financial
,→ analysis by leveraging
,→ both document analysis
,→ and market research
,→ capabilities. Your
,→ workflow: ...’)
55
ð: Agent Editor run_agent( [The output of the tool ‘run_agent‘
Agent ’agent_name’= ,→ showing in the interactive
’Financial Analysis ,→ terminal]
,→ Orchestrator’, Terminal viewport position: Showing
’query’= ,→ page 1 of 35.
"Based␣on␣the␣10-K␣reports␣ [NOTE] The output of the tool
,→ of␣AAPL␣and␣MSFT␣from␣ ‘run_agent‘, you can use
,→ the␣past␣5␣years␣in␣the␣ ‘terminal_page_up‘ to scroll up and
,→ docs␣directory␣‘docs/ ‘terminal_page_down‘ to scroll down.
aapl-2020-2024-10K/‘␣and␣ If there are many pages with
,→ ‘docs/ meaningless content like progress
msft-2020-2024-10K/‘,␣ bar or output of generating
,→ along␣with␣AAPL’s␣other␣ directory structure when there are
,→ reports␣‘docs/ many datasets in the directory, you
aapl-other-report/‘␣and␣ can use ‘terminal_page_to‘ to move
,→ available␣data,␣conduct␣ the viewport to the end of terminal
,→ a␣comprehensive␣ where the meaningful content is.
,→ horizontal␣comparison,␣ ======================
,→ create␣a␣comparative␣ [SUCCESS]
,→ analysis␣report,␣and␣ **** Receive Task ****
,→ provide␣constructive␣ Receiveing the task:
,→ investment␣advice␣for␣ Based on the 10-K reports of AAPL
,→ investing␣in␣them␣in␣ ,→ and MSFT from the past 5 years
,→ 2025.", ,→ in the docs
’ctx_vars’=’{}’) directory ...
**** Assistant Message ****
Financial Analysis Orchestrator:
,→ I’ll help coordinate a
,→ comprehensive comparative
,→ analysis between Apple (AAPL)
and Microsoft (MSFT) using both
,→ historical document analysis and
,→ current market
research. Let’s break this down
,→ into parallel workflows to
,→ gather all necessary
information.
56
ð: Agent Editor terminal_page_to [The output of the tool ‘run_agent‘
Agent (’page_idx’=35) ,→ showing in the interactive
,→ terminal]
Terminal viewport position: Showing
,→ page 35 of 35.
[NOTE] The output of the tool
‘run_agent‘, you can use
‘terminal_page_up‘ to scroll up and
‘terminal_page_down‘ to scroll down.
If there are many pages with
meaningless content like progress
bar or output of generating
directory structure when there are
many datasets in the directory, you
can use ‘terminal_page_to‘ to move
the viewport to the end of terminal
where the meaningful content is.
======================
**** Assistant Message ****
Financial Analysis Orchestrator:
,→ Based on the comprehensive
,→ analysis of both historical
,→ performance from 10-K
reports and current financial data,
,→ here is the final investment
,→ recommendation
for 2025:
2. Investment Rationale:
57
ð: Agent Editor Output I have successfully created and
Agent ,→ tested the following agents:
58