深入OpenManus架构:模块化设计与智能体实现

摘要

OpenManus作为一个功能强大的AI智能体框架,其核心在于优秀的模块化架构设计。本文将深入分析OpenManus的系统架构,探讨其核心组件的设计理念和实现方式,包括智能体基类、工具系统、配置管理、多智能体协作等关键模块。通过本文的学习,开发者可以更好地理解OpenManus的内部机制,为定制开发和优化提供指导。

正文

1. 架构概览

OpenManus采用模块化设计,各个组件之间松耦合,便于扩展和维护。整体架构如下图所示:

基础设施
核心架构
执行环境层
文件系统
大语言模型
智能体层
工具层
配置管理层
LLM接口层
用户接口层

2. 智能体系统设计

2.1 基础智能体类

OpenManus的智能体系统以[BaseAgent](file:///D:/project/OpenManus/app/agent/base.py#L17-L196)为核心,定义了智能体的基本行为和状态管理:

# BaseAgent核心实现
class BaseAgent(BaseModel, ABC):
    # 核心属性
    name: str = Field(..., description="Unique name of the agent")
    description: Optional[str] = Field(None, description="Optional agent description")
    
    # 提示词配置
    system_prompt: Optional[str] = Field(
        None, description="System-level instruction prompt"
    )
    next_step_prompt: Optional[str] = Field(
        None, description="Prompt for determining next action"
    )
    
    # 依赖组件
    llm: LLM = Field(default_factory=LLM, description="Language model instance")
    memory: Memory = Field(default_factory=Memory, description="Agent's memory store")
    state: AgentState = Field(
        default=AgentState.IDLE, description="Current agent state"
    )
    
    # 执行控制
    max_steps: int = Field(default=10, description="Maximum steps before termination")
    current_step: int = Field(default=0, description="Current step in execution")
2.2 状态管理模式

智能体采用状态机模式管理执行状态:

# AgentState枚举定义
class AgentState(str, Enum):
    IDLE = "idle"
    RUNNING = "running"
    FINISHED = "finished"
    ERROR = "error"

状态转换通过[state_context](file:///D:/project/OpenManus/app/agent/base.py#L70-L93)上下文管理器安全处理:

@asynccontextmanager
async def state_context(self, new_state: AgentState):
    """Context manager for safe agent state transitions."""
    if not isinstance(new_state, AgentState):
        raise ValueError(f"Invalid state: {new_state}")

    previous_state = self.state
    self.state = new_state
    try:
        yield
    except Exception as e:
        self.state = AgentState.ERROR
        raise e
    finally:
        self.state = previous_state
2.3 记忆管理机制

智能体通过[Memory](file:///D:/project/OpenManus/app/schema.py#L39-L57)类管理对话历史和上下文:

def update_memory(
    self,
    role: ROLE_TYPE,
    content: str,
    base64_image: Optional[str] = None,
    **kwargs,
) -> None:
    """Add a message to the agent's memory."""
    message_map = {
        "user": Message.user_message,
        "system": Message.system_message,
        "assistant": Message.assistant_message,
        "tool": lambda content, **kw: Message.tool_message(content, **kw),
    }

    if role not in message_map:
        raise ValueError(f"Unsupported message role: {role}")

    kwargs = {"base64_image": base64_image, **(kwargs if role == "tool" else {})}
    self.memory.add_message(message_map[role](content, **kwargs))

3. 工具系统架构

3.1 工具基类设计

所有工具都继承自[BaseTool](file:///D:/project/OpenManus/app/tool/base.py#L8-L32)基类:

class BaseTool(ABC, BaseModel):
    name: str
    description: str
    parameters: Optional[dict] = None

    async def __call__(self, **kwargs) -> Any:
        """Execute the tool with given parameters."""
        return await self.execute(**kwargs)

    @abstractmethod
    async def execute(self, **kwargs) -> Any:
        """Execute the tool with given parameters."""

    def to_param(self) -> Dict:
        """Convert tool to function call format."""
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": self.description,
                "parameters": self.parameters,
            },
        }
3.2 工具集合管理

[ToolCollection](file:///D:/project/OpenManus/app/tool/tool_collection.py#L12-L71)类负责管理多个工具:

class ToolCollection:
    def __init__(self, *tools: BaseTool):
        self.tools = tools
        self.tool_map = {tool.name: tool for tool in tools}

    def to_params(self) -> List[Dict[str, Any]]:
        return [tool.to_param() for tool in self.tools]

    async def execute(
        self, *, name: str, tool_input: Dict[str, Any] = None
    ) -> ToolResult:
        tool = self.tool_map.get(name)
        if not tool:
            return ToolFailure(error=f"Tool {name} is invalid")
        try:
            result = await tool(**tool_input)
            return result
        except ToolError as e:
            return ToolFailure(error=e.message)
3.3 工具调用智能体

[ToolCallAgent](file:///D:/project/OpenManus/app/agent/toolcall.py#L19-L250)是支持工具调用的智能体基类:

class ToolCallAgent(ReActAgent):
    available_tools: ToolCollection = ToolCollection(
        CreateChatCompletion(), Terminate()
    )
    tool_choices: TOOL_CHOICE_TYPE = ToolChoice.AUTO
    special_tool_names: List[str] = Field(default_factory=lambda: [Terminate().name])

    async def think(self) -> bool:
        """Process current state and decide next actions using tools"""
        # 获取带有工具选项的响应
        response = await self.llm.ask_tool(
            messages=self.messages,
            system_msgs=(
                [Message.system_message(self.system_prompt)]
                if self.system_prompt
                else None
            ),
            tools=self.available_tools.to_params(),
            tool_choice=self.tool_choices,
        )
        # 处理响应和工具调用
        # ...

    async def act(self) -> str:
        """Execute tool calls and handle their results"""
        if not self.tool_calls:
            if self.tool_choices == ToolChoice.REQUIRED:
                raise ValueError(TOOL_CALL_REQUIRED)
            return self.messages[-1].content or "No content or commands to execute"

        results = []
        for command in self.tool_calls:
            result = await self.execute_tool(command)
            # 处理工具执行结果
            # ...

4. LLM接口层设计

4.1 多模型支持

[LLM](file:///D:/project/OpenManus/app/llm.py#L235-L766)类支持多种大语言模型:

class LLM:
    def __init__(
        self, config_name: str = "default", llm_config: Optional[LLMSettings] = None
    ):
        llm_config = llm_config or config.llm
        llm_config = llm_config.get(config_name, llm_config["default"])
        self.model = llm_config.model
        self.max_tokens = llm_config.max_tokens
        self.temperature = llm_config.temperature
        self.api_type = llm_config.api_type
        self.api_key = llm_config.api_key
        self.api_version = llm_config.api_version
        self.base_url = llm_config.base_url
        
        # 根据API类型初始化客户端
        if self.api_type == "azure":
            self.client = AsyncAzureOpenAI(
                base_url=self.base_url,
                api_key=self.api_key,
                api_version=self.api_version,
            )
        elif self.api_type == "aws":
            self.client = BedrockClient()
        else:
            self.client = AsyncOpenAI(api_key=self.api_key, base_url=self.base_url)
4.2 工具调用接口

[ask_tool](file:///D:/project/OpenManus/app/llm.py#L81-L135)方法支持工具调用:

async def ask_tool(
    self,
    messages: List[Union[dict, Message]],
    system_msgs: Optional[List[Union[dict, Message]]] = None,
    timeout: int = 300,
    tools: Optional[List[dict]] = None,
    tool_choice: TOOL_CHOICE_TYPE = ToolChoice.AUTO,
    temperature: Optional[float] = None,
    **kwargs,
) -> ChatCompletionMessage | None:
    # 格式化消息
    # 计算令牌数
    # 验证工具和工具选择
    # 发送请求并处理响应

5. 配置管理系统

5.1 配置结构

[Config](file:///D:/project/OpenManus/app/config.py#L273-L340)类采用单例模式管理配置:

class Config:
    _instance = None
    _lock = threading.Lock()
    _initialized = False

    def __new__(cls):
        if cls._instance is None:
            with cls._lock:
                if cls._instance is None:
                    cls._instance = super().__new__(cls)
        return cls._instance

    @property
    def llm(self) -> Dict[str, LLMSettings]:
        return self._config.llm

    @property
    def sandbox(self) -> SandboxSettings:
        return self._config.sandbox

    @property
    def browser_config(self) -> Optional[BrowserSettings]:
        return self._config.browser_config
5.2 TOML配置文件

配置文件采用TOML格式,支持多种模型配置:

# 全局LLM配置
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..."
max_tokens = 4096
temperature = 0.0

# 可选特定LLM模型配置
[llm.vision]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..."

# 浏览器配置
[browser]
headless = false
disable_security = true

# 搜索配置
[search]
engine = "Google"
fallback_engines = ["DuckDuckGo", "Baidu", "Bing"]

6. 多智能体协作系统

6.1 Flow工厂模式

[FlowFactory](file:///D:/project/OpenManus/app/flow/flow_factory.py#L15-L30)使用工厂模式创建不同类型的流程:

class FlowFactory:
    @staticmethod
    def create_flow(
        flow_type: FlowType,
        agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]],
        **kwargs,
    ) -> BaseFlow:
        flows = {
            FlowType.PLANNING: PlanningFlow,
        }

        flow_class = flows.get(flow_type)
        if not flow_class:
            raise ValueError(f"Unknown flow type: {flow_type}")

        return flow_class(agents, **kwargs)
6.2 规划流程实现

[PlanningFlow](file:///D:/project/OpenManus/app/flow/planning.py#L49-L442)实现多智能体协作的规划流程:

class PlanningFlow(BaseFlow):
    def __init__(
        self, agents: Union[BaseAgent, List[BaseAgent], Dict[str, BaseAgent]], **data
    ):
        # 初始化智能体
        super().__init__(agents, **data)
        # 设置执行器键
        if not self.executor_keys:
            self.executor_keys = list(self.agents.keys())

    async def execute(self, input_text: str) -> str:
        """Execute the planning flow with agents."""
        try:
            # 创建初始计划
            if input_text:
                await self._create_initial_plan(input_text)

            result = ""
            while True:
                # 获取当前步骤信息
                self.current_step_index, step_info = await self._get_current_step_info()

                # 退出条件
                if self.current_step_index is None:
                    result += await self._finalize_plan()
                    break

                # 执行当前步骤
                step_type = step_info.get("type") if step_info else None
                executor = self.get_executor(step_type)
                step_result = await self._execute_step(executor, step_info)
                result += step_result + "\n"

                # 检查智能体是否要终止
                if hasattr(executor, "state") and executor.state == AgentState.FINISHED:
                    break

            return result
        except Exception as e:
            logger.error(f"Error in PlanningFlow: {str(e)}")
            return f"Execution failed: {str(e)}"

7. 执行环境与沙箱

7.1 沙箱配置

[SandboxSettings](file:///D:/project/OpenManus/app/config.py#L131-L143)定义沙箱配置:

class SandboxSettings(BaseModel):
    """Configuration for the execution sandbox"""

    use_sandbox: bool = Field(False, description="Whether to use the sandbox")
    image: str = Field("python:3.12-slim", description="Base image")
    work_dir: str = Field("/workspace", description="Container working directory")
    memory_limit: str = Field("512m", description="Memory limit")
    cpu_limit: float = Field(1.0, description="CPU limit")
    timeout: int = Field(300, description="Default command timeout (seconds)")
    network_enabled: bool = Field(
        False, description="Whether network access is allowed"
    )
7.2 沙箱客户端

[SANDBOX_CLIENT](file:///D:/project/OpenManus/app/sandbox/client.py#L115-L115)提供沙箱操作接口:

class SandboxClient:
    async def execute_command(
        self, 
        command: str, 
        timeout: Optional[int] = None,
        work_dir: Optional[str] = None
    ) -> SandboxResult:
        """Execute a command in the sandbox environment."""
        # 实现命令执行逻辑
        # 处理超时和错误
        # 返回执行结果

8. 实践示例

8.1 自定义智能体实现
from app.agent.toolcall import ToolCallAgent
from app.tool import ToolCollection
from app.tool.python_execute import PythonExecute
from app.tool.web_search import WebSearch

class CustomAgent(ToolCallAgent):
    name: str = "custom_agent"
    description: str = "自定义智能体示例"

    available_tools: ToolCollection = Field(
        default_factory=lambda: ToolCollection(
            PythonExecute(),
            WebSearch(),
        )
    )

    system_prompt: str = """你是一个专门处理数据分析任务的智能体。
    你可以使用Python执行代码和网络搜索工具来完成任务。"""

    async def step(self) -> str:
        """执行单步操作"""
        thought = await self.think()
        if thought:
            action = await self.act()
            return action
        return "任务完成"
8.2 工具扩展实现
from app.tool.base import BaseTool

class CustomTool(BaseTool):
    name: str = "custom_tool"
    description: str = "自定义工具示例"
    parameters: dict = {
        "type": "object",
        "properties": {
            "param1": {
                "type": "string",
                "description": "示例参数1"
            },
            "param2": {
                "type": "integer",
                "description": "示例参数2"
            }
        },
        "required": ["param1"]
    }

    async def execute(self, param1: str, param2: int = 10) -> str:
        """执行工具逻辑"""
        # 实现具体功能
        result = f"处理参数: {param1}, {param2}"
        return result

9. 性能优化策略

9.1 令牌计数优化
class TokenCounter:
    def count_message_tokens(self, messages: List[dict]) -> int:
        """计算消息列表中的总令牌数"""
        total_tokens = self.FORMAT_TOKENS  # 基础格式令牌

        for message in messages:
            tokens = self.BASE_MESSAGE_TOKENS  # 每条消息的基础令牌

            # 添加角色令牌
            tokens += self.count_text(message.get("role", ""))

            # 添加内容令牌
            if "content" in message:
                tokens += self.count_content(message["content"])

            # 添加工具调用令牌
            if "tool_calls" in message:
                tokens += self.count_tool_calls(message["tool_calls"])

            total_tokens += tokens

        return total_tokens
9.2 缓存机制
from functools import lru_cache

class LLM:
    @lru_cache(maxsize=128)
    def count_tokens(self, text: str) -> int:
        """计算文本中的令牌数"""
        if not text:
            return 0
        return len(self.tokenizer.encode(text))

10. 错误处理与日志

10.1 异常处理机制
class ToolCallAgent(ReActAgent):
    async def execute_tool(self, command: ToolCall) -> str:
        """执行单个工具调用并处理错误"""
        try:
            # 解析参数
            args = json.loads(command.function.arguments or "{}")

            # 执行工具
            result = await self.available_tools.execute(name=name, tool_input=args)
            
            # 格式化结果
            observation = (
                f"Observed output of cmd `{name}` executed:\n{str(result)}"
                if result
                else f"Cmd `{name}` completed with no output"
            )

            return observation
        except json.JSONDecodeError:
            error_msg = f"Error parsing arguments for {name}: Invalid JSON format"
            logger.error(
                f"📝 参数解析错误 '{name}' - 无效的JSON格式, 参数:{command.function.arguments}"
            )
            return f"Error: {error_msg}"
        except Exception as e:
            error_msg = f"⚠️ 工具 '{name}' 遇到问题: {str(e)}"
            logger.exception(error_msg)
            return f"Error: {error_msg}"

总结

OpenManus的架构设计体现了现代软件工程的最佳实践:

  1. 模块化设计:各组件职责清晰,便于维护和扩展
  2. 面向接口编程:通过抽象基类定义标准接口
  3. 工厂模式:通过工厂类创建复杂对象
  4. 策略模式:支持多种LLM和工具选择策略
  5. 状态管理:通过状态机模式管理智能体状态
  6. 错误处理:完善的异常处理和日志记录机制

通过深入理解这些设计模式和实现细节,开发者可以更好地利用OpenManus构建强大的AI应用,同时也能为框架的进一步发展做出贡献。

实践建议

  1. 理解核心概念:深入理解智能体、工具、流程等核心概念
  2. 遵循设计模式:在扩展开发中遵循已有的设计模式
  3. 合理使用缓存:对频繁调用且结果稳定的功能使用缓存
  4. 完善错误处理:为自定义组件实现完善的错误处理机制
  5. 关注性能优化:注意令牌计数和执行效率的优化

参考资料

  1. OpenManus GitHub仓库
  2. OpenManus官方文档
  3. 设计模式:可复用面向对象软件的基础
  4. Python异步编程指南
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CarlowZJ

我的文章对你有用的话,可以支持

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值