ADK 第三篇 Agents (LlmAgent)

最新推荐文章于 2025-06-27 17:33:17 发布

王学政2

最新推荐文章于 2025-06-27 17:33:17 发布

阅读量1.3k

点赞数 19

CC 4.0 BY-SA版权

分类专栏： ADK 文章标签：人工智能 python

本文链接：https://blog.csdn.net/2501_90353350/article/details/147681703

ADK 专栏收录该内容

4 篇文章

订阅专栏

Agents

在智能体开发套件（ADK）中，智能体（Agent）是一个独立的执行单元，旨在自主行动以实现特定目标。智能体能够执行任务、与用户交互、使用外部工具，并与其他智能体协同工作。

在ADK中，所有智能体的基础都是BaseAgent类，它充当着核心蓝图的作用。开发者通常通过以下三种主要方式扩展BaseAgent，以满足不同需求——从智能推理到结构化流程控制，从而创建出功能完备的智能体。

核心智能体类型

ADK 提供多种核心智能体类型，用于构建复杂应用场景：

大语言模型智能体（LlmAgent/Agent）：以大型语言模型（LLM）为核心引擎，具备自然语言理解、逻辑推理、任务规划、内容生成等能力，并能动态决策执行路径与工具调用，特别适合需要灵活语言处理的任务。

工作流智能体（SequentialAgent/ParallelAgent/LoopAgent）：通过预定义模式（顺序/并行/循环）精确控制其他智能体的执行流程，其流程调度机制不依赖LLM，适用于需要确定性执行的结构化流程。

自定义智能体 (Custom Agents)：通过直接扩展BaseAgent实现，可开发具有独特业务逻辑、定制化控制流或特殊集成的智能体，满足高度定制化需求。

选择适合的智能体类型

下表提供了高层次对比，帮助区分不同智能体类型。随着您在后续章节深入了解每种类型，这些差异将更加清晰。

功能对比	LLM Agent (`LlmAgent`)	Workflow Agent	Custom Agent (`BaseAgent` subclass)
核心功能	推理/生成/工具调用	控制智能体执行流程	实现独特逻辑与集成
驱动引擎	大型语言模型(LLM)	预定义逻辑(顺序/并行/循环)	自定义Python代码
确定性	非确定性(灵活响应)	确定性(可预测执行)	可自定义(取决于实现)
典型场景	语言任务/动态决策	结构化流程/任务编排	定制化需求/特定工作流

LlmAgent

LlmAgent（通常简称为Agent）是ADK中的核心组件，充当应用程序的"大脑"。它利用大型语言模型（LLM）的强大能力，实现推理、自然语言理解、决策制定、内容生成以及工具调用等功能。

与遵循预定义执行路径的确定性工作流智能体不同，LlmAgent具有非确定性特征。它依托LLM解析指令和上下文，动态决策后续操作（包括工具调用选择、是否移交控制权等），实现灵活的任务处理。

构建高效的LlmAgent需要明确定义其身份标识，通过指令精准引导行为，并配置必要的工具与能力集。

创建智能体

from google.adk.agents import LlmAgent

agent = LlmAgent(
    name="",
    model="",
    description="",
    # instruction and tools will be added next
)

参数说明：

name（必填）：每个智能体需具备唯一字符串标识符。该名称在内部运维中至关重要，尤其涉及多智能体系统中的任务互调时。应选择体现功能特征的描述性名称（如customer_support_router、billing_inquiry_agent），避免使用user等保留名称。

description（可选，多智能体场景推荐）：提供智能体能力的简明概述。该描述主要用于其他LLM智能体判断是否路由任务至本智能体。需具备足够特异性以区分同类（例如"处理当前账单查询"，而非笼统的"账单智能体"）。

model（必填）：指定驱动智能体推理的底层LLM模型。采用字符串标识符如"gemini-2.0-flash"。模型选择直接影响智能体能力、成本及性能表现。可选模型及考量因素详见模型列表。

instruction 参数说明

引导智能体行为：instruction参数是塑造LlmAgent行为最关键的核心配置。该参数接受字符串或字符串生成函数，用于向智能体明确以下行为准则：

其核心任务与目标：明确智能体需要完成的主要工作及成功标准。
角色设定与人格特征：例如："你是一个乐于助人的助手"、"你扮演幽默的海盗角色"，通过人格模板塑造交互风格。
行为约束规范：限定操作范围（如"仅回答X相关问题"），设置禁忌条款（如"严禁透露Y信息"）。
工具调用策略：说明每个工具的设计用途及调用条件，需补充工具自身的描述不足处，包含触发阈值、参数规范等工程细节。
输出格式要求：结构化输出（如"以JSON格式响应"），呈现形式规范（如"使用项目符号列表"），包含数据类型、字段说明等约束。

设计要诀：

清晰明确性原则：规避歧义，精确声明预期行为与输出标准。
采用Markdown结构化：运用标题/列表等元素提升复杂指令可读性
少样本示例集成：针对复杂任务或特定输出格式，应在指令中直接内嵌范例
工具调用引导规范：超越工具枚举，明确调用时机与决策逻辑

可在字符串模板中使用动态变量

指令作为字符串模板，支持通过{var}语法插入动态变量值。
{var} 用于插入名为 var 的状态变量值
{artifact.var} 用于插入名为 var 的工件文本内容
若状态变量或工件不存在，智能体将抛出错误。如需忽略错误，可在变量名后添加 ?，如 {var?}。

# Example: Adding instructions
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="解答用户关于各国首都的查询",
    instruction="""您正在使用【首都查询智能体】
    当用户查询某国首都时，请按以下标准流程响应：
        1. 从用户查询中识别国家名称
        2. 调用 get_capital_city 工具获取首都数据
        3. 向用户清晰反馈首都信息
    示例查询："法国的首都是哪里？"
    示例回复：“法国的首都是巴黎。”
    """,
    # tools will be added next
)

（注：对于适用于系统中所有智能体的指令，可考虑在根智能体上配置 global_instruction 参数，具体用法详见《多智能体系统》章节。）

配置智能体工具Tools

工具集赋予LlmAgent超越LLM内置知识与推理的能力，使其能够：

与外部系统交互
执行精准计算
获取实时数据流
触发特定操作

tools（可选）：配置智能体可使用的工具列表。列表中的每个工具可以是以下任意一种形式：

Python函数（将自动封装为FunctionTool）
继承自 BaseTool 的类实例
其他智能体的实例（通过AgentTool实现智能体间任务委托 - 详见《多智能体系统》）

大语言模型（LLM）会根据函数/工具的名称、描述（来自文档字符串或描述字段）以及参数模式，结合当前对话内容和自身指令，来决定调用哪个工具。

# Define a tool function
def get_capital_city(country: str) -> str:
  """Retrieves the capital city for a given country."""
  # Replace with actual logic (e.g., API call, database lookup)
  capitals = {"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"}
  return capitals.get(country.lower(), f"Sorry, I don't know the capital of {country}.")

# Add the tool to the agent
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="Answers user questions about the capital city of a given country.",
    instruction="""You are an agent that provides the capital city of a country... (previous instruction text)""",
    tools=[get_capital_city] # Provide the function directly
)

高级配置与控制

精细化调控 LLM 生成（generate_content_config）

您可以通过 generate_content_config 参数深度调整底层大语言模型（LLM）的响应生成方式，具体支持以下维度的精细化控制：

temperature（随机性控制）：

取值范围 0.0 ~ 2.0，默认值通常为 0.9
低值（如 0.2）：输出更确定、保守，适用于事实性回答
高值（如 1.0）：增强创造性，适合创意生成或开放式对话

max_output_tokens（响应长度限制）：

设定生成内容的最大 token 数量（如 300），避免冗长响应

top_p & top_k（候选词筛选）：

top_p（0.0 ~ 1.0）：动态截断概率分布（如 0.8 保留前 80% 可能词）
top_k（整数）：限制每步采样候选词数量（如 40 仅考虑前 40 个最佳词）

safety_settings（内容安全过滤）：

配置敏感内容拦截等级（如 BLOCK_LOW/BLOCK_MEDIUM/BLOCK_HIGH）
支持按类别过滤（如 HARM_CATEGORY_HATE_SPEECH 仇恨言论检测）

from google.genai import types

agent = LlmAgent(
    # ... other params
    generate_content_config=types.GenerateContentConfig(
        temperature=0.2, # More deterministic output
        max_output_tokens=250
    )
)

结构化数据控制

（input_schema / output_schema / output_key）

在需要结构化数据交互的场景中，您可以通过 Pydantic 模型 实现严格的输入/输出控制，确保数据格式的规范性和类型安全。

input_schema（可选参数）

通过定义 Pydantic 的 BaseModel 类，严格规范输入数据的结构。启用后，所有传入该 Agent 的用户消息内容必须是符合此模型的 JSON 字符串，系统会自动执行校验与转换。

output_schema（可选）

定义一个表示预期输出结构的 Pydantic BaseModel 类。如果设置此项，智能体的最终响应必须是符合此模式的 JSON 字符串。使用 output_schema 会启用大语言模型（LLM）的受控生成功能，但同时会禁用智能体调用工具或将控制权转移给其他智能体的能力。您需要通过指令明确引导 LLM 直接生成符合该模式的 JSON。

output_key（可选参数）

当设置此参数时，Agent 的最终文本响应会自动保存到会话状态字典中（session.state[output_key]），实现跨 Agent 或工作流步骤的数据传递。

from pydantic import BaseModel, Field

class CapitalOutput(BaseModel):
    capital: str = Field(description="The capital of the country.")

structured_capital_agent = LlmAgent(
    # ... name, model, description
    instruction="""若输入国家为"中国"，则严格按 {"capital": "北京"} JSON格式返回，不包含任何额外文本或解		释。""",
    output_schema=CapitalOutput, # Enforce JSON output
    output_key="found_capital"  # Store result in state['found_capital']
    # Cannot use tools=[get_capital_city] effectively here
)

上下文管理（include_contents）

控制智能体是否接收历史对话记录

include_contents（可选，默认值：'default'）：控制是否将对话历史内容传递给大语言模型（LLM）。

'default'（默认模式）：智能体会接收到相关的对话历史，使其能够基于上下文进行连贯的多轮交互（例如，理解指代或延续之前的任务）。
'none'（无历史模式）：智能体不会接收任何先前的对话内容，仅根据当前指令和本轮输入生成响应（适用于无状态任务或强制限定上下文场景）。

stateless_agent = LlmAgent(
    # ... other params
    include_contents='none'
)

案例：完整代码

# 获取国家首都的示例代码
# --- 以下是演示 LlmAgent 使用工具（Tools）与输出模式（Output Schema）对比 的完整示例代码及说明 ---
import asyncio
import json # Needed for pretty printing dicts

from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from pydantic import BaseModel, Field

# --- 1. 定义常量 ---
APP_NAME = "agent_comparison_app"
USER_ID = "test_user_456"
SESSION_ID_TOOL_AGENT = "session_tool_agent_xyz"
SESSION_ID_SCHEMA_AGENT = "session_schema_agent_xyz"
MODEL_NAME = "gemini-2.0-flash"

# --- 2. 使用LiteLLM调用在线模型 ---
model_client = LiteLlm(
    model="deepseek/deepseek-chat",
    api_base="https://api.deepseek.com",
    api_key="sk-xxxxxx",
)

# --- 3. 定义数据模型 ---
# Input schema used by both agents
class CountryInput(BaseModel):
    country: str = Field(description="要获取相关信息的国家。")

# Output schema ONLY for the second agent
class CapitalInfoOutput(BaseModel):
    capital: str = Field(description="该国家的首都城市。")
    # Note: 人口数据为示意值；由于设定了输出格式（output_schema），
    # 大语言模型（LLM）将自行推断或估算该数值（此时无法调用外部工具获取真实数据）。
    population_estimate: str = Field(description="该首都城市的估计人口数量。")

# --- 4. 定义工具 ---
def get_capital_city(country: str) -> str:
    """获取指定国家的首都城市名称。"""
    print(f"\n-- Tool Call: get_capital_city(country='{country}') --")
    country_capitals = {
            "美国": "华盛顿哥伦比亚特区",
            "加拿大": "渥太华",
            "法国": "巴黎",
            "日本": "东京",
    }
    result = country_capitals.get(country.lower(), f"Sorry, I couldn't find the capital for {country}.")
    print(f"-- Tool Result: '{result}' --")
    return result

# --- 5. 配置 Agents ---

# Agent 1: Uses a tool and output_key
capital_agent_with_tool = LlmAgent(
    model=model_client,
    name="capital_agent_tool",
    description="调用指定工具获取国家首都城市信息",
    instruction="""您是一个智能助手，专门通过工具查询国家首都信息。
    工作流程：
        1、接收用户输入的JSON格式数据：{"country": "国家名称"}
        2、自动提取country字段值
        3、调用get_capital_city工具查询首都
        4、以清晰语句向用户返回查询结果
    示例：
        用户输入：{"country": "法国"}  
        助手响应：根据查询结果，法国的首都是巴黎。
    """,
    tools=[get_capital_city],
    input_schema=CountryInput,
    output_key="capital_tool_result", # Store final text response
)

# Agent 2: Uses output_schema (NO tools possible)
structured_info_agent_schema = LlmAgent(
    model=model_client,
    name="structured_info_agent_schema",
    description="提供以特定JSON格式标注的首都及预估人口数据。",
    instruction=f"""你是一个提供国家信息的智能体
    用户将以JSON格式提供国家名称，如{{“country”：“country_name”}}。
    仅使用与此确切模式匹配的JSON对象进行响应：
    EXAMPLE JSON OUTPUT:
    {{
        "capital": "日本",
        "population_estimate": "1万"
        
    }}
    用你已有知识判断其首都并估算人口。不要使用任何工具。
    """,
    # *** NO tools parameter here - using output_schema prevents tool use ***
    input_schema=CountryInput,
    output_schema=CapitalInfoOutput, # Enforce JSON output structure
    output_key="structured_info_result", # Store final JSON response
)

# --- 6. 设置会话管理器Session与运行器 ---
session_service = InMemorySessionService()

# 为清晰起见创建独立会话（若上下文管理得当则非必需）
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_TOOL_AGENT)
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_SCHEMA_AGENT)

# 为每个智能体创建独立的运行器
capital_runner = Runner(
    agent=capital_agent_with_tool,
    app_name=APP_NAME,
    session_service=session_service
)
structured_runner = Runner(
    agent=structured_info_agent_schema,
    app_name=APP_NAME,
    session_service=session_service
)

# --- 7. 定义智能体交互逻辑 ---
async def call_agent_and_print(
    runner_instance: Runner,
    agent_instance: LlmAgent,
    session_id: str,
    query_json: str
):
    """向指定的 Agent/Runner 发送查询并打印结果。"""
    print(f"\n>>> Calling Agent: '{agent_instance.name}' | Query: {query_json}")

    user_content = types.Content(role='user', parts=[types.Part(text=query_json)])

    final_response_content = "No final response received."
    async for event in runner_instance.run_async(user_id=USER_ID, session_id=session_id, new_message=user_content):
        # print(f"Event: {event.type}, Author: {event.author}") # Uncomment for detailed logging
        if event.is_final_response() and event.content and event.content.parts:
            # For output_schema, the content is the JSON string itself
            final_response_content = event.content.parts[0].text

    print(f"<<< Agent '{agent_instance.name}' Response: {final_response_content}")

    current_session = session_service.get_session(app_name=APP_NAME,
                                                  user_id=USER_ID,
                                                  session_id=session_id)
    stored_output = current_session.state.get(agent_instance.output_key)

    # 如果存储的输出类似 JSON（可能来自 output_schema），则进行格式化美化打印。
    print(f"--- Session State ['{agent_instance.output_key}']: ", end="")
    try:
        # 若内容为 JSON 格式，则尝试解析并美化输出
        parsed_output = json.loads(stored_output)
        print(json.dumps(parsed_output, indent=2))
    except (json.JSONDecodeError, TypeError):
         # Otherwise, print as string
        print(stored_output)
    print("-" * 30)


# --- 7. Run Interactions ---
async def main():
    print("--- Testing Agent with Tool ---")
    await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "日本"}')
    #await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "加拿大"}')

    print("\n\n--- Testing Agent with Output Schema (No Tool Use) ---")
    await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "日本"}')
    #await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "日本"}')

if __name__ == "__main__":
    asyncio.run(main())

运行结果：

--- Testing Agent with Tool ---

>>> Calling Agent: 'capital_agent_tool' | Query: {"country": "日本"}

-- Tool Call: get_capital_city(country='日本') --
-- Tool Result: '东京' --
<<< Agent 'capital_agent_tool' Response: 根据查询结果，日本的首都是东京。
--- Session State ['capital_tool_result']: 根据查询结果，日本的首都是东京。
------------------------------


--- Testing Agent with Output Schema (No Tool Use) ---

>>> Calling Agent: 'structured_info_agent_schema' | Query: {"country": "日本"}
<<< Agent 'structured_info_agent_schema' Response: {
    "capital": "东京",
    "population_estimate": "1.26亿"
}
--- Session State ['structured_info_result']: {'capital': '东京', 'population_estimate': '1.26亿'}
------------------------------

Process finished with exit code 0