第 17 章:推理技术
本章深入探讨智能体的高级推理方法,重点关注多步骤逻辑推理和复杂问题解决。这些技术超越了简单的顺序操作,使智能体的内部推理过程变得透明可见。通过这种方式,智能体能够将复杂问题分解为更小的子问题、考虑中间推理步骤,并得出更加可靠和准确的结论。这些高级方法的核心原则是在推理过程中分配更多的计算资源。这意味着给予智能体或底层 LLM 更多的处理时间或推理步骤来处理查询并生成响应。与快速单次处理不同,智能体可以进行迭代改进、探索多种解决方案或利用外部工具。这种延长推理时间的方法通常能显著提升准确性、连贯性和鲁棒性,特别是在处理需要深入分析和仔细审议的复杂问题时。
实际应用和用例
这些推理技术的实际应用场景包括:
- 复杂问答:支持解决多跳查询,这类查询需要整合不同来源的数据并进行逻辑推理,可能涉及检查多条推理路径,通过延长推理时间来综合信息。
- 数学问题解决:能够将复杂数学问题分解为更小、可解决的组件,展示逐步解题过程,并使用代码执行进行精确计算,延长的推理时间使得更复杂的代码生成和验证成为可能。
- 代码调试和生成:支持智能体解释其生成或修改代码的理由,按顺序识别潜在问题,并根据测试结果迭代改进代码(自我纠正),利用延长的推理时间进行彻底的调试。
- 战略规划:通过对各种选项、后果和前提条件进行推理,协助制定全面计划,并根据实时反馈调整策略(ReAct),延长的审议时间可以带来更有效和可靠的规划。
- 医疗诊断:帮助智能体评估症状、检查结果和患者病史以达成诊断,在每个阶段阐明推理过程,并可能利用外部工具检索数据(ReAct)。增加的推理时间允许进行更全面的鉴别诊断。
- 法律分析:支持分析法律文件和先例以构建论点或提供指导,详细说明所采取的逻辑步骤,并通过自我纠正确保逻辑一致性。增加的推理时间允许进行更深入的法律研究和论证构建。
推理技术
首先,让我们深入了解用于增强 AI 模型问题解决能力的核心推理技术。
**思维链(Chain-of-Thought,CoT)**提示词通过模拟逐步思考过程,显著增强了 LLM 的复杂推理能力(见图 1)。CoT 提示词不是要求模型直接给出答案,而是引导其生成一系列中间推理步骤。这种显式的分解使 LLM 能够将复杂问题拆分为更小、更易管理的子问题来逐步解决。该技术显著提升了模型在需要多步推理任务上的表现,例如算术计算、常识推理和符号操作。CoT 的主要优势在于能够将困难的单步问题转化为一系列更简单的步骤,从而提高 LLM 推理过程的透明度。这种方法不仅提高了准确性,还为理解模型决策过程提供了宝贵见解,有助于调试和分析。CoT 可以通过多种策略实现,包括提供展示逐步推理的少样本示例,或简单地指示模型"逐步思考"。其有效性源于它能够引导模型的内部处理过程朝着更审慎和逻辑化的方向发展。因此,思维链已成为在当代 LLM 中实现高级推理能力的基石技术。这种增强的透明度和复杂问题分解能力对于自主智能体尤为重要,使它们能够在复杂环境中执行更可靠和可审计的动作。
图 1:CoT 提示词以及智能体的详细的、逐步的响应。
让我们看一个例子。它以一组指令开始,告诉 AI 如何思考,定义其角色和要遵循的清晰五步流程。这是启动结构化思维的提示词。
接下来,示例展示了 CoT 过程的实际应用。标记为"Agent 的思考过程"的部分是模型执行指示步骤的内部独白。这是字面上的"思维链"。最后,"Agent 的最终答案"是由于仔细的、逐步推理过程而生成的精炼的、全面的输出。
You are an Information Retrieval Agent. Your goal is to answer the user's question comprehensively and accurately by thinking step-by-step. Here's the process you must follow:
1. **Analyze the Query:** Understand the core subject and specific requirements of the user's question. Identify key entities, keywords, and the type of information being sought.
2. **Formulate Search Queries (for Knowledge Base):** Based on your analysis, generate a list of precise search queries that you would use to retrieve relevant information from a knowledge base or external tools.
3. **Simulate Information Retrieval (Self-Correction/Reasoning):** For each search query, mentally consider what kind of information you expect to find. If you were to retrieve the content, what would be the most relevant snippets? Think about potential ambiguities or missing pieces.
4. **Synthesize Information:** Based on the simulated retrieval and your understanding of the user's original query, synthesize the gathered information into a coherent and complete answer. Ensure all aspects of the query are addressed.
5. **Review and Refine:** Before finalizing, critically evaluate your answer. Is it accurate? Is it comprehensive? Is it easy to understand? Is it concise? If not, identify what needs to be improved and how.
**User Query:** "Explain the main differences between classical computers and quantum computers, and briefly describe a potential application of quantum computing."
**Agent's Thought Process (Internal CoT Output - encouraged by the prompt):**
**Thought 1 (Analyze Query):** The user wants two things:
1. Main differences between classical and quantum computers.
2. One potential application of quantum computing.
Keywords: classical computers, quantum computers, differences, potential application.
**Thought 2 (Formulate Search Queries):**
- "differences classical vs quantum computers"
- "how quantum computers work"
- "how classical computers work"
- "applications of quantum computing"
**Thought 3 (Simulate Information Retrieval & Identify Key Concepts):**
- From "differences": Expect concepts like bits vs. qubits, superposition, entanglement, probabilistic nature, computational power for specific problems.
- From "how they work": Classical uses transistors/binary. Quantum uses quantum mechanics.
- From "applications": Expect drug discovery, material science, cryptography, optimization.
**Thought 4 (Synthesize Information):** Classical computers use bits (0 or 1), processing information sequentially. Quantum computers use qubits, which can be 0, 1, or both simultaneously (superposition), and can be entangled, allowing for vastly different computational approaches. This enables them to solve certain problems (like factoring large numbers or simulating complex molecules) far more efficiently than classical computers. A potential application is in drug discovery, where quantum computers could simulate molecular interactions with unprecedented accuracy, accelerating the development of new medicines.
**Thought 5 (Review and Refine):**
- Differences are covered (bits vs qubits, superposition, entanglement, processing).
- Application is given (drug discovery).
- Is it brief? Yes.
- Is it clear? Yes.
- Looks good.
**Agent's Final Answer (Output to User):** Classical computers process information using bits, which can be either a 0 or a 1 at any given time, performing operations sequentially. In contrast, quantum computers utilize qubits, which can exist in multiple states simultaneously through superposition and can be linked through entanglement, allowing for exponentially more complex computations. This fundamental difference enables quantum machines to tackle certain problems, such as simulating molecular structures or breaking complex encryption, that are intractable for even the most powerful classical supercomputers. A significant potential application of quantum computing lies in drug discovery, where its ability to precisely model molecular behavior could revolutionize the development of new pharmaceuticals.
**思维树(Tree-of-Thought,ToT)**是一种建立在思维链(CoT)基础上的推理技术。它允许 LLM 通过分支到不同的中间步骤来探索多个推理路径,形成树状结构(见图 2)。这种方法通过支持回溯、自我纠正和探索替代解决方案来应对复杂的问题解决。维护可能性树使得模型能够在最终确定答案之前评估各种推理轨迹。这种迭代过程增强了模型处理需要战略规划和决策的挑战性任务的能力。
图 2:思维树示例
自我纠正,也称为自我改进,是智能体推理过程的一个关键方面,特别是在思维链提示词中。它涉及智能体对生成内容和中间思考过程的内部评估。这种批判性审查使智能体能够识别其理解或解决方案中的模糊性、信息缺口或不准确性。通过审查和改进的迭代循环,Agent 可以调整方法、提升响应质量,并在提供最终输出前确保准确性和完整性。这种内部批评机制增强了智能体产生可靠和高质量结果的能力,如第 4 章的专门示例所示。
这个示例展示了自我纠正的系统过程,这对于改进 AI 生成内容至关重要。它涉及起草、根据原始要求进行审查以及实施具体改进的迭代循环。示例首先概述了 AI 作为"自我纠正 Agent"的功能,并定义了五步分析和修订工作流程。然后,展示了社交媒体帖子的"初稿"。"自我纠正智能体的思考过程"构成了演示的核心部分。在这里,Agent 根据其指令批判性地评估草稿,指出诸如低参与度和模糊的号召性用语等弱点。随后提出具体的改进建议,包括使用更有影响力的动词和表情符号。整个过程以"最终修订内容"结束,这是一个整合了自我识别调整的精炼和显著改进的版本。
You are a highly critical and detail-oriented Self-Correction Agent. Your task is to review a previously generated piece of content against its original requirements and identify areas for improvement. Your goal is to refine the content to be more accurate, comprehensive, engaging, and aligned with the prompt. Here's the process you must follow for self-correction:
1. **Understand Original Requirements:** Review the initial prompt/requirements that led to the content's creation. What was the *original intent*? What were the key constraints or goals?
2. **Analyze Current Content:** Read the provided content carefully.
3. **Identify Discrepancies/Weaknesses:** Compare the current content against the original requirements. Look for:
* **Accuracy Issues:** Are there any factual errors or misleading statements?
* **Completeness Gaps:** Does it fully address all aspects of the original prompt? Is anything missing?
* **Clarity & Coherence:** Is the language clear, concise, and easy to understand? Does it flow logically?
* **Tone & Style:** Does it match the desired tone and style (e.g., professional, engaging, concise)?
* **Engagement:** Is it captivating? Does it hold the reader's attention?
* **Redundancy/Verbosity:** Can any parts be condensed or removed without losing meaning?
4. **Propose Specific Improvements:** For each identified weakness, suggest concrete and actionable changes. Do not just state the problem; propose a solution.
5. **Generate Revised Content:** Based on your proposed improvements, rewrite the original content to incorporate all the necessary changes. Ensure the revised content is polished and ready for final use.
**Original Prompt/Requirements:** "Write a short, engaging social media post (max 150 characters) announcing a new eco-friendly product line: 'GreenTech Gadgets'."
**Initial Draft (Content to be corrected):** "We have new products. They are green and techy. Buy GreenTech Gadgets now!"
**Self-Correction Agent's Thought Process (Internal CoT Output - encouraged by the prompt):**
**Thought 1 (Understand Original Requirements):**
- Short, engaging social media post.
- Max 150 characters.
- Announce "GreenTech Gadgets" - a new eco-friendly product line.
**Thought 2 (Analyze Current Content):** "We have new products. They are green and techy. Buy GreenTech Gadgets now!" (64 characters)
**Thought 3 (Identify Discrepancies/Weaknesses):**
- **Engaging:** Not very engaging. Sounds generic.
- **Eco-friendly:** Implied by "green," but not explicitly highlighted as a benefit.
- **Call to Action:** "Buy now!" is okay, but could be more compelling.
- **Conciseness:** It's short, but could use stronger verbs/phrasing.
- **Impact:** Lacks excitement for a new product launch.
**Thought 4 (Propose Specific Improvements):**
- Make it more dynamic and exciting.
- Emphasize "eco-friendly" and "innovation."
- Use stronger verbs and more active voice.
- Add a clear, engaging call to action.
- Ensure it stays under 150 characters.
**Thought 5 (Generate Revised Content):** Original: "We have new products. They are green and techy. Buy GreenTech Gadgets now!" Revised idea 1: "🌱 Discover GreenTech Gadgets! Our new eco-friendly line blends innovation with sustainability. Go green, go smart! Shop now! #EcoFriendly #GreenTech" (148 characters) This revision directly addresses eco-friendliness, uses emojis for engagement, adds a stronger call to action, and includes relevant hashtags while staying within the character limit.
**Self-Correction Agent's Final Revised Content (Output to User):** 🌱 Discover GreenTech Gadgets! Our new eco-friendly line blends innovation with sustainability. Go green, go smart! Shop now! #EcoFriendly #GreenTech
从根本上说,这种技术将质量控制措施直接集成到智能体的内容生成过程中,产生更精炼、准确和优质的结果,从而更有效地满足复杂的用户需求。
**程序辅助语言模型(Program-Aided Language Models,PALMs)**将 LLM 与符号推理能力相结合。这种集成允许 LLM 在问题解决过程中生成和执行代码,例如 Python。PALMs 将复 杂的计算、逻辑操作和数据操作卸载到确定性编程环境中。这种方法利用传统编程的优势来处理 LLM 在准确性或一致性方面可能存在局限的任务。当面对符号推理挑战时,模型可以生成代码、执行代码,并将结果转换为自然语言。这种混合方法结合了 LLM 的理解和生成能力与精确计算,使模型能够以更高的可靠性和准确性解决更广泛的复杂问题。这对智能体很重要,因为它允许它们通过在理解和生成能力之外利用精确计算来执行更准确和可靠的动作。一个例子是在 Google 的 ADK 中使用外部工具生成代码。
from google.adk.tools import agent_tool
from google.adk.agents import Agent
from google.adk.tools import google_search
from google.adk.code_executors import BuiltInCodeExecutor
search_agent = Agent(
model='gemini-2.0-flash',
name='SearchAgent',
instruction="""
您是 Google 搜索专家
""",
tools=[google_search],
)
coding_agent = Agent(
model='gemini-2.0-flash',
name='CodeAgent',
instruction="""
您是代码执行专家
""",
code_executor=[BuiltInCodeExecutor],
)
root_agent = Agent(
name="RootAgent",
model="gemini-2.0-flash",
description="根 Agent",
tools=[agent_tool.AgentTool(agent=search_agent), agent_tool.AgentTool(agent=coding_agent)],
)
可验证奖励强化学习(Reinforcement Learning with Verifiable Rewards,RLVR):虽然有效,但许多 LLM 使用的标准思维链(CoT)提示词是一种相对基础的推理方法。它生成单一的、预定的思路,而无法适应问题的复杂性。为了克服这些限制,开发了一类新的专门"推理模型"。这些模型的运行方式有所不同,在提供答案之前专门花费可变量的"思考"时间。这个"思考"过程产生更广泛和动态的思维链,可能长达数千个 token。这种扩展推理允许更复杂的行为,如自我纠正和回溯,模型在更困难的问题上投入更多计算资源。实现这些模型的关键创新是一种称为可验证奖励强化学习(RLVR)的训练策略。通过在具有已知正确答案的问题(如数学或代码)上训练模型,它通过试错学习生成有效的长篇推理。这使得模型能够在没有直接人类监督的情况下发展其问题解决能力。最终,这些推理模型不仅产生答案,还生成展示规划、监控和评估等高级技能的"推理轨迹"。这种增强的推理和策略能力是开发自主 AI 智能体的基础,使它们能够在最少人类干预的情况下分解和解决复杂任务。
ReAct(推理与行动,见图 3,其中 KB 代表知识库)是一种将思维链(CoT)提示词与智能体通过工具与外部环境交互的能力相结合的范式。与直接生成最终答案的生成模型不同,ReAct 智能体对采取哪些行动进行推理。这个推理阶段涉及内部规划过程,类似于 CoT,Agent 确定下一步行动,考虑可用工具并预测可能的结果。然后,Agent 通过执行工具或工具调用来行动,例如查询数据库、执行计算或与 API 交互。

图 3:推理和行动
ReAct 以交错的方式运行:Agent 执行一个动作,观察结果,并将此观察纳入后续推理。这种"思考、行动、观察、思考..."的迭代循环允许智能体调整计划、纠正错误并实现需要与环境进行多次交互的目标。与线性 CoT 相比,这提供了更强大和灵活的问题解决方法,因为智能体能够响应实时反馈。通过结合语言模型的理解和生成能力与使用工具的能力,ReAct 使智能体能够执行需要推理和实际执行的复杂任务。这种方法对智能体至关重要,因为它允许它们不仅进行推理,还可以实际执行步骤并与动态环境交互。
CoD(辩论链,Chain of Debates)是微软提出的一种正式 AI 框架,其中多个不同的模型协作和争论以解决问题,超越了单个 AI 的"思维链"。该系统运作类似于 AI 委员会会议,不同的模型提出初步想法,批评彼此的推理,并交换反驳论点。主要目标是通过利用集体智慧来提高准确性、减少偏见并改善最终答案的整体质量。作为 AI 版本的同行评审,这种方法创建了推理过程的透明和可信记录。最终,它代表了从单独智能体提供答案到协作智能体团队共同寻找更可靠和验证的解决方案的转变。
GoD(辩论图,Graph of Debates)是一个高级 Agentic 框架,它将讨论重新构想为动态的、非线性网络,而不是简单的链式结构。在这个模型中,论点是由表示"支持"或"反驳"等关系的边连接的各个节点,反映了真实辩论的多线程性质。这种结构允许新的探究线索动态分支、独立演化,甚至随时间合并。结论不是在序列的末尾达成,而是通过识别整个图中最稳健和得到良好支持的论点集群来达成。在这种情况下,"得到良好支 持"是指已牢固建立和可验证的知识。这包括被认为是基本事实的信息,即本质上是正确的并被广泛接受为事实的内容。此外,它包括通过搜索基础获得的事实证据,其中信息针对外部来源和现实世界数据进行验证。最后,它还涉及在辩论期间由多个模型达成的共识,表明对所呈现信息的高度一致和信心。这种全面的方法确保了所讨论信息的更稳健和可靠的基础。这种方法为复杂的、协作的 AI 推理提供了更全面和现实的模型。
MASS(可选高级主题):对多智能体设计的深入分析表明,其有效性严重依赖于各个智能体的提示词质量以及决定其交互的拓扑结构。设计这些系统的复杂性非常显著,因为它涉及一个庞大而复杂的搜索空间。为了应对这一挑战,开发了一个名为多智能体系统搜索(Multi-Agent System Search,MASS)的新框架来自动化和优化多智能体系统的设计。
MASS 采用多阶段优化策略,通过交错进行提示词和拓扑优化来系统地导航复杂的设计空间(见图 4)。