Agent 推理模式深度指南

ReAct · Plan-then-Execute · Reflexion — 从论文到生产的完整参考

能力模型：三种模式的定位

复杂度阶梯 ────────────────────────────────────────────────────── Level 4 ┌──────────────────────────────────────────┐ 组合模式 │ LATS = ReAct + Reflexion + Tree Search │ │ Plan-Execute + ReAct子循环 + Reflexion │ └──────────────────────────────────────────┘ ▲ Level 3 ┌─────────────────────────────┐ 自我改进 │ Reflexion (反思循环) │ │ 多轮重试 + 语言强化学习 │ └─────────────────────────────┘ ▲ Level 2 ┌─────────────────────────────┐ 规划分离 │ Plan-then-Execute │ │ 策略阶段 + 执行阶段 + 审批门 │ └─────────────────────────────┘ ▲ Level 1 ┌─────────────────────────────┐ 显式推理 │ ReAct (Thought-Action-Obs) │ │ 可审计的推理-行动循环 │ └─────────────────────────────┘ ▲ Level 0 ┌─────────────────────────────┐ 基础循环 │ Agent Loop (s01 style) │ │ while tool_use: execute() │ └─────────────────────────────┘

三模式速查卡

ReAct

核心思想：先想再做，推理可见

论文：Yao et al., ICLR 2023

循环：Thought → Action → Observation → Loop

适用：探索性任务、调试、研究

代码行：244 (s13)

生产采用率: 最高

Plan-then-Execute

核心思想：计划是合同，执行是交付

论文：Wang et al., ACL 2023 + BabyAGI

循环：Plan → Approve → Execute → Re-plan

适用：结构化任务、流水线、报告

代码行：290 (s14)

生产采用率: 中

Reflexion

核心思想：失败是下一轮的输入

论文：Shinn et al., NeurIPS 2023

循环：Attempt → Evaluate → Reflect → Retry

适用：编码、有明确评判标准的任务

代码行：292 (s15)

生产采用率: 限定场景

知识地图

┌────────────────────┐ │ 用户任务 Query │ └─────────┬──────────┘ │ ┌───────────────────┼───────────────────┐ ▼ ▼ ▼ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ ReAct │ │ Plan-Exec │ │Reflexion │ │ │ │ │ │ │ │ Thought │ │ Planner │ │ Executor │ │ Action │◄──────│ Executor │──────►│ Evaluator│ │ Observe │ │ Approver │ │ Reflector│ └──────────┘ └──────────┘ └──────────┘ │ │ │ └───────────────────┼───────────────────┘ ▼ ┌────────────────────┐ │ 组合: LATS / 分层 │ │ 策略层 + 战术层 │ │ + 质量层 │ └────────────────────┘ 论文谱系: CoT (2022) ──► ReAct (2022) ──► Plan-and-Solve (2023) │ │ ▼ ▼ Reflexion (2023) ──► LATS (2023) ──► Adaptive Agent (2025+)

代码文件索引

文件	模式	LOC	核心类	工具数	LLM 角色
`agents/s13_react.py`	ReAct	244	ThoughtLog	6	1
`agents/s14_plan_execute.py`	Plan-Exec	290	PlanManager	8	2
`agents/s15_reflexion.py`	Reflexion	292	ReflexionMemory	4	3

ReAct: Reasoning + Acting

先想再做 — 显式推理链让智能体的决策过程可审计、可干预

原理

核心洞察：基础 Agent Loop 中，模型的 "思考" 和 "行动" 混在一起。你能看到它选了什么工具和参数，但看不到为什么这么选。ReAct 把推理从模型的黑盒中拿出来，变成一个显式的、可记录的工具调用。

循环流程

Thought

→

Action

→

Observation

→

Loop / Stop

while True: response = client.messages.create( messages=messages, tools=TOOLS # 包含 think + action tools + final_answer ) tool_calls = [b for b in response.content if b.type == "tool_use"] # ──── 关键：think-before-act 检查 ──── has_think = any(tc.name == "think" for tc in tool_calls) has_action = any(tc.name in ACTION_TOOLS for tc in tool_calls) if has_action and not has_think: inject_reminder("You acted without thinking first!") for tc in tool_calls: result = execute(tc) if tc.name == "final_answer": return result # 显式终止

实现详解 (s13_react.py)

ThoughtLog 类

class ThoughtLog: """累积推理链，提供可审计记录""" def __init__(self): self.entries: list[dict] = [] def add(self, thought: str) -> str: step = len(self.entries) + 1 self.entries.append({"step": step, "thought": thought}) return f"[Thought #{step} recorded]" def render(self) -> str: return "\n".join( f"Step {e['step']}: {e['thought']}" for e in self.entries )

6 个工具

工具	类型	参数	职责
`think`	元认知	thought: str	记录推理到 ThoughtLog
`bash`	行动	command: str	执行 shell 命令
`read_file`	行动	path, limit?	读取文件
`write_file`	行动	path, content	写入文件
`edit_file`	行动	path, old, new	替换文本
`final_answer`	终止	answer: str	结束循环，返回结果

设计决策

Think-Before-Act 强制

ACTION_TOOLS 集合定义了哪些工具需要先 think。如果模型跳过思考直接行动，系统注入 <reminder> 提醒。这是一种 "唠叨式" 约束 — 不阻止执行，但持续提醒。

显式终止 vs 自然终止

final_answer 工具给了模型主动结束的能力。对比 s01 的 stop_reason != "tool_use"，这让终止变成一个有意识的决策而非副作用。

ReAct 的局限

无限重试循环

Agent 反复调用同一工具却不收敛。高算力消耗，零进展。

缓解：重复行动检测 + 早停

幻觉观察

工具返回空结果时，模型捏造答案而非换思路。后续推理在错误基础上层层叠加。

缓解：空结果显式标记 + 强制 think

上下文漂移

对话越长，原始任务定义在 prompt 中被稀释，Agent 偏离目标。

缓解：每步重申任务 (Focused ReAct)

Token 低效

每次工具调用需包含全部历史。ReWOO 研究表明 ReAct 平均多用 ~64% Token。

缓解：上下文压缩 (s06) / ReWOO 批量规划

生产采用

ReAct 是目前生产中最广泛的模式。 Claude Code、GitHub Copilot Agent、Cursor、LangChain create_react_agent()、LlamaIndex ReActAgent 都是 ReAct 或其变体。Anthropic/OpenAI 的原生 tool_use 协议本质就是 ReAct。

论文： Yao et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629

Plan-then-Execute

计划是合同，执行是交付 — 两阶段分离让复杂任务可管理

原理

核心洞察：ReAct 边想边做，适合探索；但对于结构化任务（数据管道、报告生成），先制定完整策略再逐步执行更高效。关键创新：规划器和执行器使用完全不同的工具集，形成物理隔离。

两阶段流程

┌─────────────────────────────────────────────────────────────┐ │ PHASE 1: PLAN │ │ │ │ User Query ──► Planner LLM ──► create_plan / revise_plan │ │ ▲ │ │ │ feedback │ │ │ │ │ ┌───────┴───────┐ │ │ │ Approval Gate │ ◄── y / n / feedback │ │ └───────────────┘ │ └──────────────────────┬──────────────────────────────────────┘ │ approved plan ▼ ┌─────────────────────────────────────────────────────────────┐ │ PHASE 2: EXECUTE │ │ │ │ Plan (injected) ──► Executor LLM ──► bash / read / write │ │ ──► mark_step / report │ │ │ │ Step 1 [x] ──► Step 2 [>] ──► Step 3 [ ] ──► ... │ └─────────────────────────────────────────────────────────────┘

实现详解 (s14_plan_execute.py)

PlanManager 类

class PlanManager: """计划 = 有序步骤列表 + 生命周期状态""" def create(self, steps: list[str]) -> str: self.steps = [ {"id": f"step_{i+1}", "description": s, "status": "pending"} for i, s in enumerate(steps[:20]) ] return self.render() def mark(self, step_id: str, status: str) -> str: # pending -> in_progress -> completed | failed step["status"] = status return self.render() def render(self) -> str: markers = {"pending": "[ ]", "in_progress": "[>]", "completed": "[x]", "failed": "[!]"} ...

工具集分离

Phase 1: 规划工具 (2)

create_plan — 创建步骤列表
revise_plan — 替换当前计划

规划器不能执行任何操作。

Phase 2: 执行工具 (6)

bash / read_file / write_file / edit_file
mark_step — 标记步骤进度
report_failure — 报告失败

执行器不能修改计划结构。

审批门 (Approval Gate)

# 计划-审批-修改循环 while True: print(PLAN.render()) choice = input("Approve? (y/n/feedback): ") if choice == "y": break # 进入执行阶段 elif choice == "n": return # 放弃 else: messages.append(user_feedback) plan_loop(messages) # 让 LLM 修改计划

与 s03 的对比：s03 (TodoWrite) 中模型在同一个循环里规划和执行，可以随时修改计划。s14 把计划锁定在审批后，形成 "合同" — 执行器只能标记进度，不能改变方向。

ReAct vs Plan-Execute 选择指南

维度	ReAct	Plan-Exec
规划方式	隐式，逐步发现	显式，预先完整规划
API 调用量	多（每步都是 LLM 调用）	较少（规划一次，批量执行）
适应性	高 — 每步可调整	低，除非加 Re-planner
适用场景	探索、调试、研究	流水线、报告、结构化工作流
典型失败	循环、漂移、过度行动	过时计划、上下文丢失
Token 效率	低（上下文膨胀）	高（计划紧凑）

反模式

过时计划

环境变化后计划不更新，执行器盲目跟随已失效的步骤。

缓解：加 Re-planner 节点 (LangGraph 模板)

过度规划

生成过于详细的计划，浪费 Token 且过度约束执行器。

缓解：限制步骤数 (max 20)，粒度适中

交接漂移

Planner 和 Executor 之间上下文传递丢失信息。

缓解：将完整计划注入执行 system prompt

计划幻觉

规划器生成引用不存在工具或能力的步骤。

缓解：将可用工具列表提供给规划器

论文： Wang et al. (2023). Plan-and-Solve Prompting. ACL 2023. arXiv:2305.04091
项目： BabyAGI (Yohei Nakajima, 2023). babyagi.org

Reflexion: 语言强化学习

失败是下一轮的输入 — 用语言反馈替代梯度更新的自我改进循环

原理

核心洞察：传统 RL 用梯度更新权重来从失败中学习。Reflexion 用语言文本作为强化信号 — Agent 把自己的失败分析写成文字，注入到下一次尝试的上下文中。不改模型权重，只改 prompt。

三角色架构

┌──────────────────────────────────────────────────────┐ │ REFLEXION LOOP │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Executor │────►│Evaluator │────►│Reflector │ │ │ │ (工具+行) │ │ (打分判) │ │ (分析因) │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ▲ │ │ │ │ ▼ │ │ │ ┌────────────────────┐ │ │ └──────────│ ReflexionMemory │ │ │ │ │ │ │ └────────────────────┘ │ │ │ │ Attempt 1 ──► score < 80 ──► reflect ──┐ │ │ Attempt 2 ──► score < 80 ──► reflect ──┤ │ │ Attempt 3 ──► score >= 80 ──► SUCCESS │ │ │ OR return best │ │ └──────────────────────────────────────────────────────┘

实现详解 (s15_reflexion.py)

三个 System Prompt

Executor

执行任务的主要 Agent，拥有全部工具（bash/read/write/edit）。被告知 "从过去的反思中学习，避免重复错误"。

Evaluator

严格的评判者。输出格式固定为 JSON：{"pass": bool, "reason": str, "score": 0-100}。score ≥ 80 即通过。

Reflector

失败分析师。分析具体哪一步出了问题，生成 ≤ 150 字的可执行建议。不能模糊地说 "更仔细"。

ReflexionMemory 类

class ReflexionMemory: """跨尝试的反思累积器""" def add(self, reflection: str, attempt: int): self.reflections.append({ "attempt": attempt, "reflection": reflection }) def render(self) -> str: if not self.reflections: return "" lines = [f"Attempt {r['attempt']}: {r['reflection']}" for r in self.reflections] return f"<past_reflections>\n" + "\n".join(lines) + "\n</past_reflections>"

外层循环

def reflexion_loop(task: str) -> str: memory = ReflexionMemory() best_result, best_score = "", -1 for attempt in range(MAX_ATTEMPTS): # MAX_ATTEMPTS = 3 prompt = task if memory.reflections: prompt += "\n\n" + memory.render() # 注入历史反思 result = attempt_loop([{"role": "user", "content": prompt}]) evaluation = evaluate(task, result) # Evaluator LLM if evaluation["score"] > best_score: best_result, best_score = result, evaluation["score"] if evaluation["pass"]: return result # 通过! reflection = reflect(task, result, evaluation) # Reflector LLM memory.add(reflection, attempt + 1) return best_result # 返回最佳结果 (graceful degradation)

Benchmark 表现

基准	原始模型	+ Reflexion	提升
HumanEval (编码)	GPT-4: 80%	91% pass@1	+11%
AlfWorld (决策)	~100/134	134/134	100%
HotpotQA (推理)	ReAct baseline	显著提升	++

反模式

泛泛反思

"我应该更仔细一些" — 这种反思毫无可操作性。

缓解：Reflector prompt 要求具体步骤和精确错误

局部最小值

Agent 重复犯同类错误，反思未触及根因。

缓解：注入历史反思后，要求对比前次策略

简单任务退化

初始准确率已经很高时，自我反思反而让 Agent 推翻正确答案。

缓解：设定高通过阈值 (score ≥ 80)，避免过度纠正

评估者依赖

没有可靠外部反馈（测试套件、标准答案）时，自评不可靠。

缓解：限制在有明确评判标准的场景使用

Reflexion vs Constitutional AI

	Reflexion	Constitutional AI
时机	推理时 (inference-time)	训练时 (training-time)
反馈来源	自评 + 外部评估器	预定义原则
学习方式	文本记忆（context window）	梯度更新（权重）
持久性	会话内有效	永久刻入模型

两者互补：Constitutional AI 训练出更擅长自我批评的模型，Reflexion 给这个模型一个在推理时应用自我批评的结构化框架。

论文： Shinn et al. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366

模式组合与对比

三种模式不是互斥的 — 它们是可组合的构建块

组合矩阵

ReAct Plan-Execute Reflexion ─────────── ──────────────── ─────────── ReAct -- P-E 用 ReAct Reflexion 包裹作为执行器 ReAct 作为 Actor Plan-Exec ReAct 执行 -- Reflexion 反思每个计划步骤计划质量 Reflexion ReAct 是计划失败触发 -- 内部循环反思+重新规划

常见组合模式

Plan-Execute + ReAct 子循环

最常见的混合。Planner 制定策略，每个步骤由 ReAct 子 Agent 自适应执行。LangGraph 的 plan-and-execute 模板就是这种。

Plan

→

ReAct_step1

→

ReAct_step2

→

...

ReAct + Reflexion

ReAct 循环失败后，Reflexion 生成批评并重试。这是多数编码 Agent（SWE-bench、Devin 类）的模式。

ReAct

→

Eval

→

Reflect

→

ReAct v2

LATS = ReAct + Reflexion + 树搜索

最完整的组合：ReAct 负责行动，Reflexion 从失败学习，MCTS 探索多路径。HumanEval 上 GPT-4 达到 92.7% pass@1（ICML 2024）。

MCTS Root

→

ReAct Path A

ReAct Path B

→

Reflect

→

Expand Best

分层架构：生产级 Agent

┌─────────────────────────────────────────────────────────┐ │ QUALITY LAYER (质量层) │ │ Reflexion: 评估结果 → 分析失败 → 注入改进 │ ├─────────────────────────────────────────────────────────┤ │ TACTICAL LAYER (战术层) │ │ ReAct: Thought → Action → Observation → 自适应执行 │ ├─────────────────────────────────────────────────────────┤ │ STRATEGIC LAYER (策略层) │ │ Plan-then-Execute: 任务分解 → 审批 → 步骤调度 │ └─────────────────────────────────────────────────────────┘ Claude Code / Devin / 生产编码 Agent 的实际架构就是这三层的不同程度组合。

全面对比表

维度	ReAct	Plan-Exec	Reflexion
核心循环	Think → Act → Observe	Plan → Approve → Execute	Attempt → Eval → Reflect → Retry
LLM 角色数	1	2 (Planner + Executor)	3 (Executor + Evaluator + Reflector)
状态管理	ThoughtLog	PlanManager (步骤状态机)	ReflexionMemory
人类参与	无（自主）	审批门	无（自主）
终止条件	final_answer / 自然停止	计划完成	评分通过 / MAX_ATTEMPTS
Token 效率	低	高	低 (多次重试)
延迟	中	中	高 (3x 重试)
适用场景	探索、调试	流水线、报告	编码、有评判标准
生产采用	最广	中	限定

失败模式交叉对比

失败模式	ReAct	Plan-Exec	Reflexion
无限循环	高风险	低 (计划有界)	中 (有界重试)
幻觉	捏造观察	幻觉步骤	泛泛反思
Token 浪费	上下文膨胀	过度规划	失败尝试
脆弱性	工具故障级联	过时计划	依赖评估器

行业演进与参考索引

从论文到生产 — Agent 推理模式的发展时间线

时间线

2022.01 — Chain-of-Thought Prompting (Wei et al.)

开创了让模型 "展示思考过程" 的范式，为后续所有推理模式奠基。

2022.10 — ReAct (Yao et al.)

将推理和行动统一在一个交错循环中。定义了 Thought-Action-Observation 范式。

2023.03 — Reflexion (Shinn et al.)

语言强化学习：用文本反思替代梯度更新。HumanEval 80% → 91%。

2023.04 — BabyAGI + Auto-GPT

Plan-then-Execute 模式的开源爆发。任务创建 → 优先级 → 执行循环。

2023.05 — Plan-and-Solve (Wang et al.) + ReWOO (Xu et al.)

Plan-and-Solve 证明规划分离优于 Zero-shot CoT。ReWOO 批量规划省 64% Token。

2023.10 — LATS (Zhou et al.)

ReAct + Reflexion + MCTS 的组合。HumanEval 92.7% pass@1。ICML 2024。

2024.11 — MCP (Anthropic)

Model Context Protocol 统一了工具连接标准，10,000+ 服务器。所有模式共享 Action 层。

2025+ — Adaptive Agent Loop 收敛

Plan + ReAct + Reflexion 融合为统一循环：规划 → 自适应执行 → 评估 → 反思 → 重规划。

收敛趋势

三种模式正在融合为统一架构。 LangGraph、AutoGen、CrewAI 都将 ReAct、Plan-Execute、Reflexion 作为可组合模块。模式名称正从 "竞争架构" 变成 "设计词汇"。

Adaptive Agent Loop（2025+ 统一范式） ────────────────────────────────────── 1. PLAN: 分解目标为子任务 (Plan-then-Execute) 2. EXECUTE: 每个子任务运行 ReAct 循环 (ReAct) 3. EVALUATE: 检查结果是否满足标准 (Reflexion - Evaluator) 4. REFLECT: 失败时生成语言批评 (Reflexion - Reflector) 5. REPLAN: 将反思纳入更新计划 (Plan-then-Execute v2) 6. REPEAT: 直到目标达成或预算耗尽

尚未收敛的领域

治理与安全

高度碎片化。可观测性 ≠ 可控性。各框架安全模型不统一。

评测标准

SWE-bench 覆盖编码，但通用 Agent 缺乏标准基准测试。

成本/延迟权衡

完整统一模式很贵。多数生产系统根据预算选择子集。

完整参考索引

奠基论文

论文	作者	会议	链接
ReAct: Synergizing Reasoning and Acting	Yao et al.	ICLR 2023	arXiv:2210.03629
Reflexion: Language Agents with Verbal RL	Shinn et al.	NeurIPS 2023	arXiv:2303.11366
Plan-and-Solve Prompting	Wang et al.	ACL 2023	arXiv:2305.04091
Language Agent Tree Search (LATS)	Zhou et al.	ICML 2024	arXiv:2310.04406
ReWOO: Decoupling Reasoning from Observations	Xu et al.	arXiv 2023	arXiv:2305.18323
Self-Refine: Iterative Refinement with Self-Feedback	Madaan et al.	NeurIPS 2023	arXiv:2303.17651
Chain-of-Thought Prompting	Wei et al.	NeurIPS 2022	arXiv:2201.11903

综述论文

论文	链接
The Landscape of Emerging AI Agent Architectures (2024)	arXiv:2404.11584
Concentrix: 12 Failure Patterns of Agentic AI Systems	Link
Microsoft: Taxonomy of Failure Modes in AI Agents (2025)	Link

实践资源

资源	链接
LangChain - Planning Agents	Link
IBM - What is a ReAct Agent?	Link
Anthropic - Building Effective Agents (2024)	Link
Prompt Engineering Guide - ReAct	Link
Prompt Engineering Guide - Reflexion	Link
Learn Claude Code (本项目)	learn-claude-code.pages.dev

本地代码文件

文件	说明
`~/lccode-web/agents/s13_react.py`	ReAct 完整实现 (244 LOC, 6 tools)
`~/lccode-web/agents/s14_plan_execute.py`	Plan-Execute 完整实现 (290 LOC, 8 tools)
`~/lccode-web/agents/s15_reflexion.py`	Reflexion 完整实现 (292 LOC, 4 tools)
`~/lccode-web/docs/zh/s13-react.md`	ReAct 中文文档
`~/lccode-web/docs/zh/s14-plan-execute.md`	Plan-Execute 中文文档
`~/lccode-web/docs/zh/s15-reflexion.md`	Reflexion 中文文档

Web 预览：cd ~/lccode-web && npm run dev -- -p 3099 然后访问 http://localhost:3099/zh/s13