技能:使用 Claude API 构建
Skill: Build with Claude API
v2.1.63Main routing guide for building LLM-powered applications with Claude, including language detection, surface selection, and architecture overview
使用 Claude 构建 LLM 驱动型应用
本技能帮助您使用 Claude 构建 LLM 驱动型应用。根据您的需求选择合适的交互层面,检测项目语言,然后阅读相关的语言特定文档。
默认设置
除非用户另有要求:
对于 Claude 模型版本,请使用 ,您可以通过确切的模型字符串 {\{OPUS_ID}\} 访问。对于任何稍微复杂的任务,请默认使用自适应思考(thinking: {type: "adaptive"})。最后,对于任何可能涉及长输入、长输出或高 max_tokens 的请求,请默认使用流式传输——这可以防止请求超时。如果您不需要处理单个流事件,请使用 SDK 的 .get_final_message() / .finalMessage() 辅助函数来获取完整响应。
语言检测
在阅读代码示例之前,确定用户正在使用哪种语言:
查看项目文件 以推断语言:
*.py、requirements.txt、pyproject.toml、setup.py、Pipfile→ Python — 从python/读取*.ts、*.tsx、package.json、tsconfig.json→ TypeScript — 从typescript/读取*.js、*.jsx(不存在.ts文件) → TypeScript — JS 使用相同的 SDK,从typescript/读取*.java、pom.xml、build.gradle→ Java — 从java/读取*.kt、*.kts、build.gradle.kts→ Java — Kotlin 使用 Java SDK,从java/读取*.scala、build.sbt→ Java — Scala 使用 Java SDK,从java/读取*.go、go.mod→ Go — 从go/读取*.rb、Gemfile→ Ruby — 从ruby/读取*.cs、*.csproj→ C# — 从csharp/读取*.php、composer.json→ PHP — 从php/读取
如果检测到多种语言(例如,同时存在 Python 和 TypeScript 文件):
- 检查用户当前文件或问题涉及哪种语言
- 如果仍然不明确,请询问:"我检测到 Python 和 TypeScript 文件。您使用哪种语言进行 Claude API 集成?"
如果无法推断语言(空项目、无源文件或不支持的语言):
- 使用 AskUserQuestion 并提供选项:Python、TypeScript、Java、Go、Ruby、cURL/原始 HTTP、C#、PHP
- 如果 AskUserQuestion 不可用,则默认使用 Python 示例并注明:"显示 Python 示例。如果您需要其他语言,请告诉我。"
如果检测到不支持的语言(Rust、Swift、C++、Elixir 等):
- 建议使用来自
curl/的 cURL/原始 HTTP 示例,并注明可能存在社区 SDK - 提供 Python 或 TypeScript 示例作为参考实现
- 建议使用来自
如果用户需要 cURL/原始 HTTP 示例,从
curl/读取。
语言特定功能支持
| 语言 | Tool Runner | Agent SDK | 备注 |
|---|---|---|---|
| Python | 是 (beta) | 是 | 完全支持 — @beta_tool 装饰器 |
| TypeScript | 是 (beta) | 是 | 完全支持 — betaZodTool + Zod |
| Java | 是 (beta) | 否 | 使用带注解的类进行 Beta 工具调用 |
| Go | 是 (beta) | 否 | toolrunner 包中的 BetaToolRunner |
| Ruby | 是 (beta) | 否 | BaseTool + tool_runner 处于 beta 状态 |
| cURL | 不适用 | 不适用 | 原始 HTTP,无 SDK 功能 |
| C# | 否 | 否 | 官方 SDK |
| PHP | 否 | 否 | 官方 SDK |
我应该使用哪个交互层面?
从简单开始。 默认使用满足您需求的最简单层级。单次 API 调用和工作流可以处理大多数用例——只有当任务真正需要开放式的、模型驱动的探索时,才使用 agent。
| 使用场景 | 层级 | 推荐交互层面 | 原因 |
|---|---|---|---|
| 分类、摘要、提取、问答 | 单次 LLM 调用 | Claude API | 一次请求,一次响应 |
| 批量处理或 embeddings | 单次 LLM 调用 | Claude API | 专用端点 |
| 具有代码控制逻辑的多步骤流水线 | 工作流 | Claude API + 工具使用 | 您编排循环 |
| 带有自定义工具的自定义 agent | Agent | Claude API + 工具使用 | 最大灵活性 |
| 具有文件/网络/终端访问权限的 AI agent | Agent | Agent SDK | 内置工具、安全性和 MCP 支持 |
| Agentic 编码助手 | Agent | Agent SDK | 专为此用例设计 |
| 需要内置权限和防护措施 | Agent | Agent SDK | 包含安全功能 |
注意: Agent SDK 适用于当您需要开箱即用的内置文件/网络/终端工具、权限和 MCP 时。如果您想构建一个带有自己工具的 agent,Claude API 是正确的选择——使用 tool runner 进行自动循环处理,或使用手动循环进行细粒度控制(审批门、自定义日志记录、条件执行)。
决策树
您的应用需要什么?
1. 单次 LLM 调用(分类、摘要、提取、问答)
└── Claude API — 一次请求,一次响应
2. Claude 是否需要在其工作过程中读取/写入文件、浏览网页或运行 shell 命令?
(注意:不是您的应用读取文件并交给 Claude——而是 Claude 本身是否需要发现和访问文件/网络/shell?)
└── 是 → Agent SDK — 内置工具,无需重新实现它们
示例:"扫描代码库查找错误"、"总结目录中的每个文件"、
"使用子 agent 查找错误"、"通过网络搜索研究一个主题"
3. 工作流(多步骤、代码编排、带有您自己的工具)
└── 带有工具使用的 Claude API — 您控制循环
4. 开放式 agent(模型决定自己的轨迹,您自己的工具)
└── Claude API agentic 循环(最大灵活性)我应该构建一个 Agent 吗?
在选择 agent 层级之前,检查所有四个标准:
- 复杂性 — 任务是否是多步骤的,并且难以预先完全指定?(例如,"将此设计文档转换为 PR" 与 "从此 PDF 中提取标题")
- 价值 — 结果是否值得更高的成本和延迟?
- 可行性 — Claude 是否能够胜任此类任务?
- 错误成本 — 错误能否被捕获和恢复?(测试、审查、回滚)
如果其中任何一个的答案是"否",请保持在更简单的层级(单次调用或工作流)。
架构
所有内容都通过 POST /v1/messages 进行。工具和输出约束是这个单一端点的功能——不是单独的 API。
用户定义的工具 — 您定义工具(通过装饰器、Zod 模式或原始 JSON),SDK 的 tool runner 处理调用 API、执行您的函数以及循环直到 Claude 完成。为了完全控制,您可以手动编写循环。
服务器端工具 — 由 Anthropic 托管并在 Anthropic 基础设施上运行的工具。代码执行完全在服务器端(在 tools 中声明,Claude 自动运行代码)。Computer use 可以是服务器托管或自托管。
结构化输出 — 约束 Messages API 响应格式(output_config.format)和/或工具参数验证(strict: true)。推荐的方法是 client.messages.parse(),它会根据您的模式自动验证响应。注意:旧的 output_format 参数已弃用;在 messages.create() 上使用 output_config: {format: {...}\}。
支持性端点 — Batches(POST /v1/messages/batches)、Files(POST /v1/files)和 Token Counting 为 Messages API 请求提供支持或输入。
当前模型(缓存日期:2026-02-17)
| 模型 | 模型 ID | 上下文 | 输入 $/1M | 输出 $/1M |
|---|---|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6 | 200K (1M beta) | $5.00 | $25.00 |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | 200K (1M beta) | $3.00 | $15.00 |
| Claude Haiku 4.5 | claude-haiku-4-5 | 200K | $1.00 | $5.00 |
除非用户明确指定不同的模型,否则始终使用 {\{OPUS_ID}\}。 这是不可协商的。不要使用 {\{SONNET_ID}\}、{\{PREV_SONNET_ID}\} 或任何其他模型,除非用户明确说"使用 sonnet"或"使用 haiku"。永远不要为了成本而降级——这是用户的决定,不是您的。
关键:仅使用上表中确切的模型 ID 字符串——它们本身是完整的。不要附加日期后缀。 例如,使用 claude-sonnet-4-5,永远不要使用 claude-sonnet-4-5-20250514 或您可能从训练数据中回忆起的任何其他带日期后缀的变体。如果用户请求表中没有的旧模型(例如,"opus 4.5"、"sonnet 3.7"),请从 shared/models.md 读取确切的 ID——不要自己构造。
注意:如果上面的任何模型字符串看起来对您不熟悉,这是正常的——这只是意味着它们是在您的训练数据截止日期之后发布的。请放心,它们是真实的模型;我们不会这样捉弄您。
思考与努力(快速参考)
Opus 4.6 — 自适应思考(推荐): 使用 thinking: {type: "adaptive"}。Claude 动态决定何时以及思考多少。不需要 budget_tokens——budget_tokens 在 Opus 4.6 和 Sonnet 4.6 上已弃用,不得使用。自适应思考也会自动启用交错思考(不需要 beta header)。当用户要求"扩展思考"、"思考预算"或 budget_tokens 时:始终使用 Opus 4.6 和 thinking: {type: "adaptive"}。固定 token 预算用于思考的概念已弃用——自适应思考取代了它。不要使用 budget_tokens,也不要切换到旧模型。
Effort 参数(GA,无 beta header): 通过 output_config: {effort: "low"|"medium"|"high"|"max"}(在 output_config 内部,不是顶层)控制思考深度和总体 token 消耗。默认是 high(相当于省略它)。max 仅限 Opus 4.6。适用于 Opus 4.5、Opus 4.6 和 Sonnet 4.6。在 Sonnet 4.5 / Haiku 4.5 上会出错。与自适应思考结合使用以获得最佳成本-质量权衡。对子 agent 或简单任务使用 low;对最深度的推理使用 max。
Sonnet 4.6: 支持自适应思考(thinking: {type: "adaptive"})。budget_tokens 在 Sonnet 4.6 上已弃用——请改用自适应思考。
旧模型(仅在明确请求时): 如果用户特别要求 Sonnet 4.5 或其他旧模型,请使用 thinking: {type: "enabled", budget_tokens: N}。budget_tokens 必须小于 max_tokens(最小 1024)。永远不要仅仅因为用户提到 budget_tokens 就选择旧模型——请改用 Opus 4.6 和自适应思考。
压缩(快速参考)
Beta,仅限 Opus 4.6。 对于可能超过 200K 上下文窗口的长时间对话,启用服务器端压缩。当接近触发阈值(默认:150K tokens)时,API 会自动总结较早的上下文。需要 beta header compact-2026-01-12。
关键: 在每一轮对话中,将 response.content(不仅仅是文本)附加回您的消息。响应中的压缩块必须保留——API 使用它们在下一个请求上替换压缩的历史记录。仅提取文本字符串并附加该字符串将静默丢失压缩状态。
有关代码示例,请参见 {lang}/claude-api/README.md(压缩部分)。完整文档通过 WebFetch 在 shared/live-sources.md 中。
阅读指南
检测语言后,根据用户需求阅读相关文件:
快速任务参考
单次文本分类/摘要/提取/问答: → 仅阅读 {lang}/claude-api/README.md
聊天 UI 或实时响应显示: → 阅读 {lang}/claude-api/README.md + {lang}/claude-api/streaming.md
长时间对话(可能超过上下文窗口): → 阅读 {lang}/claude-api/README.md — 参见压缩部分
函数调用 / 工具使用 / agent: → 阅读 {lang}/claude-api/README.md + shared/tool-use-concepts.md + {lang}/claude-api/tool-use.md
批量处理(非延迟敏感): → 阅读 {lang}/claude-api/README.md + {lang}/claude-api/batches.md
跨多个请求的文件上传: → 阅读 {lang}/claude-api/README.md + {lang}/claude-api/files-api.md
具有内置工具(文件/网络/终端)的 Agent: → 阅读 {lang}/agent-sdk/README.md + {lang}/agent-sdk/patterns.md
Claude API(完整文件参考)
阅读 语言特定的 Claude API 文件夹({language}/claude-api/):
{language}/claude-api/README.md— 首先阅读此文件。 安装、快速入门、常见模式、错误处理。shared/tool-use-concepts.md— 当用户需要函数调用、代码执行、记忆或结构化输出时阅读。涵盖概念基础。{language}/claude-api/tool-use.md— 阅读语言特定的工具使用代码示例(tool runner、手动循环、代码执行、记忆、结构化输出)。{language}/claude-api/streaming.md— 在构建聊天 UI 或增量显示响应的界面时阅读。{language}/claude-api/batches.md— 在离线处理许多请求(非延迟敏感)时阅读。以 50% 的成本异步运行。{language}/claude-api/files-api.md— 在跨多个请求发送相同文件而无需重新上传时阅读。shared/error-codes.md— 在调试 HTTP 错误或实现错误处理时阅读。shared/live-sources.md— 用于获取最新官方文档的 WebFetch URL。
注意: 对于 Java、Go、Ruby、C#、PHP 和 cURL——每个都有一个文件涵盖所有基础知识。根据需要阅读该文件加上
shared/tool-use-concepts.md和shared/error-codes.md。
Agent SDK
阅读 语言特定的 Agent SDK 文件夹({language}/agent-sdk/)。Agent SDK 仅适用于 Python 和 TypeScript。
{language}/agent-sdk/README.md— 安装、快速入门、内置工具、权限、MCP、hook。{language}/agent-sdk/patterns.md— 自定义工具、hook、子 agent、MCP 集成、会话恢复。shared/live-sources.md— 当前 Agent SDK 文档的 WebFetch URL。
何时使用 WebFetch
在以下情况下使用 WebFetch 获取最新文档:
- 用户要求"最新"或"当前"信息
- 缓存数据似乎不正确
- 用户询问此处未涵盖的功能
实时文档 URL 位于 shared/live-sources.md 中。
常见陷阱
- 在将文件或内容传递给 API 时,不要截断输入。如果内容太长无法放入上下文窗口,请通知用户并讨论选项(分块、摘要等),而不是静默截断。
- Opus 4.6 / Sonnet 4.6 思考: 使用
thinking: {type: "adaptive"}— 不要使用budget_tokens(在 Opus 4.6 和 Sonnet 4.6 上均已弃用)。对于旧模型,budget_tokens必须小于max_tokens(最小 1024)。如果您弄错了,这将引发错误。 - Opus 4.6 预填充已移除: Assistant 消息预填充(last-assistant-turn prefills)在 Opus 4.6 上返回 400 错误。请使用结构化输出(
output_config.format)或系统提示指令来控制响应格式。 - 128K 输出 tokens: Opus 4.6 支持高达 128K 的
max_tokens,但 SDK 需要流式传输以避免 HTTP 超时。使用.stream()和.get_final_message()/.finalMessage()。 - 工具调用 JSON 解析(Opus 4.6): Opus 4.6 可能在工具调用
input字段中产生不同的 JSON 字符串转义(例如,Unicode 或正斜杠转义)。始终使用json.loads()/JSON.parse()解析工具输入——永远不要对序列化的输入进行原始字符串匹配。 - 结构化输出(所有模型): 在
messages.create()上使用output_config: {format: {...}\}而不是已弃用的output_format参数。这是一个通用的 API 更改,不是 4.6 特定的。 - 不要重新实现 SDK 功能: SDK 提供了高级辅助函数——使用它们而不是从头开始构建。具体来说:使用
stream.finalMessage()而不是将.on()事件包装在new Promise()中;使用类型化的异常类(Anthropic.RateLimitError等)而不是字符串匹配错误消息;使用 SDK 类型(Anthropic.MessageParam、Anthropic.Tool、Anthropic.Message等)而不是重新定义等效接口。 - 不要为 SDK 数据结构定义自定义类型: SDK 导出所有 API 对象的类型。对消息使用
Anthropic.MessageParam,对工具定义使用Anthropic.Tool,对工具结果使用Anthropic.ToolUseBlock/Anthropic.ToolResultBlockParam,对响应使用Anthropic.Message。定义您自己的interface ChatMessage { role: string; content: unknown }会重复 SDK 已经提供的内容并失去类型安全性。 - 报告和文档输出: 对于生成报告、文档或可视化的任务,代码执行 sandbox 预装了
python-docx、python-pptx、matplotlib、pillow和pypdf。Claude 可以生成格式化文件(DOCX、PDF、图表)并通过 Files API 返回它们——对于"报告"或"文档"类型的请求,请考虑使用此方法而不是纯 stdout 文本。
英文原文 / English Original
Building LLM-Powered Applications with Claude
This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
Defaults
Unless the user requests otherwise:
For the Claude model version, please use , which you can access via the exact model string {\{OPUS_ID}\}. Please default to using adaptive thinking (thinking: {type: "adaptive"}) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high max_tokens — it prevents hitting request timeouts. Use the SDK's .get_final_message() / .finalMessage() helper to get the complete response if you don't need to handle individual stream events
Language Detection
Before reading code examples, determine which language the user is working in:
Look at project files to infer the language:
*.py,requirements.txt,pyproject.toml,setup.py,Pipfile→ Python — read frompython/*.ts,*.tsx,package.json,tsconfig.json→ TypeScript — read fromtypescript/*.js,*.jsx(no.tsfiles present) → TypeScript — JS uses the same SDK, read fromtypescript/*.java,pom.xml,build.gradle→ Java — read fromjava/*.kt,*.kts,build.gradle.kts→ Java — Kotlin uses the Java SDK, read fromjava/*.scala,build.sbt→ Java — Scala uses the Java SDK, read fromjava/*.go,go.mod→ Go — read fromgo/*.rb,Gemfile→ Ruby — read fromruby/*.cs,*.csproj→ C# — read fromcsharp/*.php,composer.json→ PHP — read fromphp/
If multiple languages detected (e.g., both Python and TypeScript files):
- Check which language the user's current file or question relates to
- If still ambiguous, ask: "I detected both Python and TypeScript files. Which language are you using for the Claude API integration?"
If language can't be inferred (empty project, no source files, or unsupported language):
- Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP
- If AskUserQuestion is unavailable, default to Python examples and note: "Showing Python examples. Let me know if you need a different language."
If unsupported language detected (Rust, Swift, C++, Elixir, etc.):
- Suggest cURL/raw HTTP examples from
curl/and note that community SDKs may exist - Offer to show Python or TypeScript examples as reference implementations
- Suggest cURL/raw HTTP examples from
If user needs cURL/raw HTTP examples, read from
curl/.
Language-Specific Feature Support
| Language | Tool Runner | Agent SDK | Notes |
|---|---|---|---|
| Python | Yes (beta) | Yes | Full support — @beta_tool decorator |
| TypeScript | Yes (beta) | Yes | Full support — betaZodTool + Zod |
| Java | Yes (beta) | No | Beta tool use with annotated classes |
| Go | Yes (beta) | No | BetaToolRunner in toolrunner pkg |
| Ruby | Yes (beta) | No | BaseTool + tool_runner in beta |
| cURL | N/A | N/A | Raw HTTP, no SDK features |
| C# | No | No | Official SDK |
| PHP | No | No | Official SDK |
Which Surface Should I Use?
Start simple. Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.
| Use Case | Tier | Recommended Surface | Why |
|---|---|---|---|
| Classification, summarization, extraction, Q&A | Single LLM call | Claude API | One request, one response |
| Batch processing or embeddings | Single LLM call | Claude API | Specialized endpoints |
| Multi-step pipelines with code-controlled logic | Workflow | Claude API + tool use | You orchestrate the loop |
| Custom agent with your own tools | Agent | Claude API + tool use | Maximum flexibility |
| AI agent with file/web/terminal access | Agent | Agent SDK | Built-in tools, safety, and MCP support |
| Agentic coding assistant | Agent | Agent SDK | Designed for this use case |
| Want built-in permissions and guardrails | Agent | Agent SDK | Safety features included |
Note: The Agent SDK is for when you want built-in file/web/terminal tools, permissions, and MCP out of the box. If you want to build an agent with your own tools, Claude API is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).
Decision Tree
What does your application need?
1. Single LLM call (classification, summarization, extraction, Q&A)
└── Claude API — one request, one response
2. Does Claude need to read/write files, browse the web, or run shell commands
as part of its work? (Not: does your app read a file and hand it to Claude —
does Claude itself need to discover and access files/web/shell?)
└── Yes → Agent SDK — built-in tools, don't reimplement them
Examples: "scan a codebase for bugs", "summarize every file in a directory",
"find bugs using subagents", "research a topic via web search"
3. Workflow (multi-step, code-orchestrated, with your own tools)
└── Claude API with tool use — you control the loop
4. Open-ended agent (model decides its own trajectory, your own tools)
└── Claude API agentic loop (maximum flexibility)Should I Build an Agent?
Before choosing the agent tier, check all four criteria:
- Complexity — Is the task multi-step and hard to fully specify in advance? (e.g., "turn this design doc into a PR" vs. "extract the title from this PDF")
- Value — Does the outcome justify higher cost and latency?
- Viability — Is Claude capable at this task type?
- Cost of error — Can errors be caught and recovered from? (tests, review, rollback)
If the answer is "no" to any of these, stay at a simpler tier (single call or workflow).
Architecture
Everything goes through POST /v1/messages. Tools and output constraints are features of this single endpoint — not separate APIs.
User-defined tools — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.
Server-side tools — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in tools, Claude runs code automatically). Computer use can be server-hosted or self-hosted.
Structured outputs — Constrains the Messages API response format (output_config.format) and/or tool parameter validation (strict: true). The recommended approach is client.messages.parse() which validates responses against your schema automatically. Note: the old output_format parameter is deprecated; use output_config: {format: {...}\} on messages.create().
Supporting endpoints — Batches (POST /v1/messages/batches), Files (POST /v1/files), and Token Counting feed into or support Messages API requests.
Current Models (cached: 2026-02-17)
| Model | Model ID | Context | Input $/1M | Output $/1M |
|---|---|---|---|---|
| Claude Opus 4.6 | claude-opus-4-6 | 200K (1M beta) | $5.00 | $25.00 |
| Claude Sonnet 4.6 | claude-sonnet-4-6 | 200K (1M beta) | $3.00 | $15.00 |
| Claude Haiku 4.5 | claude-haiku-4-5 | 200K | $1.00 | $5.00 |
ALWAYS use {\{OPUS_ID}\} unless the user explicitly names a different model. This is non-negotiable. Do not use {\{SONNET_ID}\}, {\{PREV_SONNET_ID}\}, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.
CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes. For example, use claude-sonnet-4-5, never claude-sonnet-4-5-20250514 or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read shared/models.md for the exact ID — do not construct one yourself.
A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.
Thinking & Effort (Quick Reference)
Opus 4.6 — Adaptive thinking (recommended): Use thinking: {type: "adaptive"}. Claude dynamically decides when and how much to think. No budget_tokens needed — budget_tokens is deprecated on Opus 4.6 and Sonnet 4.6 and must not be used. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). When the user asks for "extended thinking", a "thinking budget", or budget_tokens: always use Opus 4.6 with thinking: {type: "adaptive"}. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use budget_tokens and do NOT switch to an older model.
Effort parameter (GA, no beta header): Controls thinking depth and overall token spend via output_config: {effort: "low"|"medium"|"high"|"max"} (inside output_config, not top-level). Default is high (equivalent to omitting it). max is Opus 4.6 only. Works on Opus 4.5, Opus 4.6, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. Combine with adaptive thinking for the best cost-quality tradeoffs. Use low for subagents or simple tasks; max for the deepest reasoning.
Sonnet 4.6: Supports adaptive thinking (thinking: {type: "adaptive"}). budget_tokens is deprecated on Sonnet 4.6 — use adaptive thinking instead.
Older models (only if explicitly requested): If the user specifically asks for Sonnet 4.5 or another older model, use thinking: {type: "enabled", budget_tokens: N}. budget_tokens must be less than max_tokens (minimum 1024). Never choose an older model just because the user mentions budget_tokens — use Opus 4.6 with adaptive thinking instead.
Compaction (Quick Reference)
Beta, Opus 4.6 only. For long-running conversations that may exceed the 200K context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header compact-2026-01-12.
Critical: Append response.content (not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.
See {lang}/claude-api/README.md (Compaction section) for code examples. Full docs via WebFetch in shared/live-sources.md.
Reading Guide
After detecting the language, read the relevant files based on what the user needs:
Quick Task Reference
Single text classification/summarization/extraction/Q&A: → Read only {lang}/claude-api/README.md
Chat UI or real-time response display: → Read {lang}/claude-api/README.md + {lang}/claude-api/streaming.md
Long-running conversations (may exceed context window): → Read {lang}/claude-api/README.md — see Compaction section
Function calling / tool use / agents: → Read {lang}/claude-api/README.md + shared/tool-use-concepts.md + {lang}/claude-api/tool-use.md
Batch processing (non-latency-sensitive): → Read {lang}/claude-api/README.md + {lang}/claude-api/batches.md
File uploads across multiple requests: → Read {lang}/claude-api/README.md + {lang}/claude-api/files-api.md
Agent with built-in tools (file/web/terminal): → Read {lang}/agent-sdk/README.md + {lang}/agent-sdk/patterns.md
Claude API (Full File Reference)
Read the language-specific Claude API folder ({language}/claude-api/):
{language}/claude-api/README.md— Read this first. Installation, quick start, common patterns, error handling.shared/tool-use-concepts.md— Read when the user needs function calling, code execution, memory, or structured outputs. Covers conceptual foundations.{language}/claude-api/tool-use.md— Read for language-specific tool use code examples (tool runner, manual loop, code execution, memory, structured outputs).{language}/claude-api/streaming.md— Read when building chat UIs or interfaces that display responses incrementally.{language}/claude-api/batches.md— Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost.{language}/claude-api/files-api.md— Read when sending the same file across multiple requests without re-uploading.shared/error-codes.md— Read when debugging HTTP errors or implementing error handling.shared/live-sources.md— WebFetch URLs for fetching the latest official documentation.
Note: For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus
shared/tool-use-concepts.mdandshared/error-codes.mdas needed.
Agent SDK
Read the language-specific Agent SDK folder ({language}/agent-sdk/). Agent SDK is available for Python and TypeScript only.
{language}/agent-sdk/README.md— Installation, quick start, built-in tools, permissions, MCP, hooks.{language}/agent-sdk/patterns.md— Custom tools, hooks, subagents, MCP integration, session resumption.shared/live-sources.md— WebFetch URLs for current Agent SDK docs.
When to Use WebFetch
Use WebFetch to get the latest documentation when:
- User asks for "latest" or "current" information
- Cached data seems incorrect
- User asks about features not covered here
Live documentation URLs are in shared/live-sources.md.
Common Pitfalls
- Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.
- Opus 4.6 / Sonnet 4.6 thinking: Use
thinking: {type: "adaptive"}— do NOT usebudget_tokens(deprecated on both Opus 4.6 and Sonnet 4.6). For older models,budget_tokensmust be less thanmax_tokens(minimum 1024). This will throw an error if you get it wrong. - Opus 4.6 prefill removed: Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6. Use structured outputs (
output_config.format) or system prompt instructions to control response format instead. - 128K output tokens: Opus 4.6 supports up to 128K
max_tokens, but the SDKs require streaming for largemax_tokensto avoid HTTP timeouts. Use.stream()with.get_final_message()/.finalMessage(). - Tool call JSON parsing (Opus 4.6): Opus 4.6 may produce different JSON string escaping in tool call
inputfields (e.g., Unicode or forward-slash escaping). Always parse tool inputs withjson.loads()/JSON.parse()— never do raw string matching on the serialized input. - Structured outputs (all models): Use
output_config: {format: {...}\}instead of the deprecatedoutput_formatparameter onmessages.create(). This is a general API change, not 4.6-specific. - Don't reimplement SDK functionality: The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use
stream.finalMessage()instead of wrapping.on()events innew Promise(); use typed exception classes (Anthropic.RateLimitError, etc.) instead of string-matching error messages; use SDK types (Anthropic.MessageParam,Anthropic.Tool,Anthropic.Message, etc.) instead of redefining equivalent interfaces. - Don't define custom types for SDK data structures: The SDK exports types for all API objects. Use
Anthropic.MessageParamfor messages,Anthropic.Toolfor tool definitions,Anthropic.ToolUseBlock/Anthropic.ToolResultBlockParamfor tool results,Anthropic.Messagefor responses. Defining your owninterface ChatMessage { role: string; content: unknown }duplicates what the SDK already provides and loses type safety. - Report and document output: For tasks that produce reports, documents, or visualizations, the code execution sandbox has
python-docx,python-pptx,matplotlib,pillow, andpypdfpre-installed. Claude can generate formatted files (DOCX, PDF, charts) and return them via the Files API — consider this for "report" or "document" type requests instead of plain stdout text.