# 变更日志

URL: https://developers.cloudflare.com/workers-ai/changelog/

import { ProductReleaseNotes } from "~/components";

{/* <!-- Actual content lives in /src/content/release-notes/workers-ai.yaml. Update the file there for new entries to appear here. For more details, refer to https://developers.cloudflare.com/style-guide/documentation-content-strategy/content-types/changelog/#yaml-file --> */}

<ProductReleaseNotes />

---

# 在 Cloudflare 上构建 Agents

URL: https://developers.cloudflare.com/agents/

import {
	CardGrid,
	Description,
	Feature,
	LinkButton,
	LinkTitleCard,
	PackageManagers,
	Plan,
	RelatedProduct,
	Render,
	TabItem,
	Tabs,
	TypeScriptExample,
} from "~/components";

Agents SDK 让您能够构建和部署AI驱动的智能体，这些智能体可以自主执行任务、与客户端实时通信、调用AI模型、持久化状态、调度任务、运行异步工作流、浏览网络、从数据库查询数据、支持人机协作交互，以及[更多功能](/agents/api-reference/)。

### 发布您的第一个 Agent

要使用agents-starter模板并通过Agents SDK创建您的第一个 Agent：

```sh
# 安装
npm create cloudflare@latest agents-starter -- --template=cloudflare/agents-starter
# 部署
npx wrangler@latest deploy
```

前往[构建聊天 Agent](/agents/getting-started/build-a-chat-agent) 指南，了解agents-starter项目是如何构建的，以及如何将其作为您自己 Agents 的基础。

如果您已经在 [Workers](/workers/) 上进行开发，您可以直接将 `agents` 包安装到现有项目中：

```sh
npm i agents
```

然后通过创建一个继承 `Agent` 类的类来定义您的第一个 Agent：

<TypeScriptExample>

```ts
import { Agent, AgentNamespace } from "agents";

export class MyAgent extends Agent {
	// 在 Agent 上定义方法:
	// https://developers.cloudflare.com/agents/api-reference/agents-api/
	//
	// 每个 Agent 都有通过 this.setState 和 this.sql 的内置状态
	// 通过 this.schedule 的内置调度功能
	// Agents 支持 WebSockets、HTTP 请求、状态同步，并且
	// 可以运行几秒钟、几分钟或几小时：只要任务需要多长时间。
}
```

</TypeScriptExample>

深入了解 [Agent SDK 参考文档](/agents/api-reference/agents-api/)，了解更多关于如何使用 Agents SDK 包和定义 `Agent` 的信息。

### 为什么在 Cloudflare 上构建 Agents？

我们在构建 Agents SDK 时考虑了以下几个方面：

- **内置状态管理**: Agents 配备了[内置状态管理](/agents/api-reference/store-and-sync-state/)，能够自动在 Agent 和客户端之间同步状态、在状态变化时触发事件，以及读写每个 Agent 的 SQL 数据库。
- **通信能力**: 您可以通过 [WebSockets](/agents/api-reference/websockets/) 连接到 Agent，并将更新实时流式传输回客户端。处理推理模型的长时间运行响应、[异步工作流](/agents/api-reference/run-workflows/)的结果，或构建基于 Agents SDK 中包含的 `useAgent` hook 的聊天应用。
- **可扩展性**: Agents 就是代码。使用您想要的 [AI 模型](/agents/api-reference/using-ai-models/)，带上您自己的无头浏览器服务，从托管在其他云中的数据库拉取数据，为您的 Agent 添加自己的方法并调用它们。

使用 Agents SDK 构建的 Agents 可以直接部署到 Cloudflare，并在 [Durable Objects](/durable-objects/) 上运行——您可以将其视为能够扩展到数千万个的有状态微服务器——并且能够在需要的任何地方运行。让您的 Agents 通过CDN技术以实现低延迟交互，利用CDN技术将数据存储在靠近用户的地方以提高吞吐量，或在两者之间的任何地方运行。

---

### 在 Cloudflare 平台上构建

<RelatedProduct header="Workers" href="/workers/" product="workers">

构建无服务器应用程序，在全球范围内即时部署，实现卓越的性能、可靠性和规模。

</RelatedProduct>

<RelatedProduct header="AI Gateway" href="/ai-gateway/" product="ai-gateway">

通过缓存、速率限制、请求重试、模型回退等功能观察和控制您的 AI 应用程序。

</RelatedProduct>

<RelatedProduct header="Vectorize" href="/vectorize/" product="vectorize">

使用 Cloudflare 的向量数据库 Vectorize 构建全栈 AI 应用程序。添加 Vectorize 让您能够执行语义搜索、推荐、异常检测等任务，或用于为 LLM 提供上下文和记忆。

</RelatedProduct>

<RelatedProduct header="Workers AI" href="/workers-ai/" product="workers-ai">

在 Cloudflare 的全球网络上运行由无服务器 GPU 驱动的机器学习模型。

</RelatedProduct>

<RelatedProduct header="Workflows" href="/workflows/" product="workflows">

构建有状态的 Agents，保证执行，包括自动重试、可运行几分钟、几小时、几天或几周的持久状态。

</RelatedProduct>

---

# 代理

URL: https://developers.cloudflare.com/workers-ai/agents/

import { LinkButton } from "~/components";

<div style={{ textAlign: "center", marginBottom: "2rem" }}>
	<p>
		使用 Cloudflare Workers AI 和代理构建能够代表您的用户执行复杂任务的 AI
		助手。
	</p>
	<LinkButton href="/agents/">转到代理文档</LinkButton>
</div>

---

# Cloudflare Workers AI

URL: https://developers.cloudflare.com/workers-ai/

import {
	CardGrid,
	Description,
	Feature,
	LinkTitleCard,
	Plan,
	RelatedProduct,
	Render,
	LinkButton,
	Flex,
} from "~/components";

<Description>

在 Cloudflare 的全球网络上，由无服务器 GPU 提供支持，运行机器学习模型。

</Description>

<Plan type="workers-all" />

Workers AI 允许您以无服务器的方式运行 AI 模型，无需担心扩展、维护或为未使用的基础设施付费。您可以从您自己的代码中——从 [Workers](/workers/)、[Pages](/pages/) 或通过 [Cloudflare API](/api/resources/ai/methods/run/) 的任何地方——调用在 Cloudflare 网络上的 GPU 上运行的模型。

Workers AI 让您可以访问：

- **50多种[开源模型](/workers-ai/models/)**，作为我们模型目录的一部分提供
- 无服务器、**按使用付费**的[定价模型](/workers-ai/platform/pricing/)
- 所有这些都作为**功能齐全的开发者平台**的一部分，包括 [AI 网关](/ai-gateway/)、[Vectorize](/vectorize/)、[Workers](/workers/) 等等...

<div>
	<LinkButton href="/workers-ai/get-started">开始使用</LinkButton>
	<LinkButton
		target="_blank"
		variant="secondary"
		icon="external"
		href="https://youtu.be/cK_leoJsBWY?si=4u6BIy_uBOZf9Ve8"
	>
		观看 Workers AI 演示
	</LinkButton>
</div>

<Render file="custom_requirements" />

<Render file="file_issues" />

---

## 功能

<Feature header="模型" href="/workers-ai/models/" cta="浏览模型">

Workers AI 配备了一系列精选的流行开源模型，使您能够执行图像分类、文本生成、对象检测等任务。

</Feature>

---

## 相关产品

<RelatedProduct header="AI 网关" href="/ai-gateway/" product="ai-gateway">

通过缓存、速率限制、请求重试、模型回退等功能，观察和控制您的 AI 应用程序。

</RelatedProduct>

<RelatedProduct header="Vectorize" href="/vectorize/" product="vectorize">

使用 Cloudflare 的矢量数据库 Vectorize 构建全栈 AI 应用程序。添加 Vectorize 使您能够执行语义搜索、推荐、异常检测等任务，或用于为 LLM 提供上下文和记忆。

</RelatedProduct>

<RelatedProduct header="Workers" href="/workers/" product="workers">

构建无服务器应用程序并立即在全球范围内部署，以获得卓越的性能、可靠性和规模。

</RelatedProduct>

<RelatedProduct header="Pages" href="/pages/" product="pages">

创建立即部署到 Cloudflare 全球网络的全栈应用程序。

</RelatedProduct>

<RelatedProduct header="R2" href="/r2/" product="r2">

存储大量非结构化数据，而无需支付与典型云存储服务相关的昂贵出口带宽费用。

</RelatedProduct>

<RelatedProduct header="D1" href="/d1/" product="d1">

创建新的无服务器 SQL 数据库，以便从您的 Workers 和 Pages 项目中查询。

</RelatedProduct>

<RelatedProduct header="Durable Objects" href="/durable-objects/" product="durable-objects">

具有强一致性存储的全球分布式协调 API。

</RelatedProduct>

<RelatedProduct header="KV" href="/kv/" product="kv">

创建全球性、低延迟的键值数据存储。

</RelatedProduct>

---

## 更多资源

<CardGrid>

<LinkTitleCard
	title="开始使用"
	href="/workers-ai/get-started/workers-wrangler/"
	icon="open-book"
>
	构建和部署您的第一个 Workers AI 应用程序。
</LinkTitleCard>

<LinkTitleCard
	title="计划"
	href="/workers-ai/platform/pricing/"
	icon="seti:shell"
>
	了解免费和付费计划。
</LinkTitleCard>

<LinkTitleCard title="限制" href="/workers-ai/platform/limits/" icon="document">
	了解 Workers AI 的限制。
</LinkTitleCard>

<LinkTitleCard title="用例" href="/use-cases/ai/" icon="document">
	了解如何构建和部署雄心勃勃的 AI 应用程序到 Cloudflare 的全球网络。
</LinkTitleCard>

<LinkTitleCard
	title="存储选项"
	href="/workers/platform/storage-options/"
	icon="open-book"
>
	了解哪种存储选项最适合您的项目。
</LinkTitleCard>

<LinkTitleCard
	title="开发者 Discord"
	href="https://discord.cloudflare.com"
	icon="discord"
>
	在 Discord 上与 Workers
	社区联系，提出问题，分享您正在构建的内容，并与其他开发者讨论平台。
</LinkTitleCard>

<LinkTitleCard
	title="@CloudflareDev"
	href="https://x.com/cloudflaredev"
	icon="x.com"
>
	在 Twitter 上关注 @CloudflareDev，了解产品公告和 Cloudflare Workers 的新功能。
</LinkTitleCard>

</CardGrid>

---

# 更新日志

URL: https://developers.cloudflare.com/ai-gateway/changelog/

import { ProductReleaseNotes } from "~/components";

{/* <!-- 实际内容位于 /src/content/release-notes/ai-gateway.yaml。在该文件中更新新条目以便在此处显示。有关更多详细信息，请参阅 https://developers.cloudflare.com/style-guide/documentation-content-strategy/content-types/changelog/#yaml-file --> */}

<ProductReleaseNotes />

---

# OpenAI 兼容性

URL: https://developers.cloudflare.com/ai-gateway/chat-completion/

Cloudflare 的 AI 网关提供了一个与 OpenAI 兼容的 `/chat/completions` 端点，可以使用单一 URL 集成多个 AI 提供商。此功能简化了集成过程，允许在不同模型之间无缝切换，而无需进行重大代码修改。

## 端点 URL

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat/chat/completions
```

将 `{account_id}` 和 `{gateway_id}` 替换为您的 Cloudflare 账户和网关 ID。

## 参数

通过更改 `model` 和 `apiKey` 参数来切换提供商。

使用 `{provider}/{model}` 格式指定模型。例如：

- `openai/gpt-4o-mini`
- `google-ai-studio/gemini-2.0-flash`
- `anthropic/claude-3-haiku`

## 示例

### OpenAI SDK

```js
import OpenAI from "openai";
const client = new OpenAI({
	apiKey: "YOUR_PROVIDER_API_KEY", // 提供商 API 密钥
	// 注意：OpenAI 客户端会自动在 URL 末尾添加 /chat/completions，您不应该自己添加。
	baseURL:
		"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat",
});

const response = await client.chat.completions.create({
	model: "google-ai-studio/gemini-2.0-flash",
	messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0].message.content);
```

### cURL

```bash
curl -X POST https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat/chat/completions \
  --header 'Authorization: Bearer {openai_token}' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "google-ai-studio/gemini-2.0-flash",
    "messages": [
      {
        "role": "user",
        "content": "What is Cloudflare?"
      }
    ]
  }'
```

### 通用提供商

您还可以将此模式与[通用端点](/ai-gateway/universal/)结合使用，以在多个提供商之间添加[回退](/ai-gateway/configuration/fallbacks/)。当与通用端点结合使用时，无论来自主模型还是回退模型，每个请求都将返回相同的标准化格式。这种行为意味着您不必在应用中添加额外的解析逻辑。

```ts title="index.ts"
export interface Env {
	AI: Ai;
}

export default {
	async fetch(request: Request, env: Env) {
		return env.AI.gateway("default").run({
			provider: "compat",
			endpoint: "chat/completions",
			headers: {
				authorization: "Bearer ",
			},
			query: {
				model: "google-ai-studio/gemini-2.0-flash",
				messages: [
					{
						role: "user",
						content: "What is Cloudflare?",
					},
				],
			},
		});
	},
};
```

## 支持的提供商

与 OpenAI 兼容的端点支持以下提供商的模型：

- [Anthropic](/ai-gateway/providers/anthropic/)
- [OpenAI](/ai-gateway/providers/openai/)
- [Groq](/ai-gateway/providers/groq/)
- [Mistral](/ai-gateway/providers/mistral/)
- [Cohere](/ai-gateway/providers/cohere/)
- [Perplexity](/ai-gateway/providers/perplexity/)
- [Workers AI](/ai-gateway/providers/workersai/)
- [Google-AI-Studio](/ai-gateway/providers/google-ai-studio/)
- [Grok](/ai-gateway/providers/grok/)
- [DeepSeek](/ai-gateway/providers/deepseek/)
- [Cerebras](/ai-gateway/providers/cerebras/)

---

# 架构

URL: https://developers.cloudflare.com/ai-gateway/demos/

import { GlossaryTooltip, ResourcesBySelector } from "~/components";

了解如何在现有架构中使用 AI 网关。

## 参考架构

探索以下使用 AI 网关的<GlossaryTooltip term="reference architecture">参考架构</GlossaryTooltip>：

<ResourcesBySelector
	types={[
		"reference-architecture",
		"design-guide",
		"reference-architecture-diagram",
	]}
	products={["AI Gateway"]}
/>

---

# 快速开始

URL: https://developers.cloudflare.com/ai-gateway/get-started/

import { Details, DirectoryListing, LinkButton, Render } from "~/components";

在本指南中，您将学习如何创建您的第一个 AI 网关。您可以创建多个网关来控制不同的应用。

## 前提条件

在开始之前，您需要一个 Cloudflare 账户。

<LinkButton variant="primary" href="https://dash.cloudflare.com/sign-up">
	注册
</LinkButton>

## 创建网关

然后，创建一个新的 AI 网关。

<Render file="create-gateway" />

## 选择网关身份验证

设置新网关时，您可以在已验证和未验证的网关之间进行选择。启用已验证的网关需要每个请求都包含有效的授权令牌，这增加了一层额外的安全性。我们建议在存储日志时使用已验证的网关，以防止未经授权的访问，并防范可能增加日志存储使用量并使您难以找到所需数据的无效请求。了解更多关于设置[已验证网关](/ai-gateway/configuration/authentication/)的信息。

## 连接应用

接下来，将您的 AI 提供商连接到您的网关。

AI 网关为您创建的每个网关提供多个端点 - 每个提供商一个端点，以及一个通用端点。要使用 AI 网关，您需要为每个提供商创建自己的账户并提供您的 API 密钥。AI 网关充当这些请求的代理，实现可观察性、缓存等功能。

此外，AI 网关还有一个 [WebSockets API](/ai-gateway/websockets-api/)，它提供单一持久连接，实现持续通信。此 API 支持连接到 AI 网关的所有 AI 提供商，包括那些本身不支持 WebSockets 的提供商。

以下是我们支持的模型提供商列表：

<DirectoryListing folder="ai-gateway/providers" />

如果您没有提供商偏好，请从我们的专门教程之一开始：

- [OpenAI](/ai-gateway/tutorials/deploy-aig-worker/)
- [Workers AI](/ai-gateway/tutorials/create-first-aig-workers/)

## 查看分析

现在您的提供商已连接到 AI 网关，您可以查看通过网关的请求分析。

<Render file="analytics-overview" /> <br />

<Render file="analytics-dashboard" />

:::note[注意]

成本指标是基于请求中发送和接收的令牌数量的估算。虽然此指标可以帮助您监控和预测成本趋势，但请参考您提供商的仪表板获取最准确的成本详情。

:::

## 下一步

- 了解更多关于[缓存](/ai-gateway/configuration/caching/)以实现更快的请求和成本节省，以及[速率限制](/ai-gateway/configuration/rate-limiting/)来控制应用的扩展方式。
- 探索如何为弹性指定模型或提供商[回退](/ai-gateway/configuration/fallbacks/)。
- 了解如何在 [Workers AI](/ai-gateway/providers/workersai/) 上使用低成本的开源模型 - 我们的 AI 推理服务。

---

# 标头术语表

URL: https://developers.cloudflare.com/ai-gateway/glossary/

import { Glossary } from "~/components";

AI 网关支持各种标头来帮助您配置、自定义和管理您的 API 请求。本页面提供了所有支持的标头的完整列表，以及简短描述。

<Glossary product="ai-gateway" />

## 配置层次结构

AI 网关中的设置可以在三个级别配置：**提供商**、**请求**和**网关**。由于相同的设置可以在多个位置配置，以下层次结构确定应用哪个值：

1. **提供商级别标头**：
   仅在使用[通用端点](/ai-gateway/universal/)时相关，这些标头优先于所有其他配置。
2. **请求级别标头**：
   如果未设置提供商级别标头，则应用此级别。
3. **网关级别设置**：
   仅在提供商或请求级别未设置标头时作为默认值。

此层次结构确保一致的行为，优先考虑最具体的配置。使用提供商级别和请求级别标头进行更精细的控制，使用网关设置作为通用默认值。

---

# Cloudflare AI 网关

URL: https://developers.cloudflare.com/ai-gateway/

import {
	CardGrid,
	Description,
	Feature,
	LinkTitleCard,
	Plan,
	RelatedProduct,
} from "~/components";

<Description>

观察和控制您的 AI 应用。

</Description>

<Plan type="all" />

Cloudflare 的 AI 网关让您能够观察和控制您的 AI 应用。通过将应用连接到 AI 网关，您可以通过分析和日志记录深入了解用户如何使用您的应用，然后通过缓存、速率限制以及请求重试、模型回退等功能来控制应用的扩展方式。更好的是 - 只需一行代码即可开始使用。

查看[快速开始指南](/ai-gateway/get-started/)了解如何为您的应用配置 AI 网关。

## 功能特性

<Feature header="分析" href="/ai-gateway/observability/analytics/" cta="查看分析">

查看请求数量、令牌数量以及运行应用所需成本等指标。

</Feature>

<Feature header="日志记录" href="/ai-gateway/observability/logging/" cta="查看日志记录">

深入了解请求和错误信息。

</Feature>

<Feature header="缓存" href="/ai-gateway/configuration/caching/">

直接从 Cloudflare 的缓存提供请求服务，而不是从原始模型提供商，以实现更快的请求和成本节省。

</Feature>

<Feature header="速率限制" href="/ai-gateway/configuration/rate-limiting">

通过限制应用接收的请求数量来控制应用的扩展方式。

</Feature>

<Feature header="请求重试和回退" href="/ai-gateway/configuration/fallbacks/">

通过定义请求重试和模型回退来提高弹性，以防出现错误。

</Feature>

<Feature header="您喜欢的提供商" href="/ai-gateway/providers/">

Workers AI、OpenAI、Azure OpenAI、HuggingFace、Replicate 等都支持 AI 网关。

</Feature>

---

## 相关产品

<RelatedProduct header="Workers AI" href="/workers-ai/" product="workers-ai">

在 Cloudflare 的全球网络上运行由无服务器 GPU 驱动的机器学习模型。

</RelatedProduct>

<RelatedProduct header="Vectorize" href="/vectorize/" product="vectorize">

使用 Vectorize（Cloudflare 的向量数据库）构建全栈 AI 应用。添加 Vectorize 使您能够执行语义搜索、推荐、异常检测等任务，或者可用于为 LLM 提供上下文和记忆。

</RelatedProduct>

## 更多资源

<CardGrid>

<LinkTitleCard
	title="开发者社区 Discord"
	href="https://discord.cloudflare.com"
	icon="discord"
>
	在 Discord 上与 Workers
	社区联系，提出问题，展示您正在构建的内容，并与其他开发者讨论平台。
</LinkTitleCard>

<LinkTitleCard title="使用案例" href="/use-cases/ai/" icon="document">
	了解如何在 Cloudflare 的全球网络上构建和部署雄心勃勃的 AI 应用。
</LinkTitleCard>

<LinkTitleCard
	title="@CloudflareDev"
	href="https://x.com/cloudflaredev"
	icon="x.com"
>
	在 Twitter 上关注 @CloudflareDev，了解产品公告以及 Cloudflare Workers
	的新动态。
</LinkTitleCard>

</CardGrid>

---

# 通用端点

URL: https://developers.cloudflare.com/ai-gateway/universal/

import { Render, Badge } from "~/components";

您可以使用通用端点与每个提供商进行交互。

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}
```

AI 网关为您创建的每个网关提供多个端点 - 每个提供商一个端点，以及一个通用端点。通用端点需要对您的架构进行一些调整，但支持其他功能。这些功能包括，例如，如果请求首次失败时重试请求，或配置[回退模型/提供商](/ai-gateway/configuration/fallbacks/)。

您可以使用通用端点与每个提供商进行交互。载荷期望一个消息数组，每个消息都是具有以下参数的对象：

- `provider`：您想要将此消息定向到的提供商名称。可以是 OpenAI、workers-ai 或我们支持的任何提供商。
- `endpoint`：您尝试访问的提供商 API 的路径名。例如，在 OpenAI 上可以是 `chat/completions`，对于 Workers AI 可能是 [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/)。在[每个提供商](/ai-gateway/providers/)的特定部分中查看更多信息。
- `authorization`：联系此提供商时应使用的授权 HTTP 标头内容。这通常以 'Token' 或 'Bearer' 开头。
- `query`：提供商在其官方 API 中期望的载荷。

## cURL 示例

<Render file="universal-gateway-example" />

上述请求将发送到 Workers AI 推理 API，如果失败，它将继续发送到 OpenAI。您可以通过在数组中添加另一个 JSON 来添加任意数量的回退。

## WebSockets API <Badge text="beta" variant="tip" size="small" />

通用端点还可以通过 [WebSockets API](/ai-gateway/websockets-api/) 访问，该 API 提供单一持久连接，实现持续通信。此 API 支持连接到 AI 网关的所有 AI 提供商，包括那些本身不支持 WebSockets 的提供商。

## WebSockets 示例

```javascript
import WebSocket from "ws";
const ws = new WebSocket(
	"wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/",
	{
		headers: {
			"cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",
		},
	},
);

ws.send(
	JSON.stringify({
		type: "universal.create",
		request: {
			eventId: "my-request",
			provider: "workers-ai",
			endpoint: "@cf/meta/llama-3.1-8b-instruct",
			headers: {
				Authorization: "Bearer WORKERS_AI_TOKEN",
				"Content-Type": "application/json",
			},
			query: {
				prompt: "tell me a joke",
			},
		},
	}),
);

ws.on("message", function incoming(message) {
	console.log(message.toString());
});
```

## Workers 绑定示例

import { WranglerConfig } from "~/components";

<WranglerConfig>

```toml title="wrangler.toml"
[ai]
binding = "AI"
```

</WranglerConfig>

```typescript title="src/index.ts"
type Env = {
	AI: Ai;
};

export default {
	async fetch(request: Request, env: Env) {
		return env.AI.gateway("my-gateway").run({
			provider: "workers-ai",
			endpoint: "@cf/meta/llama-3.1-8b-instruct",
			headers: {
				authorization: "Bearer my-api-token",
			},
			query: {
				prompt: "tell me a joke",
			},
		});
	},
};
```

## 标头配置层次结构

通用端点允许您设置回退模型或提供商，并为每个提供商或请求自定义标头。您可以在三个级别配置标头：

1. **提供商级别**：特定于特定提供商的标头。
2. **请求级别**：包含在各个请求中的标头。
3. **网关设置**：在网关仪表板中配置的默认标头。

由于相同的设置可以在多个位置配置，AI 网关应用层次结构来确定哪个配置优先：

- **提供商级别标头**覆盖所有其他配置。
- **请求级别标头**在未设置提供商级别标头时使用。
- **网关级别设置**仅在提供商或请求级别未配置标头时使用。

此层次结构确保一致的行为，优先考虑最具体的配置。使用提供商级别和请求级别标头进行精细控制，使用网关设置作为通用默认值。

## 层次结构示例

此示例演示了在不同级别设置的标头如何影响缓存行为：

- **请求级别标头**：`cf-aig-cache-ttl` 设置为 `3600` 秒，默认情况下将此缓存持续时间应用于请求。
- **提供商级别标头**：对于回退提供商（OpenAI），`cf-aig-cache-ttl` 明确设置为 `0` 秒，覆盖请求级别标头，并在使用 OpenAI 作为提供商时禁用响应缓存。

这显示了提供商级别标头如何优先于请求级别标头，允许对缓存行为进行精细控制。

```bash
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-cache-ttl: 3600' \
  --data '[
    {
      "provider": "workers-ai",
      "endpoint": "@cf/meta/llama-3.1-8b-instruct",
      "headers": {
        "Authorization": "Bearer {cloudflare_token}",
        "Content-Type": "application/json"
      },
      "query": {
        "messages": [
          {
            "role": "system",
            "content": "You are a friendly assistant"
          },
          {
            "role": "user",
            "content": "What is Cloudflare?"
          }
        ]
      }
    },
    {
      "provider": "openai",
      "endpoint": "chat/completions",
      "headers": {
        "Authorization": "Bearer {open_ai_token}",
        "Content-Type": "application/json",
        "cf-aig-cache-ttl": "0"
      },
      "query": {
        "model": "gpt-4o-mini",
        "stream": true,
        "messages": [
          {
            "role": "user",
            "content": "What is Cloudflare?"
          }
        ]
      }
    }
  ]'
```

---

# 浏览网页

URL: https://developers.cloudflare.com/agents/api-reference/browse-the-web/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
	PackageManagers,
} from "~/components";

Agents 可以使用[浏览器渲染](/browser-rendering/) API 或您首选的无头浏览器服务来浏览网页。

### 浏览器渲染 API

[浏览器渲染](/browser-rendering/)允许您启动无头浏览器实例、渲染网页，并通过您的 Agent 与网站交互。

您可以定义一个使用 Puppeteer 提取网页内容、解析 DOM 并通过调用 OpenAI 模型提取相关信息的方法：

<TypeScriptExample>

```ts
interface Env {
	BROWSER: Fetcher;
}

export class MyAgent extends Agent<Env> {
	async browse(browserInstance: Fetcher, urls: string[]) {
		let responses = [];
		for (const url of urls) {
			const browser = await puppeteer.launch(browserInstance);
			const page = await browser.newPage();
			await page.goto(url);

			await page.waitForSelector("body");
			const bodyContent = await page.$eval(
				"body",
				(element) => element.innerHTML,
			);
			const client = new OpenAI({
				apiKey: this.env.OPENAI_API_KEY,
			});

			let resp = await client.chat.completions.create({
				model: this.env.MODEL,
				messages: [
					{
						role: "user",
						content: `从下面的网站内容中返回一个包含产品名称、价格和 URL 的 JSON 对象，格式如下：{ "name": "产品名称", "price": "价格", "url": "URL" }。<content>${bodyContent}</content>`,
					},
				],
				response_format: {
					type: "json_object",
				},
			});

			responses.push(resp);
			await browser.close();
		}

		return responses;
	}
}
```

</TypeScriptExample>

您还需要安装 `@cloudflare/puppeteer` 包并将以下内容添加到您的 Agent 的 wrangler 配置中：

<PackageManagers pkg="@cloudflare/puppeteer" dev />

<WranglerConfig>

```jsonc
{
	// ...
	"browser": {
		"binding": "MYBROWSER",
	},
	// ...
}
```

</WranglerConfig>

### Browserbase

您还可以通过直接从 Agent 内使用 Browserbase API 来使用 [Browserbase](https://docs.browserbase.com/integrations/cloudflare/typescript)。

获得您的 [Browserbase API 密钥](https://docs.browserbase.com/integrations/cloudflare/typescript)后，您可以通过创建[密钥](/workers/configuration/secrets/)将其添加到您的 Agent：

```sh
cd your-agent-project-folder
npx wrangler@latest secret put BROWSERBASE_API_KEY
```

```sh output
Enter a secret value: ******
Creating the secret for the Worker "agents-example"
Success! Uploaded secret BROWSERBASE_API_KEY
```

安装 `@cloudflare/puppeteer` 包并在您的 Agent 内使用它来调用 Browserbase API：

<PackageManagers pkg="@cloudflare/puppeteer" />

<TypeScriptExample>

```ts
interface Env {
	BROWSERBASE_API_KEY: string;
}

export class MyAgent extends Agent<Env> {
	constructor(env: Env) {
		super(env);
	}
}
```

</TypeScriptExample>

---

# Agents API

URL: https://developers.cloudflare.com/agents/api-reference/agents-api/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

本页面提供了 Agent SDK API 的概述，包括 `Agent` 类、内置于 Agents SDK 的方法和属性。

Agents SDK 公开了两个主要 API：

- 服务器端 `Agent` 类。Agent 封装了 Agent 的所有逻辑，包括客户端如何连接到它、如何存储状态、它公开的方法、如何调用 AI 模型以及任何错误处理。
- 客户端 `AgentClient` 类，允许您从客户端应用程序连接到 Agent 实例。客户端 API 还包括 React hooks，包括 `useAgent` 和 `useAgentChat`，并允许您在每个唯一的 Agent（运行在服务器端）和您的客户端应用程序之间自动同步状态。

:::note

Agents 需要 [Cloudflare Durable Objects](/durable-objects/)，参见[配置](/agents/getting-started/testing-your-agent/#add-the-agent-configuration)了解如何向您的项目添加所需的绑定。

:::

您还可以在 [Agents API 参考](/agents/api-reference/)中找到每个 API 的更具体使用示例。

<TypeScriptExample>

```ts
import { Agent } from "agents";

class MyAgent extends Agent {
	// 在 Agent 上定义方法
}

export default MyAgent;
```

</TypeScriptExample>

一个 Agent 可以有许多（数百万个）实例：每个实例都是一个独立运行的独立微服务器。这允许 Agents 水平扩展：一个 Agent 可以与单个用户关联，或与数千个用户关联，这取决于您正在构建的 Agent。

Agent 的实例通过唯一标识符寻址：该标识符（ID）可以是用户 ID、电子邮件地址、GitHub 用户名、航班机票号码、发票 ID，或任何其他有助于唯一标识实例并代表谁行动的标识符。

<Render file="unique-agents" />

### Agent 类 API

编写 Agent 需要您定义一个扩展 Agents SDK 包中 `Agent` 类的类。Agent 封装了 Agent 的所有逻辑，包括客户端如何连接到它、如何存储状态、它公开的方法以及任何错误处理。

您还可以在 Agent 上定义自己的方法：从技术上讲，发布一个只公开您自己方法的 Agent 是有效的，并直接从 Worker 创建/获取 Agents。

您自己的方法可以通过 `this.env` 访问 Agent 的环境变量和绑定，通过 `this.setState` 访问状态，并通过 `this.yourMethodName` 调用 Agent 上的其他方法。

```ts
import { Agent } from "agents";

interface Env {
	// 在这里定义环境变量和绑定
}

// 将 Env 作为 TypeScript 类型参数传递
// 作为绑定连接到您的 Agent 或 Worker 的任何服务
// 然后在 this.env.<BINDING_NAME> 上可用

// 用于创建可以维护状态、编排的 Agents 的核心类
// 复杂的 AI 工作流、调度任务，并与用户和其他 Agents 交互。
class MyAgent extends Agent<Env, State> {
	// 可选的初始状态定义
	initialState = {
		counter: 0,
		messages: [],
		lastUpdated: null,
	};

	// 当新的 Agent 实例启动或从休眠中唤醒时调用
	async onStart() {
		console.log("Agent 已启动，状态:", this.state);
	}

	// 处理来到此 Agent 实例的 HTTP 请求
	// 返回 Response 对象
	async onRequest(request: Request): Promise<Response> {
		return new Response("来自 Agent 的问候！");
	}

	// 当建立 WebSocket 连接时调用
	// 通过 ctx.request 访问原始请求进行身份验证等
	async onConnect(connection: Connection, ctx: ConnectionContext) {
		// 连接被 SDK 自动接受。
		// 您也可以在这里用 connection.close() 显式关闭连接
		// 在 ctx.request 上访问 Request 以检查头部、cookies 和 URL
	}

	// 为在 WebSocket 连接上接收的每条消息调用
	// 消息可以是 string、ArrayBuffer 或 ArrayBufferView
	async onMessage(connection: Connection, message: WSMessage) {
		// 处理传入消息
		connection.send("已收到您的消息");
	}

	// 处理 WebSocket 连接错误
	async onError(connection: Connection, error: unknown): Promise<void> {
		console.error(`连接错误:`, error);
	}

	// 处理 WebSocket 连接关闭事件
	async onClose(
		connection: Connection,
		code: number,
		reason: string,
		wasClean: boolean,
	): Promise<void> {
		console.log(`连接已关闭: ${code} - ${reason}`);
	}

	// 当 Agent 的状态从任何源更新时调用
	// source 可以是 "server" 或客户端 Connection
	onStateUpdate(state: State, source: "server" | Connection) {
		console.log("状态已更新:", state, "来源:", source);
	}

	// 您可以定义自己的自定义方法，由请求、
	// WebSocket 消息或调度任务调用
	async customProcessingMethod(data: any) {
		// 处理数据、更新状态、调度任务等
		this.setState({ ...this.state, lastUpdated: new Date() });
	}
}
```

<TypeScriptExample>

```ts
// 带有自定义方法的基本 Agent 实现
import { Agent } from "agents";

interface MyState {
	counter: number;
	lastUpdated: Date | null;
}

class MyAgent extends Agent<Env, MyState> {
	initialState = {
		counter: 0,
		lastUpdated: null,
	};

	async onRequest(request: Request) {
		if (request.method === "POST") {
			await this.incrementCounter();
			return new Response(JSON.stringify(this.state), {
				headers: { "Content-Type": "application/json" },
			});
		}
		return new Response(JSON.stringify(this.state), {
			headers: { "Content-Type": "application/json" },
		});
	}

	async incrementCounter() {
		this.setState({
			counter: this.state.counter + 1,
			lastUpdated: new Date(),
		});
	}
}
```

</TypeScriptExample>

### WebSocket API

WebSocket API 允许您接受和管理与 Agent 的 WebSocket 连接。

#### Connection

表示与 Agent 的 WebSocket 连接。

```ts
// WebSocket 连接接口
interface Connection<State = unknown> {
	// 此连接的唯一 ID
	id: string;

	// 附加到此连接的客户端特定状态
	state: State;

	// 更新连接的状态
	setState(state: State): void;

	// 接受传入的 WebSocket 连接
	accept(): void;

	// 使用可选代码和原因关闭 WebSocket 连接
	close(code?: number, reason?: string): void;

	// 向客户端发送消息
	// 可以是 string、ArrayBuffer 或 ArrayBufferView
	send(message: string | ArrayBuffer | ArrayBufferView): void;
}
```

<TypeScriptExample>

```ts
// Example of handling WebSocket messages
export class YourAgent extends Agent {
	async onMessage(connection: Connection, message: WSMessage) {
		if (typeof message === "string") {
			try {
				// Parse JSON message
				const data = JSON.parse(message);

				if (data.type === "update") {
					// Update connection-specific state
					connection.setState({ ...connection.state, lastActive: Date.now() });

					// Update global Agent state
					this.setState({
						...this.state,
						connections: this.state.connections + 1,
					});

					// Send response back to this client only
					connection.send(
						JSON.stringify({
							type: "updated",
							status: "success",
						}),
					);
				}
			} catch (e) {
				connection.send(JSON.stringify({ error: "Invalid message format" }));
			}
		}
	}
}
```

</TypeScriptExample>

#### WSMessage

Types of messages that can be received from a WebSocket.

```ts
// Types of messages that can be received from WebSockets
type WSMessage = string | ArrayBuffer | ArrayBufferView;
```

#### ConnectionContext

Context information for a WebSocket connection.

```ts
// Context available during WebSocket connection
interface ConnectionContext {
	// The original HTTP request that initiated the WebSocket connection
	request: Request;
}
```

### State synchronization API

:::note

To learn more about how to manage state within an Agent, refer to the documentation on [managing and syncing state](/agents/api-reference/store-and-sync-state/).

:::

#### State

Methods and types for managing Agent state.

```ts
// State management in the Agent class
class Agent<Env, State = unknown> {
	// Initial state that will be set if no state exists yet
	initialState: State = {} as unknown as State;

	// Current state of the Agent, persisted across restarts
	get state(): State;

	// Update the Agent's state
	// Persists to storage and notifies all connected clients
	setState(state: State): void;

	// Called when state is updated from any source
	// Override to react to state changes
	onStateUpdate(state: State, source: "server" | Connection): void;
}
```

<TypeScriptExample>

```ts
// Example of state management in an Agent
interface ChatState {
	messages: Array<{ sender: string; text: string; timestamp: number }>;
	participants: string[];
	settings: {
		allowAnonymous: boolean;
		maxHistoryLength: number;
	};
}

interface Env {
	// Your bindings and environment variables
}

// Inside your Agent class
export class YourAgent extends Agent<Env, ChatState> {
	async addMessage(sender: string, text: string) {
		// Update state with new message
		this.setState({
			...this.state,
			messages: [
				...this.state.messages,
				{ sender, text, timestamp: Date.now() },
			].slice(-this.state.settings.maxHistoryLength), // Maintain max history
		});

		// The onStateUpdate method will automatically be called
		// and all connected clients will receive the update
	}

	// Override onStateUpdate to add custom behavior when state changes
	onStateUpdate(state: ChatState, source: "server" | Connection) {
		console.log(
			`State updated by ${source === "server" ? "server" : "client"}`,
		);

		// You could trigger additional actions based on state changes
		if (state.messages.length > 0) {
			const lastMessage = state.messages[state.messages.length - 1];
			if (lastMessage.text.includes("@everyone")) {
				this.notifyAllParticipants(lastMessage);
			}
		}
	}
}
```

</TypeScriptExample>

### Scheduling API

#### Scheduling tasks

Schedule tasks to run at a specified time in the future.

```ts
// Scheduling API for running tasks in the future
class Agent<Env, State = unknown> {
	// Schedule a task to run in the future
	// when: seconds from now, specific Date, or cron expression
	// callback: method name on the Agent to call
	// payload: data to pass to the callback
	// Returns a Schedule object with the task ID
	async schedule<T = any>(
		when: Date | string | number,
		callback: keyof this,
		payload?: T,
	): Promise<Schedule<T>>;

	// Get a scheduled task by ID
	// Returns undefined if the task doesn't exist
	async getSchedule<T = any>(id: string): Promise<Schedule<T> | undefined>;

	// Get all scheduled tasks matching the criteria
	// Returns an array of Schedule objects
	getSchedules<T = any>(criteria?: {
		description?: string;
		id?: string;
		type?: "scheduled" | "delayed" | "cron";
		timeRange?: { start?: Date; end?: Date };
	}): Schedule<T>[];

	// Cancel a scheduled task by ID
	// Returns true if the task was cancelled, false otherwise
	async cancelSchedule(id: string): Promise<boolean>;
}
```

<TypeScriptExample>

```ts
// Example of scheduling in an Agent
interface ReminderData {
	userId: string;
	message: string;
	channel: string;
}

export class YourAgent extends Agent {
	// Schedule a one-time reminder in 2 hours
	async scheduleReminder(userId: string, message: string) {
		const twoHoursFromNow = new Date(Date.now() + 2 * 60 * 60 * 1000);

		const schedule = await this.schedule<ReminderData>(
			twoHoursFromNow,
			"sendReminder",
			{ userId, message, channel: "email" },
		);

		console.log(`Scheduled reminder with ID: ${schedule.id}`);
		return schedule.id;
	}

	// Schedule a recurring daily task using cron
	async scheduleDailyReport() {
		// Run at 08:00 AM every day
		const schedule = await this.schedule(
			"0 8 * * *", // Cron expression: minute hour day month weekday
			"generateDailyReport",
			{ reportType: "daily-summary" },
		);

		console.log(`Scheduled daily report with ID: ${schedule.id}`);
		return schedule.id;
	}

	// Method that will be called when the scheduled task runs
	async sendReminder(data: ReminderData) {
		console.log(`Sending reminder to ${data.userId}: ${data.message}`);
		// Add code to send the actual notification
	}
}
```

</TypeScriptExample>

#### Schedule object

Represents a scheduled task.

```ts
// Represents a scheduled task
type Schedule<T = any> = {
	// Unique identifier for the schedule
	id: string;
	// Name of the method to be called
	callback: string;
	// Data to be passed to the callback
	payload: T;
} & (
	| {
			// One-time execution at a specific time
			type: "scheduled";
			// Timestamp when the task should execute
			time: number;
	  }
	| {
			// Delayed execution after a certain time
			type: "delayed";
			// Timestamp when the task should execute
			time: number;
			// Number of seconds to delay execution
			delayInSeconds: number;
	  }
	| {
			// Recurring execution based on cron expression
			type: "cron";
			// Timestamp for the next execution
			time: number;
			// Cron expression defining the schedule
			cron: string;
	  }
);
```

<TypeScriptExample>

```ts
export class YourAgent extends Agent {
	// Example of managing scheduled tasks
	async viewAndManageSchedules() {
		// Get all scheduled tasks
		const allSchedules = this.getSchedules();
		console.log(`Total scheduled tasks: ${allSchedules.length}`);

		// Get tasks scheduled for a specific time range
		const upcomingSchedules = this.getSchedules({
			timeRange: {
				start: new Date(),
				end: new Date(Date.now() + 24 * 60 * 60 * 1000), // Next 24 hours
			},
		});

		// Get a specific task by ID
		const taskId = "task-123";
		const specificTask = await this.getSchedule(taskId);

		if (specificTask) {
			console.log(
				`Found task: ${specificTask.callback} at ${new Date(specificTask.time)}`,
			);

			// Cancel a scheduled task
			const cancelled = await this.cancelSchedule(taskId);
			console.log(`Task cancelled: ${cancelled}`);
		}
	}
}
```

</TypeScriptExample>

### SQL API

Each Agent instance has an embedded SQLite database that can be accessed using the `this.sql` method within any method on your `Agent` class.

#### SQL queries

Execute SQL queries against the Agent's built-in SQLite database using the `this.sql` method within any method on your `Agent` class.

```ts
// SQL query API for the Agent's embedded database
class Agent<Env, State = unknown> {
	// Execute a SQL query with tagged template literals
	// Returns an array of rows matching the query
	sql<T = Record<string, string | number | boolean | null>>(
		strings: TemplateStringsArray,
		...values: (string | number | boolean | null)[]
	): T[];
}
```

<TypeScriptExample>

```ts
// Example of using SQL in an Agent
interface User {
	id: string;
	name: string;
	email: string;
	created_at: number;
}

export class YourAgent extends Agent {
	async setupDatabase() {
		// Create a table if it doesn't exist
		this.sql`
      CREATE TABLE IF NOT EXISTS users (
        id TEXT PRIMARY KEY,
        name TEXT NOT NULL,
        email TEXT UNIQUE,
        created_at INTEGER
      )
    `;
	}

	async createUser(id: string, name: string, email: string) {
		// Insert a new user
		this.sql`
      INSERT INTO users (id, name, email, created_at)
      VALUES (${id}, ${name}, ${email}, ${Date.now()})
    `;
	}

	async getUserById(id: string): Promise<User | null> {
		// Query a user by ID
		const users = this.sql<User>`
      SELECT * FROM users WHERE id = ${id}
    `;

		return users.length ? users[0] : null;
	}

	async searchUsers(term: string): Promise<User[]> {
		// Search users with a wildcard
		return this.sql<User>`
      SELECT * FROM users
      WHERE name LIKE ${"%" + term + "%"} OR email LIKE ${"%" + term + "%"}
      ORDER BY created_at DESC
    `;
	}
}
```

</TypeScriptExample>

:::note

Visit the [state management API documentation](/agents/api-reference/store-and-sync-state/) within the Agents SDK, including the native `state` APIs and the built-in `this.sql` API for storing and querying data within your Agents.

:::

### Client API

The Agents SDK provides a set of client APIs for interacting with Agents from client-side JavaScript code, including:

- React hooks, including `useAgent` and `useAgentChat`, for connecting to Agents from client applications.
- Client-side [state syncing](/agents/api-reference/store-and-sync-state/) that allows you to subscribe to state updates between the Agent and any connected client(s) when calling `this.setState` within your Agent's code.
- The ability to call remote methods (Remote Procedure Calls; RPC) on the Agent from client-side JavaScript code using the `@callable` method decorator.

#### AgentClient

Client for connecting to an Agent from the browser.

```ts
import { AgentClient } from "agents/client";

// Options for creating an AgentClient
type AgentClientOptions = Omit<PartySocketOptions, "party" | "room"> & {
	// Name of the agent to connect to (class name in kebab-case)
	agent: string;
	// Name of the specific Agent instance (optional, defaults to "default")
	name?: string;
	// Other WebSocket options like host, protocol, etc.
};

// WebSocket client for connecting to an Agent
class AgentClient extends PartySocket {
	static fetch(opts: PartyFetchOptions): Promise<Response>;
	constructor(opts: AgentClientOptions);
}
```

<TypeScriptExample>

```ts
// Example of using AgentClient in the browser
import { AgentClient } from "agents/client";

// Connect to an Agent instance
const client = new AgentClient({
	agent: "chat-agent", // Name of your Agent class in kebab-case
	name: "support-room-123", // Specific instance name
	host: window.location.host, // Using same host
});

client.onopen = () => {
	console.log("Connected to agent");
	// Send an initial message
	client.send(JSON.stringify({ type: "join", user: "user123" }));
};

client.onmessage = (event) => {
	// Handle incoming messages
	const data = JSON.parse(event.data);
	console.log("Received:", data);

	if (data.type === "state_update") {
		// Update local UI with new state
		updateUI(data.state);
	}
};

client.onclose = () => console.log("Disconnected from agent");

// Send messages to the Agent
function sendMessage(text) {
	client.send(
		JSON.stringify({
			type: "message",
			text,
			timestamp: Date.now(),
		}),
	);
}
```

</TypeScriptExample>

#### agentFetch

Make an HTTP request to an Agent.

```ts
import { agentFetch } from "agents/client";

// Options for the agentFetch function
type AgentClientFetchOptions = Omit<PartyFetchOptions, "party" | "room"> & {
	// Name of the agent to connect to
	agent: string;
	// Name of the specific Agent instance (optional)
	name?: string;
};

// Make an HTTP request to an Agent
function agentFetch(
	opts: AgentClientFetchOptions,
	init?: RequestInit,
): Promise<Response>;
```

<TypeScriptExample>

```ts
// Example of using agentFetch in the browser
import { agentFetch } from "agents/client";

// Function to get data from an Agent
async function fetchAgentData() {
	try {
		const response = await agentFetch(
			{
				agent: "task-manager",
				name: "user-123-tasks",
			},
			{
				method: "GET",
				headers: {
					Authorization: `Bearer ${userToken}`,
				},
			},
		);

		if (!response.ok) {
			throw new Error(`Error: ${response.status}`);
		}

		const data = await response.json();
		return data;
	} catch (error) {
		console.error("Failed to fetch from agent:", error);
	}
}
```

</TypeScriptExample>

### React API

The Agents SDK provides a React API for simplifying connection and routing to Agents from front-end frameworks, including React Router (Remix), Next.js, and Astro.

#### useAgent

React hook for connecting to an Agent.

```ts
import { useAgent } from "agents/react";

// Options for the useAgent hook
type UseAgentOptions<State = unknown> = Omit<
	Parameters<typeof usePartySocket>[0],
	"party" | "room"
> & {
	// Name of the agent to connect to
	agent: string;
	// Name of the specific Agent instance (optional)
	name?: string;
	// Called when the Agent's state is updated
	onStateUpdate?: (state: State, source: "server" | "client") => void;
};

// React hook for connecting to an Agent
// Returns a WebSocket connection with setState method
function useAgent<State = unknown>(
	options: UseAgentOptions<State>,
): PartySocket & {
	// Update the Agent's state
	setState: (state: State) => void;
};
```

### Chat Agent

The Agents SDK exposes an `AIChatAgent` class that extends the `Agent` class and exposes an `onChatMessage` method that simplifies building interactive chat agents.

You can combine this with the `useAgentChat` React hook from the `agents/ai-react` package to manage chat state and messages between a user and your Agent(s).

#### AIChatAgent

Extension of the `Agent` class with built-in chat capabilities.

```ts
import { AIChatAgent } from "agents/ai-chat-agent";
import { Message, StreamTextOnFinishCallback, ToolSet } from "ai";

// Base class for chat-specific agents
class AIChatAgent<Env = unknown, State = unknown> extends Agent<Env, State> {
	// Array of chat messages for the current conversation
	messages: Message[];

	// Handle incoming chat messages and generate a response
	// onFinish is called when the response is complete
	async onChatMessage(
		onFinish: StreamTextOnFinishCallback<ToolSet>,
	): Promise<Response | undefined>;

	// Persist messages within the Agent's local storage.
	async saveMessages(messages: Message[]): Promise<void>;
}
```

<TypeScriptExample>

```ts
// Example of extending AIChatAgent
import { AIChatAgent } from "agents/ai-chat-agent";
import { Message } from "ai";

interface Env {
	AI: any; // Your AI binding
}

class CustomerSupportAgent extends AIChatAgent<Env> {
	// Override the onChatMessage method to customize behavior
	async onChatMessage(onFinish) {
		// Access the AI models using environment bindings
		const { openai } = this.env.AI;

		// Get the current conversation history
		const chatHistory = this.messages;

		// Generate a system prompt based on knowledge base
		const systemPrompt = await this.generateSystemPrompt();

		// Generate a response stream
		const stream = await openai.chat({
			model: "gpt-4o",
			messages: [{ role: "system", content: systemPrompt }, ...chatHistory],
			stream: true,
		});

		// Return the streaming response
		return new Response(stream, {
			headers: { "Content-Type": "text/event-stream" },
		});
	}

	// Helper method to generate a system prompt
	async generateSystemPrompt() {
		// Query knowledge base or use static prompt
		return `You are a helpful customer support agent.
            Respond to customer inquiries based on the following guidelines:
            - Be friendly and professional
            - If you don't know an answer, say so
            - Current company policies: ...`;
	}
}
```

</TypeScriptExample>

### Chat Agent React API

#### useAgentChat

React hook for building AI chat interfaces using an Agent.

```ts
import { useAgentChat } from "agents/ai-react";
import { useAgent } from "agents/react";
import type { Message } from "ai";

// Options for the useAgentChat hook
type UseAgentChatOptions = Omit<
	Parameters<typeof useChat>[0] & {
		// Agent connection from useAgent
		agent: ReturnType<typeof useAgent>;
	},
	"fetch"
>;

// React hook for building AI chat interfaces using an Agent
function useAgentChat(options: UseAgentChatOptions): {
	// Current chat messages
	messages: Message[];
	// Set messages and synchronize with the Agent
	setMessages: (messages: Message[]) => void;
	// Clear chat history on both client and Agent
	clearHistory: () => void;
	// Append a new message to the conversation
	append: (
		message: Message,
		chatRequestOptions?: any,
	) => Promise<string | null | undefined>;
	// Reload the last user message
	reload: (chatRequestOptions?: any) => Promise<string | null | undefined>;
	// Stop the AI response generation
	stop: () => void;
	// Current input text
	input: string;
	// Set the input text
	setInput: React.Dispatch<React.SetStateAction<string>>;
	// Handle input changes
	handleInputChange: (
		e: React.ChangeEvent<HTMLInputElement | HTMLTextAreaElement>,
	) => void;
	// Submit the current input
	handleSubmit: (
		event?: { preventDefault?: () => void },
		chatRequestOptions?: any,
	) => void;
	// Additional metadata
	metadata?: Object;
	// Whether a response is currently being generated
	isLoading: boolean;
	// Current status of the chat
	status: "submitted" | "streaming" | "ready" | "error";
	// Tool data from the AI response
	data?: any[];
	// Set tool data
	setData: (
		data: any[] | undefined | ((data: any[] | undefined) => any[] | undefined),
	) => void;
	// Unique ID for the chat
	id: string;
	// Add a tool result for a specific tool call
	addToolResult: ({
		toolCallId,
		result,
	}: {
		toolCallId: string;
		result: any;
	}) => void;
	// Current error if any
	error: Error | undefined;
};
```

<TypeScriptExample>

```tsx
// Example of using useAgentChat in a React component
import { useAgentChat } from "agents/ai-react";
import { useAgent } from "agents/react";
import { useState } from "react";

function ChatInterface() {
	// Connect to the chat agent
	const agentConnection = useAgent({
		agent: "customer-support",
		name: "session-12345",
	});

	// Use the useAgentChat hook with the agent connection
	const {
		messages,
		input,
		handleInputChange,
		handleSubmit,
		isLoading,
		error,
		clearHistory,
	} = useAgentChat({
		agent: agentConnection,
		initialMessages: [
			{ role: "system", content: "You're chatting with our AI assistant." },
			{ role: "assistant", content: "Hello! How can I help you today?" },
		],
	});

	return (
		<div className="chat-container">
			<div className="message-history">
				{messages.map((message, i) => (
					<div key={i} className={`message ${message.role}`}>
						{message.role === "user" ? "👤" : "🤖"} {message.content}
					</div>
				))}

				{isLoading && <div className="loading">AI is typing...</div>}
				{error && <div className="error">Error: {error.message}</div>}
			</div>

			<form onSubmit={handleSubmit} className="message-input">
				<input
					value={input}
					onChange={handleInputChange}
					placeholder="Type your message..."
					disabled={isLoading}
				/>
				<button type="submit" disabled={isLoading || !input.trim()}>
					Send
				</button>
				<button type="button" onClick={clearHistory}>
					Clear Chat
				</button>
			</form>
		</div>
	);
}
```

</TypeScriptExample>

### Next steps

- [Build a chat Agent](/agents/getting-started/build-a-chat-agent/) using the Agents SDK and deploy it to Workers.
- Learn more [using WebSockets](/agents/api-reference/websockets/) to build interactive Agents and stream data back from your Agent.
- [Orchestrate asynchronous workflows](/agents/api-reference/run-workflows) from your Agent by combining the Agents SDK and [Workflows](/workflows).

---

# 调用 Agents

URL: https://developers.cloudflare.com/agents/api-reference/calling-agents/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

了解如何从 Workers 调用您的 Agents，包括如何即时创建 Agents、寻址它们以及将请求路由到 Agent 的特定实例。

### 调用您的 Agent

Agents 是即时创建的，可以并发处理多个请求。每个 Agent 实例与其他实例隔离，可以维护自己的状态，并有唯一的地址。

<Render file="unique-agents" />

您可以使用以下方式之一直接从 Worker 创建和运行 Agent 实例：

- `routeAgentRequest` 助手：这将基于 `/agents/:agent/:name` URL 模式自动将请求映射到单个 Agent。`:agent` 的值将是您的 Agent 类名转换为 `kebab-case`，`:name` 的值将是您想要创建或检索的 Agent 实例的名称。
- `getAgentByName`，如果该名称不存在 Agent 实例，它将创建一个新的 Agent 实例，或检索现有实例的句柄。

查看以下示例中的使用模式：

<TypeScriptExample>

```ts
import {
	Agent,
	AgentNamespace,
	getAgentByName,
	routeAgentRequest,
} from "agents";

interface Env {
	// 在环境中定义您的 Agent
	// 将您的 Agent 类作为 TypeScript 类型参数传递，允许您调用
	// 在您的 Agent 上定义的方法。
	MyAgent: AgentNamespace<MyAgent>;
}

export default {
	async fetch(request, env, ctx): Promise<Response> {
		// 路由寻址
		// 自动将 HTTP 请求和/或 WebSocket 连接路由到 /agents/:agent/:name
		// 最适合：使用 agents/react 的 useAgent 将 React 应用直接连接到 Agents
		return (
			(await routeAgentRequest(request, env)) ||
			Response.json({ msg: "no agent here" }, { status: 404 })
		);

		// 命名寻址
		// 最适合：通过名称/ID 创建或检索 Agent 的便利方法。
		// 带上您自己的路由、中间件和/或插入现有的
		// 应用程序或框架。
		let namedAgent = getAgentByName<Env, MyAgent>(
			env.MyAgent,
			"my-unique-agent-id",
		);
		// 将传入请求直接传递给您的 Agent
		let namedResp = (await namedAgent).fetch(request);
		return namedResp;
	},
} satisfies ExportedHandler<Env>;

export class MyAgent extends Agent<Env> {
	// 您的 Agent 实现在这里
}
```

</TypeScriptExample>

:::note[调用其他 Agents]

您也可以从一个 Agent 内调用其他 Agents 并构建多 Agent 系统。

调用其他 Agents 使用与直接调用 Agent 相同的 API。

:::

### 在 Agents 上调用方法

使用 `getAgentByName` 时，您可以传递请求（包括 WebSocket）连接，并使用原生 [JavaScript RPC](/workers/runtime-apis/rpc/)（JSRPC）API 调用直接在 Agent 本身上定义的方法。

例如，一旦您有了 Agent 唯一实例的句柄（或"存根"），您可以在其上调用方法：

<TypeScriptExample>

```ts
import { Agent, AgentNamespace, getAgentByName } from "agents";

interface Env {
	// 在环境中定义您的 Agent
	// 将您的 Agent 类作为 TypeScript 类型参数传递，允许您调用
	// 在您的 Agent 上定义的方法。
	MyAgent: AgentNamespace<MyAgent>;
}

interface UserHistory {
	history: string[];
	lastUpdated: Date;
}

export default {
	async fetch(request, env, ctx): Promise<Response> {
		let namedAgent = getAgentByName<Env, MyAgent>(
			env.MyAgent,
			"my-unique-agent-id",
		);
		// 直接在 Agent 上调用方法，并传递原生 JavaScript 对象
		let chatResponse = namedAgent.chat("你好！");
		// 无需从 HTTP 请求或 WebSocket 序列化/反序列化
		// 消息然后再返回
		let agentState = getState(); // agentState 的类型是 UserHistory
		return namedResp;
	},
} satisfies ExportedHandler<Env>;

export class MyAgent extends Agent<Env, UserHistory> {
	// 您的 Agent 实现在这里
	async chat(prompt: string) {
		// 调用您喜欢的 LLM
		return "result";
	}

	async getState() {
		// 直接返回 Agent 的状态
		return this.state;
	}

	// 根据需要的其他方法！
}
```

</TypeScriptExample>

使用 TypeScript 时，确保将您的 Agent 类作为 TypeScript 类型参数传递给 AgentNamespace 类型，以便正确推断类型：

```ts
interface Env {
	// 将您的 Agent 类作为 TypeScript 类型参数传递，允许您调用
	// 在您的 Agent 上定义的方法。
	MyAgent: AgentNamespace<CodeReviewAgent>;
}

export class CodeReviewAgent extends Agent<Env, AgentState> {
	// Agent 方法在这里
}
```

### 为您的 Agents 命名

为您的 Agents 创建名称时，考虑 Agent 代表什么。一个唯一的用户？一个团队或公司？用于协作的房间或频道？

一致的命名方法允许您：

- 将传入请求直接定向到正确的 Agent
- 确定性地将新请求路由回该 Agent，无论客户端在世界的哪个地方。
- 避免依赖集中式会话存储或外部服务进行状态管理，因为每个 Agent 实例可以维护自己的状态。

对于给定的 Agent 定义（或下面代码中的"命名空间"），可以有数百万（或数千万）个该 Agent 的实例，每个都处理自己的请求、调用 LLM 并维护自己的状态。

例如，您可能为每个使用新的基于 AI 的代码编辑器的用户拥有一个 Agent。在这种情况下，您希望基于系统中的用户 ID 创建 Agents，这将允许该 Agent 处理该用户的所有请求。

它还确保 [Agent 内的状态](/agents/api-reference/store-and-sync-state/)，包括聊天历史、语言偏好、模型配置和其他上下文可以与该用户特别关联，使状态管理变得更容易。

以下示例显示如何为请求中的每个 `userId` 创建唯一的 Agent：

<TypeScriptExample>

```ts
import {
	Agent,
	AgentNamespace,
	getAgentByName,
	routeAgentRequest,
} from "agents";

interface Env {
	MyAgent: AgentNamespace<MyAgent>;
}

export default {
	async fetch(request, env, ctx): Promise<Response> {
		let userId = new URL(request.url).searchParams.get("userId") || "anonymous";
		// 使用允许您路由到请求、WebSockets 或在 Agent 上调用方法的标识符
		// 您也可以在这里放置身份验证逻辑 - 例如，仅为已知用户创建或检索 Agents。
		let namedAgent = getAgentByName<Env, MyAgent>(
			env.MyAgent,
			"my-unique-agent-id",
		);
		return (await namedAgent).fetch(request);
	},
} satisfies ExportedHandler<Env>;

export class MyAgent extends Agent<Env> {
	// 您可以在 Agent 内的任何方法中通过 this.name 访问 Agent 的名称
	async onStartup() {
		console.log(`agent ${this.name} ready!`);
	}
}
```

</TypeScriptExample>

根据您的 Agents 目标，将 `userId` 替换为 `teamName`、`channel`、`companyName` - 和/或配置身份验证以确保仅为已知的、经过身份验证的用户创建 Agents。

### 验证 Agents

使用 Agents SDK 构建和部署 Agents 时，您通常希望在将请求传递给 Agent 之前验证客户端，以便限制 Agent 将调用谁、为特定 Agents 授权特定用户，和/或限制谁可以访问 Agent 公开的管理或调试 API。

作为最佳实践：

- 在调用 Agent 之前，在您的 Workers 代码中处理身份验证。
- 使用 `routeAgentRequest` 助手时使用内置钩子 - `onBeforeConnect` 和 `onBeforeRequest`
- 在使用其他方法调用 Agent 之前，使用您首选的路由器（如 Hono）和身份验证中间件或提供商来应用自定义身份验证方案。

本指南前面记录的 `routeAgentRequest` 助手公开了两个有用的钩子（`onBeforeConnect`、`onBeforeRequest`），允许您在创建或检索 Agent 之前应用自定义逻辑：

<TypeScriptExample>

```ts
import { Agent, AgentNamespace, routeAgentRequest } from "agents";

interface Env {
	MyAgent: AgentNamespace<MyAgent>;
}

export default {
	async fetch(request, env, ctx): Promise<Response> {
		// Use the onBeforeConnect and onBeforeRequest hooks to authenticate clients
		// or run logic before handling a HTTP request or WebSocket.
		return (
			(await routeAgentRequest(request, env, {
				// Run logic before a WebSocket client connects
				onBeforeConnect: (request) => {
					// Your code/auth code here
					// You can return a Response here - e.g. a HTTP 403 Not Authorized -
					// which will stop further request processing and will NOT invoke the
					// Agent.
					// return Response.json({"error": "not authorized"}, { status: 403 })
				},
				// Run logic before a HTTP client clients
				onBeforeRequest: (request) => {
					// Your code/auth code here
					// Returning nothing will result in the call to the Agent continuing
				},
				// Prepend a prefix for how your Agents are named here
				prefix: "name-prefix-here",
			})) || Response.json({ msg: "no agent here" }, { status: 404 })
		);
	},
} satisfies ExportedHandler<Env>;
```

</TypeScriptExample>

如果使用 `getAgentByName` 或底层 Durable Objects 路由 API，则应在调用 `getAgentByName` 之前验证传入请求或 WebSocket 连接。

例如，如果您使用 [Hono](https://hono.dev/)，您可以在调用 Agent 之前在中间件中进行身份验证：

<TypeScriptExample>

```ts
import { Agent, AgentNamespace, getAgentByName } from "agents";
import { Hono } from "hono";

const app = new Hono<{ Bindings: Env }>();

app.use("/code-review/*", async (c, next) => {
	// Perform auth here
	// e.g. validate a Bearer token, a JWT, use your preferred auth library
	// return Response.json({ msg: 'unauthorized' }, { status: 401 });
	await next(); // continue on if valid
});

app.get("/code-review/:id", async (c) => {
	const id = c.req.param("teamId");
	if (!id) return Response.json({ msg: "missing id" }, { status: 400 });

	// Call the Agent, creating it with the name/identifier from the ":id" segment
	// of our URL
	const agent = await getAgentByName<Env, MyAgent>(c.env.MyAgent, id);

	// Pass the request to our Agent instance
	return await agent.fetch(c.req.raw);
});
```

</TypeScriptExample>

这确保我们仅创建已验证用户的 Agents，并允许您验证 Agent 名称是否符合您首选的命名方案，然后再创建实例。

### 下一步

- 查看 [API 文档](/agents/api-reference/agents-api/) 以了解如何定义
- [构建聊天 Agent](/agents/getting-started/build-a-chat-agent/) 使用 Agents SDK 并将其部署到 Workers。
- 了解如何使用 [WebSockets](/agents/api-reference/websockets/) 构建交互式 Agents 并从您的 Agent 流式传输数据。
- [或组织异步工作流](/agents/api-reference/run-workflows) 从您的 Agent 通过组合 Agents SDK 和 [工作流](/workflows)。

---

# 配置

URL: https://developers.cloudflare.com/agents/api-reference/configuration/

import { MetaInfo, Render, Type, WranglerConfig } from "~/components";

Agent 的配置与任何其他 Cloudflare Workers 项目相同，使用[wrangler 配置](/workers/wrangler/configuration/)文件来定义您的代码位置以及它将使用的服务（绑定）。

### 项目结构

从 `npm create cloudflare@latest agents-starter -- --template cloudflare/agents-starter` 创建的 Agent 项目的典型文件结构如下：

```sh
.
|-- package-lock.json
|-- package.json
|-- public
|   `-- index.html
|-- src
|   `-- index.ts // 您的 Agent 定义
|-- test
|   |-- index.spec.ts // 您的测试
|   `-- tsconfig.json
|-- tsconfig.json
|-- vitest.config.mts
|-- worker-configuration.d.ts
`-- wrangler.jsonc // 您的 Workers 和 Agent 配置
```

### 示例配置

下面是一个最小的 `wrangler.jsonc` 文件，定义了 Agent 的配置，包括入口点、`durable_object` 命名空间和代码 `migrations`：

<WranglerConfig>

```jsonc
{
	"$schema": "node_modules/wrangler/config-schema.json",
	"name": "agents-example",
	"main": "src/index.ts",
	"compatibility_date": "2025-02-23",
	"compatibility_flags": ["nodejs_compat"],
	"durable_objects": {
		"bindings": [
			{
				// 必需的：
				"name": "MyAgent", // 您的 Agent 从 Worker 中的调用方式
				"class_name": "MyAgent", // 必须与代码中 Agent 的类名匹配
				// 可选：如果 Agent 在另一个 Worker 脚本中定义，请设置此项
				"script_name": "the-other-worker",
			},
		],
	},
	"migrations": [
		{
			"tag": "v1",
			// Agent 存储状态的必要配置
			"new_sqlite_classes": ["MyAgent"],
		},
	],
	"observability": {
		"enabled": true,
	},
}
```

</WranglerConfig>

配置包括：

- 一个 `main` 字段，指向您的 Agent 的入口点，通常是 TypeScript（或 JavaScript）文件。
- 一个 `durable_objects` 字段，定义您的 Agents 将在其中运行的 [Durable Object 命名空间](/durable-objects/reference/glossary/)。
- 一个 `migrations` 字段，定义您的 Agent 将使用的代码迁移。此字段是必需的，必须至少包含一次迁移。`new_sqlite_classes` 字段是 Agent 存储状态的必要配置。

Agents 必须在其 `wrangler.jsonc`（或 `wrangler.toml`）配置文件中定义这些字段。

---

# HTTP 和服务器发送事件

URL: https://developers.cloudflare.com/agents/api-reference/http-sse/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

Agents SDK 允许您处理 HTTP 请求，并原生支持[服务器发送事件](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events)（SSE）。这允许您构建可以将数据推送到客户端并避免缓冲的应用程序。

### 处理 HTTP 请求

Agents 可以使用 `onRequest` 方法处理 HTTP 请求，该方法在 Agent 实例接收到 HTTP 请求时被调用。该方法将 `Request` 对象作为参数并返回 `Response` 对象。

<TypeScriptExample>

```ts
class MyAgent extends Agent<Env, State> {
	// 处理来到此 Agent 实例的 HTTP 请求
	// 返回 Response 对象
	async onRequest(request: Request) {
		return new Response("来自 Agent 的问候！");
	}

	async callAIModel(prompt: string) {
		// 在这里实现 AI 模型调用
	}
}
```

</TypeScriptExample>

查看 [Agents API 参考](/agents/api-reference/agents-api/)了解更多关于 `Agent` 类及其方法的信息。

### 实现服务器发送事件

Agents SDK 直接支持服务器发送事件：您可以使用 SSE 通过长时间运行的连接将数据流式传输回客户端。这避免了缓冲大型响应，这既可能使您的 Agent 感觉缓慢，又迫使您在内存中缓冲整个响应。

当 Agent 部署到 Cloudflare Workers 时，对于将响应流式传输回去的总时间没有有效限制：需要几分钟推理然后响应的大型 AI 模型响应不会被过早终止。

请注意，这并不意味着客户端在流式传输过程中不能潜在地断开连接：您可以通过[写入 Agent 的有状态存储](/agents/api-reference/store-and-sync-state/)和/或[使用 WebSockets](/agents/api-reference/websockets/)来解决这个问题。因为您总是可以[路由到同一个 Agent](/agents/api-reference/calling-agents/)，所以您不需要使用集中式会话存储来在客户端断开连接时从中断的地方继续。

以下示例使用 AI SDK 生成文本并将其流式传输回客户端。它将在模型生成时自动将响应流式传输回客户端：

<TypeScriptExample>

```ts
import {
	Agent,
	AgentNamespace,
	getAgentByName,
	routeAgentRequest,
} from "agents";
import { streamText } from "ai";
import { createOpenAI, openai } from "@ai-sdk/openai";

interface Env {
	MyAgent: AgentNamespace<MyAgent>;
	OPENAI_API_KEY: string;
}

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request) {
		// 通过以下方式测试：
		// curl -d '{"prompt": "为我写一个 Cloudflare Worker"}' <url>
		let data = await request.json<{ prompt: string }>();
		let stream = await this.callAIModel(data.prompt);
		// 这使用服务器发送事件（SSE）
		return stream.toTextStreamResponse({
			headers: {
				"Content-Type": "text/x-unknown",
				"content-encoding": "identity",
				"transfer-encoding": "chunked",
			},
		});
	}

	async callAIModel(prompt: string) {
		const openai = createOpenAI({
			apiKey: this.env.OPENAI_API_KEY,
		});

		return streamText({
			model: openai("gpt-4o"),
			prompt: prompt,
		});
	}
}

export default {
	async fetch(request: Request, env: Env) {
		let agentId = new URL(request.url).searchParams.get("agent-id") || "";
		const agent = await getAgentByName<Env, MyAgent>(env.MyAgent, agentId);
		return agent.fetch(request);
	},
};
```

</TypeScriptExample>

### WebSockets 与服务器发送事件

WebSockets 和服务器发送事件（SSE）都能够在客户端和 Agents 之间进行实时通信。基于 Agents SDK 构建的 Agents 可以直接公开 WebSocket 和 SSE 端点。

- WebSockets 提供全双工通信，允许数据同时在两个方向流动。SSE 仅支持服务器到客户端的通信，如果客户端需要向服务器发送数据，需要额外的 HTTP 请求。
- WebSockets 建立一个在会话期间保持开放的单一持久连接。基于 HTTP 构建的 SSE 可能由于重新连接尝试和每次重新连接时的头部传输而经历更多开销，特别是当有大量客户端-服务器通信时。
- 虽然 SSE 对于简单的流式传输场景效果很好，但 WebSockets 更适合需要几分钟或几小时连接时间的应用程序，因为它们通过内置的 ping/pong 机制维护更稳定的连接以保持连接活跃。
- WebSockets 使用自己的协议（ws:// 或 wss://），在初始握手后与 HTTP 分离。这种分离允许 WebSockets 更好地处理二进制数据传输并为专门用例实现自定义子协议。

如果您不确定哪种更适合您的用例，我们推荐 WebSockets。[WebSockets API 文档](/agents/api-reference/websockets/)提供了有关如何在 Agents SDK 中使用 WebSockets 的详细信息。

### 下一步

- 查看 [API 文档](/agents/api-reference/agents-api/)了解如何定义 Agents 类。
- [构建聊天 Agent](/agents/getting-started/build-a-chat-agent/)使用 Agents SDK 并将其部署到 Workers。
- 了解更多[使用 WebSockets](/agents/api-reference/websockets/)构建交互式 Agents 并从您的 Agent 流式传输数据。
- [编排异步工作流](/agents/api-reference/run-workflows)通过结合 Agents SDK 和[工作流](/workflows)从您的 Agent。

---

# 检索增强生成

URL: https://developers.cloudflare.com/agents/api-reference/rag/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

Agents 可以使用检索增强生成（RAG）来检索相关信息并使用它来增强[对 AI 模型的调用](/agents/api-reference/using-ai-models/)。存储用户的聊天历史记录以用作未来对话的上下文，总结文档以引导 Agent 的知识库，和/或使用来自 Agent 的[网页浏览](/agents/api-reference/browse-the-web/)任务的数据来增强您的 Agent 的能力。

您可以使用 Agent 自己的 [SQL 数据库](/agents/api-reference/store-and-sync-state)作为数据的事实来源，并将嵌入存储在 [Vectorize](/vectorize/)（或任何其他支持向量的数据库）中，以允许您的 Agent 检索相关信息。

### 向量搜索

:::note

如果您对向量数据库和 Vectorize 完全陌生，请访问 [Vectorize 教程](/vectorize/get-started/intro/)学习基础知识，包括如何创建索引、插入数据和生成嵌入。

:::

您可以从 Agent 上的任何方法查询向量索引（或索引）：您附加的任何 Vectorize 索引都可以在 Agent 内的 `this.env` 上使用。如果您已经[将元数据关联](/vectorize/best-practices/insert-vectors/#metadata)到映射回存储在 Agent 中的数据的向量，您可以使用 `this.sql` 直接在 Agent 内查找数据。

以下是如何为 Agent 提供检索功能的示例：

<TypeScriptExample>

```ts
import { Agent } from "agents";

interface Env {
	AI: Ai;
	VECTOR_DB: Vectorize;
}

export class RAGAgent extends Agent<Env> {
	// 我们 Agent 上的其他方法
	// ...
	//
	async queryKnowledge(userQuery: string) {
		// 将查询转换为嵌入
		const queryVector = await this.env.AI.run("@cf/baai/bge-base-en-v1.5", {
			text: [userQuery],
		});

		// 从我们的向量索引中检索结果
		let searchResults = await this.env.VECTOR_DB.query(queryVector.data[0], {
			topK: 10,
			returnMetadata: "all",
		});

		let knowledge = [];
		for (const match of searchResults.matches) {
			console.log(match.metadata);
			knowledge.push(match.metadata);
		}

		// 使用元数据将向量搜索结果重新关联
		// 到我们 Agent 的 SQL 数据库中的数据
		let results = this
			.sql`SELECT * FROM knowledge WHERE id IN (${knowledge.map((k) => k.id)})`;

		// 返回它们
		return results;
	}
}
```

</TypeScriptExample>

您还需要将您的 Agent 连接到您的向量索引：

<WranglerConfig>

```jsonc
{
	// ...
	"vectorize": [
		{
			"binding": "VECTOR_DB",
			"index_name": "your-vectorize-index-name",
		},
	],
	// ...
}
```

</WranglerConfig>

如果您有多个要提供的索引，您可以提供一个 `vectorize` 绑定数组。

#### 下一步

- 了解更多关于如何[结合 Vectorize 和 Workers AI](/vectorize/get-started/embeddings/)
- 查看 [Vectorize 查询 API](/vectorize/reference/client-api/)
- 使用[元数据过滤](/vectorize/reference/metadata-filtering/)为您的结果添加上下文

---

# API 参考

URL: https://developers.cloudflare.com/agents/api-reference/

import { DirectoryListing } from "~/components";

了解更多关于 Agents 能做什么、`Agent` 类以及 Agents 公开的 API：

<DirectoryListing />

---

# 运行工作流

URL: https://developers.cloudflare.com/agents/api-reference/run-workflows/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

Agents 可以触发异步[工作流](/workflows/)，允许您的 Agent 在后台运行复杂的多步骤任务。这可以包括后处理用户上传的文件、更新[向量数据库](/vectorize/)中的嵌入，和/或管理长时间运行的用户生命周期电子邮件或短信通知工作流。

因为 Agent 就像 Worker 脚本一样，它可以创建在与 Agent 相同项目（脚本）中定义的工作流*或*在不同项目中定义的工作流。

:::note[Agents 与工作流]

Agents 和工作流有一些相似性：它们都可以异步运行任务。对于直接的线性任务或需要运行到完成的任务，工作流可能是理想的：步骤可以重试，它们可以被取消，并且可以对事件做出反应。

Agents 不必运行到完成：它们可以循环、分支并永远运行，它们还可以直接与用户交互（通过 HTTP 或 WebSockets）。Agent 可以在运行时触发多个工作流，因此可以用来协调和管理工作流以实现其目标。

:::

## 触发工作流

Agent 可以从任何方法内触发一个或多个工作流，无论是来自传入的 HTTP 请求、WebSocket 连接、延迟或调度，和/或来自 Agent 采取的任何其他操作。

从 Agent 触发工作流与[从 Worker 脚本触发工作流](/workflows/build/trigger-workflows/)没有区别：

<TypeScriptExample>

```ts
interface Env {
	MY_WORKFLOW: Workflow;
	MyAgent: AgentNamespace<MyAgent>;
}

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request) {
		let userId = request.headers.get("user-id");
		// 触发运行工作流的调度
		// 传递负载给它
		let { taskId } = await this.schedule(300, "runWorkflow", {
			id: userId,
			flight: "DL264",
			date: "2025-02-23",
		});
	}

	async runWorkflow(data) {
		let instance = await env.MY_WORKFLOW.create({
			id: data.id,
			params: data,
		});

		// 调度另一个每5分钟检查工作流状态的任务...
		await this.schedule("*/5 * * * *", "checkWorkflowStatus", {
			id: instance.id,
		});
	}
}

export class MyWorkflow extends WorkflowEntrypoint<Env> {
	async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
		// 您的工作流代码在这里
	}
}
```

</TypeScriptExample>

您还需要确保您的 Agent [与您的工作流有绑定](/workflows/build/trigger-workflows/#workers-api-bindings)，以便它可以调用它：

<WranglerConfig>

```jsonc
{
	// ...
	// 在您的 Agent 和您的工作流之间创建绑定
	"workflows": [
		{
			// 必需的：
			"name": "EMAIL_WORKFLOW",
			"class_name": "MyWorkflow",
			// 可选：如果您的工作流在与您的 Agent 不同的项目中定义，请设置 script_name 字段
			"script_name": "email-workflows",
		},
	],
	// ...
}
```

</WranglerConfig>

## 从另一个项目触发工作流

您还可以通过在 Agent 的 `workflows` 绑定中设置 `script_name` 属性，调用在与 Agent 不同的 Workers 脚本中定义的工作流：

<WranglerConfig>

```jsonc
{
	// 必需的：
	"name": "EMAIL_WORKFLOW",
	"class_name": "MyWorkflow",
	// 可选：如果您的工作流在与您的 Agent 不同的项目中定义，请设置 script_name 字段
	"script_name": "email-workflows",
}
```

</WranglerConfig>

有关更多示例，请参阅工作流文档的[跨脚本调用](/workflows/build/workers-api/#cross-script-calls)部分。

---

# 调度任务

URL: https://developers.cloudflare.com/agents/api-reference/schedule-tasks/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

Agent 可以通过调用 `this.schedule(when, callback, data)` 来调度未来运行的任务，其中 `when` 可以是延迟、`Date` 或 cron 字符串；`callback` 是要调用的函数名，`data` 是要传递给函数的数据对象。

调度任务可以做用户请求或消息能做的任何事情：发出请求、查询数据库、发送邮件、读写状态：调度任务可以调用您的 Agent 上的任何常规方法。

### 调度任务

您可以在 Agent 的任何方法中调用 `this.schedule`，并为每个单独的 Agent 调度数万个任务：

<TypeScriptExample>

```ts
import { Agent } from "agents";

export class SchedulingAgent extends Agent {
	async onRequest(request) {
		// 处理传入请求
		// 调度一个5分钟后的任务
		// 调用"checkFlights"方法
		let { taskId } = await this.schedule(600, "checkFlights", {
			flight: "DL264",
			date: "2025-02-23",
		});
		return Response.json({ taskId });
	}

	async checkFlights(data) {
		// 当我们的调度任务运行时被调用
		// 我们也可以在这里调用 this.schedule 来调度另一个任务
	}
}
```

</TypeScriptExample>

:::caution

为不存在的方法设置回调的任务将抛出异常：确保 `this.schedule` 的 `callback` 参数中命名的方法存在于您的 `Agent` 类上。

:::

您可以通过多种方式调度任务：

<TypeScriptExample>

```ts
// 调度一个在10秒后运行的任务
let task = await this.schedule(10, "someTask", { message: "hello" });

// 调度一个在特定日期运行的任务
let task = await this.schedule(new Date("2025-01-01"), "someTask", {});

// 调度一个每10秒运行的任务
let { id } = await this.schedule("*/10 * * * *", "someTask", {
	message: "hello",
});

// 调度一个每10秒运行的任务，但只在周一
let task = await this.schedule("0 0 * * 1", "someTask", { message: "hello" });

// 取消调度任务
this.cancelSchedule(task.id);
```

</TypeScriptExample>

调用 `await this.schedule` 返回一个 `Schedule`，其中包含任务随机生成的 `id`。您可以使用此 `id` 在将来检索或取消任务。它还提供一个 `type` 属性，指示调度类型，例如，`"scheduled" | "delayed" | "cron"` 之一。

:::note[最大调度任务数]

每个任务都映射到 Agent 底层 [SQLite 数据库](/durable-objects/api/storage-api/) 中的一行，这意味着每个任务最大可达 2 MB。最大任务数必须满足 `(任务大小 * 任务数) + 所有其他状态 < 最大数据库大小`（目前每个 Agent 为 1GB）。

:::

### 管理调度任务

您可以使用调度 API 在 Agent 内获取、取消和过滤调度任务：

<TypeScriptExample>

```ts
// 通过 ID 获取特定调度
// 如果任务不存在则返回 undefined
let task = await this.getSchedule(task.id);

// 获取所有调度任务
// 返回 Schedule 对象数组
let tasks = this.getSchedules();

// 通过 ID 取消任务
// 如果任务被取消则返回 true，如果不存在则返回 false
await this.cancelSchedule(task.id);

// 过滤特定任务
// 例如，未来一小时内开始的所有任务
let tasks = this.getSchedules({
	timeRange: {
		start: new Date(Date.now()),
		end: new Date(Date.now() + 60 * 60 * 1000),
	},
});
```

</TypeScriptExample>

---

# 存储和同步状态

URL: https://developers.cloudflare.com/agents/api-reference/store-and-sync-state/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

每个 Agent 都有内置的状态管理功能，包括内置存储和 Agent 与前端应用程序之间的同步。

Agent 内的状态具有以下特性：

- 在 Agent 重启时持久化：数据永久存储在 Agent 内。
- 自动序列化/反序列化：您可以存储任何 JSON 可序列化的数据。
- 在 Agent 内立即一致：读取您自己的写入。
- 并发更新的线程安全
- 快速：状态与 Agent 运行的位置位于同一位置。读取和写入不需要穿越网络。

Agent 状态存储在嵌入每个单独 Agent 实例内的 SQL 数据库中：您可以使用更高级别的 `this.setState` API（推荐）与其交互，它允许您同步状态并在状态更改时触发事件，或直接使用 `this.sql` 查询数据库。

#### 状态 API

每个 Agent 都有内置的状态管理功能。您可以直接使用 `this.setState` 设置和更新 Agent 的状态：

<TypeScriptExample>

```ts
import { Agent } from "agents";

export class MyAgent extends Agent {
	// 响应事件更新状态
	async incrementCounter() {
		this.setState({
			...this.state,
			counter: this.state.counter + 1,
		});
	}

	// 处理传入消息
	async onMessage(message) {
		if (message.type === "update") {
			this.setState({
				...this.state,
				...message.data,
			});
		}
	}

	// 处理状态更新
	onStateUpdate(state, source: "server" | Connection) {
		console.log("状态已更新", state);
	}
}
```

</TypeScriptExample>

如果您使用 TypeScript，您还可以通过将类型作为 [类型参数](https://www.typescriptlang.org/docs/handbook/2/generics.html#using-type-parameters-in-generic-constraints) 传递给 `Agent` 类定义的*第二个*类型参数来为您的 Agent 状态提供类型。

<TypeScriptExample>

```ts
import { Agent } from "agents";

interface Env {}

// 为您的 Agent 状态定义类型
interface FlightRecord {
	id: string;
	departureIata: string;
	arrival: Date;
	arrivalIata: string;
	price: number;
}

// 传入您的 Agent 状态的类型
export class MyAgent extends Agent<Env, FlightRecord> {
	// 这允许 this.setState 和 onStateUpdate 方法
	// 被类型化：
	async onStateUpdate(state: FlightRecord) {
		console.log("状态已更新", state);
	}

	async someOtherMethod() {
		this.setState({
			...this.state,
			price: this.state.price + 10,
		});
	}
}
```

</TypeScriptExample>

### 为 Agent 设置初始状态

您还可以通过 `Agent` 类上的 `initialState` 属性为 Agent 设置初始状态：

<TypeScriptExample>

```ts
type State = {
	counter: number;
	text: string;
	color: string;
};

class MyAgent extends Agent<Env, State> {
	// 设置默认的初始状态
	initialState = {
		counter: 0,
		text: "",
		color: "#3B82F6",
	};

	doSomething() {
		console.log(this.state); // {counter: 0, text: "", color: "#3B82F6"}，如果您还没有设置状态
	}
}
```

</TypeScriptExample>

任何初始状态都会同步到通过[`useAgent` hook](#synchronizing-state)连接的客户端。

### 同步状态

客户端可以连接到 Agent 并使用作为 `agents/react` 一部分提供的 React hooks 与其状态保持同步。

React 应用程序可以调用 `useAgent` 通过 WebSockets 连接到命名的 Agent

<TypeScriptExample>

```ts
import { useState } from "react";
import { useAgent } from "agents/react";

function StateInterface() {
  const [state, setState] = useState({ counter: 0 });

  const agent = useAgent({
    agent: "thinking-agent",
    name: "my-agent",
    onStateUpdate: (newState) => setState(newState),
  });

  const increment = () => {
    agent.setState({ counter: state.counter + 1 });
  };

  return (
    <div>
      <div>计数: {state.counter}</div>
      <button onClick={increment}>增加</button>
    </div>
  );
}
```

</TypeScriptExample>

状态同步系统：

- 自动将 Agent 的状态同步到所有连接的客户端
- 优雅地处理客户端断开连接和重新连接
- 提供立即的本地更新
- 支持多个同时的客户端连接

常见用例：

- 实时协作功能
- 多窗口/选项卡同步
- 跨多个设备的实时更新
- 在客户端之间维护一致的 UI 状态
- 当新客户端连接时，它们自动从 Agent 接收当前状态，确保所有客户端都从最新数据开始。

### SQL API

每个单独的 Agent 实例都有自己的 SQL（SQLite）数据库，运行在与 Agent 本身*相同的上下文*中。这意味着在您的 Agent 内插入或查询数据实际上是零延迟的：Agent 不必跨越大陆或世界来访问自己的数据。

您可以通过 `this.sql` 在 Agent 的任何方法中访问 SQL API。SQL API 接受模板字符串，并且

<TypeScriptExample>

```ts
export class MyAgent extends Agent<Env> {
	async onRequest(request: Request) {
		let userId = new URL(request.url).searchParams.get("userId");

		// 'users' 这里只是一个例子：您可以创建任意表并在每个 Agent 的数据库中
		// 使用 SQL（SQLite 语法）定义您自己的架构。
		let user = await this.sql`SELECT * FROM users WHERE id = ${userId}`;
		return Response.json(user);
	}
}
```

</TypeScriptExample>

您还可以为查询提供一个 [TypeScript 类型参数](https://www.typescriptlang.org/docs/handbook/2/generics.html#using-type-parameters-in-generic-constraints)，它将用于推断结果的类型：

```ts
type User = {
	id: string;
	name: string;
	email: string;
};

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request) {
		let userId = new URL(request.url).searchParams.get("userId");
		// Supply the type paramter to the query when calling this.sql
		// This assumes the results returns one or more User rows with "id", "name", and "email" columns
		const user = await this.sql<User>`SELECT * FROM users WHERE id = ${userId}`;
		return Response.json(user);
	}
}
```

您不需要指定数组类型 (`User[]` 或 `Array<User>`)，因为 `this.sql` 将始终返回指定类型的数组。

提供类型参数不会验证结果是否匹配您的类型定义。在 TypeScript 中，不存在的属性（字段）或不符合您提供的类型定义的属性将被删除。如果您需要验证传入的事件，我们建议使用库，例如 [zod](https://zod.dev/) 或您自己的验证逻辑。

:::note

Learn more about the zero-latency SQL storage that powers both Agents and Durable Objects [on our blog](https://blog.cloudflare.com/sqlite-in-durable-objects/).

:::

The SQL API exposed to an Agent is similar to the one [within Durable Objects](/durable-objects/api/storage-api/#sql-api): Durable Object SQL methods available on `this.ctx.storage.sql`. You can use the same SQL queries with the Agent's database, create tables, and query data, just as you would with Durable Objects or [D1](/d1/).

### Use Agent state as model context

You can combine the state and SQL APIs in your Agent with its ability to [call AI models](/agents/api-reference/using-ai-models/) to include historical context within your prompts to a model. Modern Large Language Models (LLMs) often have very large context windows (up to millions of tokens), which allows you to pull relevant context into your prompt directly.

For example, you can use an Agent's built-in SQL database to pull history, query a model with it, and append to that history ahead of the next call to the model:

<TypeScriptExample>

```ts
export class ReasoningAgent extends Agent<Env> {
	async callReasoningModel(prompt: Prompt) {
		let result = this
			.sql<History>`SELECT * FROM history WHERE user = ${prompt.userId} ORDER BY timestamp DESC LIMIT 1000`;
		let context = [];
		for await (const row of result) {
			context.push(row.entry);
		}

		const client = new OpenAI({
			apiKey: this.env.OPENAI_API_KEY,
		});

		// Combine user history with the current prompt
		const systemPrompt = prompt.system || "You are a helpful assistant.";
		const userPrompt = `${prompt.user}\n\nUser history:\n${context.join("\n")}`;

		try {
			const completion = await client.chat.completions.create({
				model: this.env.MODEL || "o3-mini",
				messages: [
					{ role: "system", content: systemPrompt },
					{ role: "user", content: userPrompt },
				],
				temperature: 0.7,
				max_tokens: 1000,
			});

			// Store the response in history
			this
				.sql`INSERT INTO history (timestamp, user, entry) VALUES (${new Date()}, ${prompt.userId}, ${completion.choices[0].message.content})`;

			return completion.choices[0].message.content;
		} catch (error) {
			console.error("Error calling reasoning model:", error);
			throw error;
		}
	}
}
```

</TypeScriptExample>

This works because each instance of an Agent has its _own_ database, the state stored in that database is private to that Agent: whether it's acting on behalf of a single user, a room or channel, or a deep research tool. By default, you don't have to manage contention or reach out over the network to a centralized database to retrieve and store state.

### Next steps

- Review the [API documentation](/agents/api-reference/agents-api/) for the Agents class to learn how to define them.
- [Build a chat Agent](/agents/getting-started/build-a-chat-agent/) using the Agents SDK and deploy it to Workers.
- Learn more [using WebSockets](/agents/api-reference/websockets/) to build interactive Agents and stream data back from your Agent.
- [Orchestrate asynchronous workflows](/agents/api-reference/run-workflows) from your Agent by combining the Agents SDK and [Workflows](/workflows).

---

# 使用 AI 模型

URL: https://developers.cloudflare.com/agents/api-reference/using-ai-models/

import {
	AnchorHeading,
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
	PackageManagers,
} from "~/components";

Agents 可以与托管在任何提供商上的 AI 模型进行通信，包括：

- [Workers AI](/workers-ai/)
- [AI SDK](https://sdk.vercel.ai/docs/ai-sdk-core/overview)
- [OpenAI](https://platform.openai.com/docs/quickstart?language=javascript)
- [Anthropic](https://docs.anthropic.com/en/api/client-sdks#typescript)
- [Google's Gemini](https://ai.google.dev/gemini-api/docs/openai)

您还可以使用 [AI Gateway](/ai-gateway/) 中的模型路由功能来跨提供商路由、评估响应和管理 AI 提供商速率限制。

因为 Agents 构建在 [Durable Objects](/durable-objects/) 之上，每个 Agent 或聊天会话都与一个有状态的计算实例相关联。传统的无服务器架构通常为聊天等实时应用程序所需的持久连接带来挑战。

用户可能在现代推理模型（如 `o3-mini` 或 DeepSeek R1）的长时间运行响应期间断开连接，或在刷新浏览器时丢失对话上下文。Agents 不依赖请求-响应模式和管理外部数据库来跟踪和存储对话状态，而是可以直接在 Agent 内存储状态。如果客户端断开连接，Agent 可以写入其自己的分布式存储，并在客户端重新连接时立即更新——即使是几小时或几天后。

## 调用 AI 模型

您可以从 Agent 内的任何方法调用模型，包括使用 [`onRequest`](/agents/api-reference/agents-api/) 处理器处理 HTTP 请求时、运行[计划任务](/agents/api-reference/schedule-tasks/)时、在 [`onMessage`](/agents/api-reference/websockets/) 处理器中处理 WebSocket 消息时，或从您自己的任何方法中。

重要的是，Agents 可以自主调用 AI 模型，并且可以处理可能需要几分钟（或更长时间）才能完全响应的长时间运行响应。

### 长时间运行的模型请求 {/* long-running-model-requests */}

现代[推理模型](https://platform.openai.com/docs/guides/reasoning)或"思考"模型可能需要一些时间来生成响应*并且*将响应流式传输回客户端。

您可以使用 [WebSocket API](/agents/api-reference/websockets/) 将响应流式传输回客户端，而不是缓冲整个响应或冒客户端断开连接的风险。

<TypeScriptExample filename="src/index.ts">

```ts
import { Agent } from "agents";
import { OpenAI } from "openai";

export class MyAgent extends Agent<Env> {
	async onConnect(connection: Connection, ctx: ConnectionContext) {
		//
	}

	async onMessage(connection: Connection, message: WSMessage) {
		let msg = JSON.parse(message);
		// 这可以运行任意长的时间，并返回任意数量的消息！
		await queryReasoningModel(connection, msg.prompt);
	}

	async queryReasoningModel(connection: Connection, userPrompt: string) {
		const client = new OpenAI({
			apiKey: this.env.OPENAI_API_KEY,
		});

		try {
			const stream = await client.chat.completions.create({
				model: this.env.MODEL || "o3-mini",
				messages: [{ role: "user", content: userPrompt }],
				stream: true,
			});

			// 将响应作为 WebSocket 消息流式传输回去
			for await (const chunk of stream) {
				const content = chunk.choices[0]?.delta?.content || "";
				if (content) {
					connection.send(JSON.stringify({ type: "chunk", content }));
				}
			}

			// 发送完成消息
			connection.send(JSON.stringify({ type: "done" }));
		} catch (error) {
			connection.send(JSON.stringify({ type: "error", error: error }));
		}
	}
}
```

</TypeScriptExample>

您还可以使用 `this.setState` 方法将 AI 模型响应持久化到 [Agent 的内部状态](/agents/api-reference/store-and-sync-state/)。例如，如果您运行[计划任务](/agents/api-reference/schedule-tasks/)，您可以存储任务的输出并稍后读取。或者，如果用户断开连接，读取消息历史记录并在他们重新连接时发送给用户。

### Workers AI

### 托管模型

您可以通过[配置绑定](/workers-ai/configuration/bindings/)在 Agent 中使用 [Workers AI 中可用的任何模型](/workers-ai/models/)。

Workers AI 通过设置 `stream: true` 开箱即用地支持流式响应，我们强烈推荐使用它们来避免缓冲和延迟响应，特别是对于较大的模型或需要更多时间生成响应的推理模型。

<TypeScriptExample filename="src/index.ts">

```ts
import { Agent } from "agents";

interface Env {
	AI: Ai;
}

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request) {
		const response = await env.AI.run(
			"@cf/deepseek-ai/deepseek-r1-distill-qwen-32b",
			{
				prompt: "为我构建一个返回 JSON 的 Cloudflare Worker。",
				stream: true, // 流式传输响应，不阻塞客户端！
			},
		);

		// 返回流
		return new Response(answer, {
			headers: { "content-type": "text/event-stream" },
		});
	}
}
```

</TypeScriptExample>

您的 Wrangler 配置需要添加 `ai` 绑定：

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

### 模型路由

您还可以通过在调用 AI 绑定时指定 [`gateway` 配置](/ai-gateway/providers/workersai/)，直接从 Agent 使用 [AI Gateway](/ai-gateway/) 中的模型路由功能。

:::note

模型路由允许您根据模型是否可达、对客户端进行速率限制，和/或是否超出了特定提供商的成本预算，将请求路由到不同的 AI 模型。

:::

<TypeScriptExample filename="src/index.ts">

```ts
import { Agent } from "agents";

interface Env {
	AI: Ai;
}

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request) {
		const response = await env.AI.run(
			"@cf/deepseek-ai/deepseek-r1-distill-qwen-32b",
			{
				prompt: "为我构建一个返回 JSON 的 Cloudflare Worker。",
			},
			{
				gateway: {
					id: "{gateway_id}", // 在此处指定您的 AI Gateway ID
					skipCache: false,
					cacheTtl: 3360,
				},
			},
		);

		return Response.json(response);
	}
}
```

</TypeScriptExample>

您的 Wrangler 配置需要添加 `ai` 绑定。这在 Workers AI 和 AI Gateway 之间共享。

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

访问 [AI Gateway 文档](/ai-gateway/) 了解如何配置网关并检索网关 ID。

### AI SDK

[AI SDK](https://sdk.vercel.ai/docs/introduction) 为使用 AI 模型提供了统一的 API，包括文本生成、工具调用、结构化响应、图像生成等。

To use the AI SDK, install the `ai` package and use it within your Agent. The example below shows how it use it to generate text on request, but you can use it from any method within your Agent, including WebSocket handlers, as part of a scheduled task, or even when the Agent is initialized.

<PackageManagers pkg="ai @ai-sdk/openai" />

<TypeScriptExample filename="src/index.ts">

```ts
import { Agent } from "agents";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request): Promise<Response> {
		const { text } = await generateText({
			model: openai("o3-mini"),
			prompt: "Build me an AI agent on Cloudflare Workers",
		});

		return Response.json({ modelResponse: text });
	}
}
```

</TypeScriptExample>

### OpenAI compatible endpoints

Agents can call models across any service, including those that support the OpenAI API. For example, you can use the OpenAI SDK to use one of [Google's Gemini models](https://ai.google.dev/gemini-api/docs/openai#node.js) directly from your Agent.

Agents can stream responses back over HTTP using Server Sent Events (SSE) from within an `onRequest` handler, or by using the native [WebSockets](/agents/api-reference/websockets/) API in your Agent to responses back to a client, which is especially useful for larger models that can take over 30+ seconds to reply.

<TypeScriptExample filename="src/index.ts">

```ts
import { Agent } from "agents";
import { OpenAI } from "openai";

export class MyAgent extends Agent<Env> {
	async onRequest(request: Request): Promise<Response> {
		const openai = new OpenAI({
			apiKey: this.env.GEMINI_API_KEY,
			baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
		});

		// Create a TransformStream to handle streaming data
		let { readable, writable } = new TransformStream();
		let writer = writable.getWriter();
		const textEncoder = new TextEncoder();

		// Use ctx.waitUntil to run the async function in the background
		// so that it doesn't block the streaming response
		ctx.waitUntil(
			(async () => {
				const stream = await openai.chat.completions.create({
					model: "4o",
					messages: [
						{ role: "user", content: "Write me a Cloudflare Worker." },
					],
					stream: true,
				});

				// loop over the data as it is streamed and write to the writeable
				for await (const part of stream) {
					writer.write(
						textEncoder.encode(part.choices[0]?.delta?.content || ""),
					);
				}
				writer.close();
			})(),
		);

		// Return the readable stream back to the client
		return new Response(readable);
	}
}
```

</TypeScriptExample>

---

# 使用 WebSockets

URL: https://developers.cloudflare.com/agents/api-reference/websockets/

import {
	MetaInfo,
	Render,
	Type,
	TypeScriptExample,
	WranglerConfig,
} from "~/components";

用户和客户端可以通过 WebSockets 直接连接到 Agent，允许与您的 Agent 进行长时间运行的双向通信。

要使 Agent 能够接受 WebSockets，请在您的 Agent 上定义 `onConnect` 和 `onMessage` 方法。

- `onConnect(connection: Connection, ctx: ConnectionContext)` 在客户端建立新的 WebSocket 连接时被调用。原始的 HTTP 请求，包括请求头、cookies 和 URL 本身，都可以在 `ctx.request` 上获取。
- `onMessage(connection: Connection, message: WSMessage)` 为每个传入的 WebSocket 消息调用。消息是 `ArrayBuffer | ArrayBufferView | string` 之一，您可以使用 `connection.send()` 向客户端发送消息。您可以通过检查 `connection.id` 来区分客户端连接，它对每个连接的客户端都是唯一的。

以下是一个回显收到的任何消息的 Agent 示例：

<TypeScriptExample>

```ts
import { Agent, Connection } from "agents";

export class ChatAgent extends Agent {
	async onConnect(connection: Connection, ctx: ConnectionContext) {
		// 连接被 SDK 自动接受。
		// 您也可以在这里用 connection.close() 显式关闭连接
		// 在 ctx.request 上访问 Request 以检查头部、cookies 和 URL
	}

	async onMessage(connection: Connection, message: WSMessage) {
		// const response = await longRunningAITask(message)
		await connection.send(message);
	}
}
```

</TypeScriptExample>

### 连接客户端

Agent 框架包含一个有用的辅助包，用于从客户端应用程序直接连接到您的 Agent（或其他 Agents）。导入 `agents/client`，创建 `AgentClient` 实例并使用它连接到您的 Agent 实例：

<TypeScriptExample>

```ts
import { AgentClient } from "agents/client";

const connection = new AgentClient({
	agent: "dialogue-agent",
	name: "insight-seeker",
});

connection.addEventListener("message", (event) => {
	console.log("接收到:", event.data);
});

connection.send(
	JSON.stringify({
		type: "inquiry",
		content: "您看到了什么模式？",
	}),
);
```

</TypeScriptExample>

### React 客户端

基于 React 的应用程序可以导入 `agents/react` 并使用 `useAgent` hook 直接连接到 Agent 实例：

<TypeScriptExample>

```ts
import { useAgent } from "agents/react";

function AgentInterface() {
  const connection = useAgent({
    agent: "dialogue-agent",
    name: "insight-seeker",
    onMessage: (message) => {
      console.log("收到理解:", message.data);
    },
    onOpen: () => console.log("连接已建立"),
    onClose: () => console.log("连接已关闭"),
  });

  const inquire = () => {
    connection.send(
      JSON.stringify({
        type: "inquiry",
        content: "您收集了哪些洞察？",
      })
    );
  };

  return (
    <div className="agent-interface">
      <button onClick={inquire}>寻求理解</button>
    </div>
  );
}

```

</TypeScriptExample>

`useAgent` hook 自动处理连接的生命周期，确保在组件挂载和卸载时正确初始化和清理。您还可以[将 `useAgent` 与 `useState` 结合使用](/agents/api-reference/store-and-sync-state/)，自动在连接到您的 Agent 的所有客户端之间同步状态。

### 处理 WebSocket 事件

在您的 Agent 上定义 `onError` 和 `onClose` 方法以显式处理 WebSocket 客户端错误和关闭事件。记录错误、清理状态和/或发出指标：

<TypeScriptExample>

```ts
import { Agent, Connection } from "agents";

export class ChatAgent extends Agent {
	// onConnect 和 onMessage 方法
	// ...

	// WebSocket 错误和断开连接（关闭）处理。
	async onError(connection: Connection, error: unknown): Promise<void> {
		console.error(`WS 错误: ${error}`);
	}
	async onClose(
		connection: Connection,
		code: number,
		reason: string,
		wasClean: boolean,
	): Promise<void> {
		console.log(`WS 关闭: ${code} - ${reason} - wasClean: ${wasClean}`);
		connection.close();
	}
}
```

</TypeScriptExample>

---

# 调用 LLMs

URL: https://developers.cloudflare.com/agents/concepts/calling-llms/

import { Render } from "~/components";

### 理解 LLM 提供商和模型类型

不同的 LLM 提供商提供针对特定类型任务优化的模型。在构建 AI 系统时，选择正确的模型对性能和成本效率都至关重要。

#### 推理模型

像 OpenAI 的 o1、Anthropic 的 Claude 和 DeepSeek 的 R1 这样的模型特别适合复杂的推理任务。这些模型擅长：

- 将问题分解为步骤
- 遵循复杂指令
- 在长对话中维护上下文
- 生成代码和技术内容

例如，在实施旅行预订系统时，您可能使用推理模型来分析旅行需求并生成适当的预订策略。

#### 指令模型

像 GPT-4 和 Claude Instant 这样的模型经过优化，能够高效地遵循直接的指令。它们在以下方面表现良好：

- 内容生成
- 简单分类任务
- 基本问答
- 文本转换

这些模型对于不需要复杂推理的直接任务通常更具成本效益。

---

# 人机协作

URL: https://developers.cloudflare.com/agents/concepts/human-in-the-loop/

### 什么是人机协作？

人机协作（Human-in-the-Loop，HITL）工作流将人类判断和监督集成到自动化流程中。这些工作流在关键点暂停，进行人工审查、验证或决策，然后再继续。这种方法将自动化的效率与人类专业知识和监督相结合，在最重要的地方发挥作用。

![人机协作图表](~/assets/images/agents/human-in-the-loop.svg)

#### 理解人机协作工作流

在人机协作工作流中，流程不是完全自动化的。相反，它们包含指定的检查点，需要人工干预。例如，在旅行预订系统中，人类可能希望在 Agent 执行交易之前确认旅行安排。工作流管理这种交互，确保：

1. 流程在适当的审查点暂停
2. 人工审查员获得必要的上下文
3. 系统在审查期间维护状态
4. 审查决策得到适当整合
5. 获得批准后流程继续进行

### 人机协作工作流的最佳实践

#### 长期状态持久化

人工审查流程不会按照可预测的时间表运行。审查员可能需要几天或几周的时间来做出决定，特别是对于需要额外调查或多重批准的复杂案例。您的系统需要在此期间保持完美的状态一致性，包括：

- 原始请求和上下文
- 所有中间决策和行动
- 任何部分进度或临时状态
- 审查历史和反馈

:::note[提示]
[Durable Objects](/durable-objects/) 为管理人机协作工作流中的状态提供了理想的解决方案，提供了可以维持几小时、几周或几个月状态的持久计算实例。
:::

#### 通过评估持续改进

人工审查员在评估和改进 LLM 性能方面发挥着关键作用。实施系统化的评估流程，不仅收集人工对最终输出的反馈，还收集对 LLM 决策过程的反馈。这可以包括：

- 决策质量评估：让审查员评估 LLM 的推理过程和决策点，而不仅仅是最终输出。
- 边缘案例识别：利用人类专业知识识别可以改进 LLM 性能的场景。
- 反馈收集：收集结构化反馈，可用于微调 LLM 或调整工作流。[AI Gateway](/ai-gateway/evaluations/add-human-feedback/) 可以是设置 LLM 反馈循环的有用工具。

#### 错误处理和恢复

强大的错误处理对于维护工作流完整性至关重要。您的系统应该优雅地处理各种故障场景，包括审查员不可用、系统中断或冲突审查。实施清晰的升级路径来处理超出正常参数的异常情况。

系统应该在暂停状态期间保持稳定，确保即使在延长的审查期间也不会丢失任何工作。考虑实施自动检查点，允许工作流在任何中断后从最后的稳定状态恢复。

---

# 概念

URL: https://developers.cloudflare.com/agents/concepts/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# Agents

URL: https://developers.cloudflare.com/agents/concepts/what-are-agents/

import { Render } from "~/components";

### 什么是 Agents？

Agent 是一个 AI 系统，能够通过对工具使用和流程控制做出决策来自主执行任务。与遵循预定义路径的传统自动化不同，Agents 可以根据上下文和中间结果动态调整其方法。Agents 也与副驾驶（例如传统聊天应用程序）不同，它们可以完全自动化任务，而不是简单地增强和扩展人类输入。

- **Agents** → 非线性、非确定性（每次运行都可能改变）
- **Workflows** → 线性、确定性执行路径
- **Co-pilots** → 需要人工干预的增强 AI 助手

### 示例：预订假期

如果这是您第一次使用或与 Agents 交互，这个例子将说明 Agent 在预订假期这样的情境中是如何工作的。如果您已经熟悉这个主题，请继续阅读。

想象您正在尝试预订假期。您需要研究航班、寻找酒店、查看餐厅评价并跟踪您的预算。

#### 传统工作流自动化

传统自动化系统遵循预定的序列：

- 接受特定输入（日期、位置、预算）
- 按固定顺序调用预定义的 API 端点
- 基于硬编码标准返回结果
- 当出现意外情况时无法适应

![传统工作流自动化图表](~/assets/images/agents/workflow-automation.svg)

#### AI Co-pilot

Co-pilot 作为智能助手：

- 基于您的偏好提供酒店和行程推荐
- 能够理解和响应自然语言查询
- 提供指导和建议
- 需要人类决策和执行操作

![Co-pilot 图表](~/assets/images/agents/co-pilot.svg)

#### Agent

Agent 结合了 AI 的判断能力和调用相关工具来执行任务的能力。Agent 的输出将是非确定性的，考虑到：

- 实时可用性和价格变化
- 约束条件的动态优先级排序
- 从故障中恢复的能力
- 基于中间结果的自适应决策

![Agent 图表](~/assets/images/agents/agent-workflow.svg)

Agent 可以动态生成行程并执行预订，类似于您对旅行社的期望。

### Agent 系统的三个主要组成部分：

- **决策引擎**: 通常是 LLM（大语言模型），决定行动步骤
- **工具集成**: Agent 可以利用的 API、函数和服务
- **记忆系统**: 维护上下文并跟踪任务进度

#### Agents 如何工作

Agents 在以下连续循环中运行：

1. **观察** 当前状态或任务
2. **规划** 要采取的行动，使用 AI 进行推理
3. **执行** 使用可用工具执行这些行动（通常是 API 或 [MCPs](https://modelcontextprotocol.io/introduction)）
4. **学习** 从结果中学习（将结果存储在记忆中，更新任务进度，并为下一次迭代做准备）

---

# 工具

URL: https://developers.cloudflare.com/agents/concepts/tools/

### 什么是工具？

工具使 AI 系统能够与外部服务交互并执行操作。它们为 Agents 和工作流提供了调用 API、操作数据以及与外部系统集成的结构化方式。工具在 AI 决策能力和现实世界行动之间架起了桥梁。

### 理解工具

在 AI 系统中，工具通常实现为 AI 可以用来完成特定任务的函数调用。例如，旅行预订 Agent 可能有以下工具：

- 搜索航班可用性
- 查询酒店价格
- 处理付款
- 发送确认邮件

每个工具都有一个定义的接口，指定其输入、输出和预期行为。这使得 AI 系统能够理解何时以及如何适当地使用每个工具。

### 常见工具模式

#### API 集成工具

最常见的工具类型是那些包装外部 API 的工具。这些工具处理 API 认证、请求格式化和响应解析的复杂性，为 AI 系统提供一个清洁的接口。

#### Model Context Protocol (MCP)

[Model Context Protocol](https://modelcontextprotocol.io/introduction) 提供了定义和与工具交互的标准化方式。可以将其视为为 LLM 与外部资源交互而设计的 API 抽象层。MCP 为以下方面定义了一致的接口：

- **工具发现**: 系统可以动态发现可用的工具
- **参数验证**: 工具使用 JSON Schema 指定其输入要求
- **错误处理**: 标准化的错误报告和恢复
- **状态管理**: 工具可以在调用之间维护状态

#### 数据处理工具

处理数据转换和分析的工具对许多 AI 工作流都是必需的。这些可能包括：

- CSV 解析和分析
- 图像处理
- 文本提取
- 数据验证

---

# 工作流

URL: https://developers.cloudflare.com/agents/concepts/workflows/

import { Render } from "~/components";

## 什么是工作流？

工作流是协调 Agent 组件如何协同工作的编排层。它定义了任务处理、工具调用和结果管理的结构化路径。虽然 Agents 动态决定要做什么，但工作流提供了管理这些决策如何执行的底层框架。

### 理解 Agent 系统中的工作流

将工作流想象成公司的操作程序。公司（Agent）可以做出各种决策，但这些决策的实施方式遵循既定的流程（工作流）。例如，当您通过旅行社预订航班时，他们可能对推荐哪些航班做出不同的决策，但实际预订航班的过程遵循固定的步骤序列。

让我们检查一个基本的 Agent 工作流：

### 工作流的核心组成部分

工作流通常由几个关键元素组成：

1. **输入处理**
   工作流定义了输入在被 Agent 处理之前如何被接收和验证。这包括标准化格式、检查权限，以及确保所有必需信息都存在。
2. **工具集成**
   工作流管理如何访问外部工具和服务。它们处理认证、速率限制、错误恢复，并确保工具以正确的顺序使用。
3. **状态管理**
   工作流维护正在进行的流程状态，跟踪多个步骤的进度，并确保操作的一致性。
4. **输出处理**
   来自 Agent 行动的结果根据定义的规则进行处理，无论是存储数据、触发通知还是格式化响应。

---

# 指南

URL: https://developers.cloudflare.com/agents/guides/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 构建远程 MCP 服务器

URL: https://developers.cloudflare.com/agents/guides/remote-mcp-server/

import { Details, Render, PackageManagers } from "~/components";

## 部署您的第一个 MCP 服务器

本指南将向您展示如何在 Cloudflare 上部署您自己的远程 MCP 服务器，提供两种选择：

- **无身份验证** — 任何人都可以连接和使用服务器（无需登录）。
- **有[身份验证和授权](/agents/guides/remote-mcp-server/#add-authentication)** — 用户在访问工具之前需要登录，您可以根据用户的权限控制 Agent 可以调用哪些工具。

您可以从部署一个无身份验证的[公共 MCP 服务器](https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-authless)开始，然后稍后添加用户身份验证和范围授权。如果您已经知道您的服务器需要身份验证，可以跳到[下一节](/agents/guides/remote-mcp-server/#add-authentication)。

下面的按钮将引导您完成将此[示例 MCP 服务器](https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-authless)部署到您的 Cloudflare 账户所需的所有操作：

[![部署到 Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-authless)

部署后，此服务器将在您的 workers.dev 子域名上运行（例如 remote-mcp-server-authless.your-account.workers.dev/sse）。您可以立即使用 [AI Playground](https://playground.ai.cloudflare.com/)（远程 MCP 客户端）、[MCP inspector](https://github.com/modelcontextprotocol/inspector) 或[其他 MCP 客户端](/agents/guides/remote-mcp-server/#connect-your-remote-mcp-server-to-claude-and-other-mcp-clients-via-a-local-proxy)连接到它。然后，一旦您准备好了，就可以自定义 MCP 服务器并添加您自己的[工具](/agents/model-context-protocol/tools/)。

如果您使用"部署到 Cloudflare"按钮，将在您的 GitHub 或 GitLab 账户上为您的 MCP 服务器设置一个新的 git 仓库，配置为每次您推送更改或将拉取请求合并到仓库的主分支时自动部署到 Cloudflare。然后您可以克隆此仓库，[进行本地开发](/agents/guides/remote-mcp-server/#local-development)，并开始编写代码和构建。

### 通过 CLI 设置和部署您的 MCP 服务器

或者，您可以使用如下所示的命令行在本地机器上创建新的 MCP 服务器。

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"my-mcp-server --template=cloudflare/ai/demos/remote-mcp-authless"}
/>

现在，您已经设置了 MCP 服务器，依赖项已安装。进入该项目文件夹：

```sh
cd my-mcp-server
```

#### 本地开发

在新项目的目录中，运行以下命令启动开发服务器：

```sh
npm start
```

您的 MCP 服务器现在运行在 `http://localhost:8787/sse`。

在新终端中，运行 [MCP inspector](https://github.com/modelcontextprotocol/inspector)。MCP inspector 是一个交互式 MCP 客户端，允许您连接到 MCP 服务器并从 Web 浏览器调用工具。

```sh
npx @modelcontextprotocol/inspector@latest
```

在 Web 浏览器中打开 MCP inspector：

```sh
open http://localhost:5173
```

在 inspector 中，输入您的 MCP 服务器的 URL `http://localhost:8787/sse`，然后点击 **Connect**。您应该看到"List Tools"按钮，它将列出您的 MCP 服务器公开的工具。

![MCP inspector — 已认证](~/assets/images/agents/mcp-inspector-authenticated.png)

#### 部署您的 MCP 服务器

您可以在示例项目中使用以下 [Wrangler CLI 命令](/workers/wrangler)将 MCP 服务器部署到 Cloudflare：

```sh
npx wrangler@latest deploy
```

如果您已经[将 git 仓库连接](/workers/ci-cd/builds/)到带有 MCP 服务器的 Worker，您可以通过推送更改或将拉取请求合并到仓库的主分支来部署您的 MCP 服务器。

部署后，获取您已部署的 MCP 服务器的 URL，并在运行在 `http://localhost:5173` 上的 MCP inspector 中输入它。您现在有了一个部署到 Cloudflare 的远程 MCP 服务器，MCP 客户端可以连接到它。

### 通过本地代理将您的远程 MCP 服务器连接到 Claude 和其他 MCP 客户端

现在您的 MCP 服务器正在运行，您可以使用 [`mcp-remote` 本地代理](https://www.npmjs.com/package/mcp-remote)将 Claude Desktop 或其他 MCP 客户端连接到它——即使这些工具还不是*远程* MCP 客户端，并且在客户端不支持远程传输或授权。这让您可以测试与真正的 MCP 客户端交互时您的 MCP 服务器的交互体验。

更新您的 Claude Desktop 配置以指向您的 MCP 服务器的 URL。您可以使用 `localhost:8787/sse` URL 或您已部署的 MCP 服务器的 URL：

```json
{
	"mcpServers": {
		"math": {
			"command": "npx",
			"args": [
				"mcp-remote",
				"https://your-worker-name.your-account.workers.dev/sse"
			]
		}
	}
}
```

更新配置文件后重启 Claude Desktop 以加载 MCP 服务器。完成后，Claude 将能够调用您的远程 MCP 服务器。您可以通过让 Claude 使用您的工具之一来测试这一点。例如："您能使用数学工具将 23 和 19 相加吗？"。Claude 应该调用该工具并显示 MCP 服务器生成的结果。

在[此节](/agents/guides/test-remote-mcp-server)中了解更多关于将远程 MCP 服务器与 MCP 客户端一起使用的其他方式。

## 添加身份验证

现在您已经部署了公共 MCP 服务器，让我们看看如何使用 OAuth 启用用户身份验证。

您之前部署的公共服务器示例允许任何客户端连接并调用工具而无需登录。要添加身份验证，您将更新您的 MCP 服务器以充当 OAuth 提供商，处理安全登录流程并颁发 MCP 客户端可以用来进行经过身份验证的工具调用的访问令牌。

如果用户已经需要登录才能使用您的服务，这特别有用。启用身份验证后，用户可以使用其现有账户登录，并授权其 AI Agent 使用范围权限与您的 MCP 服务器公开的工具进行交互。

在此示例中，我们使用 GitHub 作为 OAuth 提供商，但您可以将您的 MCP 服务器与任何支持 OAuth 2.0 规范的 [OAuth 提供商](/agents/model-context-protocol/authorization/#2-third-party-oauth-provider)连接，包括 Google、Slack、[Stytch](/agents/model-context-protocol/authorization/#stytch)、[Auth0](/agents/model-context-protocol/authorization/#stytch)、[WorkOS](/agents/model-context-protocol/authorization/#stytch) 等。

### 步骤 1 — 创建和部署新的 MCP 服务器

运行以下命令创建新的 MCP 服务器：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={
		"my-mcp-server-github-auth --template=cloudflare/ai/demos/remote-mcp-github-oauth"
	}
/>

现在，您已经设置了 MCP 服务器，依赖项已安装。进入该项目文件夹：

```sh
cd my-mcp-server-github-auth
```

然后，运行以下命令部署 MCP 服务器：

```sh
npx wrangler@latest deploy
```

您会注意到，在示例 MCP 服务器中，如果您打开 `src/index.ts`，主要区别是 `defaultHandler` 设置为 `GitHubHandler`：

```ts ins="OAuthProvider.GitHubHandler"
import GitHubHandler from "./github-handler";

export default new OAuthProvider({
	apiRoute: "/sse",
	apiHandler: MyMCP.Router,
	defaultHandler: GitHubHandler,
	authorizeEndpoint: "/authorize",
	tokenEndpoint: "/token",
	clientRegistrationEndpoint: "/register",
});
```

这将确保您的用户被重定向到 GitHub 进行身份验证。但是要使其工作，您需要在以下步骤中创建 OAuth 客户端应用程序。

### 步骤 2 — 创建 OAuth 应用程序

您需要创建两个 [GitHub OAuth 应用程序](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/creating-an-oauth-app)以使用 GitHub 作为 MCP 服务器的身份验证提供商——一个用于本地开发，一个用于生产。

#### 首先为本地开发创建新的 OAuth 应用程序

导航到 [github.com/settings/developers](https://github.com/settings/developers) 创建具有以下设置的新 OAuth 应用程序：

- **Application name**: `My MCP Server (local)`
- **Homepage URL**: `http://localhost:8787`
- **Authorization callback URL**: `http://localhost:8787/callback`

对于您刚创建的 OAuth 应用程序，将 OAuth 应用程序的客户端 ID 添加为 `GITHUB_CLIENT_ID`，生成客户端密钥，将其添加为 `GITHUB_CLIENT_SECRET` 到项目根目录的 `.dev.vars` 文件中，该文件[将用于在本地开发中设置密钥](/workers/configuration/secrets/)。

```sh
touch .dev.vars
echo 'GITHUB_CLIENT_ID="your-client-id"' >> .dev.vars
echo 'GITHUB_CLIENT_SECRET="your-client-secret"' >> .dev.vars
cat .dev.vars
```

#### 接下来，在本地运行您的 MCP 服务器

运行以下命令启动开发服务器：

```sh
npm start
```

您的 MCP 服务器现在运行在 `http://localhost:8787/sse`。

在新终端中，运行 [MCP inspector](https://github.com/modelcontextprotocol/inspector)。MCP inspector 是一个交互式 MCP 客户端，允许您连接到 MCP 服务器并从 Web 浏览器调用工具。

```sh
npx @modelcontextprotocol/inspector@latest
```

在 Web 浏览器中打开 MCP inspector：

```sh
open http://localhost:5173
```

在 inspector 中，输入您的 MCP 服务器的 URL `http://localhost:8787/sse`，然后点击 **Connect**：

You should be redirected to a GitHub login or authorization page. After authorizing the MCP Client (the inspector) access to your GitHub account, you will be redirected back to the inspector. You should see the "List Tools" button, which will list the tools that your MCP server exposes.

#### Second — create a new OAuth App for production

You'll need to repeat these steps to create a new OAuth App for production.

Navigate to [github.com/settings/developers](https://github.com/settings/developers) to create a new OAuth App with the following settings:

- **Application name**: `My MCP Server (production)`
- **Homepage URL**: Enter the workers.dev URL of your deployed MCP server (ex: `worker-name.account-name.workers.dev`)
- **Authorization callback URL**: Enter the `/callback` path of the workers.dev URL of your deployed MCP server (ex: `worker-name.account-name.workers.dev/callback`)

For the OAuth app you just created, add the client ID and client secret, using Wrangler CLI:

```sh
wrangler secret put GITHUB_CLIENT_ID
```

```sh
wrangler secret put GITHUB_CLIENT_SECRET
```

#### Finally, connect to your MCP server

Now that you've added the ID and secret of your production OAuth app, you should now be able to connect to your MCP server running at `worker-name.account-name.workers.dev/sse` using the [AI Playground](https://playground.ai.cloudflare.com/), MCP inspector or ([other MCP clients](/agents/guides/remote-mcp-server/#connect-your-mcp-server-to-claude-and-other-mcp-clients)), and authenticate with GitHub.

## Next steps

- Add [tools](/agents/model-context-protocol/tools/) to your MCP server.
- Customize your MCP Server's [authentication and authorization](/agents/model-context-protocol/authorization/).

---

# 测试远程 MCP 服务器

URL: https://developers.cloudflare.com/agents/guides/test-remote-mcp-server/

import { Render } from "~/components";

远程授权连接是 [Model Context Protocol (MCP) 规范](https://spec.modelcontextprotocol.io/specification/draft/basic/authorization/)不断发展的一部分。并非所有 MCP 客户端都支持远程连接。

本指南将向您展示如何开始使用支持远程连接的 MCP 客户端使用您的远程 MCP 服务器的选项。如果您尚未创建和部署远程 MCP 服务器，您应该首先遵循[构建远程 MCP 服务器](/agents/guides/remote-mcp-server/)指南。

## Model Context Protocol (MCP) 检查器

[`@modelcontextprotocol/inspector` 包](https://github.com/modelcontextprotocol/inspector)是 MCP 服务器的可视化测试工具。

您可以通过运行以下命令在本地运行它：

```bash
npx @modelcontextprotocol/inspector
```

然后，输入您的远程 MCP 服务器的 URL。您可以使用在本地机器上运行在 localhost 的 MCP 服务器，或者您可以使用在 Cloudflare 上运行的远程 MCP 服务器。

![MCP 检查器](~/assets/images/agents/mcp-inspector-enter-url.png)

一旦您通过身份验证，您将被重定向回检查器。您应该看到"List Tools"按钮，它将列出您的 MCP 服务器公开的工具。

![MCP 检查器 — 已认证](~/assets/images/agents/mcp-inspector-authenticated.png)

## 通过本地代理将您的远程 MCP 服务器连接到 Claude Desktop

即使 [Claude Desktop](https://claude.ai/download) 尚未支持远程 MCP 客户端，您也可以使用 [`mcp-remote` 本地代理](https://www.npmjs.com/package/mcp-remote)将其连接到您的远程 MCP 服务器。这让您可以测试与真实世界 MCP 客户端的远程 MCP 服务器交互体验。

1. 打开 Claude Desktop 并导航到 Settings -> Developer -> Edit Config。这会打开控制 Claude 可以访问哪些 MCP 服务器的配置文件。
2. 将内容替换为这样的配置：

```json
{
	"mcpServers": {
		"math": {
			"command": "npx",
			"args": ["mcp-remote", "http://my-mcp-server.my-account.workers.dev/sse"]
		}
	}
}
```

这告诉 Claude 与运行在 `http://localhost:8787/sse` 的 MCP 服务器通信。

3. 保存文件并重启 Claude Desktop（command/ctrl + R）。当 Claude 重启时，浏览器窗口将打开显示您的 OAuth 登录页面。完成授权流程以授予 Claude 访问您的 MCP 服务器的权限。

一旦通过身份验证，您将能够通过点击 Claude 界面右下角的工具图标来查看您的工具。

## 将您的远程 MCP 服务器连接到 Cursor

要将 [Cursor](https://www.cursor.com/) 与您的远程 MCP 服务器连接，选择 `Type`: "Command"，在 `Command` 字段中，将 command 和 args 字段合并为一个（例如 `npx mcp-remote https://your-worker-name.your-account.workers.dev/sse`）。

## 将您的远程 MCP 服务器连接到 Windsurf

您可以通过编辑 [`mcp_config.json` 文件](https://docs.codeium.com/windsurf/mcp)并添加以下配置，将您的远程 MCP 服务器连接到 [Windsurf](https://codeium.com/windsurf)：

```json
{
	"mcpServers": {
		"math": {
			"command": "npx",
			"args": ["mcp-remote", "http://my-mcp-server.my-account.workers.dev/sse"]
		}
	}
}
```

---

# 快速开始

URL: https://developers.cloudflare.com/agents/getting-started/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 测试您的 Agents

URL: https://developers.cloudflare.com/agents/getting-started/testing-your-agent/

import { Render, PackageManagers, WranglerConfig } from "~/components";

因为 Agents 运行在 Cloudflare Workers 和 Durable Objects 上，所以可以使用与 Workers 和 Durable Objects 相同的工具和技术来测试它们。

## 编写和运行测试

### 设置

:::note

`agents-starter` 模板和新的 Cloudflare Workers 项目已经包含了相关的 `vitest` 和 `@cloudflare/vitest-pool-workers` 包，以及有效的 `vitest.config.js` 文件。

:::

在编写第一个测试之前，安装必要的包：

```sh
npm install vitest@~3.0.0 --save-dev --save-exact
npm install @cloudflare/vitest-pool-workers --save-dev
```

确保您的 `vitest.config.js` 文件与以下内容相同：

```js
import { defineWorkersConfig } from "@cloudflare/vitest-pool-workers/config";

export default defineWorkersConfig({
	test: {
		poolOptions: {
			workers: {
				wrangler: { configPath: "./wrangler.toml" },
			},
		},
	},
});
```

### 添加 Agent 配置

在 `vitest.config.js` 中添加 `durableObjects` 配置，包含您的 Agent 类的名称：

```js
import { defineWorkersConfig } from "@cloudflare/vitest-pool-workers/config";

export default defineWorkersConfig({
	test: {
		poolOptions: {
			workers: {
				main: "./src/index.ts",
				miniflare: {
					durableObjects: {
						NAME: "MyAgent",
					},
				},
			},
		},
	},
});
```

### 编写测试

:::note

查看 [Vitest 文档](https://vitest.dev/) 了解更多关于测试的信息，包括测试 API 参考和高级测试技术。

:::

测试使用 `vitest` 框架。您的 Agent 的基本测试套件可以验证您的 Agent 如何响应请求，但也可以对您的 Agent 的方法和状态进行单元测试。

```ts
import {
	env,
	createExecutionContext,
	waitOnExecutionContext,
	SELF,
} from "cloudflare:test";
import { describe, it, expect } from "vitest";
import worker from "../src";
import { Env } from "../src";

interface ProvidedEnv extends Env {}

describe("向我的 Agent 发出请求", () => {
	// 单元测试方法
	it("响应状态", async () => {
		// 提供一个有效的 URL，您的 Worker 可以使用它来路由到您的 Agent
		// 如果您使用 routeAgentRequest，这将是 /agent/:agent/:name
		const request = new Request<unknown, IncomingRequestCfProperties>(
			"http://example.com/agent/my-agent/agent-123",
		);
		const ctx = createExecutionContext();
		const response = await worker.fetch(request, env, ctx);
		await waitOnExecutionContext(ctx);
		expect(await response.text()).toMatchObject({ hello: "from your agent" });
	});

	it("也响应状态", async () => {
		const request = new Request("http://example.com/agent/my-agent/agent-123");
		const response = await SELF.fetch(request);
		expect(await response.text()).toMatchObject({ hello: "from your agent" });
	});
});
```

### 运行测试

运行测试使用 `vitest` CLI：

```sh
$ npm run test
# 或直接运行 vitest
$ npx vitest
```

```sh output
  MyAgent
    ✓ 应该返回问候语 (1 ms)

Test Files  1 passed (1)
```

查看[测试文档](/workers/testing/vitest-integration/write-your-first-test/) 了解更多示例和测试配置。

## 本地运行 Agents

您也可以使用 `wrangler` CLI 在本地运行 Agent：

```sh
$ npx wrangler dev
```

```sh output
Your Worker and resources are simulated locally via Miniflare. For more information, see: https://developers.cloudflare.com/workers/testing/local-development.

Your worker has access to the following bindings:
- Durable Objects:
  - MyAgent: MyAgent
  Starting local server...
[wrangler:inf] Ready on http://localhost:53645
```

这会启动一个本地开发服务器，运行与 Cloudflare Workers 相同的运行时，让您可以迭代 Agent 的代码并在不部署的情况下本地测试。

访问 [`wrangler dev`](https://developers.cloudflare.com/workers/wrangler/commands/#dev) 文档以查看 CLI 标志和配置选项。

---

# McpAgent — API 参考

URL: https://developers.cloudflare.com/agents/model-context-protocol/mcp-agent-api/

import { Render, TypeScriptExample } from "~/components";

当您在 Cloudflare 上构建 MCP 服务器时，您需要扩展来自 Agents SDK 的 [`McpAgent` 类](https://github.com/cloudflare/agents/blob/5881c5d23a7f4580600029f69307cfc94743e6b8/packages/agents/src/mcp.ts)，如下所示：

<TypeScriptExample>

```ts title="src/index.ts"
import { McpAgent } from "agents/mcp";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

export class MyMCP extends McpAgent {
	server = new McpServer({ name: "Demo", version: "1.0.0" });

	async init() {
		this.server.tool(
			"add",
			{ a: z.number(), b: z.number() },
			async ({ a, b }) => ({
				content: [{ type: "text", text: String(a + b) }],
			}),
		);
	}
}
```

</TypeScriptExample>

这意味着您的 MCP 服务器的每个实例都有自己的持久状态，由 [Durable Object](/durable-objects/) 支持，拥有自己的 [SQL 数据库](/agents/api-reference/store-and-sync-state)。

您的 MCP 服务器不一定必须是一个 Agent。您可以构建无状态的 MCP 服务器，只需使用 `@modelcontextprotocol/typescript-sdk` 包为您的 MCP 服务器添加[工具](/agents/model-context-protocol/tools)。

但是，如果您希望您的 MCP 服务器能够：

- 记住以前的工具调用和它提供的响应
- 向 MCP 客户端提供游戏，记住游戏板状态、以前的移动和分数
- 缓存以前外部 API 调用的状态，以便后续工具调用可以重复使用它
- 执行 Agent 可以做的任何事情，但允许 MCP 客户端与其通信

您可以使用下面的 API 来实现这些功能。

#### 休眠支持

`McpAgent` 实例自动支持 [WebSockets 休眠](/durable-objects/best-practices/websockets/#websocket-hibernation-api)，允许有状态的 MCP 服务器在非活动期间休眠，同时保留其状态。这意味着您的 agents 只在主动处理请求时消耗计算资源，在保持完整上下文和对话历史的同时优化成本。

休眠功能默认启用，无需额外配置。

#### 身份验证和授权

McpAgent 类提供与 [OAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider) 的无缝集成，用于[身份验证和授权](/agents/model-context-protocol/authorization/)。

当用户向您的 MCP 服务器进行身份验证时，他们的身份信息和令牌通过 `props` 参数提供，允许您：

- 访问特定用户的数据
- 在执行操作前检查用户权限
- 根据用户属性自定义响应
- 使用身份验证令牌代表用户向外部服务发出请求

### 状态同步 API

`McpAgent` 类提供了来自 [Agents SDK](/agents/api-reference/agents-api/) 的以下方法子集：

- [`state`](/agents/api-reference/store-and-sync-state/)
- [`initialState`](/agents/api-reference/store-and-sync-state/#set-the-initial-state-for-an-agent)
- [`setState`](/agents/api-reference/store-and-sync-state/)
- [`onStateUpdate`](/agents/api-reference/store-and-sync-state/#synchronizing-state)
- [`sql`](/agents/api-reference/agents-api/#sql-api)

:::note[会话结束后状态重置]
目前，每个客户端会话都由 `McpAgent` 类的一个实例支持。这对您是自动处理的，如[入门指南](/agents/guides/remote-mcp-server)所示。这意味着当同一客户端重新连接时，它们将开始一个新会话，状态将被重置。
:::

例如，以下代码实现了一个记住计数器值的 MCP 服务器，并在调用 `add` 工具时更新计数器：

<TypeScriptExample>

```ts title="src/index.ts"
import { McpAgent } from "agents/mcp";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

type State = { counter: number };

export class MyMCP extends McpAgent<Env, State, {}> {
	server = new McpServer({
		name: "Demo",
		version: "1.0.0",
	});

	initialState: State = {
		counter: 1,
	};

	async init() {
		this.server.resource(`counter`, `mcp://resource/counter`, (uri) => {
			return {
				contents: [{ uri: uri.href, text: String(this.state.counter) }],
			};
		});

		this.server.tool(
			"add",
			"Add to the counter, stored in the MCP",
			{ a: z.number() },
			async ({ a }) => {
				this.setState({ ...this.state, counter: this.state.counter + a });

				return {
					content: [
						{
							type: "text",
							text: String(`Added ${a}, total is now ${this.state.counter}`),
						},
					],
				};
			},
		);
	}

	onStateUpdate(state: State) {
		console.log({ stateUpdate: state });
	}
}
```

</TypeScriptExample>

### 尚不支持的 API

以下来自 Agents SDK 的 API 在 `McpAgent` 上尚不可用：

- [WebSocket API](/agents/api-reference/websockets/)（`onMessage`、`onError`、`onClose`、`onConnect`）
- [调度 API](/agents/api-reference/schedule-tasks/) `this.schedule`

---

# 授权

URL: https://developers.cloudflare.com/agents/model-context-protocol/authorization/

import { DirectoryListing } from "~/components";

在构建 [Model Context Protocol (MCP)](https://modelcontextprotocol.io) 服务器时，您需要一种允许用户登录（身份验证）的方式，以及允许他们授予 MCP 客户端访问其账户资源权限（授权）的方式。

<diagram>

</diagram>

Model Context Protocol 使用 [OAuth 2.1 的子集进行授权](https://spec.modelcontextprotocol.io/specification/draft/basic/authorization/)。OAuth 允许您的用户授予对资源的有限访问权限，而无需共享 API 密钥或其他凭据。

Cloudflare 提供了一个 [OAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider)，它实现了 OAuth 2.1 协议的提供方，让您可以轻松地为 MCP 服务器添加授权功能。

您可以通过三种方式使用 OAuth Provider Library：

1. **您的 Worker 自己处理授权。** 您的 MCP 服务器在 Cloudflare 上运行，处理完整的 OAuth 流程。（[示例](/agents/guides/remote-mcp-server/)）
2. **直接与第三方 OAuth 提供方集成**，如 GitHub 或 Google。
3. **与您自己的 OAuth 提供方集成**，包括您可能已经依赖的授权即服务提供商，如 Stytch、Auth0 或 WorkOS。

以下部分描述了这些选项中的每一个，并链接到每个的可运行代码示例。

## 授权选项

### (1) 您的 MCP 服务器自己处理授权和身份验证

您的 MCP 服务器使用 [OAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider)，可以处理完整的 OAuth 授权流程，无需任何第三方参与。

[Workers OAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider) 是一个 Cloudflare Worker，它实现了一个 [`fetch()` 处理程序](/workers/runtime-apis/handlers/fetch/)，并处理传入的 MCP 服务器请求。

您为您的 MCP 服务器的 API、身份验证和授权逻辑以及 OAuth 端点的 URI 路径提供您自己的处理程序，如下所示：

```ts
export default new OAuthProvider({
	apiRoute: "/mcp",
	// 您的 MCP 服务器：
	apiHandler: MyMCPServer.Router,
	// 您的身份验证和授权处理程序：
	defaultHandler: MyAuthHandler,
	authorizeEndpoint: "/authorize",
	tokenEndpoint: "/token",
	clientRegistrationEndpoint: "/register",
});
```

参考[入门示例](/agents/guides/remote-mcp-server/)以获取 `OAuthProvider` 的完整使用示例，包含模拟身份验证流程。

在这种情况下，授权流程的工作方式如下：

```mermaid
sequenceDiagram
    participant B as User-Agent (Browser)
    participant C as MCP Client
    participant M as MCP Server (your Worker)

    C->>M: MCP Request
    M->>C: HTTP 401 Unauthorized
    Note over C: Generate code_verifier and code_challenge
    C->>B: Open browser with authorization URL + code_challenge
    B->>M: GET /authorize
    Note over M: User logs in and authorizes
    M->>B: Redirect to callback URL with auth code
    B->>C: Callback with authorization code
    C->>M: Token Request with code + code_verifier
    M->>C: Access Token (+ Refresh Token)
    C->>M: MCP Request with Access Token
    Note over C,M: Begin standard MCP message exchange
```

请记住 — [身份验证与授权是不同的](https://www.cloudflare.com/learning/access-management/authn-vs-authz/)。您的 MCP 服务器可以自己处理授权，同时仍然依赖外部身份验证服务来首先验证用户身份。[入门指南中的示例](/agents/guides/remote-mcp-server)提供了一个模拟身份验证流程。您需要实现自己的身份验证处理程序 — 要么自己处理身份验证，要么使用外部身份验证服务。

### (2) 第三方 OAuth 提供方

The [OAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider) can be configured to use a third-party OAuth provider, such as GitHub or Google. You can see a complete example of this in the [GitHub example](/agents/guides/remote-mcp-server/#add-authentication).

When you use a third-party OAuth provider, you must provide a handler to the `OAuthProvider` that implements the OAuth flow for the third-party provider.

```ts ins="defaultHandler: MyAuthHandler,"
import MyAuthHandler from "./auth-handler";

export default new OAuthProvider({
	apiRoute: "/mcp",
	// Your MCP server:
	apiHandler: MyMCPServer.Router,
	// Replace this handler with your own handler for authentication and authorization with the third-party provider:
	defaultHandler: MyAuthHandler,
	authorizeEndpoint: "/authorize",
	tokenEndpoint: "/token",
	clientRegistrationEndpoint: "/register",
});
```

Note that as [defined in the Model Context Protocol specification](https://spec.modelcontextprotocol.io/specification/draft/basic/authorization/#292-flow-description) when you use a third-party OAuth provider, the MCP Server (your Worker) generates and issues its own token to the MCP client:

```mermaid
sequenceDiagram
    participant B as User-Agent (Browser)
    participant C as MCP Client
    participant M as MCP Server (your Worker)
    participant T as Third-Party Auth Server

    C->>M: Initial OAuth Request
    M->>B: Redirect to Third-Party /authorize
    B->>T: Authorization Request
    Note over T: User authorizes
    T->>B: Redirect to MCP Server callback
    B->>M: Authorization code
    M->>T: Exchange code for token
    T->>M: Third-party access token
    Note over M: Generate bound MCP token
    M->>B: Redirect to MCP Client callback
    B->>C: MCP authorization code
    C->>M: Exchange code for token
    M->>C: MCP access token
```

Read the docs for the [Workers oAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider) for more details.

### (3) Bring your own OAuth Provider

If your application already implements an OAuth Provider itself, or you use [Stytch](https://stytch.com/), [Auth0](https://auth0.com/), [WorkOS](https://workos.com/), or authorization-as-a-service provider, you can use this in the same way that you would use a third-party OAuth provider, described above in (2).

You can use the auth provider to:

- Allow users to authenticate to your MCP server through email, social logins, SSO (single sign-on), and MFA (multi-factor authentication).
- Define scopes and permissions that directly map to your MCP tools.
- Present users with a consent page corresponding with the requested permissions.
- Enforce the permissions so that agents can only invoke permitted tools.

#### Stytch

Get started with a [remote MCP server that uses Stytch](https://stytch.com/docs/guides/connected-apps/mcp-servers) to allow users to sign in with email, Google login or enterprise SSO and authorize their AI agent to view and manage their company's OKRs on their behalf. Stytch will handle restricting the scopes granted to the AI agent based on the user's role and permissions within their organization. When authorizing the MCP Client, each user will see a consent page that outlines the permissions that the agent is requesting that they are able to grant based on their role.

[![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/mcp-stytch-b2b-okr-manager)

For more consumer use cases, deploy a remote MCP server for a To Do app that uses Stytch for authentication and MCP client authorization. Users can sign in with email and immediately access the To Do lists associated with their account, and grant access to any AI assistant to help them manage their tasks.

[![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/mcp-stytch-consumer-todo-list)

#### Auth0

Get started with a remote MCP server that uses Auth0 to authenticate users through email, social logins, or enterprise SSO to interact with their todos and personal data through AI agents. The MCP server securely connects to API endpoints on behalf of users, showing exactly which resources the agent will be able to access once it gets consent from the user. In this implementation, access tokens are automatically refreshed during long running interactions.

To set it up, first deploy the protected API endpoint:

[![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-auth0/todos-api)

Then, deploy the MCP server that handles authentication through Auth0 and securely connects AI agents to your API endpoint.

[![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-auth0/mcp-auth0-oidc)

#### WorkOS

Get started with a remote MCP server that uses WorkOS's AuthKit to authenticate users and manage the permissions granted to AI agents. In this example, the MCP server dynamically exposes tools based on the user's role and access rights. All authenticated users get access to the `add` tool, but only users who have been assigned the `image_generation` permission in WorkOS can grant the AI agent access to the image generation tool. This showcases how MCP servers can conditionally expose capabilities to AI agents based on the authenticated user's role and permission.

[![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-authkit)

## Using Authentication Context in Your MCP Server

When a user authenticates to your MCP server through Cloudflare's OAuth Provider, their identity information and tokens are made available through the `props` parameter.

```js
export class MyMCP extends McpAgent<Env, unknown, AuthContext> {
  async init() {
    this.server.tool("userInfo", "Get user information", {}, async () => ({
      content: [{ type: "text", text: `Hello, ${this.props.claims.name || "user"}!` }],
    }));
  }
}
```

The authentication context can be used for:

- Accessing user-specific data by using the user ID (this.props.claims.sub) as a key
- Checking user permissions before performing operations
- Customizing responses based on user preferences or attributes
- Using authentication tokens to make requests to external services on behalf of the user
- Ensuring consistency when users interact with your application through different interfaces (dashboard, API, MCP server)

## Implementing Permission-Based Access for MCP Tools

You can implement fine-grained authorization controls for your MCP tools based on user permissions. This allows you to restrict access to certain tools based on the user's role or specific permissions.

```js
// Create a wrapper function to check permissions
function requirePermission(permission, handler) {
  return async (request, context) => {
    // Check if user has the required permission
    const userPermissions = context.props.permissions || [];
    if (!userPermissions.includes(permission)) {
      return {
        content: [{ type: "text", text: `Permission denied: requires ${permission}` }],
        status: 403
      };
    }

    // If permission check passes, execute the handler
    return handler(request, context);
  };
}

// Use the wrapper with your MCP tools
async init() {
  // Basic tools available to all authenticated users
  this.server.tool("basicTool", "Available to all users", {}, async () => {
    // Implementation for all users
  });

  // Protected tool using the permission wrapper
  this.server.tool(
    "adminAction",
    "Administrative action requiring special permission",
    { /* parameters */ },
    requirePermission("admin", async (req) => {
      // Only executes if user has "admin" permission
      return {
        content: [{ type: "text", text: "Admin action completed" }]
      };
    })
  );

  // Conditionally register tools based on user permissions
  if (this.props.permissions?.includes("special_feature")) {
    this.server.tool("specialTool", "Special feature", {}, async () => {
      // This tool only appears for users with the special_feature permission
    });
  }
}
```

Benefits:

- Authorization check at the tool level ensures proper access control
- Allows you to define permission checks once and reuse them across tools
- Provides clear feedback to users when permission is denied
- Can choose to only present tools that the agent is able to call

## Next steps

- [Learn how to use the Workers OAuth Provider Library](https://github.com/cloudflare/workers-oauth-provider)
- Learn how to use a third-party OAuth provider, using the [GitHub](/agents/guides/remote-mcp-server/#add-authentication) example MCP server.

---

# Model Context Protocol (MCP)

URL: https://developers.cloudflare.com/agents/model-context-protocol/

您可以在 Cloudflare 上构建和部署 [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) 服务器。

## 什么是 Model Context Protocol (MCP)？

[Model Context Protocol (MCP)](https://modelcontextprotocol.io) 是一个开放标准，用于连接 AI 系统与外部应用程序。可以将 MCP 想象成 AI 应用程序的 USB-C 接口。正如 USB-C 提供了一种标准化的方式来连接您的设备与各种配件一样，MCP 提供了一种标准化的方式来连接 AI Agents 与不同的服务。

### MCP 术语

- **MCP Hosts**: AI 助手（如 [Claude](http://claude.ai) 或 [Cursor](http://cursor.com)）、AI Agents 或需要访问外部功能的应用程序。
- **MCP Clients**: 嵌入在 MCP Hosts 中的客户端，连接到 MCP 服务器并调用工具。每个 MCP 客户端实例与单个 MCP 服务器建立连接。
- **MCP Servers**: 暴露[工具](/agents/model-context-protocol/tools/)、[提示词](https://modelcontextprotocol.io/docs/concepts/prompts)和[资源](https://modelcontextprotocol.io/docs/concepts/resources)供 MCP 客户端使用的应用程序。

### 远程 vs. 本地 MCP 连接

MCP 标准支持两种操作模式：

- **远程 MCP 连接**: MCP 客户端通过互联网连接到 MCP 服务器，[使用 HTTP 和 Server-Sent Events (SSE) 建立长连接](/agents/model-context-protocol/transport/)，并使用 [OAuth](/agents/model-context-protocol/authorization/) 授权 MCP 客户端访问用户账户上的资源。
- **本地 MCP 连接**: MCP 客户端连接到同一台机器上的 MCP 服务器，使用 [stdio](https://spec.modelcontextprotocol.io/specification/draft/basic/transports/#stdio) 作为本地传输方法。

### 最佳实践

- **工具设计**: 不要将您的 MCP 服务器视为完整 API 架构的包装器。相反，构建针对特定用户目标和可靠结果优化的工具。更少但设计良好的工具往往比许多细粒度的工具表现更好，特别是对于上下文窗口较小或延迟预算紧张的 Agents。
- **范围权限**: 部署几个专注的 MCP 服务器，每个都具有严格范围的权限，可以减少过度特权访问的风险，并使管理和审计每个服务器允许做什么变得更容易。
- **工具描述**: 详细的参数描述帮助 Agents 理解如何正确使用您的工具——包括期望的值、它们如何影响行为以及任何重要约束。这减少了错误并提高了可靠性。
- **评估测试**: 使用评估测试（'evals'）来衡量 Agent 正确使用您工具的能力。在对服务器或工具描述进行任何更新后运行这些测试，以便及早发现回归并跟踪改进情况。

### 开始使用

前往[快速开始](/agents/guides/remote-mcp-server/) 指南，了解如何构建和部署您的第一个远程 MCP 服务器到 Cloudflare。

---

# 工具

URL: https://developers.cloudflare.com/agents/model-context-protocol/tools/

import { Render, TypeScriptExample } from "~/components";

Model Context Protocol (MCP) 工具是 [MCP 服务器](/agents/model-context-protocol)提供且 MCP 客户端可以调用的函数。

当您使用 `@cloudflare/model-context-protocol` 包构建 MCP 服务器时，您可以[按照 `@modelcontextprotocol/typescript-sdk` 包示例中显示的相同方式](https://github.com/modelcontextprotocol/typescript-sdk?tab=readme-ov-file#tools)定义工具。

例如，来自[此示例 MCP 服务器](https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-server)的以下代码定义了一个将两个数字相加的简单 MCP 服务器：

<TypeScriptExample>
```ts title="src/index.ts"
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp";
import { McpAgent } from "agents/mcp";

export class MyMCP extends McpAgent {
	server = new McpServer({ name: "Demo", version: "1.0.0" });
	async init() {
		this.server.tool(
			"add",
			{ a: z.number(), b: z.number() },
			async ({ a, b }) => ({
				content: [{ type: "text", text: String(a + b) }],
			}),
		);
	}
}
```
</TypeScriptExample>

---

# Cloudflare 自有的 MCP 服务器

URL: https://developers.cloudflare.com/agents/model-context-protocol/mcp-servers-for-cloudflare/

import { Render } from "~/components";

Cloudflare 运行着一系列托管的远程 MCP 服务器目录，您可以在 [Claude](https://modelcontextprotocol.io/quickstart/user)、[Windsurf](https://docs.windsurf.com/windsurf/cascade/mcp)、我们自己的 [AI Playground](https://playground.ai.cloudflare.com/) 或任何[支持 MCP 的 SDK](https://github.com/cloudflare/agents/tree/main/packages/agents/src/mcp) 等客户端上使用 OAuth 进行连接。

这些 MCP 服务器允许您的 MCP 客户端从您的账户读取配置、处理信息、基于数据提出建议，甚至为您做出这些建议的更改。所有这些操作都可以跨 Cloudflare 的众多服务进行，包括应用程序开发、安全和性能。

| 服务器名称                                                                                                           | 描述                                                     | 服务器 URL                                     |
| -------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | ---------------------------------------------- |
| [文档服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/docs-vectorize)                      | 获取 Cloudflare 的最新参考信息                           | `https://docs.mcp.cloudflare.com/sse`          |
| [Workers Bindings 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/workers-bindings)       | 使用存储、AI 和计算原语构建 Workers 应用程序             | `https://bindings.mcp.cloudflare.com/sse`      |
| [Workers Builds 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/workers-builds)           | 获取洞察并管理您的 Cloudflare Workers Builds             | `https://builds.mcp.cloudflare.com/sse`        |
| [可观测性服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/workers-observability)           | 调试并深入了解您应用程序的日志和分析                     | `https://observability.mcp.cloudflare.com/sse` |
| [Radar 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/radar)                             | 获取全球互联网流量洞察、趋势、URL 扫描和其他实用工具     | `https://radar.mcp.cloudflare.com/sse`         |
| [容器服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/sandbox-container)                   | 启动沙盒开发环境                                         | `https://containers.mcp.cloudflare.com/sse`    |
| [浏览器渲染服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/browser-rendering)             | 获取网页，将其转换为 markdown 并截图                     | `https://browser.mcp.cloudflare.com/sse`       |
| [Logpush 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/logpush)                         | 获取 Logpush 作业健康状况的快速摘要                      | `https://logs.mcp.cloudflare.com/sse`          |
| [AI Gateway 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/ai-gateway)                   | 搜索您的日志，获取有关提示和响应的详细信息               | `https://ai-gateway.mcp.cloudflare.com/sse`    |
| [AutoRAG 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/autorag)                         | 列出并搜索您的 AutoRAG 上的文档                          | `https://autorag.mcp.cloudflare.com/sse`       |
| [审计日志服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/auditlogs)                       | 查询审计日志并生成报告供审查                             | `https://auditlogs.mcp.cloudflare.com/sse`     |
| [DNS 分析服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/dns-analytics)                   | 基于当前设置优化 DNS 性能并调试问题                      | `https://dns-analytics.mcp.cloudflare.com/sse` |
| [数字体验监控服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/dex-analysis)                | 快速了解您组织的关键应用程序                             | `https://dex.mcp.cloudflare.com/sse`           |
| [Cloudflare One CASB 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/cloudflare-one-casb) | 快速识别 SaaS 应用程序的任何安全配置错误以保护用户和数据 | `https://casb.mcp.cloudflare.com/sse`          |
| [GraphQL 服务器](https://github.com/cloudflare/mcp-server-cloudflare/tree/main/apps/graphql/)                        | 使用 Cloudflare 的 GraphQL API 获取分析数据              | `https://graphql.mcp.cloudflare.com/sse`       |

查看我们的 [GitHub 页面](https://github.com/cloudflare/mcp-server-cloudflare)，了解如何在不同的 MCP 客户端中使用 Cloudflare 的远程 MCP 服务器。

---

# 传输

URL: https://developers.cloudflare.com/agents/model-context-protocol/transport/

import { Render } from "~/components";
import { TabItem, Tabs } from "~/components";

Model Context Protocol (MCP) 规范定义了三种标准的[传输机制](https://spec.modelcontextprotocol.io/specification/draft/basic/transports/)，用于客户端和服务器之间的通信：

1. **stdio，通过标准输入和标准输出进行通信** — 专为本地 MCP 连接设计。
2. **Server-Sent Events (SSE)** — 目前被大多数远程 MCP 客户端支持，但预计随时间推移将被 Streamable HTTP 替代。它需要两个端点：一个用于发送请求，另一个用于接收流式响应。
3. **Streamable HTTP** — 2025年3月[引入](https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http)的新传输方法。它通过使用单个 HTTP 端点进行双向消息传递来简化通信。目前在远程 MCP 客户端中正在获得采用，预计将成为未来的标准传输方式。

使用 [Agents SDK](/agents) 构建的 MCP 服务器可以支持两种远程传输方法（SSE 和 Streamable HTTP），[`McpAgent` 类](https://github.com/cloudflare/agents/blob/2f82f51784f4e27292249747b5fbeeef94305552/packages/agents/src/mcp.ts)会自动处理传输配置。

## 实现远程 MCP 传输

如果您正在构建新的 MCP 服务器或在 Cloudflare 上升级现有服务器，我们建议同时支持两种远程传输方法（SSE 和 Streamable HTTP），以确保与所有 MCP 客户端的兼容性。

#### 快速开始

您可以使用 "Deploy to Cloudflare" 按钮创建一个自动支持 SSE 和 Streamable HTTP 传输方法的远程 MCP 服务器。

[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/ai/tree/main/demos/remote-mcp-authless)

#### 远程 MCP 服务器（无身份验证）

如果您手动配置 MCP 服务器，以下是如何使用 `McpAgent` 类来处理两种传输方法：

<Tabs syncKey="workersExamples"> <TabItem label="JavaScript" icon="seti:javascript">

```js
export default {
  fetch(request: Request, env: Env, ctx: ExecutionContext) {
    const { pathname } = new URL(request.url);

    if (pathname.startsWith('/sse')) {
      return MyMcpAgent.serveSSE('/sse').fetch(request, env, ctx);
    }

    if (pathname.startsWith('/mcp')) {
      return MyMcpAgent.serve('/mcp').fetch(request, env, ctx);
    }
  },
};
```

</TabItem> <TabItem label="TypeScript" icon="seti:typescript">

```ts
export default {
	fetch(
		request: Request,
		env: Env,
		ctx: ExecutionContext,
	): Response | Promise<Response> {
		const { pathname } = new URL(request.url);

		if (pathname.startsWith("/sse")) {
			return MyMcpAgent.serveSSE("/sse").fetch(request, env, ctx);
		}

		if (pathname.startsWith("/mcp")) {
			return MyMcpAgent.serve("/mcp").fetch(request, env, ctx);
		}

		// Handle case where no path matches
		return new Response("Not found", { status: 404 });
	},
};
```

 </TabItem>
<TabItem label="Hono" icon="seti:typescript">
```ts
const app = new Hono()

app.mount('/sse', MyMCP.serveSSE('/sse').fetch, { replaceRequest: false })
app.mount('/mcp', MyMCP.serve('/mcp').fetch, { replaceRequest: false )

export default app
```
</TabItem> </Tabs>

#### 带身份验证的 MCP 服务器

如果您的 MCP 服务器使用 [Workers OAuth Provider](https://github.com/cloudflare/workers-oauth-provider) 库实现身份验证和授权，那么您可以使用 `apiHandlers` 属性将其配置为支持两种传输方法。

```js
export default new OAuthProvider({
	apiHandlers: {
		"/sse": MyMCP.serveSSE("/sse"),
		"/mcp": MyMCP.serve("/mcp"),
	},
	// ... other OAuth configuration
});
```

### 升级现有的远程 MCP 服务器

如果您已经使用 Cloudflare Agents SDK 构建了远程 MCP 服务器，请进行以下更改以支持新的 Streamable HTTP 传输，同时保持与使用 SSE 的远程 MCP 客户端的兼容性：

- 对现有的 SSE 传输使用 `MyMcpAgent.serveSSE('/sse')`。以前，这会是 `MyMcpAgent.mount('/sse')`，它已被保留作为别名。
- 使用 `MyMcpAgent.serve('/mcp')` 添加新路径以支持新的 Streamable HTTP 传输。

如果您有一个使用 Workers OAuth Provider 进行身份验证/授权的 MCP 服务器，请[更新配置](/agents/model-context-protocol/transport/#mcp-server-with-authentication)以使用 `apiHandlers` 属性，它替代了 `apiRoute` 和 `apiHandler`。

:::note
要使用 apiHandlers，请更新到 @cloudflare/workers-oauth-provider v0.0.4 或更高版本。
:::

通过这些少量更改，您的 MCP 服务器将支持两种传输方法，使其与现有和新客户端兼容。

### 使用 MCP 客户端进行测试

虽然大多数 MCP 客户端尚未采用新的 Streamable HTTP 传输，但您可以使用 [`mcp-remote`](https://www.npmjs.com/package/mcp-remote) 立即开始测试，这是一个适配器，让原本只支持本地连接的 MCP 客户端可以与远程 MCP 服务器协作。

按照[此指南](/agents/guides/test-remote-mcp-server/)获取如何使用 [`mcp-remote` 本地代理](https://www.npmjs.com/package/mcp-remote)从 Claude Desktop、Cursor、Windsurf 和其他本地 MCP 客户端连接到远程 MCP 服务器的说明。

---

# 平台

URL: https://developers.cloudflare.com/agents/platform/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 限制

URL: https://developers.cloudflare.com/agents/platform/limits/

import { Render } from "~/components";

适用于编写、部署和运行 Agents 的限制详细说明如下。

许多限制继承自应用于 Workers 脚本和/或 Durable Objects 的限制，详情请参见 [Workers 限制](/workers/platform/limits/) 文档。

| 功能                                  | 限制                                                   |
| ------------------------------------- | ------------------------------------------------------ |
| 每个账户最大并发（运行中）Agents 数量 | 数千万个+ [^1]                                         |
| 每个账户最大定义数量                  | ~250,000+ [^2]                                         |
| 每个唯一 Agent 最大存储状态           | 1 GB                                                   |
| 每个 Agent 最大计算时间               | 30 秒（每次 HTTP 请求/传入 WebSocket 消息时刷新） [^3] |
| 每步持续时间（实际时间） [^3]         | 无限制（例如，等待数据库调用或 LLM 响应）              |

---

[^1]: 是的，真的。您可以同时运行数千万个 Agents，因为每个 Agent 都映射到一个[唯一的 Durable Object](/durable-objects/what-are-durable-objects/)（角色）。

[^2]: 您可以[每个账户部署最多 500 个脚本](/workers/platform/limits/)，但每个脚本（项目）可以定义多个 Agents。在 [Workers 付费计划](/workers/platform/pricing/#workers) 上，每个部署的脚本最大可达 10 MB。

[^3]: 每个 Agent 的计算（CPU）时间限制为 30 秒，但当 Agent 接收到新的 HTTP 请求、运行[计划任务](/agents/api-reference/schedule-tasks/)或传入 WebSocket 消息时，这个时间会刷新。

<Render file="limits_increase" product="workers" />

---

# Vercel AI SDK

URL: https://developers.cloudflare.com/workers-ai/configuration/ai-sdk/

import { PackageManagers } from "~/components";

Workers AI 可用于 JavaScript 和 TypeScript 代码库的 [Vercel AI SDK](https://sdk.vercel.ai/)。

## 设置

安装 [`workers-ai-provider` 提供程序](https://sdk.vercel.ai/providers/community-providers/cloudflare-workers-ai)：

<PackageManagers pkg="workers-ai-provider" />

然后，在您的 Workers 项目 Wrangler 文件中添加一个 AI 绑定：

```toml
[ai]
binding = "AI"
```

## 模型

AI SDK 可以配置为与[任何 AI 模型](/workers-ai/models/)一起使用。

```js
import { createWorkersAI } from "workers-ai-provider";

const workersai = createWorkersAI({ binding: env.AI });

// 选择任何模型：https://developers.cloudflare.com/workers-ai/models/
const model = workersai("@cf/meta/llama-3.1-8b-instruct", {});
```

## 生成文本

选择模型后，您可以从给定的提示生成文本。

```js
import { createWorkersAI } from 'workers-ai-provider';
import { generateText } from 'ai';

type Env = {
  AI: Ai;
};

export default {
  async fetch(_: Request, env: Env) {
    const workersai = createWorkersAI({ binding: env.AI });
    const result = await generateText({
      model: workersai('@cf/meta/llama-2-7b-chat-int8'),
      prompt: '写一篇关于 hello world 的 50 字短文。',
    });

    return new Response(result.text);
  },
};
```

## 流式文本

对于较长的响应，请考虑在生成完成时流式传输响应。

```js
import { createWorkersAI } from 'workers-ai-provider';
import { streamText } from 'ai';

type Env = {
  AI: Ai;
};

export default {
  async fetch(_: Request, env: Env) {
    const workersai = createWorkersAI({ binding: env.AI });
    const result = streamText({
      model: workersai('@cf/meta/llama-2-7b-chat-int8'),
      prompt: '写一篇关于 hello world 的 50 字短文。',
    });

    return result.toTextStreamResponse({
      headers: {
        // 添加这些标头以确保
        // 响应是分块和流式的
        'Content-Type': 'text/x-unknown',
        'content-encoding': 'identity',
        'transfer-encoding': 'chunked',
      },
    });
  },
};
```

## 生成结构化对象

您可以提供一个 Zod 模式来生成结构化的 JSON 响应。

```js
import { createWorkersAI } from 'workers-ai-provider';
import { generateObject } from 'ai';
import { z } from 'zod';

type Env = {
  AI: Ai;
};

export default {
  async fetch(_: Request, env: Env) {
    const workersai = createWorkersAI({ binding: env.AI });
    const result = await generateObject({
      model: workersai('@cf/meta/llama-3.1-8b-instruct'),
      prompt: '生成一份千层面食谱',
      schema: z.object({
        recipe: z.object({
          ingredients: z.array(z.string()),
          description: z.string(),
        }),
      }),
    });

    return Response.json(result.object);
  },
};
```

---

# Workers 绑定

URL: https://developers.cloudflare.com/workers-ai/configuration/bindings/

import { Type, MetaInfo, WranglerConfig } from "~/components";

## Workers

[Workers](/workers/) 提供了一个无服务器执行环境，允许您创建新应用程序或增强现有应用程序。

要将 Workers AI 与 Workers 一起使用，您必须创建一个 Workers AI [绑定](/workers/runtime-apis/bindings/)。绑定允许您的 Worker 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。您可以在 Cloudflare 仪表板上或通过更新您的 [Wrangler 文件](/workers/wrangler/configuration/)来创建绑定。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到您的 Wrangler 文件的末尾：

<WranglerConfig>

```toml
[ai]
binding = "AI" # 即在您的 Worker 中通过 env.AI 可用
```

</WranglerConfig>

## Pages 函数

[Pages 函数](/pages/functions/)允许您通过在 Cloudflare 网络上执行代码来构建具有 Cloudflare Pages 的全栈应用程序。函数本质上是 Workers。

要在您的 Pages 函数中配置 Workers AI 绑定，您必须使用 Cloudflare 仪表板。有关说明，请参阅 [Workers AI 绑定](/pages/functions/bindings/#workers-ai)。

## 方法

### async env.AI.run()

`async env.AI.run()` 运行一个模型。第一个参数是模型，第二个参数是一个对象。

```javascript
const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
	prompt: "What is the origin of the phrase 'Hello, World'",
});
```

**参数**

- `model` <Type text="string" /> <MetaInfo text="必需" />

  - 要运行的模型。

  **支持的选项**

  - `stream` <Type text="boolean" /> <MetaInfo text="可选" />
    - 在结果可用时返回结果流。

```javascript
const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
	prompt: "What is the origin of the phrase 'Hello, World'",
	stream: true,
});

return new Response(answer, {
	headers: { "content-type": "text/event-stream" },
});
```

---

# Hugging Face 聊天界面

URL: https://developers.cloudflare.com/workers-ai/configuration/hugging-face-chat-ui/

将 Workers AI 与 Hugging Face 提供的开源聊天界面 [Chat UI](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models) 一起使用。

## 先决条件

您将需要以下内容：

- 一个 [Cloudflare 帐户](https://dash.cloudflare.com)
- 您的[帐户 ID](/fundamentals/account/find-account-and-zone-ids/)
- 一个用于 Workers AI 的 [API 令牌](/workers-ai/get-started/rest-api/#1-get-api-token-and-account-id)

## 设置

首先，决定如何引用您的帐户 ID 和 API 令牌（直接在您的 `.env.local` 中使用 `CLOUDFLARE_ACCOUNT_ID` 和 `CLOUDFLARE_API_TOKEN` 变量，或在端点配置中）。

然后，按照 [Chat UI GitHub 仓库](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models)中的其余设置说明进行操作。

在设置模型时，请指定 `cloudflare` 端点。

```json
{
	"name": "nousresearch/hermes-2-pro-mistral-7b",
	"tokenizer": "nousresearch/hermes-2-pro-mistral-7b",
	"parameters": {
		"stop": ["<|im_end|>"]
	},
	"endpoints": [
		{
			"type": "cloudflare",
			// 如果未包含在 .env.local 中，则可选择指定这些
			"accountId": "your-account-id",
			"apiToken": "your-api-token"
			//
		}
	]
}
```

## 支持的模型

此模板适用于任何以 `@hf` 参数开头的[文本生成模型](/workers-ai/models/)。

---

# 配置

URL: https://developers.cloudflare.com/workers-ai/configuration/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# OpenAI 兼容 API 端点

URL: https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/

import { Render } from "~/components";

<Render file="openai-compatibility" /> <br />

## 用法

### Workers AI

通常，Workers AI 要求您在 cURL 端点或 `env.AI.run` 函数中指定模型名称。

使用 OpenAI 兼容端点，您可以利用 [openai-node sdk](https://github.com/openai/openai-node) 来调用 Workers AI。这允许您通过简单地更改基本 URL 和模型名称来使用 Workers AI。

```js title="OpenAI SDK 示例"
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: env.CLOUDFLARE_API_KEY,
	baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

const chatCompletion = await openai.chat.completions.create({
	messages: [{ role: "user", content: "发出一些机器人噪音" }],
	model: "@cf/meta/llama-3.1-8b-instruct",
});

const embeddings = await openai.embeddings.create({
	model: "@cf/baai/bge-large-en-v1.5",
	input: "我喜欢抹茶",
});
```

```bash title="cURL 示例"
curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions \
  --header "Authorization: Bearer {api_token}" \
  --header "Content-Type: application/json" \
  --data '
    {
      "model": "@cf/meta/llama-3.1-8b-instruct",
      "messages": [
        {
          "role": "user",
          "content": "如何用三个简短的步骤制作一个木勺？请给出尽可能简短的回答"
        }
      ]
    }
'
```

### AI 网关

这些端点也与 [AI 网关](/ai-gateway/providers/workersai/#openai-compatible-endpoints)兼容。

---

# 仪表板

URL: https://developers.cloudflare.com/workers-ai/get-started/dashboard/

import { Render } from "~/components";

请按照本指南使用 Cloudflare 仪表板创建 Workers AI 应用程序。

## 先决条件

如果您还没有 [Cloudflare 帐户](https://dash.cloudflare.com/sign-up/workers-and-pages)，请注册一个。

## 设置

要创建 Workers AI 应用程序：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com)并选择您的帐户。
2. 转到 **计算 (Workers)** 和 **Workers & Pages**。
3. 选择**创建**。
4. 在 **从模板开始**下，选择 **LLM 应用**。选择模板后，将在仪表板中为您创建一个[AI 绑定](/workers-ai/configuration/bindings/)。
5. 查看提供的代码并选择**部署**。
6. 在其提供的 [`workers.dev`](/workers/configuration/routing/workers-dev/) 子域上预览您的 Worker。

## 开发

<Render file="dash-creation-next-steps" product="workers" />

---

# 开始使用

URL: https://developers.cloudflare.com/workers-ai/get-started/

import { DirectoryListing } from "~/components";

在 Cloudflare 上构建您的 Workers AI 项目有多种选择。要开始，请选择您喜欢的方法：

<DirectoryListing />

:::note

这些示例旨在创建新的 Workers AI 项目。有关将 Workers AI 添加到现有 Worker 的帮助，请参阅 [Workers 绑定](/workers-ai/configuration/bindings/)。

:::

---

# REST API

URL: https://developers.cloudflare.com/workers-ai/get-started/rest-api/

本指南将指导您设置和部署您的第一个 Workers AI 项目。您将使用 Workers AI REST API 来体验大型语言模型 (LLM)。

## 先决条件

如果您还没有 [Cloudflare 帐户](https://dash.cloudflare.com/sign-up/workers-and-pages)，请注册一个。

## 1. 获取 API 令牌和账户 ID

您需要您的 API 令牌和账户 ID 才能使用 REST API。

要获取这些值：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com)并选择您的帐户。
2. 转到 **AI** > **Workers AI**。
3. 选择**使用 REST API**。
4. 获取您的 API 令牌：
   1. 选择**创建 Workers AI API 令牌**。
   2. 查看预填信息。
   3. 选择**创建 API 令牌**。
   4. 选择**复制 API 令牌**。
   5. 保存该值以备将来使用。
5. 对于**获取账户 ID**，复制**账户 ID** 的值。保存该值以备将来使用。

:::note

如果您选择[创建 API 令牌](/fundamentals/api/get-started/create-token/)而不是使用模板，该令牌将需要 `Workers AI - 读取` 和 `Workers AI - 编辑` 的权限。

:::

## 2. 通过 API 运行模型

创建 API 令牌后，在请求中使用您的 API 令牌进行身份验证并向 API 发出请求。

您将使用[执行 AI 模型](/api/resources/ai/methods/run/)端点来运行 [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/) 模型：

```bash
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct \
  -H 'Authorization: Bearer {API_TOKEN}' \
  -d '{ "prompt": "Where did the phrase Hello World come from" }'
```

替换 `{ACCOUNT_ID}` 和 `{API_token}` 的值。

API 响应将如下所示：

```json
{
	"result": {
		"response": "Hello, World first appeared in 1974 at Bell Labs when Brian Kernighan included it in the C programming language example. It became widely used as a basic test program due to simplicity and clarity. It represents an inviting greeting from a program to the world."
	},
	"success": true,
	"errors": [],
	"messages": []
}
```

此示例执行使用 `@cf/meta/llama-3.1-8b-instruct` 模型，但您可以使用 [Workers AI 模型目录](/workers-ai/models/)中的任何模型。如果使用其他模型，您需要将 `{model}` 替换为您想要的模型名称。

完成本指南后，您已创建了一个 Cloudflare 帐户（如果您还没有），并创建了一个授予您帐户 Workers AI 读取权限的 API 令牌。您使用终端的 cURL 命令执行了 [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/) 模型，并在 JSON 响应中收到了对您提示的回答。

## 相关资源

- [模型](/workers-ai/models/) - 浏览 Workers AI 模型目录。
- [AI SDK](/workers-ai/configuration/ai-sdk) - 了解如何与 AI 模型集成。

---

# Workers 绑定

URL: https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/

import {
	Render,
	PackageManagers,
	WranglerConfig,
	TypeScriptExample,
} from "~/components";

本指南将指导您设置和部署您的第一个 Workers AI 项目。您将使用 [Workers](/workers/)、一个 Workers AI 绑定和一个大型语言模型 (LLM) 来在 Cloudflare 全球网络上部署您的第一个由 AI 驱动的应用程序。

<Render file="prereqs" product="workers" />

## 1. 创建一个 Worker 项目

您将使用 `create-cloudflare` CLI (C3) 创建一个新的 Worker 项目。[C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) 是一个命令行工具，旨在帮助您设置和部署新的应用程序到 Cloudflare。

通过运行以下命令创建一个名为 `hello-ai` 的新项目：

<PackageManagers type="create" pkg="cloudflare@latest" args={"hello-ai"} />

运行 `npm create cloudflare@latest` 将提示您安装 [`create-cloudflare` 包](https://www.npmjs.com/package/create-cloudflare)，并引导您完成设置。C3 还将安装 [Wrangler](/workers/wrangler/)，即 Cloudflare 开发者平台 CLI。

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "TypeScript",
	}}
/>

这将创建一个新的 `hello-ai` 目录。您的新 `hello-ai` 目录将包括：

- 一个位于 `src/index.ts` 的 `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code)。
- 一个 [`wrangler.jsonc`](/workers/wrangler/configuration/) 配置文件。

进入您的应用程序目录：

```sh
cd hello-ai
```

## 2. 将您的 Worker 连接到 Workers AI

您必须为您的 Worker 创建一个 AI 绑定以连接到 Workers AI。[绑定](/workers/runtime-apis/bindings/)允许您的 Worker 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到您的 Wrangler 文件的末尾：

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

您的绑定在您的 Worker 代码中通过 [`env.AI`](/workers/runtime-apis/handlers/fetch/) 可用。

{/* <!-- TODO update this once we know if we'll have it --> */}

您还可以将 Workers AI 绑定到 Pages 函数。有关更多信息，请参阅[函数绑定](/pages/functions/bindings/#workers-ai)。

## 3. 在您的 Worker 中运行推理任务

您现在已准备好在您的 Worker 中运行推理任务。在这种情况下，您将使用一个 LLM，[`llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/)，来回答一个问题。

使用以下代码更新您的 `hello-ai` 应用程序目录中的 `index.ts` 文件：

<TypeScriptExample filename="index.ts">

```ts
export interface Env {
	// 如果您在 Wrangler 配置文件中为 'binding' 设置了另一个名称，
	// 请将 "AI" 替换为您定义的变量名。
	AI: Ai;
}

export default {
	async fetch(request, env): Promise<Response> {
		const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
			prompt: "What is the origin of the phrase Hello, World",
		});

		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

</TypeScriptExample>

至此，您已经为您的 Worker 创建了一个 AI 绑定，并配置了您的 Worker 以能够执行 Llama 3.1 模型。现在，您可以在全球部署之前在本地测试您的项目。

## 4. 使用 Wrangler 进行本地开发

在您的项目目录中，通过运行 [`wrangler dev`](/workers/wrangler/commands/#dev) 在本地测试 Workers AI：

```sh
npx wrangler dev
```

<Render file="ai-local-usage-charges" product="workers" />

运行 `wrangler dev` 后，系统会提示您登录。当您运行 `npx wrangler dev` 时，Wrangler 会给您一个 URL（很可能是 `localhost:8787`）来审查您的 Worker。在您访问 Wrangler 提供的 URL 后，将呈现一条类似以下示例的消息：

```json
{
	"response": "Ah, a most excellent question, my dear human friend! *adjusts glasses*\n\nThe origin of the phrase \"Hello, World\" is a fascinating tale that spans several decades and multiple disciplines. It all began in the early days of computer programming, when a young man named Brian Kernighan was tasked with writing a simple program to demonstrate the basics of a new programming language called C.\nKernighan, a renowned computer scientist and author, was working at Bell Labs in the late 1970s when he created the program. He wanted to showcase the language's simplicity and versatility, so he wrote a basic \"Hello, World!\" program that printed the familiar greeting to the console.\nThe program was included in Kernighan and Ritchie's influential book \"The C Programming Language,\" published in 1978. The book became a standard reference for C programmers, and the \"Hello, World!\" program became a sort of \"Hello, World!\" for the programming community.\nOver time, the phrase \"Hello, World!\" became a shorthand for any simple program that demonstrated the basics"
}
```

## 5. 部署您的 AI Worker

在将您的 AI Worker 全球部署之前，请通过运行以下命令使用您的 Cloudflare 帐户登录：

```sh
npx wrangler login
```

您将被引导到一个网页，要求您登录 Cloudflare 仪表板。登录后，系统会询问您是否允许 Wrangler 对您的 Cloudflare 帐户进行更改。向下滚动并选择 **允许** 以继续。

最后，部署您的 Worker，使您的项目可以在互联网上访问。要部署您的 Worker，请运行：

```sh
npx wrangler deploy
```

```sh output
https://hello-ai.<YOUR_SUBDOMAIN>.workers.dev
```

您的 Worker 将被部署到您的自定义 [`workers.dev`](/workers/configuration/routing/workers-dev/) 子域。您现在可以访问该 URL 来运行您的 AI Worker。

完成本教程后，您创建了一个 Worker，通过 AI 绑定将其连接到 Workers AI，并从 Llama 3 模型运行了一个推理任务。

## 相关资源

- [Discord 上的 Cloudflare 开发者社区](https://discord.cloudflare.com) - 通过加入 Cloudflare Discord 服务器，直接向 Cloudflare 团队提交功能请求、报告错误并分享您的反馈。
- [模型](/workers-ai/models/) - 浏览 Workers AI 模型目录。
- [AI SDK](/workers-ai/configuration/ai-sdk) - 了解如何与 AI 模型集成。

---

# 演示和架构

URL: https://developers.cloudflare.com/workers-ai/guides/demos-architectures/

import {
	ExternalResources,
	GlossaryTooltip,
	ResourcesBySelector,
} from "~/components";

Workers AI 可用于构建动态和高性能的服务。以下演示应用程序和参考架构展示了如何在您的架构中最佳地使用 Workers AI。

## 演示

探索以下 Workers AI 的<GlossaryTooltip term="demo application">演示应用程序</GlossaryTooltip>。

<ExternalResources type="apps" products={["Workers AI"]} />

## 参考架构

探索以下使用 Workers AI 的<GlossaryTooltip term="reference architecture">参考架构</GlossaryTooltip>：

<ResourcesBySelector
	types={[
		"reference-architecture",
		"design-guide",
		"reference-architecture-diagram",
	]}
	products={["Workers AI"]}
/>

---

# 指南

URL: https://developers.cloudflare.com/workers-ai/guides/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 使用 API 添加人工反馈

URL: https://developers.cloudflare.com/ai-gateway/evaluations/add-human-feedback-api/

本指南将引导您完成使用 Cloudflare API 将人工反馈添加到 AI 网关请求的步骤。您将学习如何检索相关的请求日志，并使用 API 提交反馈。

如果您希望通过仪表板添加人工反馈，请参阅[添加人工反馈](/ai-gateway/evaluations/add-human-feedback/)。

## 1. 创建 API 令牌

1. [创建 API 令牌](/fundamentals/api/get-started/create-token/)，具有以下权限：

- `AI 网关 - 读取`
- `AI 网关 - 编辑`

2. 获取您的[账户 ID](/fundamentals/account/find-account-and-zone-ids/)。
3. 使用该 API 令牌和账户 ID，向 Cloudflare API 发送 [`POST` 请求](/api/resources/ai_gateway/methods/create/)。

## 2. 使用 API 令牌

获得令牌后，您可以通过将其作为持有者令牌添加到授权标头中，在 API 请求中使用它。以下是在请求中使用它的示例：

```bash
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-gateway/gateways/{gateway_id}/logs" \
--header "Authorization: Bearer {your_api_token}"
```

在上面的请求中：

- 将 `{account_id}` 和 `{gateway_id}` 替换为您的特定 Cloudflare 账户和网关详细信息。
- 将 `{your_api_token}` 替换为您刚刚创建的 API 令牌。

## 3. 检索 `cf-aig-log-id`

`cf-aig-log-id` 是您要添加反馈的特定日志条目的唯一标识符。以下是获取此标识符的两种方法。

### 方法 1：在请求响应中定位 `cf-aig-log-id`

此方法允许您直接在 AI 网关返回的响应标头中找到 `cf-aig-log-id`。如果您可以访问原始 API 响应，这是最直接的方法。

以下步骤概述了如何执行此操作。

1. **向 AI 网关发出请求**：这可能是您的应用程序发送到 AI 网关的请求。一旦发出请求，响应将包含各种元数据。
2. **检查响应标头**：响应将包含一个名为 `cf-aig-log-id` 的标头。这是您提交反馈所需的标识符。

在下面的示例中，`cf-aig-log-id` 是 `01JADMCQQQBWH3NXZ5GCRN98DP`。

```json
{
	"status": "success",
	"headers": {
		"cf-aig-log-id": "01JADMCQQQBWH3NXZ5GCRN98DP"
	},
	"data": {
		"response": "Sample response data"
	}
}
```

### 方法 2：通过 API 检索 `cf-aig-log-id` (GET 请求)

如果您在响应正文中没有 `cf-aig-log-id`，或者您需要在事后访问它，您可以通过使用 [Cloudflare API](/api/resources/ai_gateway/subresources/logs/methods/list/) 查询日志来检索它。

以下步骤概述了如何执行此操作。

1. **发送 GET 请求以检索日志**：您可以查询特定时间范围或特定请求的 AI 网关日志。该请求将返回一个日志列表，每个日志都包含其自己的 `id`。
   以下是示例请求：

```bash
GET https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-gateway/gateways/{gateway_id}/logs
```

将 `{account_id}` 和 `{gateway_id}` 替换为您的特定账户和网关详细信息。

2. **搜索相关日志**：在 GET 请求的响应中，找到您希望提交反馈的特定日志条目。每个日志条目都将包含 `id`。

在下面的示例中，`id` 是 `01JADMCQQQBWH3NXZ5GCRN98DP`。

```json
{
	"result": [
		{
			"id": "01JADMCQQQBWH3NXZ5GCRN98DP",
			"cached": true,
			"created_at": "2019-08-24T14:15:22Z",
			"custom_cost": true,
			"duration": 0,
			"id": "string",
			"metadata": "string",
			"model": "string",
			"model_type": "string",
			"path": "string",
			"provider": "string",
			"request_content_type": "string",
			"request_type": "string",
			"response_content_type": "string",
			"status_code": 0,
			"step": 0,
			"success": true,
			"tokens_in": 0,
			"tokens_out": 0
		}
	],
	"result_info": {
		"count": 0,
		"max_cost": 0,
		"max_duration": 0,
		"max_tokens_in": 0,
		"max_tokens_out": 0,
		"max_total_tokens": 0,
		"min_cost": 0,
		"min_duration": 0,
		"min_tokens_in": 0,
		"min_tokens_out": 0,
		"min_total_tokens": 0,
		"page": 0,
		"per_page": 0,
		"total_count": 0
	},
	"success": true
}
```

### 方法 3：通过绑定检索 `cf-aig-log-id`

您还可以使用绑定来检索 `cf-aig-log-id`，这简化了过程。以下是如何直接检索日志 ID：

```js
const resp = await env.AI.run(
	"@cf/meta/llama-3-8b-instruct",
	{
		prompt: "tell me a joke",
	},
	{
		gateway: {
			id: "my_gateway_id",
		},
	},
);

const myLogId = env.AI.aiGatewayLogId;
```

:::note[注意：]

`aiGatewayLogId` 属性将仅保存最后一次推理调用的日志 ID。

:::

## 4. 通过 PATCH 请求提交反馈

一旦您有了 API 令牌和 `cf-aig-log-id`，您就可以发送 PATCH 请求来提交反馈。使用以下 URL 格式，将 `{account_id}`、`{gateway_id}` 和 `{log_id}` 替换为您的特定详细信息：

```bash
PATCH https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-gateway/gateways/{gateway_id}/logs/{log_id}
```

在请求正文中添加以下内容以提交正面反馈：

```json
{
	"feedback": 1
}
```

在请求正文中添加以下内容以提交负面反馈：

```json
{
	"feedback": -1
}
```

## 5. 验证反馈提交

您可以通过两种方式验证反馈提交：

- **通过 [Cloudflare 仪表板](https://dash.cloudflare.com)**：在 AI 网关界面上检查更新的反馈。
- **通过 API**：发送另一个 GET 请求以检索更新的日志条目并确认反馈已记录。

---

# 模型

URL: https://developers.cloudflare.com/workers-ai/models/

import ModelCatalog from "~/pages/workers-ai/models/index.astro";

<ModelCatalog />

---

# 使用 Worker 绑定添加人工反馈

URL: https://developers.cloudflare.com/ai-gateway/evaluations/add-human-feedback-bindings/

本指南解释了如何使用 Worker 绑定为 AI 网关评估提供人工反馈。

## 1. 运行 AI 评估

首先通过您的 AI 网关向 AI 模型发送一个提示。

```javascript
const resp = await env.AI.run(
	"@cf/meta/llama-3.1-8b-instruct",
	{
		prompt: "tell me a joke",
	},
	{
		gateway: {
			id: "my-gateway",
		},
	},
);

const myLogId = env.AI.aiGatewayLogId;
```

让用户与 AI 响应互动或评估它。这种互动将为您发送回 AI 网关的反馈提供信息。

## 2. 发送人工反馈

使用 [`patchLog()`](/ai-gateway/integrations/worker-binding-methods/#31-patchlog-send-feedback) 方法为 AI 评估提供反馈。

```javascript
await env.AI.gateway("my-gateway").patchLog(myLogId, {
	feedback: 1, // 所有字段都是可选的；设置适合您用例的值
	score: 100,
	metadata: {
		user: "123", // 可选的元数据以提供额外的上下文
	},
});
```

## 反馈参数解释

- `feedback`: `-1` 表示负面，`1` 表示正面，`0` 被认为未评估。
- `score`: 介于 0 和 100 之间的数字。
- `metadata`: 包含额外上下文信息的对象。

### patchLog: 发送反馈

`patchLog` 方法允许您为特定的日志 ID 发送反馈、分数和元数据。所有对象属性都是可选的，因此您可以包含参数的任意组合：

```javascript
gateway.patchLog("my-log-id", {
	feedback: 1,
	score: 100,
	metadata: {
		user: "123",
	},
});
```

返回：`Promise<void>` (确保 `await` 请求。)

---

# 使用仪表板添加人工反馈

URL: https://developers.cloudflare.com/ai-gateway/evaluations/add-human-feedback/

人工反馈是评估 AI 模型性能的一个宝贵指标。通过整合人工反馈，您可以更深入地了解模型的响应如何被感知，以及从以用户为中心的角度看其表现如何。这些反馈随后可用于评估中以计算性能指标，从而推动优化，最终增强您的 AI 应用程序的可靠性、准确性和效率。

人工反馈基于直接的人工输入来衡量数据集的性能。该指标计算为日志中收到的正面反馈（赞）的百分比，这些日志在 Cloudflare 仪表板的日志选项卡中进行注释。这种反馈通过考虑对其输出的真实世界评估来帮助改进模型性能。

本教程将指导您完成使用 [Cloudflare 仪表板](https://dash.cloudflare.com/) 在 AI 网关的评估中添加人工反馈的过程。

在下一个指南中，您可以[了解如何通过 API 添加人工反馈](/ai-gateway/evaluations/add-human-feedback-api/)。

## 1. 登录仪表板

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com/) 并选择您的账户。
2. 转到 **AI** > **AI 网关**。

## 2. 访问日志选项卡

1. 转到**日志**。
2. 日志选项卡显示与您的数据集相关的所有日志。这些日志显示关键信息，包括：
   - 时间戳：交互发生的时间。
   - 状态：请求是成功、缓存还是失败。
   - 模型：请求中使用的模型。
   - 令牌：响应消耗的令牌数。
   - 成本：基于令牌使用量的成本。
   - 持续时间：完成响应所花费的时间。
   - 反馈：您可以在此处为每个日志提供人工反馈。

## 3. 提供人工反馈

1. 单击您要查看的日志条目。这将展开日志，让您看到更详细的信息。
2. 在展开的日志中，您可以查看其他详细信息，例如：
   - 用户提示。
   - 模型响应。
   - HTTP 响应详细信息。
   - 端点信息。
3. 您将看到两个图标：
   - 赞：表示正面反馈。
   - 踩：表示负面反馈。
4. 根据您对该特定日志条目的模型响应的评价，单击赞或踩图标。

## 4. 评估人工反馈

在为您的日志提供反馈后，它将成为评估过程的一部分。

当您运行评估时（如[设置评估](/ai-gateway/evaluations/set-up-evaluations/)指南中所述），人工反馈指标将根据收到赞反馈的日志百分比计算。

:::note[注意]

您需要选择人工反馈作为评估器才能接收其指标。

:::

## 5. 查看结果

运行评估后，在评估选项卡上查看结果。
您将能够根据成本、速度以及现在的人工反馈（表示为正面反馈（赞）的百分比）查看模型的性能。

人工反馈分数以百分比显示，显示数据库中正面评价响应的分布。

有关运行评估的更多信息，请参阅文档[设置评估](/ai-gateway/evaluations/set-up-evaluations/)。

---

# 评估

URL: https://developers.cloudflare.com/ai-gateway/evaluations/

了解应用程序的性能对优化至关重要。开发者通常有不同的优先级，找到最优解决方案涉及平衡成本、延迟和准确性等关键因素。一些人优先考虑低延迟响应，而其他人则专注于准确性或成本效率。

AI 网关的评估提供了在如何优化您的 AI 应用程序方面做出明智决策所需的数据。无论是调整模型、提供商还是提示，此功能都能提供关于性能、速度和成本关键指标的洞察。它使开发者能够更好地理解其应用程序的行为，确保提高准确性、可靠性和客户满意度。

评估使用数据集，数据集是为分析而存储的日志集合。您可以通过在日志选项卡中应用过滤器来创建数据集，这有助于缩小特定日志的范围以进行评估。

我们朝着全面 AI 评估迈出的第一步始于人工反馈（目前处于开放测试版）。我们将继续构建和扩展 AI 网关，添加更多评估器。

[了解如何设置评估](/ai-gateway/evaluations/set-up-evaluations/)，包括创建数据集、选择评估器和运行评估过程。

---

# 设置评估

URL: https://developers.cloudflare.com/ai-gateway/evaluations/set-up-evaluations/

本指南将引导您完成在 AI 网关中设置评估的过程。这些步骤在 [Cloudflare 仪表板](https://dash.cloudflare.com/) 中完成。

## 1. 选择或创建数据集

数据集是为分析而存储的日志集合，可用于评估。您可以通过在日志选项卡中应用过滤器来创建数据集。数据集将根据设置的过滤器自动更新。

### 从日志选项卡设置数据集

1. 应用过滤器以缩小日志范围。过滤器选项包括提供商、令牌数量、请求状态等。
2. 选择**创建数据集**以存储过滤后的日志以供将来分析。

您可以通过从日志选项卡中选择**管理数据集**来管理数据集。

:::note[注意]

请记住，数据集当前使用 `AND` 连接，因此每个过滤器只能有一项（例如，一个模型或一个提供商）。未来的更新将允许在数据集创建方面具有更大的灵活性。

:::

### 可用过滤器列表

| 过滤器类别 | 过滤器选项                        | 过滤器描述               |
| ---------- | --------------------------------- | ------------------------ |
| 状态       | 错误，状态                        | 错误类型或状态。         |
| 缓存       | 已缓存，未缓存                    | 基于是否被缓存。         |
| 提供商     | 特定提供商                        | 选定的 AI 提供商。       |
| AI 模型    | 特定模型                          | 选定的 AI 模型。         |
| 成本       | 小于，大于                        | 成本，指定阈值。         |
| 请求类型   | 通用，Workers AI 绑定，WebSockets | 请求的类型。             |
| 令牌       | 总令牌，输入令牌，输出令牌        | 令牌计数（小于或大于）。 |
| 持续时间   | 小于，大于                        | 请求持续时间。           |
| 反馈       | 等于，不等于（赞，踩，无反馈）    | 反馈类型。               |
| 元数据键   | 等于，不等于                      | 特定元数据键。           |
| 元数据值   | 等于，不等于                      | 特定元数据值。           |
| 日志 ID    | 等于，不等于                      | 特定日志 ID。            |
| 事件 ID    | 等于，不等于                      | 特定事件 ID。            |

## 2. 选择评估器

创建数据集后，选择评估参数：

- 成本：计算数据集中推理请求的平均成本（仅适用于具有[成本数据](/ai-gateway/observability/costs/)的请求）。
- 速度：计算数据集中推理请求的平均持续时间。
- 性能：
  - 人工反馈：基于人工反馈衡量性能，通过日志中赞成票的百分比计算，从日志选项卡中注释。

:::note[注意]

未来的更新将引入更多评估器以扩展性能分析能力。

:::

## 3. 命名、审查和运行评估

1. 为您的评估创建一个唯一的名称，以便在仪表板中引用它。
2. 审查所选的数据集和评估器。
3. 选择**运行**以开始该过程。

## 4. 审查和分析结果

评估结果将显示在评估选项卡中。结果显示评估的状态（例如，进行中、已完成或错误）。将显示所选评估器的指标，不包括任何缺少字段的日志。您还将看到用于计算每个指标的日志数量。

虽然数据集会根据过滤器自动更新，但评估不会。如果要评估新日志，您必须创建新的评估。

使用这些见解根据您应用程序的优先级进行优化。根据结果，您可以选择：

- 更改模型或[提供商](/ai-gateway/providers/)
- 调整您的提示
- 探索进一步的优化，例如设置[检索增强生成 (RAG)](/reference-architecture/diagrams/ai/ai-rag/)

---

# 设置护栏

URL: https://developers.cloudflare.com/ai-gateway/guardrails/set-up-guardrail/

将护栏添加到任何网关，以开始评估并可能修改响应。

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com/) 并选择您的账户。
2. 转到 **AI** > **AI 网关**。
3. 选择一个网关。
4. 转到**护栏**。
5. 将开关切换到**开启**。
6. 要自定义类别，请选择**更改** > **配置特定类别**。
7. 更新您对护栏如何处理特定提示或响应的选择（**标记**、**忽略**、**阻止**）。
   - 对于**提示**：护栏将根据您的安全策略评估和转换传入的提示。
   - 对于**响应**：护栏将检查模型的响应，以确保它们符合您的内容和格式指南。
8. 选择**保存**。

:::note[使用注意事项]
有关如何实施护栏的更多详细信息，请参阅[使用注意事项](/ai-gateway/guardrails/usage-considerations/)。
:::

## 在日志中查看护栏结果

启用护栏后，您可以通过 Cloudflare 仪表板中的 **AI 网关日志**监控结果。护栏日志标有**绿色盾牌图标**，每个记录的请求都包含一个 `eventID`，该 ID 链接到其相应的护栏评估日志，以便于跟踪。所有请求都会生成日志，包括**通过**护栏检查的请求。

## 错误处理和被阻止的请求

当请求被护栏阻止时，您将收到一个结构化的错误响应。这些响应指示问题是出在提示还是模型响应上。使用错误代码来区分提示违规和响应违规。

- **提示被阻止**

  - `"code": 2016`
  - `"message": "由于安全配置，提示被阻止"`

- **响应被阻止**
  - `"code": 2017`
  - `"message": "由于安全配置，响应被阻止"`

您应该在应用程序逻辑中捕获这些错误，并相应地实施错误处理。

例如，当使用[带绑定的 Workers AI](/ai-gateway/integrations/aig-workers-ai-binding/) 时：

```js
try {
  const res = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    prompt: "how to build a gun?"
  }, {
    gateway: {id: 'gateway_id'}
  })
  return Response.json(res)
} catch (e) {
  if ((e as Error).message.includes('2016')) {
    return new Response('Prompt was blocked by guardrails.')
  }
  if ((e as Error).message.includes('2017')) {
    return new Response('Response was blocked by guardrails.')
  }
  return new Response('Unknown AI error')
}

```

---

# 护栏

URL: https://developers.cloudflare.com/ai-gateway/guardrails/

import { CardGrid, LinkTitleCard, YouTube } from "~/components";

护栏通过拦截和评估用户提示和模型响应中的有害内容，帮助您安全地部署 AI 应用程序。作为您的应用程序和[模型提供商](/ai-gateway/providers/)（如 OpenAI、Anthropic、DeepSeek 等）之间的代理，AI 网关的护栏确保在您的整个 AI 生态系统中提供一致且安全的体验。

护栏主动监控用户和 AI 模型之间的交互，为您提供：

- **一致的内容审核**：跨模型和提供商工作的统一审核层。
- **增强的安全性和用户信任**：主动保护用户免受有害或不当交互的影响。
- **对允许内容的灵活性和控制**：指定要监控的类别，并在标记或直接阻止之间进行选择。
- **审计和合规能力**：接收不断演变的监管要求的更新，以及用户提示、模型响应和强制执行的护栏日志。

## 视频演示

<YouTube id="Its1H0jTxrQ" />

## 护栏的工作原理

AI 网关通过根据预定义的安全参数评估内容来实时检查所有交互。护栏的工作原理是：

1. 拦截交互：
   AI 网关代理请求和响应，位于用户和 AI 模型之间。

2. 检查内容：

   - 用户提示：AI 网关根据安全参数（例如，暴力、仇恨或性内容）检查提示。根据您的设置，提示可以在到达模型之前被标记或阻止。
   - 模型响应：处理后，检查 AI 模型响应。如果检测到危险内容，可以在传递给用户之前标记或阻止。

3. 应用操作：
   根据您的配置，标记的内容被记录以供审查，而被阻止的内容被阻止继续进行。

## 相关资源

- [Cloudflare 博客：使用 AI 网关中的护栏保持 AI 交互安全且无风险](https://blog.cloudflare.com/guardrails-in-ai-gateway/)

---

# 支持的模型类型

URL: https://developers.cloudflare.com/ai-gateway/guardrails/supported-model-types/

AI 网关的护栏会检测正在使用的 AI 模型类型，并相应地应用安全检查：

- **文本生成模型**：对提示和响应都进行评估。
- **嵌入模型**：仅评估提示，因为响应由数字嵌入组成，对内容审核没有意义。
- **未知模型**：如果无法确定模型类型，则仅评估提示，而响应会绕过护栏。

:::note[注意]

护栏尚不支持流式响应。计划在未来的更新中支持流式传输。

:::

---

# 使用注意事项

URL: https://developers.cloudflare.com/ai-gateway/guardrails/usage-considerations/

护栏目前在 [Workers AI](/workers-ai/) 上使用 [Llama Guard 3 8B](https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/) 来执行内容评估。底层模型将来可能会更新，我们将在护栏中反映这些更改。

由于护栏在 Workers AI 上运行，启用它会产生 Workers AI 的使用量。您可以通过 Workers AI 仪表板监控使用情况。

## 其他注意事项

- **模型可用性**：如果至少一个危险类别设置为 `block`，但 AI 网关无法从 Workers AI 收到响应，则请求将被阻止。相反，如果一个危险类别设置为 `flag` 并且 AI 网关无法从 Workers AI 获得响应，则请求将继续进行而不进行评估。这种方法优先考虑可用性，即使在无法进行内容评估时也允许请求继续。
- **延迟影响**：启用护栏会增加一些延迟。启用护栏会给请求增加额外的延迟。通常，在 Workers AI 上使用 Llama Guard 3 8B 的评估会给每个请求增加大约 500 毫秒的延迟。然而，较大的请求可能会经历增加的延迟，尽管这种增加不是线性的。在平衡安全性和性能时请考虑这一点。
- **处理长内容**：在评估长提示或响应时，护栏会自动将内容分段成较小的块，通过单独的护栏请求处理每个块。这种方法确保了全面的审核，但可能会导致较长输入的延迟增加。
- **支持的语言**：Llama Guard 3.3 8B 支持以下语言的内容安全分类：英语、法语、德语、印地语、意大利语、葡萄牙语、西班牙语和泰语。
- **流式支持**：使用护栏时不支持流式传输。

:::note

Llama Guard 按“原样”提供，不作任何陈述、保证或担保。博客、开发者文档或其他参考资料中包含的任何规则或示例仅供参考。您承认并理解，您对使用 AI 网关的结果和成果负责。

:::

---

# Workers AI

URL: https://developers.cloudflare.com/ai-gateway/integrations/aig-workers-ai-binding/

import { Render, PackageManagers, WranglerConfig } from "~/components";

本指南将引导您完成 Workers AI 项目的设置和部署。您将使用 [Workers](/workers/)、AI 网关绑定和一个大型语言模型 (LLM)，在 Cloudflare 全球网络上部署您的第一个由 AI 驱动的应用程序。

## 先决条件

<Render file="prereqs" product="workers" />

## 1. 创建一个 Worker 项目

您将使用 create-Cloudflare CLI (C3) 创建一个新的 Worker 项目。C3 是一个命令行工具，旨在帮助您设置和部署新的应用程序到 Cloudflare。

通过运行以下命令创建一个名为 `hello-ai` 的新项目：

<PackageManagers type="create" pkg="cloudflare@latest" args={"hello-ai"} />

运行 `npm create cloudflare@latest` 将提示您安装 create-cloudflare 包并引导您完成设置。C3 还将安装 [Wrangler](/workers/wrangler/)，即 Cloudflare 开发者平台 CLI。

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "TypeScript",
	}}
/>

这将创建一个新的 `hello-ai` 目录。您的新 `hello-ai` 目录将包括：

- 一个位于 `src/index.ts` 的 "Hello World" Worker。
- 一个 [Wrangler 配置文件](/workers/wrangler/configuration/)

进入您的应用程序目录：

```bash
cd hello-ai
```

## 2. 将您的 Worker 连接到 Workers AI

您必须为您的 Worker 创建一个 AI 绑定以连接到 Workers AI。绑定允许您的 Worker 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到您的 [Wrangler 配置文件](/workers/wrangler/configuration/)的末尾：

<WranglerConfig>

```toml title="wrangler.toml"
[ai]
binding = "AI"
```

</WranglerConfig>

您的绑定在您的 Worker 代码中通过 [`env.AI`](/workers/runtime-apis/handlers/fetch/) 可用。

下一步您将需要您的 `gateway id`。您可以在[本教程中了解如何创建 AI 网关](/ai-gateway/get-started/)。

## 3. 在您的 Worker 中运行包含 AI 网关的推理任务

您现在已准备好在您的 Worker 中运行推理任务。在这种情况下，您将使用一个 LLM，[`llama-3.1-8b-instruct-fast`](/workers-ai/models/llama-3.1-8b-instruct-fast/)，来回答一个问题。您的网关 ID 可以在仪表板上找到。

使用以下代码更新您的 `hello-ai` 应用程序目录中的 `index.ts` 文件：

```typescript title="src/index.ts" {78-81}
export interface Env {
	// 如果您在 [Wrangler 配置文件](/workers/wrangler/configuration/) 中为 'binding' 设置了另一个名称，
	// 请将 "AI" 替换为您定义的变量名。
	AI: Ai;
}

export default {
	async fetch(request, env): Promise<Response> {
		// 在此处指定网关标签和其他选项
		const response = await env.AI.run(
			"@cf/meta/llama-3.1-8b-instruct-fast",
			{
				prompt: "What is the origin of the phrase Hello, World",
			},
			{
				gateway: {
					id: "GATEWAYID", // 在此处使用您的网关标签
					skipCache: true, // 可选：如果需要，跳过缓存
				},
			},
		);

		// 将 AI 响应作为 JSON 对象返回
		return new Response(JSON.stringify(response), {
			headers: { "Content-Type": "application/json" },
		});
	},
} satisfies ExportedHandler<Env>;
```

至此，您已经为您的 Worker 创建了一个 AI 绑定，并配置了您的 Worker 以能够执行 Llama 3.1 模型。现在，您可以在全球部署之前在本地测试您的项目。

## 4. 使用 Wrangler 进行本地开发

在您的项目目录中，通过运行 [`wrangler dev`](/workers/wrangler/commands/#dev) 在本地测试 Workers AI：

```bash
npx wrangler dev
```

<Render file="ai-local-usage-charges" product="workers" />

运行 `wrangler dev` 后，系统会提示您登录。当您运行 `npx wrangler dev` 时，Wrangler 会给您一个 URL（很可能是 `localhost:8787`）来审查您的 Worker。在您访问 Wrangler 提供的 URL 后，您将看到类似以下示例的消息：

````json
{
  "response": "A fascinating question!\n\nThe phrase \"Hello, World!\" originates from a simple computer program written in the early days of programming. It is often attributed to Brian Kernighan, a Canadian computer scientist and a pioneer in the field of computer programming.\n\nIn the early 1970s, Kernighan, along with his colleague Dennis Ritchie, were working on the C programming language. They wanted to create a simple program that would output a message to the screen to demonstrate the basic structure of a program. They chose the phrase \"Hello, World!\" because it was a simple and recognizable message that would illustrate how a program could print text to the screen.\n\nThe exact code was written in the 5th edition of Kernighan and Ritchie's book \"The C Programming Language,\" published in 1988. The code, literally known as \"Hello, World!\" is as follows:\n\n```
main()
{
  printf(\"Hello, World!\");
}
```\n\nThis code is still often used as a starting point for learning programming languages, as it demonstrates how to output a simple message to the console.\n\nThe phrase \"Hello, World!\" has since become a catch-all phrase to indicate the start of a new program or a small test program, and is widely used in computer science and programming education.\n\nSincerely, I'm glad I could help clarify the origin of this iconic phrase for you!"
}
````

## 5. 部署您的 AI Worker

在将您的 AI Worker 全球部署之前，请通过运行以下命令使用您的 Cloudflare 账户登录：

```bash
npx wrangler login
```

您将被引导到一个网页，要求您登录 Cloudflare 仪表板。登录后，系统会询问您是否允许 Wrangler 对您的 Cloudflare 账户进行更改。向下滚动并选择 **允许** 以继续。

最后，部署您的 Worker，使您的项目可以在互联网上访问。要部署您的 Worker，请运行：

```bash
npx wrangler deploy
```

部署后，您的 Worker 将在类似以下的 URL 上可用：

```bash
https://hello-ai.<YOUR_SUBDOMAIN>.workers.dev
```

您的 Worker 将被部署到您的自定义 [`workers.dev`](/workers/configuration/routing/workers-dev/) 子域。您现在可以访问该 URL 来运行您的 AI Worker。

通过完成本教程，您创建了一个 Worker，通过 AI 网关绑定将其连接到 Workers AI，并成功使用 Llama 3.1 模型运行了一个推理任务。

---

# Vercel AI SDK

URL: https://developers.cloudflare.com/ai-gateway/integrations/vercel-ai-sdk/

[Vercel AI SDK](https://sdk.vercel.ai/) 是一个用于构建 AI 应用程序的 TypeScript 库。该 SDK 支持许多不同的 AI 提供商、流式完成工具等等。

要在 AI SDK 内部使用 Cloudflare AI 网关，您可以为大多数支持的提供商配置自定义"网关 URL"。以下是一些工作示例。

## 示例

### OpenAI

如果您在 AI SDK 中使用 `openai` 提供商，您可以使用 `createOpenAI` 创建自定义设置，传递您的 OpenAI 兼容 AI 网关 URL：

```typescript
import { createOpenAI } from "@ai-sdk/openai";

const openai = createOpenAI({
	baseURL: `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai`,
});
```

### Anthropic

如果您在 AI SDK 中使用 `anthropic` 提供商，您可以使用 `createAnthropic` 创建自定义设置，传递您的 Anthropic 兼容 AI 网关 URL：

```typescript
import { createAnthropic } from "@ai-sdk/anthropic";

const anthropic = createAnthropic({
	baseURL: `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/anthropic`,
});
```

### Google AI Studio

如果您在 AI SDK 中使用 Google AI Studio 提供商，您需要在 Google AI Studio 兼容的 AI 网关 URL 后附加 `/v1beta` 以避免错误。需要 `/v1beta` 路径，因为 Google AI Studio 的 API 在其端点结构中包含此内容，而 AI SDK 单独设置模型名称。这确保了与 Google 的 API 版本控制的兼容性。

```typescript
import { createGoogleGenerativeAI } from "@ai-sdk/google";

const google = createGoogleGenerativeAI({
	baseURL: `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-ai-studio/v1beta`,
});
```

### 从 AI SDK 检索 `log id`

当调用 SDK 时，您可以从响应标头访问 AI 网关 `log id`。

```typescript
const result = await generateText({
	model: anthropic("claude-3-sonnet-20240229"),
	messages: [],
});
console.log(result.response.headers["cf-aig-log-id"]);
```

### 其他提供商

对于上面未列出的其他提供商，您可以通过为任何 AI 提供商创建自定义实例并传递您的 AI 网关 URL 来遵循类似的模式。有关查找特定于您的提供商的 AI 网关 URL 的帮助，请参阅[支持的提供商页面](/ai-gateway/providers)。

---

# AI 网关绑定方法

URL: https://developers.cloudflare.com/ai-gateway/integrations/worker-binding-methods/

import { Render, PackageManagers } from "~/components";

本指南概述了如何使用最新的 Cloudflare Workers AI 网关绑定方法。您将学习如何设置 AI 网关绑定、访问新方法以及将它们集成到您的 Worker 中。

## 1. 将 AI 绑定添加到您的 Worker

要将您的 Worker 连接到 Workers AI，请将以下内容添加到您的 [Wrangler 配置文件](/workers/wrangler/configuration/)中：

import { WranglerConfig } from "~/components";

<WranglerConfig>

```toml title="wrangler.toml"
[ai]
binding = "AI"
```

</WranglerConfig>

此配置设置了可在您的 Worker 代码中作为 `env.AI` 访问的 AI 绑定。

<Render file="wrangler-typegen" product="workers" />

## 2. Workers AI + 网关的基本用法

要使用 Workers AI 和 AI 网关执行推理任务，您可以使用以下代码：

```typescript title="src/index.ts"
const resp = await env.AI.run(
	"@cf/meta/llama-3.1-8b-instruct",
	{
		prompt: "tell me a joke",
	},
	{
		gateway: {
			id: "my-gateway",
		},
	},
);
```

此外，您可以使用以下代码访问最新的请求日志 ID：

```typescript
const myLogId = env.AI.aiGatewayLogId;
```

## 3. 访问网关绑定

您可以使用以下代码访问您的 AI 网关绑定：

```typescript
const gateway = env.AI.gateway("my-gateway");
```

一旦您有了网关实例，您就可以使用以下方法：

### 3.1. `patchLog`：发送反馈

`patchLog` 方法允许您为特定的日志 ID 发送反馈、评分和元数据。所有对象属性都是可选的，因此您可以包含任何参数组合：

```typescript
gateway.patchLog("my-log-id", {
	feedback: 1,
	score: 100,
	metadata: {
		user: "123",
	},
});
```

- **返回**：`Promise<void>`（确保 `await` 请求。）
- **用例示例**：使用用户反馈或附加元数据更新日志条目。

### 3.2. `getLog`：读取日志详情

`getLog` 方法检索特定日志 ID 的详细信息。它返回一个 `Promise<AiGatewayLog>` 类型的对象。如果此类型缺失，请确保您已运行 [`wrangler types`](/workers/languages/typescript/#generate-types)。

```typescript
const log = await gateway.getLog("my-log-id");
```

- **返回**：`Promise<AiGatewayLog>`
- **用例示例**：检索日志信息以进行调试或分析。

### 3.3. `getUrl`：获取网关 URL

`getUrl` 方法允许您检索 AI 网关的基本 URL，可选择指定一个提供商以获取特定于提供商的端点。

```typescript
// 获取基本网关 URL
const baseUrl = await gateway.getUrl();
// 输出: https://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/

// 获取特定于提供商的 URL
const openaiUrl = await gateway.getUrl("openai");
// 输出: https://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/openai
```

- **参数**：可选的 `provider`（字符串或 `AIGatewayProviders` 枚举）
- **返回**：`Promise<string>`
- **用例示例**：动态构建用于直接 API 调用或调试配置的 URL。

#### SDK 集成示例

`getUrl` 方法对于与流行的 AI SDK 集成特别有用：

**OpenAI SDK：**

```typescript
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: "my api key", // 默认为 process.env["OPENAI_API_KEY"]
	baseURL: await env.AI.gateway("my-gateway").getUrl("openai"),
});
```

**Vercel AI SDK 与 OpenAI：**

```typescript
import { createOpenAI } from "@ai-sdk/openai";

const openai = createOpenAI({
	baseURL: await env.AI.gateway("my-gateway").getUrl("openai"),
});
```

**Vercel AI SDK 与 Anthropic：**

```typescript
import { createAnthropic } from "@ai-sdk/anthropic";

const anthropic = createAnthropic({
	baseURL: await env.AI.gateway("my-gateway").getUrl("anthropic"),
});
```

### 3.4. `run`：通用请求

`run` 方法允许您执行通用请求。用户可以传递单个通用请求对象或其数组。此方法支持所有 AI 网关提供商。

有关可用输入的详细信息，请参阅[通用端点文档](/ai-gateway/universal/)。

```typescript
const resp = await gateway.run({
	provider: "workers-ai",
	endpoint: "@cf/meta/llama-3.1-8b-instruct",
	headers: {
		authorization: "Bearer my-api-token",
	},
	query: {
		prompt: "tell me a joke",
	},
});
```

- **返回**：`Promise<Response>`
- **用例示例**：向任何支持的提供商执行[通用请求](/ai-gateway/universal/)。

## 结论

通过这些 AI 网关绑定方法，您现在可以：

- 使用 `patchLog` 发送反馈和更新元数据。
- 使用 `getLog` 检索详细的日志信息。
- 使用 `getUrl` 获取用于直接 API 访问的网关 URL，从而轻松与流行的 AI SDK 集成。
- 使用 `run` 执行到任何 AI 网关提供商的通用请求。

这些方法为您的 AI 集成提供了更大的灵活性和控制力，使您能够在 Cloudflare Workers 平台上构建更复杂的应用程序。

---

# 功能

URL: https://developers.cloudflare.com/workers-ai/features/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# JSON 模式

URL: https://developers.cloudflare.com/workers-ai/features/json-mode/

import { Code } from "~/components";

export const jsonModeSchema = `{
  response_format: {
    title: "JSON 模式",
    type: "object",
    properties: {
      type: {
        type: "string",
        enum: ["json_object", "json_schema"],
      },
      json_schema: {},
    }
  }
}`;

export const jsonModeRequestExample = `{
  "messages": [
    {
      "role": "system",
      "content": "提取有关国家的数据。"
    },
    {
      "role": "user",
      "content": "告诉我关于印度的信息。"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "capital": {
          "type": "string"
        },
        "languages": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required": [
        "name",
        "capital",
        "languages"
      ]
    }
  }
}`;

export const jsonModeResponseExample = `{
  "response": {
    "name": "印度",
    "capital": "新德里",
    "languages": [
      "印地语",
      "英语",
      "孟加拉语",
      "泰卢固语",
      "马拉地语",
      "泰米尔语",
      "古吉拉特语",
      "乌尔都语",
      "卡纳达语",
      "奥里亚语",
      "马拉雅拉姆语",
      "旁遮普语",
      "梵语"
    ]
  }
}`;

当我们希望文本生成 AI 模型以编程方式与数据库、服务和外部系统交互时，通常在使用工具调用或构建 AI 代理时，我们必须使用结构化的响应格式而不是自然语言。

Workers AI 支持 JSON 模式，使应用程序能够在与 AI 模型交互时请求结构化的输出响应。

## 架构

JSON 模式与 OpenAI 的实现兼容；要启用，请使用以下约定将 `response_format` 属性添加到请求对象中：

<Code code={jsonModeSchema} lang="json" />

其中 `json_schema` 必须是有效的 [JSON 模式](https://json-schema.org/) 声明。

## JSON 模式示例

使用 JSON 格式时，请将架构作为请求的一部分传递给 LLM，如下例所示。

<Code code={jsonModeRequestExample} lang="json" />

LLM 将遵循该架构，并返回如下所示的响应：

<Code code={jsonModeResponseExample} lang="json" />

如您所见，模型正在遵守请求中的 JSON 架构定义，并以经过验证的 JSON 对象进行响应。

## 支持的模型

以下是现在支持 JSON 模式的模型列表：

- [@cf/meta/llama-3.1-8b-instruct-fast](/workers-ai/models/llama-3.1-8b-instruct-fast/)
- [@cf/meta/llama-3.1-70b-instruct](/workers-ai/models/llama-3.1-70b-instruct/)
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/meta/llama-3-8b-instruct](/workers-ai/models/llama-3-8b-instruct/)
- [@cf/meta/llama-3.1-8b-instruct](/workers-ai/models/llama-3.1-8b-instruct/)
- [@cf/meta/llama-3.2-11b-vision-instruct](/workers-ai/models/llama-3.2-11b-vision-instruct/)
- [@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/)
- [@hf/thebloke/deepseek-coder-6.7b-instruct-awq](/workers-ai/models/deepseek-coder-6.7b-instruct-awq/)
- [@cf/deepseek-ai/deepseek-r1-distill-qwen-32b](/workers-ai/models/deepseek-r1-distill-qwen-32b/)

我们将继续扩展此列表，以跟上新的和被请求的模型。

请注意，Workers AI 不能保证模型会根据请求的 JSON 模式进行响应。根据任务的复杂性和 JSON 模式的充分性，模型在极端情况下可能无法满足请求。如果出现这种情况，则会返回错误 `JSON 模式无法满足`，并且必须进行处理。

JSON 模式目前不支持流式传输。

---

# 提示

URL: https://developers.cloudflare.com/workers-ai/features/prompting/

import { Code } from "~/components";

export const scopedExampleOne = `{
  messages: [
    { role: "system", content: "你是一个非常有趣的喜剧演员，你喜欢表情符号" },
    { role: "user", content: "给我讲个关于 Cloudflare 的笑话" },
  ],
};`;

export const scopedExampleTwo = `{
  messages: [
    { role: "system", content: "你是一个专业的计算机科学助理" },
    { role: "user", content: "WASM 是什么？" },
    { role: "assistant", content: "WASM (WebAssembly) 是一种二进制指令格式，旨在成为一个平台无关的格式" },
    { role: "user", content: "Python 能编译成 WASM 吗？" },
    { role: "assistant", content: "不，Python 不能直接编译成 WebAssembly" },
    { role: "user", content: "Rust 呢？" },
  ],
};`;

export const unscopedExampleOne = `{
  prompt: "给我讲个关于 Cloudflare 的笑话";
}`;

export const unscopedExampleTwo = `{
  prompt: "<s>[INST]喜剧演员[/INST]</s>\n[INST]给我讲个关于 Cloudflare 的笑话[/INST]",
  raw: true
};`;

从文本生成模型获得良好结果的一部分是正确地提出问题。LLM 通常使用特定的预定义模板进行训练，然后在进行推理任务时，应将这些模板与模型的标记器一起使用，以获得更好的结果。

使用 Workers AI 提示文本生成模型有两种方法：

:::note[重要]
我们建议对 LoRA 的推理使用无范围提示。
:::

### 有范围的提示

这是**推荐**的方法。通过有范围的提示，Workers AI 承担了了解和使用不同模型不同聊天模板的负担，并在构建提示和创建文本生成任务时为开发人员提供统一的界面。

有范围的提示是一系列消息。每条消息定义了两个键：角色和内容。

通常，角色可以是以下三个选项之一：

- <strong>system</strong> - 系统消息定义了 AI
  的个性。您可以使用它们来设置规则以及您期望 AI 的行为方式。
- <strong>user</strong> - 用户消息是您通过提供问题或对话来实际查询 AI 的地方。
- <strong>assistant</strong> - 助手消息向 AI
  暗示所需的输出格式。并非所有模型都支持此角色。

OpenAI 对他们如何在其 GPT 模型中使用这些角色有[很好的解释](https://platform.openai.com/docs/guides/text-generation#messages-and-roles)。尽管聊天模板是灵活的，但其他文本生成模型倾向于遵循相同的约定。

以下是使用系统和用户角色的有范围提示的输入示例：

<Code code={scopedExampleOne} lang="js" />

以下是在用户和助手之间进行多次迭代的聊天会话的更好示例。

<Code code={scopedExampleTwo} lang="js" />

请注意，不同的 LLM 使用不同的模板针对不同的用例进行训练。虽然 Workers AI 尽力通过统一的 API 向开发人员抽象每个 LLM 模板的细节，但您应始终参考模型文档以获取详细信息（我们在上表中提供了链接）。例如，像 Codellama 这样的指令模型经过微调以响应用户提供的指令，而聊天模型则期望以对话片段作为输入。

### 无范围的提示

您可以使用无范围的提示向模型发送单个问题，而无需担心提供任何上下文。Workers AI 会自动将您的 `prompt` 输入转换为合理的默认有范围提示，以便您获得最佳的预测结果。

<Code code={unscopedExampleOne} lang="js" />

您还可以使用无范围的提示来手动构建模型聊天模板。在这种情况下，您可以使用 raw 参数。以下是 [Mistral](https://docs.mistral.ai/models/#chat-template) 聊天模板提示的输入示例：

<Code code={unscopedExampleTwo} lang="js" />

---

# Markdown 转换

URL: https://developers.cloudflare.com/workers-ai/features/markdown-conversion/

import { Code, Type, MetaInfo, Details, Render } from "~/components";

[Markdown](https://en.wikipedia.org/wiki/Markdown) 对于训练和推理中的文本生成和大型语言模型 (LLM)至关重要，因为它可以提供结构化、语义化、人类和机器可读的输入。同样，Markdown 有助于对输入数据进行分块和结构化，以便在 RAG 的上下文中更好地检索和综合，其简单性和易于解析和呈现的特点使其成为 AI 代理的理想选择。

由于这些原因，文档转换在设计和开发 AI 应用程序时扮演着重要角色。Workers AI 提供了 `toMarkdown` 实用方法，开发人员可以从 [`env.AI`](/workers-ai/configuration/bindings/) 绑定或 REST API 中使用该方法，以便快速、轻松、方便地将多种格式的文档转换为 Markdown 语言并进行摘要。

## 方法和定义

### async env.AI.toMarkdown()

获取不同格式的文档列表并将其转换为 Markdown。

#### 参数

- <code>documents</code>: <Type text="array" /> - `toMarkdownDocument` 的数组。

#### 返回值

- <code>results</code>: <Type text="array" /> - `toMarkdownDocumentResult`
  的数组。

### `toMarkdownDocument` 定义

- `name` <Type text="string" />

  - 要转换的文档的名称。

- `blob` <Type text="Blob" />

  - 一个包含文档内容的新 [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob) 对象。

### `toMarkdownDocumentResult` 定义

- `name` <Type text="string" />

  - 转换后文档的名称。与输入名称匹配。

- `mimetype` <Type text="string" />

  - 文档检测到的 [mime 类型](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types/Common_types)。

- `tokens` <Type text="number" />

  - 转换后文档的估计令牌数。

- `data` <Type text="string" />

  - 转换后文档的内容，格式为 Markdown。

## 支持的格式

这是支持的格式列表。我们会不断添加新格式并更新此表。

<Render file="markdown-conversion-support" product="workers-ai" />

## 示例

在此示例中，我们从 R2 获取一个 PDF 文档和一张图片，并将它们都提供给 `env.AI.toMarkdown`。结果是一个转换后的文档列表。Workers AI 模型会自动用于检测和总结图像。

```typescript
import { Env } from "./env";

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext) {
		// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf
		const pdf = await env.R2.get("somatosensory.pdf");

		// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg
		const cat = await env.R2.get("cat.jpeg");

		return Response.json(
			await env.AI.toMarkdown([
				{
					name: "somatosensory.pdf",
					blob: new Blob([await pdf.arrayBuffer()], {
						type: "application/octet-stream",
					}),
				},
				{
					name: "cat.jpeg",
					blob: new Blob([await cat.arrayBuffer()], {
						type: "application/octet-stream",
					}),
				},
			]),
		);
	},
};
```

这是结果：

```json
[
	{
		"name": "somatosensory.pdf",
		"mimeType": "application/pdf",
		"format": "markdown",
		"tokens": 0,
		"data": "# somatosensory.pdf\n## Metadata\n- PDFFormatVersion=1.4\n- IsLinearized=false\n- IsAcroFormPresent=false\n- IsXFAPresent=false\n- IsCollectionPresent=false\n- IsSignaturesPresent=false\n- Producer=Prince 20150210 (www.princexml.com)\n- Title=Anatomy of the Somatosensory System\n\n## Contents\n### Page 1\nThis is a sample document to showcase..."
	},
	{
		"name": "cat.jpeg",
		"mimeType": "image/jpeg",
		"format": "markdown",
		"tokens": 0,
		"data": "这张图片是"不爽猫"的特写照片，这只猫以其独特的"不爽"表情和锐利的蓝眼睛而闻名。这只猫的脸是棕色的，鼻子上有一条白色的条纹，耳朵竖立着。它的皮毛是浅棕色的，脸部周围的颜色较深，鼻子和嘴巴是粉红色的。猫的眼睛是蓝色的，向下倾斜，使它看起来永远都是一副"不爽"的样子。背景是模糊的，但看起来是深棕色的。总的来说，这张图片是流行的网络迷因角色"不爽猫"的一个幽默而标志性的代表。猫的面部表情和姿势传达出一种不悦或烦恼的感觉，这使得它对许多人来说是一个既 relatable 又有趣的图片。"
	}
]
```

## REST API

除了 Workers AI [绑定](/workers-ai/configuration/bindings/)，您还可以使用 [REST API](/workers-ai/get-started/rest-api/)：

```bash
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \
  -H 'Authorization: Bearer {API_TOKEN}' \
	-F "files=@cat.jpeg" \
	-F "files=@somatosensory.pdf"
```

## 定价

`toMarkdown` 对于大多数格式转换是免费的。在某些情况下，例如图像转换，它可以使用 Workers AI 模型进行对象检测和摘要，如果超出 Workers AI 的免费配额限制，可能会产生额外费用。有关更多详细信息，请参阅[定价页面](/workers-ai/platform/pricing/)。

---

# 身份验证

URL: https://developers.cloudflare.com/ai-gateway/configuration/authentication/

在 AI 网关中使用已验证网关通过要求每个请求都包含有效的授权令牌来增加安全性。此功能在存储日志时特别有用，因为它可以防止未经授权的访问，并防范可能增加日志存储使用量并使您难以找到所需数据的无效请求。启用已验证网关后，只有具有正确令牌的请求才会被处理。

:::note
我们建议在选择使用 AI 网关存储日志时启用已验证网关。

如果启用了已验证网关但请求不包含所需的 `cf-aig-authorization` 标头，请求将失败。此设置确保只有经过验证的请求通过网关。要绕过 `cf-aig-authorization` 标头的需要，请确保禁用已验证网关。
:::

## 使用仪表板设置已验证网关

1. 转到您要启用身份验证的特定网关的设置。
2. 选择 **创建身份验证令牌** 以生成具有所需 `Run` 权限的自定义令牌。务必安全地保存此令牌，因为它不会再次显示。
3. 在对此网关的每个请求中包含带有您的 API 令牌的 `cf-aig-authorization` 标头。
4. 返回设置页面并开启已验证网关。

## 使用 OpenAI 的示例请求

```bash
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header 'cf-aig-authorization: Bearer {CF_AIG_TOKEN}' \
  --header 'Authorization: Bearer OPENAI_TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "What is Cloudflare?"}]}'
```

使用 OpenAI SDK：

```javascript
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: process.env.OPENAI_API_KEY,
	baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",
	defaultHeaders: {
		"cf-aig-authorization": `Bearer {token}`,
	},
});
```

## 使用 Vercel AI SDK 的示例请求

```javascript
import { createOpenAI } from "@ai-sdk/openai";

const openai = createOpenAI({
	baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",
	headers: {
		"cf-aig-authorization": `Bearer {token}`,
	},
});
```

## 预期行为

下表概述了基于身份验证设置和标头状态的网关行为：

| 身份验证设置 | 标头信息 | 网关状态   | 响应                   |
| ------------ | -------- | ---------- | ---------------------- |
| 开启         | 存在标头 | 已验证网关 | 请求成功               |
| 开启         | 无标头   | 错误       | 由于缺少授权而请求失败 |
| 关闭         | 存在标头 | 未验证网关 | 请求成功               |
| 关闭         | 无标头   | 未验证网关 | 请求成功               |

---

# 缓存

URL: https://developers.cloudflare.com/ai-gateway/configuration/caching/

import { TabItem, Tabs } from "~/components";

AI 网关可以缓存来自您的 AI 模型提供商的响应，为相同的请求直接从 Cloudflare 的缓存提供服务。

## 使用缓存的好处

- **减少延迟：** 通过避免对重复请求向源 AI 提供商进行往返，为用户提供更快的响应。
- **成本节省：** 最小化向您的 AI 提供商发出的付费请求数量，特别是对于频繁访问或非动态内容。
- **增加吞吐量：** 从您的 AI 提供商卸载重复请求，使其能够更高效地处理独特请求。

:::note

目前缓存仅支持文本和图像响应，并且仅适用于相同的请求。

此配置适用于具有有限提示选项的用例。例如，询问"我可以如何帮助您？"并让用户从有限的选项集中选择答案的支持机器人，在当前缓存配置下运行良好。
我们计划在未来为缓存添加语义搜索以提高缓存命中率。
:::

## 默认配置

<Tabs syncKey="dashPlusAPI"> <TabItem label="仪表板">

要在仪表板中设置默认缓存配置：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com/) 并选择您的账户。
2. 选择 **AI** > **AI 网关**。
3. 选择 **设置**。
4. 启用 **缓存响应**。
5. 将默认缓存更改为您偏好的任何值。

</TabItem> <TabItem label="API">

要使用 API 设置默认缓存配置：

1. [创建 API 令牌](/fundamentals/api/get-started/create-token/)，具有以下权限：

- `AI Gateway - Read`
- `AI Gateway - Edit`

2. 获取您的[账户 ID](/fundamentals/account/find-account-and-zone-ids/)。
3. 使用该 API 令牌和账户 ID，发送 [`POST` 请求](/api/resources/ai_gateway/methods/create/)创建新网关并包含 `cache_ttl` 的值。

</TabItem> </Tabs>

此缓存行为将统一应用于所有支持缓存的请求。如果您需要为特定请求修改缓存设置，您可以灵活地在每个请求的基础上覆盖此设置。

要检查响应是否来自缓存，**cf-aig-cache-status** 将被指定为 `HIT` 或 `MISS`。

## 每个请求的缓存

虽然您网关的默认缓存设置提供了良好的基线，但您可能需要更精细的控制。这些情况可能是数据新鲜度、具有不同生命周期的内容，或动态或个性化响应。

为了满足这些需求，AI 网关允许您使用特定的 HTTP 标头在每个请求的基础上覆盖默认缓存行为。这为您提供了为单个 API 调用优化缓存的精确性。

以下标头允许您定义此每个请求的缓存行为：

:::note

以下标头已更新为新名称，尽管旧标头仍将起作用。我们建议更新为新标头以确保未来兼容性：

`cf-cache-ttl` 现在是 `cf-aig-cache-ttl`

`cf-skip-cache` 现在是 `cf-aig-skip-cache`

:::

### 跳过缓存 (cf-aig-skip-cache)

跳过缓存是指绕过缓存并直接从原始提供商获取请求，而不使用任何缓存副本。

您可以使用标头 **cf-aig-skip-cache** 绕过请求的缓存版本。

例如，当向 OpenAI 提交请求时，以以下方式包含标头：

```bash title="跳过缓存的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header "Authorization: Bearer $TOKEN" \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-skip-cache: true' \
  --data ' {
   		 "model": "gpt-4o-mini",
   		 "messages": [
   			 {
   				 "role": "user",
   				 "content": "how to build a wooden spoon in 3 short steps? give as short as answer as possible"
   			 }
   		 ]
   	 }
'
```

### 缓存 TTL (cf-aig-cache-ttl)

缓存 TTL，或生存时间，是缓存请求在过期并从原始源刷新之前保持有效的持续时间。您可以使用 **cf-aig-cache-ttl** 以秒为单位设置所需的缓存持续时间。最小 TTL 是 60 秒，最大 TTL 是一个月。

例如，如果您设置一小时的 TTL，这意味着请求在缓存中保存一小时。在该小时内，相同的请求将从缓存提供服务而不是原始 API。一小时后，缓存过期，请求将转到原始 API 获取新响应，该响应将为下一小时重新填充缓存。

例如，当向 OpenAI 提交请求时，以以下方式包含标头：

```bash title="要缓存一小时的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header "Authorization: Bearer $TOKEN" \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-cache-ttl: 3600' \
  --data ' {
   		 "model": "gpt-4o-mini",
   		 "messages": [
   			 {
   				 "role": "user",
   				 "content": "how to build a wooden spoon in 3 short steps? give as short as answer as possible"
   			 }
   		 ]
   	 }
'
```

### 自定义缓存键 (cf-aig-cache-key)

自定义缓存键让您覆盖默认缓存键，以便精确设置任何资源的可缓存性设置。要覆盖默认缓存键，您可以使用标头 **cf-aig-cache-key**。

当您第一次使用 **cf-aig-cache-key** 标头时，您将收到来自提供商的响应。具有相同标头的后续请求将返回缓存的响应。如果使用了 **cf-aig-cache-ttl** 标头，响应将根据指定的缓存生存时间进行缓存。否则，响应将根据仪表板中的缓存设置进行缓存。如果网关未启用缓存，响应将默认缓存 5 分钟。

例如，当向 OpenAI 提交请求时，以以下方式包含标头：

```bash title="具有自定义缓存键的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header 'Authorization: Bearer {openai_token}' \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-cache-key: responseA' \
  --data ' {
   		 "model": "gpt-4o-mini",
   		 "messages": [
   			 {
   				 "role": "user",
   				 "content": "how to build a wooden spoon in 3 short steps? give as short as answer as possible"
   			 }
   		 ]
   	 }
'
```

:::caution[AI 网关缓存行为]
AI 网关中的缓存是易失性的。如果同时发送两个相同的请求，第一个请求可能无法及时缓存供第二个请求使用，这可能导致第二个请求从原始源检索数据。
:::

---

# 自定义成本

URL: https://developers.cloudflare.com/ai-gateway/configuration/custom-costs/

import { TabItem, Tabs } from "~/components";

AI 网关允许您在请求级别设置自定义成本。通过使用此功能，成本指标可以准确反映您的独特定价，覆盖默认或公共模型成本。

:::note[注意]

自定义成本仅适用于在其响应中传递令牌的请求。没有令牌信息的请求将不会计算成本。

:::

## 自定义成本

要为您的 API 请求添加自定义成本，请使用 `cf-aig-custom-cost` 标头。此标头使您能够为输入（发送的令牌）和输出（接收的令牌）指定每个令牌的成本。

- **per_token_in**：协商的输入令牌成本（每个令牌）。
- **per_token_out**：协商的输出令牌成本（每个令牌）。

您可以包含的小数位数没有限制，确保精确的成本计算，无论值有多小。

自定义成本将在日志中以下划线显示，使您可以轻松识别何时应用了自定义定价。

在此示例中，如果您的协商价格为每百万输入令牌 1 美元和每百万输出令牌 2 美元，请如下所示包含 `cf-aig-custom-cost` 标头。

```bash title="具有自定义成本的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header "Authorization: Bearer $TOKEN" \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-custom-cost: {"per_token_in":0.000001,"per_token_out":0.000002}' \
  --data ' {
        "model": "gpt-4o-mini",
        "messages": [
          {
            "role": "user",
            "content": "When is Cloudflare's Birthday Week?"
          }
        ]
      }'
```

:::note

如果响应从缓存提供（缓存命中），成本始终为 `0`，即使您指定了自定义成本。自定义成本仅在请求到达模型提供商时适用。
:::

---

# 回退

URL: https://developers.cloudflare.com/ai-gateway/configuration/fallbacks/

import { Render } from "~/components";

通过您的[通用端点](/ai-gateway/universal/)指定模型或提供商回退，以处理请求失败并确保可靠性。

Cloudflare 可以在响应[请求错误](#request-failures)或[预定的请求超时](/ai-gateway/configuration/request-handling#request-timeouts)时触发您的回退提供商。[响应标头 `cf-aig-step`](#response-headercf-aig-step) 指示哪个步骤成功处理了请求。

## 请求失败

默认情况下，如果模型请求返回错误，Cloudflare 会触发您的回退。

### 示例

在以下示例中，请求首先发送到 [Workers AI](/workers-ai/) 推理 API。如果请求失败，它会回退到 OpenAI。响应标头 `cf-aig-step` 指示哪个提供商成功处理了请求。

1. 向 Workers AI 推理 API 发送请求。
2. 如果该请求失败，继续发送到 OpenAI。

```mermaid
graph TD
    A[AI Gateway] --> B[Request to Workers AI Inference API]
    B -->|Success| C[Return Response]
    B -->|Failure| D[Request to OpenAI API]
    D --> E[Return Response]
```

<br />

您可以通过在数组中添加另一个对象来添加任意数量的回退。

<Render file="universal-gateway-example" />

## 响应标头(cf-aig-step)

在使用带有回退的[通用端点](/ai-gateway/universal/)时，响应标头 `cf-aig-step` 通过返回步骤编号指示哪个模型成功处理了请求。此标头提供了关于是否触发了回退以及哪个模型最终处理了响应的可见性。

- `cf-aig-step:0` – 成功使用了第一个（主要）模型。
- `cf-aig-step:1` – 请求回退到第二个模型。
- `cf-aig-step:2` – 请求回退到第三个模型。
- 后续步骤 – 每个回退将步骤编号递增 1。

---

# 自定义元数据

URL: https://developers.cloudflare.com/ai-gateway/configuration/custom-metadata/

AI 网关中的自定义元数据允许您使用用户 ID 或其他标识符标记请求，从而更好地跟踪和分析您的请求。元数据值可以是字符串、数字或布尔值，并将出现在您的日志中，使您可以轻松搜索和过滤数据。

## 主要功能

- **自定义标记**：向您的请求添加用户 ID、团队名称、测试指示器和其他相关信息。
- **增强日志记录**：元数据出现在您的日志中，允许详细检查和故障排除。
- **搜索和过滤**：使用元数据高效搜索和过滤已记录的请求。

:::note

AI 网关允许您每个请求传递最多五个自定义元数据条目。如果提供超过五个条目，只有前五个将被保存；额外的条目将被忽略。确保您的自定义元数据限制为五个条目，以避免未处理或丢失的数据。

:::

## 支持的元数据类型

- 字符串
- 数字
- 布尔值

:::note

不支持对象作为元数据值。

:::

## 实现

### 使用 cURL

要在使用 cURL 的请求中包含自定义元数据：

```bash
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header 'Authorization: Bearer {api_token}' \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-metadata: {"team": "AI", "user": 12345, "test":true}' \
  --data '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What should I eat for lunch?"}]}'
```

### 使用 SDK

要在使用 OpenAI SDK 的请求中包含自定义元数据：

```javascript
import OpenAI from "openai";

export default {
	async fetch(request, env, ctx) {
		const openai = new OpenAI({
			apiKey: env.OPENAI_API_KEY,
			baseURL:
				"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai",
		});

		try {
			const chatCompletion = await openai.chat.completions.create(
				{
					model: "gpt-4o",
					messages: [{ role: "user", content: "What should I eat for lunch?" }],
					max_tokens: 50,
				},
				{
					headers: {
						"cf-aig-metadata": JSON.stringify({
							user: "JaneDoe",
							team: 12345,
							test: true,
						}),
					},
				},
			);

			const response = chatCompletion.choices[0].message;
			return new Response(JSON.stringify(response));
		} catch (e) {
			console.log(e);
			return new Response(e);
		}
	},
};
```

### 使用绑定

要在使用[绑定](/workers/runtime-apis/bindings/)的请求中包含自定义元数据：

```javascript
export default {
	async fetch(request, env, ctx) {
		const aiResp = await env.AI.run(
			"@cf/mistral/mistral-7b-instruct-v0.1",
			{ prompt: "What should I eat for lunch?" },
			{
				gateway: {
					id: "gateway_id",
					metadata: { team: "AI", user: 12345, test: true },
				},
			},
		);

		return new Response(aiResp);
	},
};
```

---

# 配置

URL: https://developers.cloudflare.com/ai-gateway/configuration/

import { DirectoryListing } from "~/components";

使用多种选项和自定义设置配置您的 AI 网关。

<DirectoryListing />

---

# 管理网关

URL: https://developers.cloudflare.com/ai-gateway/configuration/manage-gateway/

import { Render } from "~/components";

您有多种不同的选项来管理 AI 网关。

## 创建网关

<Render file="create-gateway" />

## 编辑网关

<Render file="edit-gateway" />

:::note

有关可编辑设置的更多详细信息，请参阅[配置](/ai-gateway/configuration/)。

:::

## 删除网关

删除您的网关是永久性的，无法撤销。

<Render file="delete-gateway" />

---

# 速率限制

URL: https://developers.cloudflare.com/ai-gateway/configuration/rate-limiting/

import { TabItem, Tabs } from "~/components";

速率限制控制到达您应用的流量，可以防止昂贵的账单和可疑活动。

## 参数

您可以将速率限制定义为在特定时间范围内发送的请求数量。例如，您可以将应用限制为每 60 秒 100 个请求。

您还可以选择是否希望使用**固定**或**滑动**速率限制技术。使用速率限制时，我们允许在时间窗口内发送一定数量的请求。例如，如果是固定速率，窗口基于时间，因此在十分钟窗口内不会超过 `x` 个请求。如果是滑动速率，在过去十分钟内不会超过 `x` 个请求。

为了说明这一点，假设您从 12:00 开始，每十分钟限制十个请求。因此固定窗口是 12:00-12:10、12:10-12:20，以此类推。如果您在 12:09 发送十个请求，在 12:11 发送十个请求，在固定窗口策略中所有 20 个请求都会成功。但是，在滑动窗口策略中它们会失败，因为在过去十分钟内有超过十个请求。

## 处理速率限制

当您的请求超过允许的速率时，您将遇到速率限制。这意味着服务器将以 `429 Too Many Requests` 状态码响应，您的请求不会被处理。

## 默认配置

<Tabs syncKey="dashPlusAPI"> <TabItem label="仪表板">

要在仪表板中设置默认速率限制配置：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com/) 并选择您的账户。
2. 转到 **AI** > **AI 网关**。
3. 转到 **设置**。
4. 启用 **速率限制**。
5. 根据需要调整速率、时间周期和速率限制方法。

</TabItem> <TabItem label="API">

要使用 API 设置默认速率限制配置：

1. [创建 API 令牌](/fundamentals/api/get-started/create-token/)，具有以下权限：

- `AI Gateway - Read`
- `AI Gateway - Edit`

2. 获取您的[账户 ID](/fundamentals/account/find-account-and-zone-ids/)。
3. 使用该 API 令牌和账户 ID，发送 [`POST` 请求](/api/resources/ai_gateway/methods/create/)创建新网关并包含 `rate_limiting_interval`、`rate_limiting_limit` 和 `rate_limiting_technique` 的值。

</TabItem> </Tabs>

此速率限制行为将统一应用于该网关的所有请求。

---

# 分析

URL: https://developers.cloudflare.com/ai-gateway/observability/analytics/

import { Render, TabItem, Tabs } from "~/components";

您的 AI 网关仪表板显示请求、令牌、缓存、错误和成本的指标。您可以按时间过滤这些指标。
这些分析帮助您了解流量模式、令牌消耗以及
AI 提供商之间的潜在问题。您可以
查看以下分析：

- **请求**：跟踪 AI 网关处理的请求总数。
- **令牌使用量**：分析请求中的令牌消耗，深入了解使用模式。
- **成本**：了解使用不同 AI 提供商的相关成本，让您能够跟踪支出、管理预算和优化资源。
- **错误**：监控网关中的错误数量，帮助识别和排除问题。
- **缓存响应**：查看从缓存提供服务的响应百分比，这可以帮助降低成本并提高速度。

## 查看分析

<Tabs> <TabItem label="仪表板">

<Render file="analytics-dashboard" />

</TabItem> <TabItem label="graphql">

您可以使用 GraphQL 在 AI 网关仪表板之外查询您的使用数据。请参阅下面的示例查询。您需要在发出请求时使用您的 Cloudflare 令牌，并将 `{account_id}` 更改为匹配您的账户标签。

```bash title="请求"
curl https://api.cloudflare.com/client/v4/graphql \
  --header 'Authorization: Bearer TOKEN \
  --header 'Content-Type: application/json' \
  --data '{
    "query": "query{\n  viewer {\n	accounts(filter: { accountTag: \"{account_id}\" }) {\n	requests: aiGatewayRequestsAdaptiveGroups(\n    	limit: $limit\n    	filter: { datetimeHour_geq: $start, datetimeHour_leq: $end }\n    	orderBy: [datetimeMinute_ASC]\n  	) {\n    	count,\n    	dimensions {\n        	model,\n        	provider,\n        	gateway,\n        	ts: datetimeMinute\n    	}\n    	\n  	}\n    	\n	}\n  }\n}",
    "variables": {
   	 "limit": 1000,
   	 "start": "2023-09-01T10:00:00.000Z",
   	 "end": "2023-09-30T10:00:00.000Z",
   	 "orderBy": "date_ASC"
    }
}'
```

</TabItem> </Tabs>

---

# 请求处理

URL: https://developers.cloudflare.com/ai-gateway/configuration/request-handling/

import { Render, Aside } from "~/components";

您的 AI 网关支持不同的策略来处理对提供商的请求，这允许您有效管理 AI 交互并确保您的应用保持响应性和可靠性。

## 请求超时

请求超时允许您在提供商响应时间过长时触发回退或重试。

这些超时有助于：

- 通过防止用户等待响应时间过长来改善用户体验
- 通过检测无响应的提供商并触发回退选项来主动处理错误

请求超时可以在通用端点上设置，也可以直接在对任何提供商的请求上设置。

### 定义

超时以毫秒为单位设置。此外，超时基于响应的第一部分何时返回。只要响应的第一部分在指定的时间范围内返回 - 例如在流式传输响应时 - 您的网关将等待响应。

### 配置

#### 通用端点

如果在[通用端点](/ai-gateway/universal/)上设置，请求超时指定请求的超时持续时间并触发回退。

对于通用端点，通过在提供商特定的 `config` 对象中设置 `requestTimeout` 属性来配置超时值。每个提供商可以有不同的 `requestTimeout` 值进行精细自定义。

```bash title="提供商级别配置" {11-13} collapse={15-48}
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
	--header 'Content-Type: application/json' \
	--data '[
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-3.1-8b-instruct",
        "headers": {
            "Authorization": "Bearer {cloudflare_token}",
            "Content-Type": "application/json"
        },
        "config": {
            "requestTimeout": 1000
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "What is Cloudflare?"
                }
            ]
        }
    },
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
        "headers": {
            "Authorization": "Bearer {cloudflare_token}",
            "Content-Type": "application/json"
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "What is Cloudflare?"
                }
            ]
        },
				"config": {
            "requestTimeout": 3000
        },
    }
]'
```

#### 直接提供商

如果在[提供商](/ai-gateway/providers/)请求上设置，请求超时指定请求的超时持续时间，如果超过则返回错误。

对于提供商特定端点，通过添加 `cf-aig-request-timeout` 标头来配置超时值。

```bash title="提供商特定端点示例" {4}
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
 --header 'Authorization: Bearer {cf_api_token}' \
 --header 'Content-Type: application/json' \
 --header 'cf-aig-request-timeout: 5000'
 --data '{"prompt": "What is Cloudflare?"}'
```

---

## 请求重试

AI 网关还支持对失败请求的自动重试，最多五次重试尝试。

此功能提高了应用的弹性，确保您可以从临时问题中恢复，而无需手动干预。

请求超时可以在通用端点上设置，也可以直接在对任何提供商的请求上设置。

### 定义

使用请求重试，您可以调整三个属性的组合：

- 尝试次数（最多 5 次尝试）
- 重试前等待时间（以毫秒为单位，最多 5 秒）
- 退避方法（常量、线性或指数）

在最后一次重试尝试时，您的网关将等待直到请求完成，无论需要多长时间。

### 配置

#### 通用端点

如果在[通用端点](/ai-gateway/universal/)上设置，请求重试将在触发任何配置的回退之前自动重试失败的请求最多五次。

对于通用端点，在提供商特定的 `config` 中使用以下属性配置重试设置：

```json
config:{
	maxAttempts?: number;
	retryDelay?: number;
	backoff?: "constant" | "linear" | "exponential";
}
```

与[请求超时](/ai-gateway/configuration/request-handling/#universal-endpoint)一样，每个提供商可以有不同的重试设置进行精细自定义。

```bash title="提供商级别配置" {11-15} collapse={16-55}
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
	--header 'Content-Type: application/json' \
	--data '[
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-3.1-8b-instruct",
        "headers": {
            "Authorization": "Bearer {cloudflare_token}",
            "Content-Type": "application/json"
        },
        "config": {
            "maxAttempts": 2,
						"retryDelay": 1000,
						"backoff": "constant"
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "What is Cloudflare?"
                }
            ]
        }
    },
    {
        "provider": "workers-ai",
        "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
        "headers": {
            "Authorization": "Bearer {cloudflare_token}",
            "Content-Type": "application/json"
        },
        "query": {
            "messages": [
                {
                    "role": "system",
                    "content": "You are a friendly assistant"
                },
                {
                    "role": "user",
                    "content": "What is Cloudflare?"
                }
            ]
        },
				"config": {
            "maxAttempts": 4,
						"retryDelay": 1000,
						"backoff": "exponential"
        },
    }
]'
```

#### 直接提供商

如果在[提供商](/ai-gateway/universal/)请求上设置，请求重试将自动重试失败的请求最多五次。在最后一次重试尝试时，您的网关将等待直到请求完成，无论需要多长时间。

For a provider-specific endpoint, configure the retry settings by adding different header values:

- `cf-aig-max-attempts` (number)
- `cf-aig-retry-delay` (number)
- `cf-aig-backoff` ("constant" | "linear" | "exponential)

---

# 成本

URL: https://developers.cloudflare.com/ai-gateway/observability/costs/

成本指标仅适用于模型在其响应中返回令牌数据和模型名称的端点。

## 跟踪 AI 提供商的成本

AI 网关让您更容易监控和估算所有 AI 提供商基于令牌的成本。这可以帮助您：

- 了解和比较提供商之间的使用成本。
- 使用一致的指标监控趋势和估算支出。
- 应用自定义定价逻辑以匹配协商的费率。

:::note[注意]

成本指标是基于请求中发送和接收的令牌数量的**估算**。虽然此指标可以帮助您监控和预测成本趋势，但请参考您提供商的仪表板获取最**准确**的成本详情。

:::

:::caution[注意]

提供商可能会引入新模型或更改其定价。如果您注意到过时的成本数据或正在使用我们的成本跟踪尚不支持的模型，请[提交请求](https://forms.gle/8kRa73wRnvq7bxL48)

:::

## 自定义成本

AI 网关允许用户在特殊定价协议或协商费率下设置自定义成本。自定义成本可以在请求级别应用，应用时将覆盖默认或公共模型成本。
有关自定义成本配置的更多信息，请访问[自定义成本](/ai-gateway/configuration/custom-costs/)配置页面。

---

# 可观察性

URL: https://developers.cloudflare.com/ai-gateway/observability/

import { DirectoryListing } from "~/components";

可观察性是指为系统添加监控工具以收集指标和日志的实践，从而实现更好的监控、故障排除和应用优化。

<DirectoryListing />

---

# 使用 Workers AI 创建您的第一个 AI 网关

URL: https://developers.cloudflare.com/ai-gateway/tutorials/create-first-aig-workers/

import { Render } from "~/components";

本教程将指导您使用 Cloudflare 仪表板上的 Workers AI 创建您的第一个 AI 网关。目标受众是刚接触 AI 网关和 Workers AI 的初学者。创建一个 AI 网关可以使用户高效地管理和保护 AI 请求，从而使他们能够利用 AI 模型执行内容生成、数据处理或预测分析等任务，并具有增强的控制和性能。

## 注册和登录

1. **注册**：如果您没有 Cloudflare 帐户，请[注册](https://cloudflare.com/sign-up)。
2. **登录**：通过登录[Cloudflare 仪表板](https://dash.cloudflare.com/login)访问 Cloudflare 仪表板。

## 创建网关

然后，创建一个新的 AI 网关。

<Render file="create-gateway" />

## 连接您的 AI 提供商

1. 在 AI 网关部分，选择您创建的网关。
2. 选择 **Workers AI** 作为您的提供商，以设置一个特定于 Workers AI 的端点。
   您将收到一个用于发送请求的端点 URL。

## 配置您的 Workers AI

1. 在 Cloudflare 仪表板中转到 **AI** > **Workers AI**。
2. 选择 **使用 REST API** 并按照步骤创建并复制 API 令牌和账户 ID。
3. **向 Workers AI 发送请求**：使用提供的 API 端点。例如，您可以使用 curl 命令通过 API 运行模型。将 `{account_id}`、`{gateway_id}` 和 `{cf_api_token}` 替换为您的实际账户 ID 和 API 令牌：

   ```bash
   curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
   --header 'Authorization: Bearer {cf_api_token}' \
   --header 'Content-Type: application/json' \
   --data '{"prompt": "What is Cloudflare?"}'
   ```

预期的输出将类似于：

```bash
{"result":{"response":"I'd be happy to explain what Cloudflare is.\n\nCloudflare is a cloud-based service that provides a range of features to help protect and improve the performance, security, and reliability of websites, applications, and other online services. Think of it as a shield for your online presence!\n\nHere are some of the key things Cloudflare does:\n\n1. **Content Delivery Network (CDN)**: Cloudflare has a network of servers all over the world. When you visit a website that uses Cloudflare, your request is sent to the nearest server, which caches a copy of the website's content. This reduces the time it takes for the content to load, making your browsing experience faster.\n2. **DDoS Protection**: Cloudflare protects against Distributed Denial-of-Service (DDoS) attacks. This happens when a website is overwhelmed with traffic from multiple sources to make it unavailable. Cloudflare filters out this traffic, ensuring your site remains accessible.\n3. **Firewall**: Cloudflare acts as an additional layer of security, filtering out malicious traffic and hacking attempts, such as SQL injection or cross-site scripting (XSS) attacks.\n4. **SSL Encryption**: Cloudflare offers free SSL encryption, which secure sensitive information (like passwords, credit card numbers, and browsing data) with an HTTPS connection (the \"S\" stands for Secure).\n5. **Bot Protection**: Cloudflare has an AI-driven system that identifies and blocks bots trying to exploit vulnerabilities or scrape your content.\n6. **Analytics**: Cloudflare provides insights into website traffic, helping you understand your audience and make informed decisions.\n7. **Cybersecurity**: Cloudflare offers advanced security features, such as intrusion protection, DNS filtering, and Web Application Firewall (WAF) protection.\n\nOverall, Cloudflare helps protect against cyber threats, improves website performance, and enhances security for online businesses, bloggers, and individuals who need to establish a strong online presence.\n\nWould you like to know more about a specific aspect of Cloudflare?"},"success":true,"errors":[],"messages":[]}%
```

## 查看分析

监控您的 AI 网关以查看使用指标。

1. 在仪表板中转到 **AI** > **AI 网关**。
2. 选择您的网关以查看请求计数、令牌使用、缓存效率、错误和预估成本等指标。您还可以开启额外的配置，如日志记录和速率限制。

## 可选 - 后续步骤

要使用 Workers 构建更多内容，请参阅[教程](/workers/tutorials/)。

如果您有任何问题、需要帮助或想分享您的项目，请加入 [Discord](https://discord.cloudflare.com) 上的 Cloudflare 开发者社区，与其他开发者和 Cloudflare 团队联系。

---

# 部署通过 AI 网关连接到 OpenAI 的 Worker

URL: https://developers.cloudflare.com/ai-gateway/tutorials/deploy-aig-worker/

import { Render, PackageManagers } from "~/components";

在本教程中，您将学习如何部署一个通过 AI 网关调用 OpenAI 的 Worker。AI 网关通过更多的分析、缓存、速率限制和日志记录，帮助您更好地观察和控制您的 AI 应用程序。

本教程使用最新的 v4 OpenAI node 库，这是 2023 年 8 月发布的更新。

## 开始之前

所有教程都假设您已经完成了[入门指南](/workers/get-started/guide/)，该指南帮助您设置 Cloudflare Workers 帐户、[C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) 和 [Wrangler](/workers/wrangler/install-and-update/)。

## 1. 创建 AI 网关和 OpenAI API 密钥

在 Cloudflare 仪表板的 AI 网关页面上，通过单击右上角的加号按钮创建一个新的 AI 网关。您应该能够命名网关和端点。单击 API 端点按钮以复制端点。您可以从特定于提供商的端点中选择，如 OpenAI、HuggingFace 和 Replicate。或者您可以使用接受特定模式并支持模型回退和重试的通用端点。

在本教程中，我们将使用 OpenAI 特定于提供商的端点，因此在下拉菜单中选择 OpenAI 并复制新的端点。

本教程还需要一个 OpenAI 帐户和 API 密钥。如果您没有，请创建一个新的 OpenAI 帐户并创建一个 API 密钥以继续本教程。请务必将您的 API 密钥存放在安全的地方，以便以后使用。

## 2. 创建一个新的 Worker

在命令行中创建一个 Worker 项目：

<PackageManagers type="create" pkg="cloudflare@latest" args={"openai-aig"} />

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "JavaScript",
	}}
/>

转到您新的 open Worker 项目：

```sh title="打开您的新项目目录"
cd openai-aig
```

在您新的 openai-aig 目录中，找到并打开 `src/index.js` 文件。在本教程的大部分时间里，您将配置此文件。

最初，您生成的 `index.js` 文件应如下所示：

```js
export default {
	async fetch(request, env, ctx) {
		return new Response("Hello World!");
	},
};
```

## 3. 在您的 Worker 中配置 OpenAI

创建 Worker 项目后，我们可以学习如何向 OpenAI 发出第一个请求。您将使用 OpenAI node 库与 OpenAI API 进行交互。使用 `npm` 安装 OpenAI node 库：

<PackageManagers pkg="openai" />

在您的 `src/index.js` 文件中，在 `export default` 上方添加 `openai` 的导入：

```js
import OpenAI from "openai";
```

在您的 `fetch` 函数中，设置配置并使用您创建的 AI 网关端点实例化您的 `OpenAIApi` 客户端：

```js null {5-8}
import OpenAI from "openai";

export default {
	async fetch(request, env, ctx) {
		const openai = new OpenAI({
			apiKey: env.OPENAI_API_KEY,
			baseURL:
				"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai", // 在此处粘贴您的 AI 网关端点
		});
	},
};
```

要使其正常工作，您需要使用 [`wrangler secret put`](/workers/wrangler/commands/#put) 来设置您的 `OPENAI_API_KEY`。这会将 API 密钥保存到您的环境中，以便您的 Worker 在部署时可以访问它。此密钥是您之前在 OpenAI 仪表板中创建的 API 密钥：

<PackageManagers type="exec" pkg="wrangler" args="secret put OPENAI_API_KEY" />

为了在本地开发中使其正常工作，请在您的 Worker 项目中创建一个新文件 `.dev.vars` 并添加此行。请确保将 `OPENAI_API_KEY` 替换为您自己的 OpenAI API 密钥：

```txt title="在本地保存您的 API 密钥"
OPENAI_API_KEY = "<YOUR_OPENAI_API_KEY_HERE>"
```

## 4. 发出 OpenAI 请求

现在我们可以向 OpenAI [聊天完成 API](https://platform.openai.com/docs/guides/gpt/chat-completions-api) 发出请求。

您可以指定您想要的模型、角色和提示，以及您希望在总请求中使用的最大令牌数。

```js null {10-22}
import OpenAI from "openai";

export default {
	async fetch(request, env, ctx) {
		const openai = new OpenAI({
			apiKey: env.OPENAI_API_KEY,
			baseURL:
				"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai",
		});

		try {
			const chatCompletion = await openai.chat.completions.create({
				model: "gpt-4o-mini",
				messages: [{ role: "user", content: "What is a neuron?" }],
				max_tokens: 100,
			});

			const response = chatCompletion.choices[0].message;

			return new Response(JSON.stringify(response));
		} catch (e) {
			return new Response(e);
		}
	},
};
```

## 5. 部署您的 Worker 应用程序

要部署您的应用程序，请运行 `npx wrangler deploy` 命令来部署您的 Worker 应用程序：

<PackageManagers type="exec" pkg="wrangler" args="deploy" />

您现在可以在 \<YOUR_WORKER>.\<YOUR_SUBDOMAIN>.workers.dev 上预览您的 Worker。

## 6. 查看您的 AI 网关

当您在 Cloudflare 仪表板中转到 AI 网关时，您应该会看到您最近的请求被记录下来。您还可以[调整您的设置](/ai-gateway/configuration/)来管理您的日志、缓存和速率限制设置。

---

# 教程

URL: https://developers.cloudflare.com/ai-gateway/tutorials/

import { GlossaryTooltip, ListTutorials, YouTubeVideos } from "~/components";

查看<GlossaryTooltip term="tutorial">教程</GlossaryTooltip>以帮助您开始使用 AI 网关。

## 文档

<ListTutorials />

## 视频

<YouTubeVideos products={["AI Gateway"]} />

---

# 审计日志

URL: https://developers.cloudflare.com/ai-gateway/reference/audit-logs/

[审计日志](/fundamentals/account/account-security/review-audit-logs/)提供您的 Cloudflare 账户内所做更改的全面摘要，包括对 AI 网关中的网关所做的更改。此功能在所有计划类型中均可用，免费提供，并且默认启用。

## 查看审计日志

要查看 AI 网关的审计日志：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com/login) 并选择您的账户。
2. 转到**管理账户** > **审计日志**。

有关如何访问和使用审计日志的更多信息，请参阅[查看审计日志文档](/fundamentals/account/account-security/review-audit-logs/)。

## 记录的操作

以下配置操作会被记录：

| 操作            | 描述           |
| --------------- | -------------- |
| gateway created | 创建新网关。   |
| gateway deleted | 删除现有网关。 |
| gateway updated | 编辑现有网关。 |

## 示例日志条目

以下是显示创建新网关的审计日志条目示例：

```json
{
	"action": {
		"info": "gateway created",
		"result": true,
		"type": "create"
	},
	"actor": {
		"email": "<ACTOR_EMAIL>",
		"id": "3f7b730e625b975bc1231234cfbec091",
		"ip": "fe32:43ed:12b5:526::1d2:13",
		"type": "user"
	},
	"id": "5eaeb6be-1234-406a-87ab-1971adc1234c",
	"interface": "UI",
	"metadata": {},
	"newValue": "",
	"newValueJson": {
		"cache_invalidate_on_update": false,
		"cache_ttl": 0,
		"collect_logs": true,
		"id": "test",
		"rate_limiting_interval": 0,
		"rate_limiting_limit": 0,
		"rate_limiting_technique": "fixed"
	},
	"oldValue": "",
	"oldValueJson": {},
	"owner": {
		"id": "1234d848c0b9e484dfc37ec392b5fa8a"
	},
	"resource": {
		"id": "89303df8-1234-4cfa-a0f8-0bd848e831ca",
		"type": "ai_gateway.gateway"
	},
	"when": "2024-07-17T14:06:11.425Z"
}
```

---

# 限制

URL: https://developers.cloudflare.com/ai-gateway/reference/limits/

import { Render } from "~/components";

以下限制适用于 Cloudflare 平台中的网关配置、日志和相关功能。

| 功能                                                                      | 限制                            |
| ------------------------------------------------------------------------- | ------------------------------- |
| [可缓存请求大小](/ai-gateway/configuration/caching/)                      | 每个请求 25 MB                  |
| [缓存 TTL](/ai-gateway/configuration/caching/#cache-ttl-cf-aig-cache-ttl) | 1 个月                          |
| [自定义元数据](/ai-gateway/configuration/custom-metadata/)                | 每个请求 5 个条目               |
| [数据集](/ai-gateway/evaluations/set-up-evaluations/)                     | 每个网关 10 个                  |
| 网关免费计划                                                              | 每个账户 10 个                  |
| 网关付费计划                                                              | 每个账户 20 个                  |
| 网关名称长度                                                              | 64 个字符                       |
| 日志存储速率限制                                                          | 每个网关每秒 500 条日志         |
| 日志存储[付费计划](/ai-gateway/reference/pricing/)                        | 每个网关 1000 万条 <sup>1</sup> |
| 日志存储[免费计划](/ai-gateway/reference/pricing/)                        | 每个账户 10 万条 <sup>2</sup>   |
| [日志大小存储](/ai-gateway/observability/logging/)                        | 每条日志 10 MB <sup>3</sup>     |
| [Logpush 作业](/ai-gateway/observability/logging/logpush/)                | 每个账户 4 个                   |
| [Logpush 大小限制](/ai-gateway/observability/logging/logpush/)            | 每条日志 1MB                    |

<sup>1</sup> 如果您已达到每个网关存储 1000 万条日志的限制，新日志
将停止保存。要继续保存日志，您必须删除该网关中的较旧日志以释放空间或创建新网关。请参阅[自动日志清理](/ai-gateway/observability/logging/#auto-log-cleanup)了解如何自动删除日志的更多详细信息。

<sup>2</sup> 如果您已达到所有网关中每个账户存储 10
万条日志的限制，新日志将停止保存。要继续保存日志，您必须删除较旧的日志。请参阅[自动日志清理](/ai-gateway/observability/logging/#auto-log-cleanup)了解如何自动删除日志的更多详细信息。

<sup>3</sup> 大于 10 MB 的日志将不会被存储。

<Render file="limits-increase" product="ai-gateway" />

---

# 平台

URL: https://developers.cloudflare.com/ai-gateway/reference/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 定价

URL: https://developers.cloudflare.com/ai-gateway/reference/pricing/

AI 网关在所有计划中都可以使用。

AI 网关目前可用的核心功能是免费提供的，只需要一个 Cloudflare 账户和一行代码即可[开始使用](/ai-gateway/get-started/)。核心功能包括：仪表板分析、缓存和速率限制。

我们将继续构建和扩展 AI 网关。一些新功能可能是免费的附加核心功能，而其他功能可能是高级计划的一部分。我们将在这些功能可用时宣布。

您可以在 AI 网关仪表板中监控您的使用情况。

## 持久日志

:::note[注意]

持久日志的计费尚未开始。付费计划用户可以在此期间存储超过每月 20 万条日志的免费配额而不会被收费。（免费计划用户仍受其计划的 10 万条日志上限限制。）我们将在开始对持久日志存储收费之前提供充分的提前通知。

:::

持久日志在所有计划中都可用，免费计划和付费计划都有免费配额。超出这些限制的额外日志费用基于每月存储的日志数量。

### 免费配额和超额定价

| 计划         | 免费日志存储     | 超额定价                     |
| ------------ | ---------------- | ---------------------------- |
| Workers 免费 | 总共 10 万条日志 | 不适用 - 升级到 Workers 付费 |
| Workers 付费 | 总共 20 万条日志 | 每月每存储 10 万条日志 $8    |

配额基于所有网关中存储的总日志数。有关管理或删除日志的指导，请参阅我们的[文档](/ai-gateway/observability/logging)。

例如，如果您是 Workers 付费计划用户，存储了 30 万条日志，您将为超出的 10 万条日志（30 万条总日志 - 20 万条免费日志）付费，费用为每月 $8。

## Logpush

Logpush 仅在 Workers 付费计划中可用。

|      | 付费计划                      |
| ---- | ----------------------------- |
| 请求 | 每月 1000 万次，+$0.05/百万次 |

## 细则

价格可能会变更。如果您是企业客户，请联系您的客户团队确认定价详细信息。

---

# 数据使用

URL: https://developers.cloudflare.com/workers-ai/platform/data-usage/

Cloudflare 为了提供 Workers AI 服务会处理某些客户数据，这受我们的[隐私政策](https://www.cloudflare.com/privacypolicy/)和[自助服务订阅协议](https://www.cloudflare.com/terms/)或[企业订阅协议](https://www.cloudflare.com/enterpriseterms/)（如适用）的约束。

Cloudflare 既不创建也不训练在 Workers AI 上可用的 AI 模型。这些模型构成第三方服务，并可能受您与模型提供商之间的开源或其他许可条款的约束。请务必查看适用于每个模型的许可条款（如有）。

您的输入（例如，文本提示、图像提交、音频文件等）、输出（例如，生成的文本/图像、翻译等）、嵌入和训练数据构成客户内容。

对于 Workers AI：

- 您拥有并对您的所有客户内容负责。
- Cloudflare 不会将您的客户内容提供给任何其他 Cloudflare 客户。
- Cloudflare 不会将您的客户内容用于 (1) 训练在 Workers AI 上可用的任何 AI 模型，或 (2) 改进任何 Cloudflare 或第三方服务，并且除非我们收到您的明确同意，否则不会这样做。
- 如果您特别将存储服务（例如，R2、KV、DO、Vectorize 等）与 Workers AI 结合使用，您的 Workers AI 客户内容可能会被 Cloudflare 存储。

---

# 错误

URL: https://developers.cloudflare.com/workers-ai/platform/errors/

以下是 Workers AI 错误的列表。

| **名称**               | **内部代码** | **HTTP 代码** | **描述**                                                                                            |
| ---------------------- | ------------ | ------------- | --------------------------------------------------------------------------------------------------- |
| 无此模型               | `5007`       | `400`         | 无此模型 `${model}` 或任务                                                                          |
| 无效数据               | `5004`       | `400`         | base64 输入的无效数据类型：`${type}`                                                                |
| Finetune 缺少必需文件  | `3039`       | `400`         | Finetune 缺少必需文件 `(model.safetensors and config.json) `                                        |
| 不完整的请求           | `3003`       | `400`         | 请求缺少标头或正文：`{what}`                                                                        |
| 账户不允许使用私有模型 | `5018`       | `403`         | 该账户不允许访问此模型                                                                              |
| 模型协议               | `5016`       | `403`         | 用户未同意 Llama3.2 模型条款                                                                        |
| 账户被阻止             | `3023`       | `403`         | 服务对账户不可用                                                                                    |
| 账户不允许使用私有模型 | `3041`       | `403`         | 该账户不允许访问此模型                                                                              |
| 已弃用的 SDK 版本      | `5019`       | `405`         | 请求尝试使用已弃用的 SDK 版本                                                                       |
| 不支持 LoRa            | `5005`       | `405`         | 模型 `${this.model}` 不支持 LoRa 推理                                                               |
| 无效的模型 ID          | `3042`       | `404`         | 模型名称无效                                                                                        |
| 请求过大               | `3006`       | `413`         | 请求过大                                                                                            |
| 超时                   | `3007`       | `408`         | 请求超时                                                                                            |
| 已中止                 | `3008`       | `408`         | 请求已中止                                                                                          |
| 账户受限               | `3036`       | `429`         | 您已用完每日 10,000 个神经元的免费配额。如果您想继续使用，请升级到 Cloudflare 的 Workers 付费计划。 |
| 容量不足               | `3040`       | `429`         | 没有更多的数据中心可以转发请求                                                                      |

---

# 术语表

URL: https://developers.cloudflare.com/workers-ai/platform/glossary/

import { Glossary } from "~/components";

查看 Cloudflare Workers AI 文档中使用的术语的定义。

<Glossary product="workers-ai" />

---

# 平台

URL: https://developers.cloudflare.com/workers-ai/platform/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 限制

URL: https://developers.cloudflare.com/workers-ai/platform/limits/

import { Render } from "~/components";

Workers AI 现已正式发布。我们更新了速率限制以反映这一点。

请注意，使用 Wrangler 在本地模式下进行的模型推理也将计入这些限制。在我们致力于性能和规模的同时，Beta 模型的速率限制可能会较低。

<Render file="custom_requirements" />

速率限制默认为每个任务类型，一些模型的限制定义如下：

## 按任务类型划分的速率限制

### [自动语音识别](/workers-ai/models/)

- 每分钟 720 个请求

### [图像分类](/workers-ai/models/)

- 每分钟 3000 个请求

### [图像到文本](/workers-ai/models/)

- 每分钟 720 个请求

### [对象检测](/workers-ai/models/)

- 每分钟 3000 个请求

### [摘要](/workers-ai/models/)

- 每分钟 1500 个请求

### [文本分类](/workers-ai/models/)

- 每分钟 2000 个请求

### [文本嵌入](/workers-ai/models/)

- 每分钟 3000 个请求
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) 为每分钟 1500 个请求

### [文本生成](/workers-ai/models/)

- 每分钟 300 个请求
- [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) 为每分钟 400 个请求
- [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) 为每分钟 720 个请求
- [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) 为每分钟 1500 个请求
- [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) 为每分钟 720 个请求
- [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) 为每分钟 150 个请求
- [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) 为每分钟 720 个请求

### [文本到图像](/workers-ai/models/)

- 每分钟 720 个请求
- [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) 为每分钟 1500 个请求

### [翻译](/workers-ai/models/)

- 每分钟 720 个请求

---

# 定价

URL: https://developers.cloudflare.com/workers-ai/platform/pricing/

:::note
Workers AI 更新了定价，使其更加细化，提供了基于每个模型单元的定价，但后端仍以神经元计费。
:::

Workers AI 包含在[免费和付费 Workers 计划](/workers/platform/pricing/)中，定价为**每 1,000 个神经元 $0.011**。

我们的免费配额允许任何人**每天免费使用总计 10,000 个神经元**。要每天使用超过 10,000 个神经元，您需要注册 [Workers 付费计划](/workers/platform/pricing/#workers)。在 Workers 付费计划中，任何超过每日 10,000 个神经元免费配额的使用量将按每 1,000 个神经元 $0.011 收费。

您可以在 [Cloudflare Workers AI 仪表板](https://dash.cloudflare.com/?to=/:account/ai/workers-ai)中监控您的神经元使用情况。

所有限制在每天 00:00 UTC 重置。如果您超过上述任何限制，进一步的操作将失败并显示错误。

|              | 免费 <br/> 配额      | 定价                           |
| ------------ | -------------------- | ------------------------------ |
| Workers 免费 | 每天 10,000 个神经元 | 不适用 - 升级到 Workers 付费版 |
| Workers 付费 | 每天 10,000 个神经元 | $0.011 / 1,000 个神经元        |

## 什么是神经元？

神经元是我们衡量不同模型 AI 输出的方式，代表执行您请求所需的 GPU 计算能力。我们的无服务器模型让您只需为使用的部分付费，而无需担心租用、管理或扩展 GPU。

:::note
“以令牌计价”列等同于“以神经元计价”列 - 显示不同的单位是为了让您轻松比较和理解定价。
:::

## LLM 模型定价

| 模型                                         | 以令牌计价                                        | 以神经元计价                                                       |
| -------------------------------------------- | ------------------------------------------------- | ------------------------------------------------------------------ |
| @cf/meta/llama-3.2-1b-instruct               | 每百万输入令牌 $0.027 <br/> 每百万输出令牌 $0.201 | 每百万输入令牌 2457 个神经元 <br/> 每百万输出令牌 18252 个神经元   |
| @cf/meta/llama-3.2-3b-instruct               | 每百万输入令牌 $0.051 <br/> 每百万输出令牌 $0.335 | 每百万输入令牌 4625 个神经元 <br/> 每百万输出令牌 30475 个神经元   |
| @cf/meta/llama-3.1-8b-instruct-fp8-fast      | 每百万输入令牌 $0.045 <br/> 每百万输出令牌 $0.384 | 每百万输入令牌 4119 个神经元 <br/> 每百万输出令牌 34868 个神经元   |
| @cf/meta/llama-3.2-11b-vision-instruct       | 每百万输入令牌 $0.049 <br/> 每百万输出令牌 $0.676 | 每百万输入令牌 4410 个神经元 <br/> 每百万输出令牌 61493 个神经元   |
| @cf/meta/llama-3.1-70b-instruct-fp8-fast     | 每百万输入令牌 $0.293 <br/> 每百万输出令牌 $2.253 | 每百万输入令牌 26668 个神经元 <br/> 每百万输出令牌 204805 个神经元 |
| @cf/meta/llama-3.3-70b-instruct-fp8-fast     | 每百万输入令牌 $0.293 <br/> 每百万输出令牌 $2.253 | 每百万输入令牌 26668 个神经元 <br/> 每百万输出令牌 204805 个神经元 |
| @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | 每百万输入令牌 $0.497 <br/> 每百万输出令牌 $4.881 | 每百万输入令牌 45170 个神经元 <br/> 每百万输出令牌 443756 个神经元 |
| @cf/mistral/mistral-7b-instruct-v0.1         | 每百万输入令牌 $0.110 <br/> 每百万输出令牌 $0.190 | 每百万输入令牌 10000 个神经元 <br/> 每百万输出令牌 17300 个神经元  |
| @cf/mistralai/mistral-small-3.1-24b-instruct | 每百万输入令牌 $0.351 <br/> 每百万输出令牌 $0.555 | 每百万输入令牌 31876 个神经元 <br/> 每百万输出令牌 50488 个神经元  |
| @cf/meta/llama-3.1-8b-instruct               | 每百万输入令牌 $0.282 <br/> 每百万输出令牌 $0.827 | 每百万输入令牌 25608 个神经元 <br/> 每百万输出令牌 75147 个神经元  |
| @cf/meta/llama-3.1-8b-instruct-fp8           | 每百万输入令牌 $0.152 <br/> 每百万输出令牌 $0.287 | 每百万输入令牌 13778 个神经元 <br/> 每百万输出令牌 26128 个神经元  |
| @cf/meta/llama-3.1-8b-instruct-awq           | 每百万输入令牌 $0.123 <br/> 每百万输出令牌 $0.266 | 每百万输入令牌 11161 个神经元 <br/> 每百万输出令牌 24215 个神经元  |
| @cf/meta/llama-3-8b-instruct                 | 每百万输入令牌 $0.282 <br/> 每百万输出令牌 $0.827 | 每百万输入令牌 25608 个神经元 <br/> 每百万输出令牌 75147 个神经元  |
| @cf/meta/llama-3-8b-instruct-awq             | 每百万输入令牌 $0.123 <br/> 每百万输出令牌 $0.266 | 每百万输入令牌 11161 个神经元 <br/> 每百万输出令牌 24215 个神经元  |
| @cf/meta/llama-2-7b-chat-fp16                | 每百万输入令牌 $0.556 <br/> 每百万输出令牌 $6.667 | 每百万输入令牌 50505 个神经元 <br/> 每百万输出令牌 606061 个神经元 |
| @cf/meta/llama-guard-3-8b                    | 每百万输入令牌 $0.484 <br/> 每百万输出令牌 $0.030 | 每百万输入令牌 44003 个神经元 <br/> 每百万输出令牌 2730 个神经元   |
| @cf/meta/llama-4-scout-17b-16e-instruct      | 每百万输入令牌 $0.270 <br/> 每百万输出令牌 $0.850 | 每百万输入令牌 24545 个神经元 <br/> 每百万输出令牌 77273 个神经元  |
| @cf/google/gemma-3-12b-it                    | 每百万输入令牌 $0.345 <br/> 每百万输出令牌 $0.556 | 每百万输入令牌 31371 个神经元 <br/> 每百万输出令牌 50560 个神经元  |
| @cf/qwen/qwq-32b                             | 每百万输入令牌 $0.660 <br/> 每百万输出令牌 $1.000 | 每百万输入令牌 60000 个神经元 <br/> 每百万输出令牌 90909 个神经元  |
| @cf/qwen/qwen2.5-coder-32b-instruct          | 每百万输入令牌 $0.660 <br/> 每百万输出令牌 $1.000 | 每百万输入令牌 60000 个神经元 <br/> 每百万输出令牌 90909 个神经元  |

## 嵌入模型定价

| 模型                       | 以令牌计价            | 以神经元计价                  |
| -------------------------- | --------------------- | ----------------------------- |
| @cf/baai/bge-small-en-v1.5 | 每百万输入令牌 $0.020 | 每百万输入令牌 1841 个神经元  |
| @cf/baai/bge-base-en-v1.5  | 每百万输入令牌 $0.067 | 每百万输入令牌 6058 个神经元  |
| @cf/baai/bge-large-en-v1.5 | 每百万输入令牌 $0.204 | 每百万输入令牌 18582 个神经元 |
| @cf/baai/bge-m3            | 每百万输入令牌 $0.012 | 每百万输入令牌 1075 个神经元  |

## 其他模型定价

| 模型                                  | 以令牌计价                                         | 以神经元计价                                                      |
| ------------------------------------- | -------------------------------------------------- | ----------------------------------------------------------------- |
| @cf/black-forest-labs/flux-1-schnell  | 每个 512x512 图块 $0.0000528 <br/> 每步 $0.0001056 | 每个 512x512 图块 4.80 个神经元 <br/> 每步 9.60 个神经元          |
| @cf/huggingface/distilbert-sst-2-int8 | 每百万输入令牌 $0.026                              | 每百万输入令牌 2394 个神经元                                      |
| @cf/baai/bge-reranker-base            | 每百万输入令牌 $0.003                              | 每百万输入令牌 283 个神经元                                       |
| @cf/meta/m2m100-1.2b                  | 每百万输入令牌 $0.342 <br/> 每百万输出令牌 $0.342  | 每百万输入令牌 31050 个神经元 <br/> 每百万输出令牌 31050 个神经元 |
| @cf/microsoft/resnet-50               | 每百万张图像 $2.51                                 | 每百万张图像 228055 个神经元                                      |
| @cf/openai/whisper                    | 每音频分钟 $0.0005                                 | 每音频分钟 41.14 个神经元                                         |
| @cf/openai/whisper-large-v3-turbo     | 每音频分钟 $0.0005                                 | 每音频分钟 46.63 个神经元                                         |
| @cf/myshell-ai/melotts                | 每音频分钟 $0.0002                                 | 每音频分钟 18.63 个神经元                                         |

---

# Anthropic

URL: https://developers.cloudflare.com/ai-gateway/providers/anthropic/

import { Render } from "~/components";

[Anthropic](https://www.anthropic.com/) 帮助构建可靠、可解释和可操控的 AI 系统。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/anthropic
```

## 前提条件

在向 Anthropic 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Anthropic API 令牌。
- 您要使用的 Anthropic 模型的名称。

## 示例

### cURL

```bash title="示例获取请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/anthropic/v1/messages \
 --header 'x-api-key: {anthropic_api_key}' \
 --header 'anthropic-version: 2023-06-01' \
 --header 'Content-Type: application/json' \
 --data  '{
    "model": "claude-3-opus-20240229",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is Cloudflare?"}
    ]
  }'
```

### 在 JavaScript 中使用 Anthropic SDK

如果您使用 `@anthropic-ai/sdk`，您可以这样设置您的端点：

```js title="JavaScript"
import Anthropic from "@anthropic-ai/sdk";

const apiKey = env.ANTHROPIC_API_KEY;
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/anthropic`;

const anthropic = new Anthropic({
	apiKey,
	baseURL,
});

const model = "claude-3-opus-20240229";
const messages = [{ role: "user", content: "What is Cloudflare?" }];
const maxTokens = 1024;

const message = await anthropic.messages.create({
	model,
	messages,
	max_tokens: maxTokens,
});
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Anthropic",
		jsonexample: `
{
	"model": "anthropic/{model}"
}`

    }}

/>

---

# Azure OpenAI

URL: https://developers.cloudflare.com/ai-gateway/providers/azureopenai/

[Azure OpenAI](https://azure.microsoft.com/en-gb/products/ai-services/openai-service/) 允许您在数据上应用自然语言算法。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/azure-openai/{resource_name}/{deployment_name}
```

## 前提条件

在向 Azure OpenAI 发出请求时，您需要：

- AI 网关账户 ID
- AI 网关网关名称
- Azure OpenAI API 密钥
- Azure OpenAI 资源名称
- Azure OpenAI 部署名称（也称为模型名称）

## URL 结构

您的新基础 URL 将使用上述数据的结构：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/azure-openai/{resource_name}/{deployment_name}`。然后，您可以在基础 URL 末尾附加您的端点和 api-version，如 `.../chat/completions?api-version=2023-05-15`。

## 示例

### cURL

```bash title="示例获取请求"
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/azure-openai/{resource_name}/{deployment_name}/chat/completions?api-version=2023-05-15' \
  --header 'Content-Type: application/json' \
  --header 'api-key: {azure_api_key}' \
  --data '{
  "messages": [
    {
      "role": "user",
      "content": "What is Cloudflare?"
    }
  ]
}'
```

### 在 JavaScript 中使用 `openai-node`

如果您使用 `openai-node` 库，您可以这样设置您的端点：

```js title="JavaScript"
import OpenAI from "openai";

const resource = "xxx";
const model = "xxx";
const apiVersion = "xxx";
const apiKey = env.AZURE_OPENAI_API_KEY;
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/azure-openai/${resource}/${model}`;

const azure_openai = new OpenAI({
	apiKey,
	baseURL,
	defaultQuery: { "api-version": apiVersion },
	defaultHeaders: { "api-key": apiKey },
});
```

---

# Amazon Bedrock

URL: https://developers.cloudflare.com/ai-gateway/providers/bedrock/

[Amazon Bedrock](https://aws.amazon.com/bedrock/) 允许您使用基础模型构建和扩展生成式 AI 应用程序。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/aws-bedrock`
```

## 前提条件

在向 Amazon Bedrock 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Amazon Bedrock API 令牌。
- 您要使用的 Amazon Bedrock 模型的名称。

## 发出请求

在向 Amazon Bedrock 发出请求时，将您当前使用的 URL 中的 `https://bedrock-runtime.us-east-1.amazonaws.com/` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/aws-bedrock/bedrock-runtime/us-east-1/`，然后在 URL 末尾添加您要运行的模型。

使用 Bedrock 时，您需要在向 AI 网关发出请求之前对 URL 进行签名。您可以尝试使用 [`aws4fetch`](https://github.com/mhart/aws4fetch) SDK。

## 示例

### 在 TypeScript 中使用 `aws4fetch` SDK

```typescript
import { AwsClient } from "aws4fetch";

interface Env {
	accessKey: string;
	secretAccessKey: string;
}

export default {
	async fetch(
		request: Request,
		env: Env,
		ctx: ExecutionContext,
	): Promise<Response> {
		// 替换为您的配置
		const cfAccountId = "{account_id}";
		const gatewayName = "{gateway_id}";
		const region = "us-east-1";

		// 添加为秘密 (https://developers.cloudflare.com/workers/configuration/secrets/)
		const accessKey = env.accessKey;
		const secretKey = env.secretAccessKey;

		const awsClient = new AwsClient({
			accessKeyId: accessKey,
			secretAccessKey: secretKey,
			region: region,
			service: "bedrock",
		});

		const requestBodyString = JSON.stringify({
			inputText: "What does ethereal mean?",
		});

		const stockUrl = new URL(
			`https://bedrock-runtime.${region}.amazonaws.com/model/amazon.titan-embed-text-v1/invoke`,
		);

		const headers = {
			"Content-Type": "application/json",
		};

		// 签名原始请求
		const presignedRequest = await awsClient.sign(stockUrl.toString(), {
			method: "POST",
			headers: headers,
			body: requestBodyString,
		});

		// 网关 URL
		const gatewayUrl = new URL(
			`https://gateway.ai.cloudflare.com/v1/${cfAccountId}/${gatewayName}/aws-bedrock/bedrock-runtime/${region}/model/amazon.titan-embed-text-v1/invoke`,
		);

		// 通过网关 URL 发出请求
		const response = await fetch(gatewayUrl, {
			method: "POST",
			headers: presignedRequest.headers,
			body: requestBodyString,
		});

		if (
			response.ok &&
			response.headers.get("content-type")?.includes("application/json")
		) {
			const data = await response.json();
			return new Response(JSON.stringify(data));
		}

		return new Response("Invalid response", { status: 500 });
	},
};
```

---

# Cartesia

URL: https://developers.cloudflare.com/ai-gateway/providers/cartesia/

[Cartesia](https://docs.cartesia.ai/) 提供具有可定制语音模型的高级文本转语音服务。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cartesia
```

## URL 结构

在向 Cartesia 发出请求时，请将您当前使用的 URL 中的 `https://api.cartesia.ai/v1` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cartesia`。

## 前提条件

在向 Cartesia 发出请求时，请确保您拥有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Cartesia API 令牌。
- 您要使用的 Cartesia 语音模型的模型 ID 和语音 ID。

## 示例

### cURL

```bash title="请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cartesia/tts/bytes \
  --header 'Content-Type: application/json' \
  --header 'Cartesia-Version: 2024-06-10' \
  --header 'X-API-Key: {cartesia_api_token}' \
  --data '{
    "transcript": "Welcome to Cloudflare - AI Gateway!",
    "model_id": "sonic-english",
    "voice": {
        "mode": "id",
        "id": "694f9389-aac1-45b6-b726-9d9369183238"
    },
    "output_format": {
        "container": "wav",
        "encoding": "pcm_f32le",
        "sample_rate": 44100
    }
}
```

---

# Cerebras

URL: https://developers.cloudflare.com/ai-gateway/providers/cerebras/

import { Render } from "~/components";

[Cerebras](https://inference-docs.cerebras.ai/) 为开发者提供 AI 模型推理的低延迟解决方案。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras-ai
```

## 前提条件

在向 Cerebras 发出请求时，请确保您拥有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Cerebras API 令牌。
- 您要使用的 Cerebras 模型的名称。

## 示例

### cURL

```bash title="示例 fetch 请求"
curl https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/cerebras/chat/completions \
 --header 'content-type: application/json' \
 --header 'Authorization: Bearer CEREBRAS_TOKEN' \
 --data '{
    "model": "llama3.1-8b",
    "messages": [
        {
            "role": "user",
            "content": "What is Cloudflare?"
        }
    ]
}'
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Cerebras",
		jsonexample: `
{
	"model": "cerebras/{model}"
}`

    }}

/>

---

# Cohere

URL: https://developers.cloudflare.com/ai-gateway/providers/cohere/

import { Render } from "~/components";

[Cohere](https://cohere.com/) 构建旨在解决现实业务挑战的 AI 模型。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cohere
```

## URL 结构

在向 [Cohere](https://cohere.com/) 发出请求时，将您当前使用的 URL 中的 `https://api.cohere.ai/v1` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cohere`。

## 前提条件

在向 Cohere 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Cohere API 令牌。
- 您要使用的 Cohere 模型的名称。

## 示例

### cURL

```bash title="请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cohere/v1/chat \
  --header 'Authorization: Token {cohere_api_token}' \
  --header 'Content-Type: application/json' \
  --data '{
  "chat_history": [
    {"role": "USER", "message": "Who discovered gravity?"},
    {"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
  ],
  "message": "What year was he born?",
  "connectors": [{"id": "web-search"}]
}'
```

### 在 Python 中使用 Cohere SDK

如果使用 [`cohere-python-sdk`](https://github.com/cohere-ai/cohere-python)，这样设置您的端点：

```js title="Python"

import cohere
import os

api_key = os.getenv('API_KEY')
account_id = '{account_id}'
gateway_id = '{gateway_id}'
base_url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cohere/v1"

co = cohere.Client(
  api_key=api_key,
  base_url=base_url,
)

message = "hello world!"
model = "command-r-plus"

chat = co.chat(
  message=message,
  model=model
)

print(chat)

```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Cohere",
		jsonexample: `
{
	"model": "cohere/{model}"
}`

    }}

/>

---

# DeepSeek

URL: https://developers.cloudflare.com/ai-gateway/providers/deepseek/

import { Render } from "~/components";

[DeepSeek](https://www.deepseek.com/) 帮助您使用 DeepSeek 的先进 AI 模型快速构建。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/deepseek
```

## 前提条件

在向 DeepSeek 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 DeepSeek AI API 令牌。
- 您要使用的 DeepSeek AI 模型的名称。

## URL 结构

您的新基础 URL 将使用上述数据的结构：

`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/deepseek/`。

然后您可以附加您要访问的端点，例如：`chat/completions`。

因此您的最终 URL 将组合为：

`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/deepseek/chat/completions`。

## 示例

### cURL

```bash title="示例获取请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/deepseek/chat/completions \
 --header 'content-type: application/json' \
 --header 'Authorization: Bearer DEEPSEEK_TOKEN' \
 --data '{
    "model": "deepseek-chat",
    "messages": [
        {
            "role": "user",
            "content": "What is Cloudflare?"
        }
    ]
}'
```

### 在 JavaScript 中使用 DeepSeek

如果您使用 OpenAI SDK，您可以这样设置您的端点：

```js title="JavaScript"
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: env.DEEPSEEK_TOKEN,
	baseURL:
		"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/deepseek",
});

try {
	const chatCompletion = await openai.chat.completions.create({
		model: "deepseek-chat",
		messages: [{ role: "user", content: "What is Cloudflare?" }],
	});

	const response = chatCompletion.choices[0].message;

	return new Response(JSON.stringify(response));
} catch (e) {
	return new Response(e);
}
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "DeepSeek",
		jsonexample: `
{
	"model": "deepseek/{model}"
}`

    }}

/>

---

# ElevenLabs

URL: https://developers.cloudflare.com/ai-gateway/providers/elevenlabs/

[ElevenLabs](https://elevenlabs.io/) 提供先进的文本转语音服务，支持多种语言的高质量语音合成。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/elevenlabs
```

## 前提条件

在向 ElevenLabs 发出请求时，请确保您拥有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 ElevenLabs API 令牌。
- 您要使用的 ElevenLabs 语音模型的模型 ID。

## 示例

### cURL

```bash title="请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/elevenlabs/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb?output_format=mp3_44100_128 \
  --header 'Content-Type: application/json' \
  --header 'xi-api-key: {elevenlabs_api_token}' \
  --data '{
    "text": "Welcome to Cloudflare - AI Gateway!",
    "model_id": "eleven_multilingual_v2"
}'
```

---

# Google AI Studio

URL: https://developers.cloudflare.com/ai-gateway/providers/google-ai-studio/

import { Render } from "~/components";

[Google AI Studio](https://ai.google.dev/aistudio) 帮助您使用 Google Gemini 模型快速构建。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-ai-studio
```

## 前提条件

在向 Google AI Studio 发出请求时，您需要：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Google AI Studio API 令牌。
- 您要使用的 Google AI Studio 模型的名称。

## URL 结构

您的新基础 URL 将使用上述数据的结构：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-ai-studio/`。

然后您可以附加您要访问的端点，例如：`v1/models/{model}:{generative_ai_rest_resource}`

因此您的最终 URL 将组合为：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-ai-studio/v1/models/{model}:{generative_ai_rest_resource}`。

## 示例

### cURL

```bash title="示例获取请求"
curl "https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_name}/google-ai-studio/v1/models/gemini-1.0-pro:generateContent" \
 --header 'content-type: application/json' \
 --header 'x-goog-api-key: {google_studio_api_key}' \
 --data '{
      "contents": [
          {
            "role":"user",
            "parts": [
              {"text":"What is Cloudflare?"}
            ]
          }
        ]
      }'
```

### 在 JavaScript 中使用 `@google/generative-ai`

如果您使用 `@google/generative-ai` 包，您可以这样设置您的端点：

```js title="JavaScript 示例"
import { GoogleGenerativeAI } from "@google/generative-ai";

const api_token = env.GOOGLE_AI_STUDIO_TOKEN;
const account_id = "";
const gateway_name = "";

const genAI = new GoogleGenerativeAI(api_token);
const model = genAI.getGenerativeModel(
	{ model: "gemini-1.5-flash" },
	{
		baseUrl: `https://gateway.ai.cloudflare.com/v1/${account_id}/${gateway_name}/google-ai-studio`,
	},
);

await model.generateContent(["What is Cloudflare?"]);
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Google AI Studio",
		jsonexample: `
{
	"model": "google-ai-studio/{model}"
}`

    }}

/>

---

# Grok

URL: https://developers.cloudflare.com/ai-gateway/providers/grok/

import { Render } from "~/components";

[Grok](https://docs.x.ai/docs#getting-started) 是一个通用模型，可用于各种任务，包括生成和理解文本、代码和函数调用。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok
```

## URL 结构

在向 [Grok](https://docs.x.ai/docs#getting-started) 发出请求时，将您当前使用的 URL 中的 `https://api.x.ai/v1` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok`。

## 前提条件

在向 Grok 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Grok API 令牌。
- 您要使用的 Grok 模型的名称。

## 示例

### cURL

```bash title="请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok/v1/chat/completions \
  --header 'content-type: application/json' \
  --header 'Authorization: Bearer {grok_api_token}' \
  --data '{
    "model": "grok-beta",
    "messages": [
        {
            "role": "user",
            "content": "What is Cloudflare?"
        }
    ]
}'
```

### 在 JavaScript 中使用 OpenAI SDK

如果您使用 JavaScript 中的 OpenAI SDK，您可以这样设置您的端点：

```js title="JavaScript"
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: "<api key>",
	baseURL:
		"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok",
});

const completion = await openai.chat.completions.create({
	model: "grok-beta",
	messages: [
		{
			role: "system",
			content:
				"You are Grok, a chatbot inspired by the Hitchhiker's Guide to the Galaxy.",
		},
		{
			role: "user",
			content: "What is the meaning of life, the universe, and everything?",
		},
	],
});

console.log(completion.choices[0].message);
```

### 在 Python 中使用 OpenAI SDK

如果您使用 Python 中的 OpenAI SDK，您可以这样设置您的端点：

```python title="Python"
import os
from openai import OpenAI

XAI_API_KEY = os.getenv("XAI_API_KEY")
client = OpenAI(
    api_key=XAI_API_KEY,
    base_url="https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok",
)

completion = client.chat.completions.create(
    model="grok-beta",
    messages=[
        {"role": "system", "content": "You are Grok, a chatbot inspired by the Hitchhiker's Guide to the Galaxy."},
        {"role": "user", "content": "What is the meaning of life, the universe, and everything?"},
    ],
)

print(completion.choices[0].message)
```

### 在 JavaScript 中使用 Anthropic SDK

如果您使用 JavaScript 中的 Anthropic SDK，您可以这样设置您的端点：

```js title="JavaScript"
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
	apiKey: "<api key>",
	baseURL:
		"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok",
});

const msg = await anthropic.messages.create({
	model: "grok-beta",
	max_tokens: 128,
	system:
		"You are Grok, a chatbot inspired by the Hitchhiker's Guide to the Galaxy.",
	messages: [
		{
			role: "user",
			content: "What is the meaning of life, the universe, and everything?",
		},
	],
});

console.log(msg);
```

### 在 Python 中使用 Anthropic SDK

如果您使用 Python 中的 Anthropic SDK，您可以这样设置您的端点：

```python title="Python"
import os
from anthropic import Anthropic

XAI_API_KEY = os.getenv("XAI_API_KEY")
client = Anthropic(
    api_key=XAI_API_KEY,
    base_url="https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/grok",
)

message = client.messages.create(
    model="grok-beta",
    max_tokens=128,
    system="You are Grok, a chatbot inspired by the Hitchhiker's Guide to the Galaxy.",
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life, the universe, and everything?",
        },
    ],
)

print(message.content)
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Grok",
		jsonexample: `
{
	"model": "grok/{model}"
}`

    }}

/>

---

# Groq

URL: https://developers.cloudflare.com/ai-gateway/providers/groq/

import { Render } from "~/components";

[Groq](https://groq.com/) 提供高速处理和低延迟性能。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/groq
```

## URL 结构

在向 [Groq](https://groq.com/) 发出请求时，将您当前使用的 URL 中的 `https://api.groq.com/openai/v1` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/groq`。

## 前提条件

在向 Groq 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Groq API 令牌。
- 您要使用的 Groq 模型的名称。

## 示例

### cURL

```bash title="示例获取请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/groq/chat/completions \
  --header 'Authorization: Bearer {groq_api_key}' \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "What is Cloudflare?"
      }
    ],
    "model": "llama3-8b-8192"
}'
```

### 在 JavaScript 中使用 Groq SDK

如果使用 [`groq-sdk`](https://www.npmjs.com/package/groq-sdk)，这样设置您的端点：

```js title="JavaScript"
import Groq from "groq-sdk";

const apiKey = env.GROQ_API_KEY;
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/groq`;

const groq = new Groq({
	apiKey,
	baseURL,
});

const messages = [{ role: "user", content: "What is Cloudflare?" }];
const model = "llama3-8b-8192";

const chatCompletion = await groq.chat.completions.create({
	messages,
	model,
});
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Groq",
		jsonexample: `
{
	"model": "groq/{model}"
}`

    }}

/>

---

# 模型提供商

URL: https://developers.cloudflare.com/ai-gateway/providers/

以下是我们支持的提供商快速列表：

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# Mistral AI

URL: https://developers.cloudflare.com/ai-gateway/providers/mistral/

import { Render } from "~/components";

[Mistral AI](https://mistral.ai) 帮助您使用 Mistral 的先进 AI 模型快速构建。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/mistral
```

## 前提条件

在向 Mistral AI 发出请求时，您需要：

- AI 网关账户 ID
- AI 网关网关名称
- Mistral AI API 令牌
- Mistral AI 模型名称

## URL 结构

您的新基础 URL 将使用上述数据的结构：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/mistral/`。

然后您可以附加您要访问的端点，例如：`v1/chat/completions`

因此您的最终 URL 将组合为：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/mistral/v1/chat/completions`。

## 示例

### cURL

```bash title="示例获取请求"
curl -X POST https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/mistral/v1/chat/completions \
 --header 'content-type: application/json' \
 --header 'Authorization: Bearer MISTRAL_TOKEN' \
 --data '{
    "model": "mistral-large-latest",
    "messages": [
        {
            "role": "user",
            "content": "What is Cloudflare?"
        }
    ]
}'
```

### 在 JavaScript 中使用 `@mistralai/mistralai` 包

如果您使用 `@mistralai/mistralai` 包，您可以这样设置您的端点：

```js title="JavaScript 示例"
import { Mistral } from "@mistralai/mistralai";

const client = new Mistral({
	apiKey: MISTRAL_TOKEN,
	serverURL: `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/mistral`,
});

await client.chat.create({
	model: "mistral-large-latest",
	messages: [
		{
			role: "user",
			content: "What is Cloudflare?",
		},
	],
});
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Mistral",
		jsonexample: `
{
	"model": "mistral/{model}"
}`

    }}

/>

---

# HuggingFace

URL: https://developers.cloudflare.com/ai-gateway/providers/huggingface/

[HuggingFace](https://huggingface.co/) 帮助用户构建、部署和训练机器学习模型。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/huggingface
```

## URL 结构

在向 HuggingFace 推理 API 发出请求时，将您当前使用的 URL 中的 `https://api-inference.huggingface.co/models/` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/huggingface`。请注意，您要访问的模型应该紧跟其后，例如 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/huggingface/bigcode/starcoder`。

## 前提条件

在向 HuggingFace 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 HuggingFace API 令牌。
- 您要使用的 HuggingFace 模型的名称。

## 示例

### cURL

```bash title="请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/huggingface/bigcode/starcoder \
  --header 'Authorization: Bearer {hf_api_token}' \
  --header 'Content-Type: application/json' \
  --data '{
    "inputs": "console.log"
}'
```

### 在 JavaScript 中使用 HuggingFace.js 库

如果您使用 HuggingFace.js 库，您可以这样设置您的推理端点：

```js title="JavaScript"
import { HfInferenceEndpoint } from "@huggingface/inference";

const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const model = "gpt2";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/huggingface/${model}`;
const apiToken = env.HF_API_TOKEN;

const hf = new HfInferenceEndpoint(baseURL, apiToken);
```

---

# OpenAI

URL: https://developers.cloudflare.com/ai-gateway/providers/openai/

[OpenAI](https://openai.com/about/) 帮助您使用 ChatGPT 进行构建。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai
```

### 聊天完成端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
```

### 响应端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/responses \
```

## URL 结构

在向 OpenAI 发出请求时，将您当前使用的 URL 中的 `https://api.openai.com/v1` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai`。

## 前提条件

在向 OpenAI 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 OpenAI API 令牌。
- 您要使用的 OpenAI 模型的名称。

## 聊天完成端点

### cURL 示例

```bash
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
--header 'Authorization: Bearer {openai_token}' \
--header 'Content-Type: application/json' \
--data '{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "user",
      "content": "What is Cloudflare?"
    }
  ]
}'
```

### JavaScript SDK 示例

```js
import OpenAI from "openai";

const apiKey = "my api key"; // 或 process.env["OPENAI_API_KEY"]
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`;

const openai = new OpenAI({
	apiKey,
	baseURL,
});

try {
	const model = "gpt-3.5-turbo-0613";
	const messages = [{ role: "user", content: "What is a neuron?" }];
	const maxTokens = 100;
	const chatCompletion = await openai.chat.completions.create({
		model,
		messages,
		max_tokens: maxTokens,
	});
	const response = chatCompletion.choices[0].message;
	console.log(response);
} catch (e) {
	console.error(e);
}
```

## OpenAI 响应端点

### cURL 示例

```bash
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/responses \
--header 'Authorization: Bearer {openai_token}' \
--header 'Content-Type: application/json' \
--data '{
  "model": "gpt-4.1",
  "input": [
    {
      "role": "user",
      "content": "Write a one-sentence bedtime story about a unicorn."
    }
  ]
}'
```

### JavaScript SDK 示例

```js
import OpenAI from "openai";

const apiKey = "my api key"; // 或 process.env["OPENAI_API_KEY"]
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/openai`;

const openai = new OpenAI({
	apiKey,
	baseURL,
});

try {
	const model = "gpt-4.1";
	const input = [
		{
			role: "user",
			content: "Write a one-sentence bedtime story about a unicorn.",
		},
	];
	const response = await openai.responses.create({
		model,
		input,
	});
	console.log(response.output_text);
} catch (e) {
	console.error(e);
}
```

---

# OpenRouter

URL: https://developers.cloudflare.com/ai-gateway/providers/openrouter/

[OpenRouter](https://openrouter.ai/) 是一个提供统一接口来访问和使用大型语言模型 (LLMs) 的平台。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openrouter
```

## URL 结构

在向 [OpenRouter](https://openrouter.ai/) 发出请求时，将您当前使用的 URL 中的 `https://openrouter.ai/api/v1/chat/completions` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openrouter`。

## 前提条件

在向 OpenRouter 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 OpenRouter API 令牌或来自原始模型提供商的令牌。
- 您要使用的 OpenRouter 模型的名称。

## 示例

### cURL

```bash title="请求"
curl -X POST https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openrouter/v1/chat/completions \
 --header 'content-type: application/json' \
 --header 'Authorization: Bearer OPENROUTER_TOKEN' \
 --data '{
    "model": "openai/gpt-3.5-turbo",
    "messages": [
        {
            "role": "user",
            "content": "What is Cloudflare?"
        }
    ]
}'

```

### 在 JavaScript 中使用 OpenAI SDK

如果您使用 JavaScript 中的 OpenAI SDK，您可以这样设置您的端点：

```js title="JavaScript"
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: env.OPENROUTER_TOKEN,
	baseURL:
		"https://gateway.ai.cloudflare.com/v1/ACCOUNT_TAG/GATEWAY/openrouter",
});

try {
	const chatCompletion = await openai.chat.completions.create({
		model: "openai/gpt-3.5-turbo",
		messages: [{ role: "user", content: "What is Cloudflare?" }],
	});

	const response = chatCompletion.choices[0].message;

	return new Response(JSON.stringify(response));
} catch (e) {
	return new Response(e);
}
```

---

# Perplexity

URL: https://developers.cloudflare.com/ai-gateway/providers/perplexity/

import { Render } from "~/components";

[Perplexity](https://www.perplexity.ai/) 是一个 AI 驱动的答案引擎。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/perplexity-ai
```

## 前提条件

在向 Perplexity 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Perplexity API 令牌。
- 您要使用的 Perplexity 模型的名称。

## 示例

### cURL

```bash title="示例获取请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/perplexity-ai/chat/completions \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'Authorization: Bearer {perplexity_token}' \
     --data '{
      "model": "mistral-7b-instruct",
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }'
```

### 在 JavaScript 中通过 OpenAI SDK 使用 Perplexity

Perplexity 没有自己的 SDK，但它们与 OpenAI SDK 兼容。您可以使用 OpenAI SDK 通过 AI 网关进行 Perplexity 调用，如下所示：

```js title="JavaScript"
import OpenAI from "openai";

const apiKey = env.PERPLEXITY_API_KEY;
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/perplexity-ai`;

const perplexity = new OpenAI({
	apiKey,
	baseURL,
});

const model = "mistral-7b-instruct";
const messages = [{ role: "user", content: "What is Cloudflare?" }];
const maxTokens = 20;

const chatCompletion = await perplexity.chat.completions.create({
	model,
	messages,
	max_tokens: maxTokens,
});
```

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Perplexity",
		jsonexample: `
{
	"model": "perplexity/{model}"
}`

    }}

/>

---

# Replicate

URL: https://developers.cloudflare.com/ai-gateway/providers/replicate/

[Replicate](https://replicate.com/) 运行和微调开源模型。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate
```

## URL 结构

在向 Replicate 发出请求时，将您当前使用的 URL 中的 `https://api.replicate.com/v1` 替换为 `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate`。

## 前提条件

在向 Replicate 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Replicate API 令牌。
- 您要使用的 Replicate 模型的名称。

## 示例

### cURL

```bash title="请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/replicate/predictions \
  --header 'Authorization: Token {replicate_api_token}' \
  --header 'Content-Type: application/json' \
  --data '{
    "input":
      {
        "prompt": "What is Cloudflare?"
      }
    }'
```

---

# Google Vertex AI

URL: https://developers.cloudflare.com/ai-gateway/providers/vertex/

[Google Vertex AI](https://cloud.google.com/vertex-ai) 使开发者能够轻松构建和部署企业级生成式 AI 体验。

以下是设置您的 Google Cloud 账户的快速指南：

1. Google Cloud Platform (GCP) 账户

   - 注册 [GCP 账户](https://cloud.google.com/vertex-ai)。新用户可能有资格获得积分（有效期 90 天）。

2. 启用 Vertex AI API

   - 导航到[启用 Vertex AI API](https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com) 并为您的项目激活 API。

3. 申请访问所需模型。

## 端点

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-vertex-ai
```

## 前提条件

在向 Google Vertex 发出请求时，您需要：

- AI 网关账户标签
- AI 网关网关名称
- Google Vertex API 密钥
- Google Vertex 项目名称
- Google Vertex 区域（例如，us-east4）
- Google Vertex 模型

## URL 结构

您的新基础 URL 将使用上述数据的结构：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-vertex-ai/v1/projects/{project_name}/locations/{region}`。

然后您可以附加您要访问的端点，例如：`/publishers/google/models/{model}:{generative_ai_rest_resource}`

因此您的最终 URL 将组合为：`https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-vertex-ai/v1/projects/{project_name}/locations/{region}/publishers/google/models/gemini-1.0-pro-001:generateContent`

## 示例

### cURL

```bash title="示例获取请求"
curl "https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/google-vertex-ai/v1/projects/{project_name}/locations/{region}/publishers/google/models/gemini-1.0-pro-001:generateContent" \
    -H "Authorization: Bearer {vertex_api_key}" \
    -H 'Content-Type: application/json' \
    -d '{
        "contents": {
          "role": "user",
          "parts": [
            {
              "text": "Tell me more about Cloudflare"
            }
          ]
        }'

```

---

# Workers AI

URL: https://developers.cloudflare.com/ai-gateway/providers/workersai/

import { Render } from "~/components";

使用 AI 网关对 [Workers AI](/workers-ai/) 的请求进行分析、缓存和安全控制。Workers AI 与 AI 网关无缝集成，允许您通过 API 请求或通过 Workers 脚本的环境绑定执行 AI 推理。绑定通过以最少的设置将请求路由到您的 AI 网关来简化过程。

## 前提条件

在向 Workers AI 发出请求时，确保您具有以下内容：

- 您的 AI 网关账户 ID。
- 您的 AI 网关网关名称。
- 一个有效的 Workers AI API 令牌。
- 您要使用的 Workers AI 模型的名称。

## REST API

要与 REST API 交互，请更新用于请求的 URL：

- **之前**：

```txt
https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model_id}
```

- **现在**：

```txt
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/{model_id}
```

对于这些参数：

- `{account_id}` 是您的 Cloudflare [账户 ID](/workers-ai/get-started/rest-api/#1-get-api-token-and-account-id)。
- `{gateway_id}` 指您现有的 [AI 网关](/ai-gateway/get-started/#create-gateway)的名称。
- `{model_id}` 指 [Workers AI 模型](/workers-ai/models/)的模型 ID。

## 示例

首先，生成一个具有 `Workers AI Read` 访问权限的 [API 令牌](/fundamentals/api/get-started/create-token/)并在您的请求中使用它。

```bash title="对 Workers AI llama 模型的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/meta/llama-3.1-8b-instruct \
 --header 'Authorization: Bearer {cf_api_token}' \
 --header 'Content-Type: application/json' \
 --data '{"prompt": "What is Cloudflare?"}'
```

```bash title="对 Workers AI 文本分类模型的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/@cf/huggingface/distilbert-sst-2-int8 \
  --header 'Authorization: Bearer {cf_api_token}' \
  --header 'Content-Type: application/json' \
  --data '{ "text": "Cloudflare docs are amazing!" }'
```

### OpenAI 兼容端点

<Render file="openai-compatibility" product="workers-ai" /> <br />

```bash title="对 OpenAI 兼容端点的请求"
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/workers-ai/v1/chat/completions \
 --header 'Authorization: Bearer {cf_api_token}' \
 --header 'Content-Type: application/json' \
 --data '{
      "model": "@cf/meta/llama-3.1-8b-instruct",
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
'
```

## Workers 绑定

您可以使用环境绑定将 Workers AI 与 AI 网关集成。要在您的 Worker 中包含 AI 网关，请在您的 Workers AI 请求中将网关添加为对象。

```ts
export interface Env {
	AI: Ai;
}

export default {
	async fetch(request: Request, env: Env): Promise<Response> {
		const response = await env.AI.run(
			"@cf/meta/llama-3.1-8b-instruct",
			{
				prompt: "Why should you use Cloudflare for your AI inference?",
			},
			{
				gateway: {
					id: "{gateway_id}",
					skipCache: false,
					cacheTtl: 3360,
				},
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

有关使用绑定将 Workers AI 与 AI 网关集成的详细分步指南，请参阅 [AI 网关中的集成](/ai-gateway/integrations/aig-workers-ai-binding/)。

Workers AI 支持以下 AI 网关参数：

- `id` string
  - 您现有的 [AI 网关](/ai-gateway/get-started/#create-gateway)的名称。必须与您的 Worker 在同一账户中。
- `skipCache` boolean（默认：false）
  - 控制请求是否应[跳过缓存](/ai-gateway/configuration/caching/#skip-cache-cf-aig-skip-cache)。
- `cacheTtl` number
  - 控制[缓存 TTL](/ai-gateway/configuration/caching/#cache-ttl-cf-aig-cache-ttl)。

<Render
	file="chat-completions-providers"
	product="ai-gateway"
	params={{
		name: "Workers AI",
		jsonexample: `
{
	"model": "workers-ai/{model}"
}`

    }}

/>

---

# WebSockets API

URL: https://developers.cloudflare.com/ai-gateway/websockets-api/

AI 网关 WebSockets API 为 AI 交互提供持久连接，消除重复握手并减少延迟。此 API 分为两类：

- **实时 API** - 专为通过 WebSockets 提供低延迟、多模态交互的 AI 提供商而设计。
- **非实时 API** - 支持 AI 提供商的标准 WebSocket 通信，包括那些本身不支持 WebSockets 的提供商。

## 何时使用 WebSockets

WebSockets 是长期存在的 TCP 连接，支持客户端和服务器之间的双向、实时和非实时通信。与需要为每个请求重复握手的 HTTP 连接不同，WebSockets 维护连接，支持持续数据交换并减少开销。WebSockets 非常适合需要低延迟、实时数据的应用程序，如语音助手。

## 主要优势

- **减少开销**：通过维护单一持久连接，避免重复握手和 TLS 协商的开销。
- **提供商兼容性**：与 AI 网关中的所有 AI 提供商兼容。即使您选择的提供商不支持 WebSockets，Cloudflare 也会为您处理，管理对您首选 AI 提供商的请求。

## 主要区别

| 功能              | 实时 API                                                                                                | 非实时 API                                                      |
| :---------------- | :------------------------------------------------------------------------------------------------------ | :-------------------------------------------------------------- |
| **目的**          | 为提供专用 WebSocket 端点的提供商启用实时、多模态 AI 交互。                                             | 支持本身不支持 WebSockets 的提供商的基于 WebSocket 的 AI 交互。 |
| **用例**          | 用于语音、视频和实时交互的流式响应。                                                                    | 基于文本的查询和响应，如 LLM 请求。                             |
| **AI 提供商支持** | [仅限于提供实时 WebSocket API 的提供商。](/ai-gateway/websockets-api/realtime-api/#supported-providers) | [AI 网关中的所有 AI 提供商。](/ai-gateway/providers/)           |
| **流式支持**      | 提供商本身支持实时数据流。                                                                              | AI 网关通过 WebSockets 处理流式传输。                           |

有关实现的详细信息，请参阅下一节：

- [实时 WebSockets API](/ai-gateway/websockets-api/realtime-api/)
- [非实时 WebSockets API](/ai-gateway/websockets-api/non-realtime-api/)

---

# 非实时 WebSockets API

URL: https://developers.cloudflare.com/ai-gateway/websockets-api/non-realtime-api/

非实时 WebSockets API 允许您为 AI 请求建立持久连接，而无需重复握手。这种方法非常适合不需要实时交互但仍能从减少的延迟和持续通信中受益的应用程序。

## 设置 WebSockets API

1. 生成具有相应 AI 网关运行权限的 AI 网关令牌，并选择使用经过身份验证的网关。
2. 通过将 `https://` 替换为 `wss://` 来修改您的通用端点 URL，以启动 WebSocket 连接：
   ```
   wss://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}
   ```
3. 打开一个使用具有 AI 网关运行权限的 Cloudflare 令牌进行身份验证的 WebSocket 连接。

:::note
或者，如果您使用的是浏览器 WebSocket，我们也支持通过 `sec-websocket-protocol` 标头进行身份验证。
:::

## 示例请求

```javascript
import WebSocket from "ws";

const ws = new WebSocket(
	"wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/",
	{
		headers: {
			"cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",
		},
	},
);

ws.send(
	JSON.stringify({
		type: "universal.create",
		request: {
			eventId: "my-request",
			provider: "workers-ai",
			endpoint: "@cf/meta/llama-3.1-8b-instruct",
			headers: {
				Authorization: "Bearer WORKERS_AI_TOKEN",
				"Content-Type": "application/json",
			},
			query: {
				prompt: "tell me a joke",
			},
		},
	}),
);

ws.on("message", function incoming(message) {
	console.log(message.toString());
});
```

## 示例响应

```json
{
	"type": "universal.created",
	"metadata": {
		"cacheStatus": "MISS",
		"eventId": "my-request",
		"logId": "01JC3R94FRD97JBCBX3S0ZAXKW",
		"step": "0",
		"contentType": "application/json"
	},
	"response": {
		"result": {
			"response": "Why was the math book sad? Because it had too many problems. Would you like to hear another one?"
		},
		"success": true,
		"errors": [],
		"messages": []
	}
}
```

## 示例流式请求

对于流式请求，AI 网关发送一个初始消息，其中包含请求元数据，指示流已开始：

```json
{
	"type": "universal.created",
	"metadata": {
		"cacheStatus": "MISS",
		"eventId": "my-request",
		"logId": "01JC40RB3NGBE5XFRZGBN07572",
		"step": "0",
		"contentType": "text/event-stream"
	}
}
```

在此初始消息之后，所有流式块在从推理提供商到达时都会实时中继到 WebSocket 连接。在这些流式块的元数据中仅包含 `eventId` 字段。`eventId` 允许 AI 网关在每个消息中包含一个客户端定义的 ID，即使在流式 WebSocket 环境中也是如此。

```json
{
	"type": "universal.stream",
	"metadata": {
		"eventId": "my-request"
	},
	"response": {
		"response": "would"
	}
}
```

一旦请求的所有块都已流式传输完毕，AI 网关会发送最后一条消息以表示请求完成。为了增加灵活性，此消息再次包含所有元数据，即使它最初是在流式处理开始时提供的。

```json
{
	"type": "universal.done",
	"metadata": {
		"cacheStatus": "MISS",
		"eventId": "my-request",
		"logId": "01JC40RB3NGBE5XFRZGBN07572",
		"step": "0",
		"contentType": "text/event-stream"
	}
}
```

---

# 实时 WebSockets API

URL: https://developers.cloudflare.com/ai-gateway/websockets-api/realtime-api/

一些 AI 提供商通过 WebSockets 支持实时、低延迟的交互。AI 网关允许与这些 API 无缝集成，支持文本、音频和视频等多模态交互。

## 支持的提供商

- [OpenAI](https://platform.openai.com/docs/guides/realtime-websocket)
- [Google AI Studio](https://ai.google.dev/gemini-api/docs/multimodal-live)
- [Cartesia](https://docs.cartesia.ai/api-reference/tts/tts)
- [ElevenLabs](https://elevenlabs.io/docs/conversational-ai/api-reference/conversational-ai/websocket)

## 身份验证

对于实时 WebSockets，可以使用以下方式进行身份验证：

- 标头（用于非浏览器环境）
- `sec-websocket-protocol`（用于浏览器）

## 示例

### OpenAI

```javascript
import WebSocket from "ws";

const url =
	"wss://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/openai?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
	headers: {
		"cf-aig-authorization": process.env.CLOUDFLARE_API_KEY,
		Authorization: "Bearer " + process.env.OPENAI_API_KEY,
		"OpenAI-Beta": "realtime=v1",
	},
});

ws.on("open", () => console.log("Connected to server."));
ws.on("message", (message) => console.log(JSON.parse(message.toString())));

ws.send(
	JSON.stringify({
		type: "response.create",
		response: { modalities: ["text"], instructions: "Tell me a joke" },
	}),
);
```

### Google AI Studio

```javascript
const ws = new WebSocket(
	"wss://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/google?api_key=<google_api_key>",
	["cf-aig-authorization.<cloudflare_token>"],
);

ws.on("open", () => console.log("Connected to server."));
ws.on("message", (message) => console.log(message.data));

ws.send(
	JSON.stringify({
		setup: {
			model: "models/gemini-2.0-flash-exp",
			generationConfig: { responseModalities: ["TEXT"] },
		},
	}),
);
```

### Cartesia

```javascript
const ws = new WebSocket(
	"wss://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/cartesia?cartesia_version=2024-06-10&api_key=<cartesia_api_key>",
	["cf-aig-authorization.<cloudflare_token>"],
);

ws.on("open", function open() {
	console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
	console.log(message.data);
});

ws.send(
	JSON.stringify({
		model_id: "sonic",
		transcript: "Hello, world! I'm generating audio on ",
		voice: { mode: "id", id: "a0e99841-438c-4a64-b679-ae501e7d6091" },
		language: "en",
		context_id: "happy-monkeys-fly",
		output_format: {
			container: "raw",
			encoding: "pcm_s16le",
			sample_rate: 8000,
		},
		add_timestamps: true,
		continue: true,
	}),
);
```

### ElevenLabs

```javascript
const ws = new WebSocket(
	"wss://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/elevenlabs?agent_id=<elevenlabs_agent_id>",
	[
		"xi-api-key.<elevenlabs_api_key>",
		"cf-aig-authorization.<cloudflare_token>",
	],
);

ws.on("open", function open() {
	console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
	console.log(message.data);
});

ws.send(
	JSON.stringify({
		text: "This is a sample text ",
		voice_settings: { stability: 0.8, similarity_boost: 0.8 },
		generation_config: { chunk_length_schedule: [120, 160, 250, 290] },
	}),
);
```

---

# 构建检索增强生成 (RAG) AI

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/

import { Details, Render, PackageManagers, WranglerConfig } from "~/components";

本指南将指导您设置和部署您的第一个 Cloudflare AI 应用程序。您将使用 Workers AI、Vectorize、D1 和 Cloudflare Workers 等工具构建一个功能齐全的 AI 驱动的应用程序。

:::note[寻找托管选项？]
[AutoRAG](/autorag) 提供了一种完全托管的方式来在 Cloudflare 上构建 RAG 管道，开箱即用地处理摄取、索引和查询。[开始使用](/autorag/get-started/)。
:::

在本教程结束时，您将构建一个 AI 工具，允许您存储信息并使用大型语言模型进行查询。这种模式被称为检索增强生成（RAG），是您可以结合 Cloudflare AI 工具包的多个方面构建的一个有用的项目。您无需具备使用 AI 工具的经验即可构建此应用程序。

<Render file="prereqs" product="workers" />

您还需要访问 [Vectorize](/vectorize/platform/pricing/)。在本教程中，我们将展示如何选择性地与 [Anthropic Claude](http://anthropic.com) 集成。您需要一个 [Anthropic API 密钥](https://docs.anthropic.com/en/api/getting-started) 才能这样做。

## 1. 创建一个新的 Worker 项目

C3 (`create-cloudflare-cli`) 是一个命令行工具，旨在帮助您尽快设置和部署 Workers 到 Cloudflare。

打开一个终端窗口并运行 C3 来创建您的 Worker 项目：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"rag-ai-tutorial"}
/>

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "JavaScript",
	}}
/>

在您的项目目录中，C3 生成了几个文件。

<Details header="C3 创建了哪些文件？">

1.  `wrangler.jsonc`: 您的 [Wrangler](/workers/wrangler/configuration/#sample-wrangler-configuration) 配置文件。
2.  `worker.js` (在 `/src` 中): 一个用 [ES 模块](/workers/reference/migrate-to-module-workers/) 语法编写的最小化 `'Hello World!'` Worker。
3.  `package.json`: 一个最小化的 Node 依赖项配置文件。
4.  `package-lock.json`: 请参阅 [`npm` 关于 `package-lock.json` 的文档](https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json)。
5.  `node_modules`: 请参阅 [`npm` 关于 `node_modules` 的文档](https://docs.npmjs.com/cli/v7/configuring-npm/folders#node-modules)。

</Details>

现在，移动到您新创建的目录中：

```sh
cd rag-ai-tutorial
```

## 2. 使用 Wrangler CLI 进行开发

Workers 命令行界面 [Wrangler](/workers/wrangler/install-and-update/) 允许您 [创建](/workers/wrangler/commands/#init)、[测试](/workers/wrangler/commands/#dev) 和 [部署](/workers/wrangler/commands/#deploy) 您的 Workers 项目。C3 将默认在项目中安装 Wrangler。

创建您的第一个 Worker 后，在项目目录中运行 [`wrangler dev`](/workers/wrangler/commands/#dev) 命令以启动本地服务器来开发您的 Worker。这将允许您在开发过程中本地测试您的 Worker。

```sh
npx wrangler dev --remote
```

:::note

如果您以前没有使用过 Wrangler，它会尝试打开您的 Web 浏览器以使用您的 Cloudflare 帐户登录。

如果此步骤出现问题或者您无法访问浏览器界面，请参阅 [`wrangler login`](/workers/wrangler/commands/#login) 文档以获取更多信息。

:::

您现在可以访问 [http://localhost:8787](http://localhost:8787) 来查看您的 Worker 正在运行。您对代码的任何更改都将触发重新构建，重新加载页面将显示您的 Worker 的最新输出。

## 3. 添加 AI 绑定

要开始使用 Cloudflare 的 AI 产品，您可以将 `ai` 块添加到 [Wrangler 配置文件](/workers/wrangler/configuration/) 中。这将在您的代码中设置一个到 Cloudflare AI 模型的绑定，您可以使用它与平台上的可用 AI 模型进行交互。

此示例使用了 [`@cf/meta/llama-3-8b-instruct` 模型](/workers-ai/models/llama-3-8b-instruct/)，该模型可以生成文本。

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

现在，找到 `src/index.js` 文件。在 `fetch` 处理程序中，您可以查询 `AI` 绑定：

```js
export default {
	async fetch(request, env, ctx) {
		const answer = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
			messages: [{ role: "user", content: `9 的平方根是多少？` }],
		});

		return new Response(JSON.stringify(answer));
	},
};
```

通过 `AI` 绑定查询 LLM，我们可以直接在代码中与 Cloudflare AI 的大型语言模型进行交互。在此示例中，我们使用的是 [`@cf/meta/llama-3-8b-instruct` 模型](/workers-ai/models/llama-3-8b-instruct/)，该模型可以生成文本。

您可以使用 `wrangler` 部署您的 Worker：

```sh
npx wrangler deploy
```

向您的 Worker 发出请求现在将从 LLM 生成文本响应，并将其作为 JSON 对象返回。

```sh
curl https://example.username.workers.dev
```

```sh output
{"response":"答案：9的平方根是3。"}
```

## 4. 使用 Cloudflare D1 和 Vectorize 添加嵌入

嵌入允许您向 Cloudflare AI 项目中使用的语言模型添加附加功能。这是通过 **Vectorize**（Cloudflare 的向量数据库）完成的。

要开始使用 Vectorize，请使用 `wrangler` 创建一个新的嵌入索引。此索引将存储具有 768 个维度的向量，并将使用余弦相似度来确定哪些向量彼此最相似：

```sh
npx wrangler vectorize create vector-index --dimensions=768 --metric=cosine
```

然后，将新 Vectorize 索引的配置详细信息添加到 [Wrangler 配置文件](/workers/wrangler/configuration/)中：

<WranglerConfig>

```toml
# ... existing wrangler configuration

[[vectorize]]
binding = "VECTOR_INDEX"
index_name = "vector-index"
```

</WranglerConfig>

向量索引允许您存储维度集合，维度是用于表示数据的浮点数。当您要查询向量数据库时，您也可以将查询转换为维度。**Vectorize** 旨在高效地确定哪些存储的向量与您的查询最相似。

要实现搜索功能，您必须设置一个 Cloudflare 的 D1 数据库。在 D1 中，您可以存储应用程序的数据。然后，您将此数据更改为向量格式。当有人搜索并与向量匹配时，您可以向他们显示匹配的数据。

使用 `wrangler` 创建一个新的 D1 数据库：

```sh
npx wrangler d1 create database
```

然后，将上一个命令输出的配置详细信息粘贴到 [Wrangler 配置文件](/workers/wrangler/configuration/) 中：

<WranglerConfig>

```toml
# ... existing wrangler configuration

[[d1_databases]]
binding = "DB" # 在您的 Worker 的 env.DB 中可用
database_name = "database"
database_id = "abc-def-geh" # 将此替换为真实的 database_id (UUID)
```

</WranglerConfig>

在此应用程序中，我们将在 D1 中创建一个 `notes` 表，这将允许我们存储笔记并稍后在 Vectorize 中检索它们。要创建此表，请使用 `wrangler d1 execute` 运行一个 SQL 命令：

```sh
npx wrangler d1 execute database --remote --command "CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, text TEXT NOT NULL)"
```

现在，我们可以使用 `wrangler d1 execute` 向我们的数据库中添加一个新笔记：

```sh
npx wrangler d1 execute database --remote --command "INSERT INTO notes (text) VALUES ('最好的披萨配料是意大利辣香肠')"
```

## 5. 创建工作流

在我们开始创建笔记之前，我们将引入一个 [Cloudflare 工作流](/workflows)。这将允许我们定义一个持久的工作流，可以安全、稳健地执行 RAG 过程的所有步骤。

首先，将一个新的 `[[workflows]]` 块添加到您的 [Wrangler 配置文件](/workers/wrangler/configuration/) 中：

<WranglerConfig>

```toml
# ... existing wrangler configuration

[[workflows]]
name = "rag"
binding = "RAG_WORKFLOW"
class_name = "RAGWorkflow"
```

</WranglerConfig>

在 `src/index.js` 中，添加一个名为 `RAGWorkflow` 的新类，它扩展了 `WorkflowEntrypoint`：

```js
import { WorkflowEntrypoint } from "cloudflare:workers";

export class RAGWorkflow extends WorkflowEntrypoint {
	async run(event, step) {
		await step.do("example step", async () => {
			console.log("Hello World!");
		});
	}
}
```

此类将定义一个工作流步骤，该步骤将在控制台中记录“Hello World!”。您可以根据需要向工作流中添加任意数量的步骤。

就其本身而言，此工作流不会执行任何操作。要执行工作流，我们将调用 `RAG_WORKFLOW` 绑定，并传入工作流正常完成所需的任何参数。以下是我们如何调用工作流的示例：

```js
env.RAG_WORKFLOW.create({ params: { text } });
```

## 6. 创建笔记并将其添加到 Vectorize

为了扩展您的 Workers 函数以处理多个路由，我们将添加 `hono`，这是一个用于 Workers 的路由库。这将允许我们为向数据库中添加笔记创建一个新路由。使用 `npm` 安装 `hono`：

<PackageManagers pkg="hono" />

然后，将 `hono` 导入您的 `src/index.js` 文件中。您还应该更新 `fetch` 处理程序以使用 `hono`：

```js
import { Hono } from "hono";
const app = new Hono();

app.get("/", async (c) => {
	const answer = await c.env.AI.run("@cf/meta/llama-3-8b-instruct", {
		messages: [{ role: "user", content: `9 的平方根是多少？` }],
	});

	return c.json(answer);
});

export default app;
```

这将在根路径 `/` 处建立一个路由，其功能与先前版本的应用程序相同。

现在，我们可以更新工作流以开始将笔记添加到数据库中，并生成它们的相关嵌入。

此示例使用了 [`@cf/baai/bge-base-en-v1.5` 模型](/workers-ai/models/bge-base-en-v1.5/)，该模型可用于创建嵌入。嵌入存储在 [Vectorize](/vectorize/) 中，这是 Cloudflare 的向量数据库。用户查询也会转换为嵌入，以便在 Vectorize 中进行搜索。

```js
import { WorkflowEntrypoint } from "cloudflare:workers";

export class RAGWorkflow extends WorkflowEntrypoint {
	async run(event, step) {
		const env = this.env;
		const { text } = event.payload;

		const record = await step.do(`create database record`, async () => {
			const query = "INSERT INTO notes (text) VALUES (?) RETURNING *";

			const { results } = await env.DB.prepare(query).bind(text).run();

			const record = results[0];
			if (!record) throw new Error("Failed to create note");
			return record;
		});

		const embedding = await step.do(`generate embedding`, async () => {
			const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
				text: text,
			});
			const values = embeddings.data[0];
			if (!values) throw new Error("Failed to generate vector embedding");
			return values;
		});

		await step.do(`insert vector`, async () => {
			return env.VECTOR_INDEX.upsert([
				{
					id: record.id.toString(),
					values: embedding,
				},
			]);
		});
	}
}
```

工作流执行以下操作：

1. 接受一个 `text` 参数。
2. 在 D1 的 `notes` 表中插入一个新行，并检索新行的 `id`。
3. 使用 LLM 绑定的 `embeddings` 模型将 `text` 转换为向量。
4. 将 `id` 和 `vectors` 上传到 Vectorize 中的 `vector-index` 索引。

通过这样做，您将创建一个新的向量表示形式的笔记，可以用于稍后检索该笔记。

要完成代码，我们将添加一个路由，允许用户向数据库提交笔记。此路由将解析 JSON 请求正文，获取 `note` 参数，并创建一个新的工作流实例，传递参数：

```js
app.post("/notes", async (c) => {
	const { text } = await c.req.json();
	if (!text) return c.text("Missing text", 400);
	await c.env.RAG_WORKFLOW.create({ params: { text } });
	return c.text("Created note", 201);
});
```

## 7. 查询 Vectorize 以检索笔记

要完成您的代码，您可以更新根路径（`/`）以查询 Vectorize。您将把查询转换为向量，然后使用 `vector-index` 索引来查找最相似的向量。

`topK` 参数限制了函数返回的向量数量。例如，提供 `topK` 为 1 将仅返回基于查询的 _最相似_ 向量。将 `topK` 设置为 5 将返回 5 个最相似的向量。

给定一组相似的向量，您可以检索与存储在这些向量旁边的记录 ID 匹配的笔记。在这种情况下，我们只检索一个笔记 - 但您可以根据需要自定义此设置。

您可以将这些笔记的文本插入 LLM 绑定的提示中。这是检索增强生成（RAG）的基础：在 LLM 的提示中提供来自数据外部的附加上下文，以增强 LLM 生成的文本。

我们将更新提示以包含上下文，并要求 LLM 在回应时使用上下文：

```js
import { Hono } from "hono";
const app = new Hono();

// Existing post route...
// app.post('/notes', async (c) => { ... })

app.get("/", async (c) => {
	const question = c.req.query("text") || "9 的平方根是多少？";

	const embeddings = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", {
		text: question,
	});
	const vectors = embeddings.data[0];

	const vectorQuery = await c.env.VECTOR_INDEX.query(vectors, { topK: 1 });
	let vecId;
	if (
		vectorQuery.matches &&
		vectorQuery.matches.length > 0 &&
		vectorQuery.matches[0]
	) {
		vecId = vectorQuery.matches[0].id;
	} else {
		console.log("No matching vector found or vectorQuery.matches is empty");
	}

	let notes = [];
	if (vecId) {
		const query = `SELECT * FROM notes WHERE id = ?`;
		const { results } = await c.env.DB.prepare(query).bind(vecId).all();
		if (results) notes = results.map((vec) => vec.text);
	}

	const contextMessage = notes.length
		? `Context:\n${notes.map((note) => `- ${note}`).join("\n")}`
		: "";

	const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`;

	const { response: answer } = await c.env.AI.run(
		"@cf/meta/llama-3-8b-instruct",
		{
			messages: [
				...(notes.length ? [{ role: "system", content: contextMessage }] : []),
				{ role: "system", content: systemPrompt },
				{ role: "user", content: question },
			],
		},
	);

	return c.text(answer);
});

app.onError((err, c) => {
	return c.text(err);
});

export default app;
```

## 8. 添加 Anthropic Claude 模型（可选）

如果您正在处理较大的文档，您有选择使用 Anthropic 的 [Claude 模型](https://claude.ai/)，这些模型具有大型上下文窗口，非常适合 RAG 工作流。

要开始，安装 `@anthropic-ai/sdk` 包：

<PackageManagers pkg="@anthropic-ai/sdk" />

在 `src/index.js` 中，您可以更新 `GET /` 路由以检查 `ANTHROPIC_API_KEY` 环境变量。如果设置了该变量，我们可以使用 Anthropic SDK 生成文本。如果没有设置，我们将回退到现有的 Workers AI 代码：

```js
import Anthropic from '@anthropic-ai/sdk';

app.get('/', async (c) => {
  // ... Existing code
	const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`

	let modelUsed: string = ""
	let response = null

	if (c.env.ANTHROPIC_API_KEY) {
		const anthropic = new Anthropic({
			apiKey: c.env.ANTHROPIC_API_KEY
		})

		const model = "claude-3-5-sonnet-latest"
		modelUsed = model

		const message = await anthropic.messages.create({
			max_tokens: 1024,
			model,
			messages: [
				{ role: 'user', content: question }
			],
			system: [systemPrompt, notes ? contextMessage : ''].join(" ")
		})

		response = {
			response: message.content.map(content => content.text).join("\n")
		}
	} else {
		const model = "@cf/meta/llama-3.1-8b-instruct"
		modelUsed = model

		response = await c.env.AI.run(
			model,
			{
				messages: [
					...(notes.length ? [{ role: 'system', content: contextMessage }] : []),
					{ role: 'system', content: systemPrompt },
					{ role: 'user', content: question }
				]
			}
		)
	}

	if (response) {
		c.header('x-model-used', modelUsed)
		return c.text(response.response)
	} else {
		return c.text("We were unable to generate output", 500)
	}
})
```

最后，您需要在 Workers 应用程序中设置 `ANTHROPIC_API_KEY` 环境变量。您可以使用 `wrangler secret put` 来实现：

```sh
$ npx wrangler secret put ANTHROPIC_API_KEY
```

## 9. 删除笔记和向量

如果您不再需要笔记，可以从数据库中删除它。每次删除笔记时，您还需要从 Vectorize 中删除相应的向量。您可以通过在 `src/index.js` 文件中构建 `DELETE /notes/:id` 路由来实现这一点：

```js
app.delete("/notes/:id", async (c) => {
	const { id } = c.req.param();

	const query = `DELETE FROM notes WHERE id = ?`;
	await c.env.DB.prepare(query).bind(id).run();

	await c.env.VECTOR_INDEX.deleteByIds([id]);

	return c.status(204);
});
```

## 10. 文本分割（可选）

对于较大的文本块，建议将文本分割成较小的块。这允许 LLM 更有效地收集相关上下文，而无需检索大块文本。

为了实现这一点，我们将向项目中添加一个新的 NPM 包，`@langchain/textsplitters`：

<PackageManagers pkg="@langchain/textsplitters" />

此包提供的 `RecursiveCharacterTextSplitter` 类将文本分割成较小的块。它可以根据需要进行自定义，但默认配置在大多数情况下都有效：

```js
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

const text = "Some long piece of text...";

const splitter = new RecursiveCharacterTextSplitter({
	// These can be customized to change the chunking size
	// chunkSize: 1000,
	// chunkOverlap: 200,
});

const output = await splitter.createDocuments([text]);
console.log(output); // [{ pageContent: 'Some long piece of text...' }]
```

要使用此分割器，我们将更新工作流以将文本分割成较小的块。然后，我们将遍历这些块，并为每个文本块运行工作流的其余部分：

```js
export class RAGWorkflow extends WorkflowEntrypoint {
	async run(event, step) {
		const env = this.env;
		const { text } = event.payload;
		let texts = await step.do("split text", async () => {
			const splitter = new RecursiveCharacterTextSplitter();
			const output = await splitter.createDocuments([text]);
			return output.map((doc) => doc.pageContent);
		});

		console.log(
			"RecursiveCharacterTextSplitter generated ${texts.length} chunks",
		);

		for (const index in texts) {
			const text = texts[index];
			const record = await step.do(
				`create database record: ${index}/${texts.length}`,
				async () => {
					const query = "INSERT INTO notes (text) VALUES (?) RETURNING *";

					const { results } = await env.DB.prepare(query).bind(text).run();

					const record = results[0];
					if (!record) throw new Error("Failed to create note");
					return record;
				},
			);

			const embedding = await step.do(
				`generate embedding: ${index}/${texts.length}`,
				async () => {
					const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
						text: text,
					});
					const values = embeddings.data[0];
					if (!values) throw new Error("Failed to generate vector embedding");
					return values;
				},
			);

			await step.do(`insert vector: ${index}/${texts.length}`, async () => {
				return env.VECTOR_INDEX.upsert([
					{
						id: record.id.toString(),
						values: embedding,
					},
				]);
			});
		}
	}
}
```

现在，当向 `/notes` 端点提交大块文本时，它们将被分割成较小的块，并且每个块将由工作流处理。

## 11. 部署您的项目

如果您在[第 1 步](/workers/get-started/guide/#1-create-a-new-worker-project)中没有部署您的 Worker，请使用 Wrangler 将您的 Worker 部署到 `*.workers.dev` 子域、[自定义域](/workers/configuration/routing/custom-domains/)（如果您已配置），或者如果您没有配置任何子域或域，Wrangler 将在发布过程中提示您设置一个。

```sh
npx wrangler deploy
```

在 `<YOUR_WORKER>.<YOUR_SUBDOMAIN>.workers.dev` 预览您的 Worker。

:::note[注意]

当首次将您的 Worker 推送到 `*.workers.dev` 子域时，您可能会看到 [`523` 错误](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-523/)，因为 DNS 正在传播。这些错误应在一分钟左右解决。

:::

## 相关资源

完整版本的此代码库可在 GitHub 上找到。它包括一个前端 UI 用于查询、添加和删除笔记，以及一个后端 API 用于与数据库和向量索引进行交互。您可以在这里找到它：[github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example](https://github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example/)。

要做更多：

- 探索 [检索增强生成（RAG）架构](/reference-architecture/diagrams/ai/ai-rag/) 的参考图表。
- 查看 Cloudflare 的 [AI 文档](/workers-ai)。
- 查看 [教程](/workers/tutorials/) 以在 Workers 上构建项目。
- 探索 [示例](/workers/examples/) 以尝试复制和粘贴 Worker 代码。
- 了解 Workers 的工作原理 [参考](/workers/reference/)。
- 了解 Workers 的功能和功能 [平台](/workers/platform/)。
- 设置 [Wrangler](/workers/wrangler/install-and-update/) 以编程方式创建、测试和部署您的 Worker 项目。

---

# 使用 Workers AI 构建带自动转录功能的语音笔记应用

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-voice-notes-app-with-auto-transcription/

import { Render, PackageManagers, Tabs, TabItem } from "~/components";

在本教程中，您将学习如何创建一个带有语音录音自动转录和可选后处理功能的语音笔记应用。构建该应用将使用以下工具：

- Workers AI 用于转录语音录音和可选的后处理
- D1 数据库用于存储笔记
- R2 存储用于存储语音录音
- Nuxt 框架用于构建全栈应用
- Workers 用于部署项目

## 先决条件

要继续，您需要：

<Render file="prereqs" product="workers" />

## 1. 创建一个新的 Worker 项目

使用带有 `nuxt` 框架预设的 `c3` CLI 创建一个新的 Worker 项目。

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args="voice-notes --framework=nuxt"
/>

### 安装附加依赖项

切换到新创建的项目目录

```sh
cd voice-notes
```

并安装以下依赖项：

<PackageManagers pkg="@nuxt/ui @vueuse/core @iconify-json/heroicons" />

然后将 `@nuxt/ui` 模块添加到 `nuxt.config.ts` 文件中：

```ts title="nuxt.config.ts"
export default defineNuxtConfig({
	//..

	modules: ["nitro-cloudflare-dev", "@nuxt/ui"],

	//..
});
```

### [可选] 迁移到 Nuxt 4 兼容模式

迁移到 Nuxt 4 兼容模式可确保您的应用程序与 Nuxt 的未来更新保持向前兼容。

在项目的根目录中创建一个新的 `app` 文件夹，并将 `app.vue` 文件移动到其中。此外，将以下内容添加到您的 `nuxt.config.ts` 文件中：

```ts title="nuxt.config.ts"
export default defineNuxtConfig({
	//..

	future: {
		compatibilityVersion: 4,
	},

	//..
});
```

:::note
本教程的其余部分将使用 `app` 文件夹来存放客户端代码。如果您没有进行此更改，您应该继续使用项目的根目录。
:::

### 启动本地开发服务器

此时，您可以通过启动本地开发服务器来测试您的应用程序：

<PackageManagers type="run" args="dev" />

如果一切设置正确，您应该在 `http://localhost:3000` 上看到一个 Nuxt 欢迎页面。

## 2. 创建转录 API 端点

此 API 利用 Workers AI 来转录语音录音。要在项目中使用 Workers AI，您首先需要将其绑定到 Worker。

<Render file="ai-local-usage-charges" product="workers" />

将 `AI` 绑定添加到 Wrangler 文件中。

```toml title="wrangler.toml"
[ai]
binding = "AI"
```

配置 `AI` 绑定后，运行 `cf-typegen` 命令以生成必要的 Cloudflare 类型定义。这使得类型定义在服务器事件上下文中可用。

<PackageManagers type="run" args="cf-typegen" />

通过在 `/server/api` 目录中创建 `transcribe.post.ts` 文件来创建一个转录 `POST` 端点。

```ts title="server/api/transcribe.post.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const form = await readFormData(event);
	const blob = form.get("audio") as Blob;
	if (!blob) {
		throw createError({
			statusCode: 400,
			message: "缺少要转录的音频 blob",
		});
	}

	try {
		const response = await cloudflare.env.AI.run("@cf/openai/whisper", {
			audio: [...new Uint8Array(await blob.arrayBuffer())],
		});

		return response.text;
	} catch (err) {
		console.error("转录音频时出错:", err);
		throw createError({
			statusCode: 500,
			message: "转录音频失败。请重试。",
		});
	}
});
```

上述代码执行以下操作：

1.  从事件中提取音频 blob。
2.  使用 `@cf/openai/whisper` 模型转录 blob 并将转录文本作为响应返回。

## 3. 为将音频录音上传到 R2 创建 API 端点

在将音频录音上传到 `R2` 之前，您需要先创建一个存储桶。您还需要将 R2 绑定添加到您的 Wrangler 文件并重新生成 Cloudflare 类型定义。

创建一个 `R2` 存储桶。

<PackageManagers
	type="exec"
	pkg="wrangler"
	args="r2 bucket create <BUCKET_NAME>"
/>

将存储绑定添加到您的 Wrangler 文件中。

```toml title="wrangler.toml"
[[r2_buckets]]
binding = "R2"
bucket_name = "<BUCKET_NAME>"
```

最后，通过重新运行 `cf-typegen` 脚本生成类型定义。

现在您已准备好创建上传端点。在您的 `server/api` 目录中创建一个新的 `upload.put.ts` 文件，并向其添加以下代码：

```ts title="server/api/upload.put.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const form = await readFormData(event);
	const files = form.getAll("files") as File[];
	if (!files) {
		throw createError({ statusCode: 400, message: "缺少文件" });
	}

	const uploadKeys: string[] = [];
	for (const file of files) {
		const obj = await cloudflare.env.R2.put(`recordings/${file.name}`, file);
		if (obj) {
			uploadKeys.push(obj.key);
		}
	}

	return uploadKeys;
});
```

上述代码执行以下操作：

1.  `files` 变量使用 `form.getAll()` 检索客户端发送的所有文件，这允许在单个请求中进行多次上传。
2.  使用您之前创建的绑定 (`R2`) 将文件上传到 R2 存储桶。

:::note
`recordings/` 前缀将上传的文件组织到存储桶中的专用文件夹中。这在向客户端提供这些录音时也会派上用场（稍后介绍）。
:::

## 4. 创建 API 端点以保存笔记条目

在创建端点之前，您需要执行与 R2 存储桶类似但有一些额外步骤的步骤，以准备一个笔记表。

创建一个 `D1` 数据库。

<PackageManagers type="exec" pkg="wrangler" args="d1 create <DB_NAME>" />

将 D1 绑定添加到 Wrangler 文件。您可以从 `d1 create` 命令的输出中获取 `DB_ID`。

```toml title="wrangler.toml"
[[d1_databases]]
binding = "DB"
database_name = "<DB_NAME>"
database_id = "<DB_ID>"
```

和以前一样，重新运行 `cf-typegen` 命令以生成类型。

接下来，创建一个数据库迁移。

<PackageManagers
	type="exec"
	pkg="wrangler"
	args={`d1 migrations create <DB_NAME> "create notes table"`}
/>

这将在项目的根目录中创建一个新的 `migrations` 文件夹，并向其中添加一个空的 `0001_create_notes_table.sql` 文件。用下面的代码替换此文件的内容。

```sql
CREATE TABLE IF NOT EXISTS notes (
 id INTEGER PRIMARY KEY AUTOINCREMENT,
 text TEXT NOT NULL,
 created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
 updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
 audio_urls TEXT
);
```

然后应用此迁移以创建 `notes` 表。

<PackageManagers
	type="exec"
	pkg="wrangler"
	args="d1 migrations apply <DB_NAME>"
/>

:::note
上述命令将在本地创建笔记表。要在您的远程生产数据库上应用迁移，请使用 `--remote` 标志。
:::

现在您可以创建 API 端点。在 `server/api/notes` 目录中创建一个新文件 `index.post.ts`，并将其内容更改为以下内容：

```ts title="server/api/notes/index.post.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const { text, audioUrls } = await readBody(event);
	if (!text) {
		throw createError({
			statusCode: 400,
			message: "Missing note text",
		});
	}

	try {
		await cloudflare.env.DB.prepare(
			"INSERT INTO notes (text, audio_urls) VALUES (?1, ?2)",
		)
			.bind(text, audioUrls ? JSON.stringify(audioUrls) : null)
			.run();

		return setResponseStatus(event, 201);
	} catch (err) {
		console.error("Error creating note:", err);
		throw createError({
			statusCode: 500,
			message: "Failed to create note. Please try again.",
		});
	}
});
```

The above does the following:

1. Extracts the text, and optional audioUrls from the event.
2. Saves it to the database after converting the audioUrls to a `JSON` string.

## 5. Handle note creation on the client-side

Now you're ready to work on the client side. Let's start by tackling the note creation part first.

### Recording user audio

Create a composable to handle audio recording using the MediaRecorder API. This will be used to record notes through the user's microphone.

Create a new file `useMediaRecorder.ts` in the `app/composables` folder, and add the following code to it:

```ts title="app/composables/useMediaRecorder.ts"
interface MediaRecorderState {
	isRecording: boolean;
	recordingDuration: number;
	audioData: Uint8Array | null;
	updateTrigger: number;
}

export function useMediaRecorder() {
	const state = ref<MediaRecorderState>({
		isRecording: false,
		recordingDuration: 0,
		audioData: null,
		updateTrigger: 0,
	});

	let mediaRecorder: MediaRecorder | null = null;
	let audioContext: AudioContext | null = null;
	let analyser: AnalyserNode | null = null;
	let animationFrame: number | null = null;
	let audioChunks: Blob[] | undefined = undefined;

	const updateAudioData = () => {
		if (!analyser || !state.value.isRecording || !state.value.audioData) {
			if (animationFrame) {
				cancelAnimationFrame(animationFrame);
				animationFrame = null;
			}

			return;
		}

		analyser.getByteTimeDomainData(state.value.audioData);
		state.value.updateTrigger += 1;
		animationFrame = requestAnimationFrame(updateAudioData);
	};

	const startRecording = async () => {
		try {
			const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

			audioContext = new AudioContext();
			analyser = audioContext.createAnalyser();

			const source = audioContext.createMediaStreamSource(stream);
			source.connect(analyser);

			mediaRecorder = new MediaRecorder(stream);
			audioChunks = [];

			mediaRecorder.ondataavailable = (e: BlobEvent) => {
				audioChunks?.push(e.data);
				state.value.recordingDuration += 1;
			};

			state.value.audioData = new Uint8Array(analyser.frequencyBinCount);
			state.value.isRecording = true;
			state.value.recordingDuration = 0;
			state.value.updateTrigger = 0;
			mediaRecorder.start(1000);

			updateAudioData();
		} catch (err) {
			console.error("Error accessing microphone:", err);
			throw err;
		}
	};

	const stopRecording = async () => {
		return await new Promise<Blob>((resolve) => {
			if (mediaRecorder && state.value.isRecording) {
				mediaRecorder.onstop = () => {
					const blob = new Blob(audioChunks, { type: "audio/webm" });
					audioChunks = undefined;

					state.value.recordingDuration = 0;
					state.value.updateTrigger = 0;
					state.value.audioData = null;

					resolve(blob);
				};

				state.value.isRecording = false;
				mediaRecorder.stop();
				mediaRecorder.stream.getTracks().forEach((track) => track.stop());

				if (animationFrame) {
					cancelAnimationFrame(animationFrame);
					animationFrame = null;
				}

				audioContext?.close();
				audioContext = null;
			}
		});
	};

	onUnmounted(() => {
		stopRecording();
	});

	return {
		state: readonly(state),
		startRecording,
		stopRecording,
	};
}
```

The above code does the following:

1. Exposes functions to start and stop audio recordings in a Vue application.
2. Captures audio input from the user's microphone using MediaRecorder API.
3. Processes real-time audio data for visualization using AudioContext and AnalyserNode.
4. Stores recording state including duration and recording status.
5. Maintains chunks of audio data and combines them into a final audio blob when recording stops.
6. Updates audio visualization data continuously using animation frames while recording.
7. Automatically cleans up all audio resources when recording stops or component unmounts.
8. Returns audio recordings in webm format for further processing.

### Create a component for note creation

This component allows users to create notes by either typing or recording audio. It also handles audio transcription and uploading the recordings to the server.

Create a new file named `CreateNote.vue` inside the `app/components` folder. Add the following template code to the newly created file:

```vue title="app/components/CreateNote.vue"
<template>
	<div class="flex flex-col gap-y-5">
		<div
			class="flex h-full flex-col gap-y-4 overflow-hidden p-px md:flex-row md:gap-x-6"
		>
			<UCard
				:ui="{
					base: 'h-full flex flex-col flex-1',
					body: { base: 'flex-grow' },
					header: { base: 'md:h-[72px]' },
				}"
			>
				<template #header>
					<h3
						class="text-base font-medium text-gray-600 md:text-lg dark:text-gray-300"
					>
						Note transcript
					</h3>
				</template>
				<UTextarea
					v-model="note"
					placeholder="Type your note or use voice recording..."
					size="lg"
					autofocus
					:disabled="loading || isTranscribing || state.isRecording"
					:rows="10"
				/>
			</UCard>

			<UCard
				class="order-first shrink-0 md:order-none md:flex md:h-full md:w-96 md:flex-col"
				:ui="{
					body: { base: 'max-h-36 md:max-h-none md:flex-grow overflow-y-auto' },
				}"
			>
				<template #header>
					<h3
						class="text-base font-medium text-gray-600 md:text-lg dark:text-gray-300"
					>
						Note recordings
					</h3>

					<UTooltip
						:text="state.isRecording ? 'Stop Recording' : 'Start Recording'"
					>
						<UButton
							:icon="
								state.isRecording
									? 'i-heroicons-stop-circle'
									: 'i-heroicons-microphone'
							"
							:color="state.isRecording ? 'red' : 'primary'"
							:loading="isTranscribing"
							@click="toggleRecording"
						/>
					</UTooltip>
				</template>

				<AudioVisualizer
					v-if="state.isRecording"
					class="mb-2 h-14 w-full rounded-lg bg-gray-50 p-2 dark:bg-gray-800"
					:audio-data="state.audioData"
					:data-update-trigger="state.updateTrigger"
				/>

				<div
					v-else-if="isTranscribing"
					class="mb-2 flex h-14 items-center justify-center gap-x-3 rounded-lg bg-gray-50 p-2 text-gray-500 dark:bg-gray-800 dark:text-gray-400"
				>
					<UIcon
						name="i-heroicons-arrow-path-20-solid"
						class="h-6 w-6 animate-spin"
					/>
					Transcribing...
				</div>

				<RecordingsList :recordings="recordings" @delete="deleteRecording" />

				<div
					v-if="!recordings.length && !state.isRecording && !isTranscribing"
					class="flex h-full items-center justify-center text-gray-500 dark:text-gray-400"
				>
					No recordings...
				</div>
			</UCard>
		</div>

		<UDivider />

		<div class="flex justify-end gap-x-4">
			<UButton
				icon="i-heroicons-trash"
				color="gray"
				size="lg"
				variant="ghost"
				:disabled="loading"
				@click="clearNote"
			>
				Clear
			</UButton>
			<UButton
				icon="i-heroicons-cloud-arrow-up"
				size="lg"
				:loading="loading"
				:disabled="!note.trim() && !state.isRecording"
				@click="saveNote"
			>
				Save
			</UButton>
		</div>
	</div>
</template>
```

The above template results in the following:

1. A panel with a `textarea` inside to type the note manually.
2. Another panel to manage start/stop of an audio recording, and show the recordings done already.
3. A bottom panel to reset or save the note (along with the recordings).

Now, add the following code below the template code in the same file:

```vue title="app/components/CreateNote.vue"
<script setup lang="ts">
import type { Recording, Settings } from "~~/types";

const emit = defineEmits<{
	(e: "created"): void;
}>();

const note = ref("");
const loading = ref(false);
const isTranscribing = ref(false);
const { state, startRecording, stopRecording } = useMediaRecorder();
const recordings = ref<Recording[]>([]);

const handleRecordingStart = async () => {
	try {
		await startRecording();
	} catch (err) {
		console.error("Error accessing microphone:", err);
		useToast().add({
			title: "Error",
			description: "Could not access microphone. Please check permissions.",
			color: "red",
		});
	}
};

const handleRecordingStop = async () => {
	let blob: Blob | undefined;

	try {
		blob = await stopRecording();
	} catch (err) {
		console.error("Error stopping recording:", err);
		useToast().add({
			title: "Error",
			description: "Failed to record audio. Please try again.",
			color: "red",
		});
	}

	if (blob) {
		try {
			const transcription = await transcribeAudio(blob);

			note.value += note.value ? "\n\n" : "";
			note.value += transcription ?? "";

			recordings.value.unshift({
				url: URL.createObjectURL(blob),
				blob,
				id: `${Date.now()}`,
			});
		} catch (err) {
			console.error("Error transcribing audio:", err);
			useToast().add({
				title: "Error",
				description: "Failed to transcribe audio. Please try again.",
				color: "red",
			});
		}
	}
};

const toggleRecording = () => {
	if (state.value.isRecording) {
		handleRecordingStop();
	} else {
		handleRecordingStart();
	}
};

const transcribeAudio = async (blob: Blob) => {
	try {
		isTranscribing.value = true;
		const formData = new FormData();
		formData.append("audio", blob);

		return await $fetch("/api/transcribe", {
			method: "POST",
			body: formData,
		});
	} finally {
		isTranscribing.value = false;
	}
};

const clearNote = () => {
	note.value = "";
	recordings.value = [];
};

const saveNote = async () => {
	if (!note.value.trim()) return;

	loading.value = true;

	const noteToSave: { text: string; audioUrls?: string[] } = {
		text: note.value.trim(),
	};

	try {
		if (recordings.value.length) {
			noteToSave.audioUrls = await uploadRecordings();
		}

		await $fetch("/api/notes", {
			method: "POST",
			body: noteToSave,
		});

		useToast().add({
			title: "Success",
			description: "Note saved successfully",
			color: "green",
		});

		note.value = "";
		recordings.value = [];

		emit("created");
	} catch (err) {
		console.error("Error saving note:", err);
		useToast().add({
			title: "Error",
			description: "Failed to save note",
			color: "red",
		});
	} finally {
		loading.value = false;
	}
};

const deleteRecording = (recording: Recording) => {
	recordings.value = recordings.value.filter((r) => r.id !== recording.id);
};

const uploadRecordings = async () => {
	if (!recordings.value.length) return;

	const formData = new FormData();
	recordings.value.forEach((recording) => {
		formData.append("files", recording.blob, recording.id + ".webm");
	});

	const uploadKeys = await $fetch("/api/upload", {
		method: "PUT",
		body: formData,
	});

	return uploadKeys;
};
</script>
```

The above code does the following:

1. When a recording is stopped by calling `handleRecordingStop` function, the audio blob is sent for transcribing to the transcribe API endpoint.
2. The transcription response text is appended to the existing textarea content.
3. When the note is saved by calling the `saveNote` function, the audio recordings are uploaded first to R2 by using the upload endpoint we created earlier. Then, the actual note content along with the audioUrls (the R2 object keys) are saved by calling the notes post endpoint.

### Create a new page route for showing the component

You can use this component in a Nuxt page to show it to the user. But before that you need to modify your `app.vue` file. Update the content of your `app.vue` to the following:

```vue title="/app/app.vue"
<template>
	<NuxtRouteAnnouncer />
	<NuxtLoadingIndicator />
	<div class="flex h-screen flex-col md:flex-row">
		<USlideover
			v-model="isDrawerOpen"
			class="md:hidden"
			side="left"
			:ui="{ width: 'max-w-xs' }"
		>
			<AppSidebar :links="links" @hide-drawer="isDrawerOpen = false" />
		</USlideover>

		<!-- The App Sidebar -->
		<AppSidebar :links="links" class="hidden md:block md:w-64" />

		<div class="h-full min-w-0 flex-1 bg-gray-50 dark:bg-gray-950">
			<!-- The App Header -->
			<AppHeader :title="title" @show-drawer="isDrawerOpen = true">
				<template #actions v-if="route.path === '/'">
					<UButton icon="i-heroicons-plus" @click="navigateTo('/new')">
						New Note
					</UButton>
				</template>
			</AppHeader>

			<!-- Main Page Content -->
			<main class="h-[calc(100vh-3.5rem)] overflow-y-auto p-4 sm:p-6">
				<NuxtPage />
			</main>
		</div>
	</div>
	<UNotifications />
</template>

<script setup lang="ts">
const isDrawerOpen = ref(false);
const links = [
	{
		label: "Notes",
		icon: "i-heroicons-document-text",
		to: "/",
		click: () => (isDrawerOpen.value = false),
	},
	{
		label: "Settings",
		icon: "i-heroicons-cog",
		to: "/settings",
		click: () => (isDrawerOpen.value = false),
	},
];

const route = useRoute();
const title = computed(() => {
	const activeLink = links.find((l) => l.to === route.path);
	if (activeLink) {
		return activeLink.label;
	}

	return "";
});
</script>
```

The above code allows for a nuxt page to be shown to the user, apart from showing an app header and a navigation sidebar.

Next, add a new file named `new.vue` inside the `app/pages` folder, add the following code to it:

```vue title="app/pages/new.vue"
<template>
	<UModal v-model="isOpen" fullscreen>
		<UCard
			:ui="{
				base: 'h-full flex flex-col',
				rounded: '',
				body: {
					base: 'flex-grow overflow-hidden',
				},
			}"
		>
			<template #header>
				<h2 class="text-xl leading-6 font-semibold md:text-2xl">Create note</h2>
				<UButton
					color="gray"
					variant="ghost"
					icon="i-heroicons-x-mark-20-solid"
					@click="closeModal"
				/>
			</template>

			<CreateNote class="mx-auto h-full max-w-7xl" @created="closeModal" />
		</UCard>
	</UModal>
</template>

<script setup lang="ts">
const isOpen = ref(true);

const router = useRouter();
const closeModal = () => {
	isOpen.value = false;

	if (window.history.length > 2) {
		router.back();
	} else {
		navigateTo({
			path: "/",
			replace: true,
		});
	}
};
</script>
```

The above code shows the `CreateNote` component inside a modal, and navigates back to the home page on successful note creation.

## 6. Showing the notes on the client side

To show the notes from the database on the client side, create an API endpoint first that will interact with the database.

### Create an API endpoint to fetch notes from the database

Create a new file named `index.get.ts` inside the `server/api/notes` directory, and add the following code to it:

```ts title="server/api/index.get.ts"
import type { Note } from "~~/types";

export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const res = await cloudflare.env.DB.prepare(
		`SELECT
      id,
      text,
      audio_urls AS audioUrls,
      created_at AS createdAt,
      updated_at AS updatedAt
    FROM notes
    ORDER BY created_at DESC
    LIMIT 50;`,
	).all<Omit<Note, "audioUrls"> & { audioUrls: string | null }>();

	return res.results.map((note) => ({
		...note,
		audioUrls: note.audioUrls ? JSON.parse(note.audioUrls) : undefined,
	}));
});
```

The above code fetches the last 50 notes from the database, ordered by their creation date in descending order. The `audio_urls` field is stored as a string in the database, but it's converted to an array using `JSON.parse` to handle multiple audio files seamlessly on the client side.

Next, create a page named `index.vue` inside the `app/pages` directory. This will be the home page of the application. Add the following code to it:

```vue title="app/pages/index.vue"
<template>
	<div :class="{ 'flex h-full': !notes?.length }">
		<div v-if="notes?.length" class="space-y-4 sm:space-y-6">
			<NoteCard v-for="note in notes" :key="note.id" :note="note" />
		</div>
		<div
			v-else
			class="flex-1 space-y-2 self-center text-center text-gray-500 dark:text-gray-400"
		>
			<h2 class="text-2xl md:text-3xl">No notes created</h2>
			<p>Get started by creating your first note</p>
		</div>
	</div>
</template>

<script setup lang="ts">
import type { Note } from "~~/types";

const { data: notes } = await useFetch<Note[]>("/api/notes");
</script>
```

The above code fetches the notes from the database by calling the `/api/notes` endpoint you created just now, and renders them as note cards.

### Serving the saved recordings from R2

To be able to play the audio recordings of these notes, you need to serve the saved recordings from the R2 storage.

Create a new file named `[...pathname].get.ts` inside the `server/routes/recordings` directory, and add the following code to it:

:::note
The `...` prefix in the file name makes it a catch all route. This allows it to receive all events that are meant for paths starting with `/recordings` prefix. This is where the `recordings` prefix that was added previously while saving the recordings becomes helpful.
:::

```ts title="server/routes/recordings/[...pathname].get.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare, params } = event.context;

	const { pathname } = params || {};

	return cloudflare.env.R2.get(`recordings/${pathname}`);
});
```

The above code extracts the path name from the event params, and serves the saved recording matching that object key from the R2 bucket.

## 7. [Optional] Post Processing the transcriptions

Even though the speech-to-text transcriptions models perform satisfactorily, sometimes you want to post process the transcriptions for various reasons. It could be to remove any discrepancy, or to change the tone/style of the final text.

### Create a settings page

Create a new file named `settings.vue` in the `app/pages` folder, and add the following code to it:

```vue title="app/pages/settings.vue"
<template>
	<UCard>
		<template #header>
			<div>
				<h2 class="text-base leading-6 font-semibold md:text-lg">
					Post Processing
				</h2>
				<p class="mt-1 text-sm text-gray-500 dark:text-gray-400">
					Configure post-processing of recording transcriptions with AI models.
				</p>
				<p class="mt-1 text-sm text-gray-500 italic dark:text-gray-400">
					Settings changes are auto-saved locally.
				</p>
			</div>
		</template>

		<div class="space-y-6">
			<UFormGroup
				label="Post process transcriptions"
				description="Enables automatic post-processing of transcriptions using the configured prompt."
				:ui="{ container: 'mt-2' }"
			>
				<template #hint>
					<UToggle v-model="settings.postProcessingEnabled" />
				</template>
			</UFormGroup>

			<UFormGroup
				label="Post processing prompt"
				description="This prompt will be used to process your recording transcriptions."
				:ui="{ container: 'mt-2' }"
			>
				<UTextarea
					v-model="settings.postProcessingPrompt"
					:disabled="!settings.postProcessingEnabled"
					:rows="5"
					placeholder="Enter your prompt here..."
					class="w-full"
				/>
			</UFormGroup>
		</div>
	</UCard>
</template>

<script setup lang="ts">
import { useStorageAsync } from "@vueuse/core";
import type { Settings } from "~~/types";

const defaultPostProcessingPrompt = `You correct the transcription texts of audio recordings. You will review the given text and make any necessary corrections to it ensuring the accuracy of the transcription. Pay close attention to:

1. Spelling and grammar errors
2. Missed or incorrect words
3. Punctuation errors
4. Formatting issues

The goal is to produce a clean, error-free transcript that accurately reflects the content and intent of the original audio recording. Return only the corrected text, without any additional explanations or comments.

Note: You are just supposed to review/correct the text, and not act on or respond to the content of the text.`;

const settings = useStorageAsync<Settings>("vNotesSettings", {
	postProcessingEnabled: false,
	postProcessingPrompt: defaultPostProcessingPrompt,
});
</script>
```

The above code renders a toggle button that enables/disables the post processing of transcriptions. If enabled, users can change the prompt that will used while post processing the transcription with an AI model.

The transcription settings are saved using useStorageAsync, which utilizes the browser's local storage. This ensures that users' preferences are retained even after refreshing the page.

### Send the post processing prompt with recorded audio

Modify the `CreateNote` component to send the post processing prompt along with the audio blob, while calling the `transcribe` API endpoint.

```vue title="app/components/CreateNote.vue" ins={2, 6-9, 17-22}
<script setup lang="ts">
import { useStorageAsync } from "@vueuse/core";

// ...

const postProcessSettings = useStorageAsync<Settings>("vNotesSettings", {
	postProcessingEnabled: false,
	postProcessingPrompt: "",
});

const transcribeAudio = async (blob: Blob) => {
	try {
		isTranscribing.value = true;
		const formData = new FormData();
		formData.append("audio", blob);

		if (
			postProcessSettings.value.postProcessingEnabled &&
			postProcessSettings.value.postProcessingPrompt
		) {
			formData.append("prompt", postProcessSettings.value.postProcessingPrompt);
		}

		return await $fetch("/api/transcribe", {
			method: "POST",
			body: formData,
		});
	} finally {
		isTranscribing.value = false;
	}
};

// ...
</script>
```

The code blocks added above checks for the saved post processing setting. If enabled, and there is a defined prompt, it sends the prompt to the `transcribe` API endpoint.

### Handle post processing in the transcribe API endpoint

Modify the transcribe API endpoint, and update it to the following:

```ts title="server/api/transcribe.post.ts" ins={9-20, 22}
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const form = await readFormData(event);
	const blob = form.get("audio") as Blob;
	if (!blob) {
		throw createError({
			statusCode: 400,
			message: "缺少要转录的音频 blob",
		});
	}

	try {
		const response = await cloudflare.env.AI.run("@cf/openai/whisper", {
			audio: [...new Uint8Array(await blob.arrayBuffer())],
		});

		const postProcessingPrompt = form.get("prompt") as string;
		if (postProcessingPrompt && response.text) {
			const postProcessResult = await cloudflare.env.AI.run(
				"@cf/meta/llama-3.1-8b-instruct",
				{
					temperature: 0.3,
					prompt: `${postProcessingPrompt}.\n\nText:\n\n${response.text}\n\nResponse:`,
				},
			);

			return (postProcessResult as { response?: string }).response;
		} else {
			return response.text;
		}
	} catch (err) {
		console.error("转录音频时出错:", err);
		throw createError({
			statusCode: 500,
			message: "转录音频失败。请重试。",
		});
	}
});
```

The above code does the following:

1. Extracts the post processing prompt from the event FormData.
2. If present, it calls the Workers AI API to process the transcription text using the `@cf/meta/llama-3.1-8b-instruct` model.
3. Finally, it returns the response from Workers AI to the client.

## 8. Deploy the application

Now you are ready to deploy the project to a `.workers.dev` sub-domain by running the deploy command.

<PackageManagers type="run" args="deploy" />

You can preview your application at `<YOUR_WORKER>.<YOUR_SUBDOMAIN>.workers.dev`.

:::note
If you used `pnpm` as your package manager, you may face build errors like `"stdin" is not exported by "node_modules/.pnpm/unenv@1.10.0/node_modules/unenv/runtime/node/process/index.mjs"`. To resolve it, you can try hoisting your node modules with the [`shamefully-hoist-true`](https://pnpm.io/npmrc) option.
:::

## Conclusion

In this tutorial, you have gone through the steps of building a voice notes application using Nuxt 3, Cloudflare Workers, D1, and R2 storage. You learnt to:

- Set up the backend to store and manage notes
- Create API endpoints to fetch and display notes
- Handle audio recordings
- Implement optional post-processing for transcriptions
- Deploy the application using the Cloudflare module syntax

The complete source code of the project is available on GitHub. You can go through it to see the code for various frontend components not covered in the article. You can find it here: [github.com/ra-jeev/vnotes](https://github.com/ra-jeev/vnotes).

---

# 使用 Cloudflare Workers AI 的 Whisper-large-v3-turbo

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking/

在本教程中，您将学习如何：

- **转录大型音频文件：** 使用 Cloudflare Workers AI 的 [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) 模型执行自动语音识别（ASR）或翻译。
- **处理大型文件：** 将大型音频文件分割成更小的块进行处理，这有助于克服内存和执行时间的限制。
- **使用 Cloudflare Workers 进行部署：** 在无服务器环境中创建可扩展、低延迟的转录管道。

## 1：创建一个新的 Cloudflare Worker 项目

import { Render, PackageManagers, WranglerConfig } from "~/components";

<Render file="prereqs" product="workers" />

您将使用 `create-cloudflare` CLI (C3) 创建一个新的 Worker 项目。[C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) 是一个命令行工具，旨在帮助您设置和部署新的应用程序到 Cloudflare。

通过运行以下命令创建一个名为 `whisper-tutorial` 的新项目：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"whisper-tutorial"}
/>

运行 `npm create cloudflare@latest` 将提示您安装 [`create-cloudflare` 包](https://www.npmjs.com/package/create-cloudflare)，并引导您完成设置。C3 还将安装 [Wrangler](/workers/wrangler/)，即 Cloudflare 开发者平台 CLI。

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "TypeScript",
	}}
/>

这将创建一个新的 `whisper-tutorial` 目录。您的新 `whisper-tutorial` 目录将包括：

- `src/index.ts` 中的一个 `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code)。
- 一个 [`wrangler.jsonc`](/workers/wrangler/configuration/) 配置文件。

转到您的应用程序目录：

```sh
cd whisper-tutorial
```

## 2. 将您的 Worker 连接到 Workers AI

您必须为您的 Worker 创建一个 AI 绑定以连接到 Workers AI。[绑定](/workers/runtime-apis/bindings/)允许您的 Workers 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到 `wrangler.toml` 文件的末尾：

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

您的绑定在您的 Worker 代码中的 [`env.AI`](/workers/runtime-apis/handlers/fetch/) 上[可用](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format)。

## 3. 配置 Wrangler

在您的 wrangler 文件中，添加或更新以下设置以启用 Node.js API 和 polyfill（兼容性日期为 2024-09-23 或更晚）：

<WranglerConfig>

```toml title="wrangler.toml"
compatibility_flags = [ "nodejs_compat" ]
compatibility_date = "2024-09-23"
```

</WranglerConfig>

## 4. 使用分块处理大型音频文件

将 `src/index.ts` 文件的内容替换为以下集成代码。此示例演示了如何：

(1) 从查询参数中提取音频文件 URL。

(2) 在明确遵循重定向的情况下获取音频文件。

(3) 将音频文件分割成更小的块（例如 1 MB 的块）。

(4) 通过 Cloudflare AI 绑定使用 Whisper-large-v3-turbo 模型转录每个块。

(5) 以纯文本形式返回聚合的转录。

```ts
import { Buffer } from "node:buffer";
import type { Ai } from "workers-ai";

export interface Env {
	AI: Ai;
	// 如果需要，添加您的 KV 命名空间以存储转录。
	// MY_KV_NAMESPACE: KVNamespace;
}

/**
 * 从提供的 URL 获取音频文件并将其分割成块。
 * 此函数明确遵循重定向。
 *
 * @param audioUrl - 音频文件的 URL。
 * @returns 一个 ArrayBuffer 数组，每个代表一个音频块。
 */
async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> {
	const response = await fetch(audioUrl, { redirect: "follow" });
	if (!response.ok) {
		throw new Error(`获取音频失败：${response.status}`);
	}
	const arrayBuffer = await response.arrayBuffer();

	// 示例：将音频分割成 1MB 的块。
	const chunkSize = 1024 * 1024; // 1MB
	const chunks: ArrayBuffer[] = [];
	for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) {
		const chunk = arrayBuffer.slice(i, i + chunkSize);
		chunks.push(chunk);
	}
	return chunks;
}

/**
 * 使用 Whisper‑large‑v3‑turbo 模型转录单个音频块。
 * 该函数将音频块转换为 Base64 编码的字符串，并
 * 通过 AI 绑定将其发送到模型。
 *
 * @param chunkBuffer - 作为 ArrayBuffer 的音频块。
 * @param env - Cloudflare Worker 环境，包括 AI 绑定。
 * @returns 来自模型的转录文本。
 */
async function transcribeChunk(
	chunkBuffer: ArrayBuffer,
	env: Env,
): Promise<string> {
	const base64 = Buffer.from(chunkBuffer, "binary").toString("base64");
	const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", {
		audio: base64,
		// 可选参数（如果需要，取消注释并设置）：
		// task: "transcribe", // 或 "translate"
		// language: "en",
		// vad_filter: "false",
		// initial_prompt: "如果需要，提供上下文。",
		// prefix: "转录：",
	});
	return res.text; // 假设转录结果包括一个 "text" 属性。
}

/**
 * 主 fetch 处理程序。它提取 'url' 查询参数，获取音频，
 * 以块为单位处理它，并返回完整的转录。
 */
export default {
	async fetch(
		request: Request,
		env: Env,
		ctx: ExecutionContext,
	): Promise<Response> {
		// 从查询参数中提取音频 URL。
		const { searchParams } = new URL(request.url);
		const audioUrl = searchParams.get("url");

		if (!audioUrl) {
			return new Response("缺少 'url' 查询参数", { status: 400 });
		}

		// 获取音频块。
		const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl);
		let fullTranscript = "";

		// 处理每个块并构建完整的转录。
		for (const chunk of audioChunks) {
			try {
				const transcript = await transcribeChunk(chunk, env);
				fullTranscript += transcript + "\n";
			} catch (error) {
				fullTranscript += "[转录块时出错]\n";
			}
		}

		return new Response(fullTranscript, {
			headers: { "Content-Type": "text/plain" },
		});
	},
} satisfies ExportedHandler<Env>;
```

## 5. 部署您的 Worker

1. **在本地运行 Worker：**

   使用 wrangler 的开发模式在本地测试您的 Worker：

```sh
npx wrangler dev
```

打开您的浏览器并转到 [http://localhost:8787](http://localhost:8787)，或使用 curl：

```sh
curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
```

将 URL 查询参数替换为您的音频文件的直接链接。（对于 GitHub 托管的文件，请确保使用原始文件 URL。）

2. **部署 Worker：**

   测试完成后，使用以下命令部署您的 Worker：

```sh
npx wrangler deploy
```

3. **测试已部署的 Worker：**

   部署后，通过将音频 URL 作为查询参数传递来测试您的 Worker：

```sh
curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
```

确保将 `<your-worker-subdomain>`、`your-username`、`your-repo` 和 `your-audio-file.mp3` 替换为您的实际详细信息。

如果成功，Worker 将返回音频文件的转录：

```sh
这是音频的转录...
```

---

# 使用 Workers AI 构建面试练习工具

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-ai-interview-practice-tool/

import { Render, PackageManagers } from "~/components";

求职面试可能会让人感到压力，而练习是建立信心的关键。虽然与朋友或导师进行的传统模拟面试很有价值，但并非总能在需要时获得。在本教程中，您将学习如何构建一个由 AI 驱动的面试练习工具，该工具可提供实时反馈以帮助提高面试技巧。

在本教程结束时，您将构建一个完整的面试练习工具，具有以下核心功能：

- 使用 WebSocket 连接的实时面试模拟工具
- 将音频转换为文本的 AI 驱动的语音处理管道
- 提供类似面试官互动的智能响应系统
- 使用 Durable Objects 管理面试会话和历史记录的持久存储系统

<Render file="tutorials-before-you-start" product="workers" />
<Render file="prereqs" product="workers" />

### 先决条件

本教程演示了如何使用多个 Cloudflare 产品，虽然许多功能在免费套餐中可用，但 Workers AI 的某些组件可能会产生基于使用量的费用。在继续之前，请查看 Workers AI 的定价文档。

<Render file="ai-local-usage-charges" product="workers" />

## 1. 创建一个新的 Worker 项目

使用 Create Cloudflare CLI (C3) 工具和 Hono 框架创建一个 Cloudflare Workers 项目。

:::note
[Hono](https://hono.dev) 是一个轻量级的 Web 框架，有助于构建 API 端点和处理 HTTP 请求。本教程使用 Hono 来创建和管理应用程序的路由和中间件组件。
:::

通过运行以下命令创建一个新的 Worker 项目，使用 `ai-interview-tool` 作为 Worker 名称：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"ai-interview-tool"}
/>

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "web-framework",
		framework: "Hono",
	}}
/>

要在本地开发和测试您的 Cloudflare Workers 应用程序：

1.  在您的终端中导航到您的 Workers 项目目录：

```sh
cd ai-interview-tool
```

2.  通过运行以下命令启动开发服务器：

```sh
npx wrangler dev
```

当您运行 `wrangler dev` 时，该命令会启动一个本地开发服务器，并提供一个 `localhost` URL，您可以在其中预览您的应用程序。
您现在可以对代码进行更改，并在提供的 localhost 地址上实时查看它们。

## 2. 为面试系统定义 TypeScript 类型

项目设置好后，创建将构成面试系统基础的 TypeScript 类型。这些类型将帮助您维护类型安全，并为应用程序的不同组件提供清晰的接口。

创建一个新的 `types.ts` 文件，其中将包含以下内容的基本类型和枚举：

- 可以评估的面试技能（JavaScript、React 等）
- 不同的面试职位（初级开发人员、高级开发人员等）
- 面试状态跟踪
- 用户与 AI 之间的消息处理
- 核心面试数据结构

```typescript title="src/types.ts"
import { Context } from "hono";

// API 端点的上下文类型，包括环境绑定和用户信息
export interface ApiContext {
	Bindings: CloudflareBindings;
	Variables: {
		username: string;
	};
}

export type HonoCtx = Context<ApiContext>;

// 您可以在模拟面试期间评估的技术技能列表。
// 此应用程序侧重于在真实面试中通常测试的流行 Web 技术和编程语言。
export enum InterviewSkill {
	JavaScript = "JavaScript",
	TypeScript = "TypeScript",
	React = "React",
	NodeJS = "NodeJS",
	Python = "Python",
}

// 基于不同工程职位的可用面试类型。
// 这有助于根据候选人的目标职位定制面试体验和问题。
export enum InterviewTitle {
	JuniorDeveloper = "初级开发人员面试",
	SeniorDeveloper = "高级开发人员面试",
	FullStackDeveloper = "全栈开发人员面试",
	FrontendDeveloper = "前端开发人员面试",
	BackendDeveloper = "后端开发人员面试",
	SystemArchitect = "系统架构师面试",
	TechnicalLead = "技术主管面试",
}

// 跟踪面试会话的当前状态。
// 这将帮助您管理面试流程，并在流程的每个阶段显示适当的 UI/操作。
export enum InterviewStatus {
	Created = "created", // 面试已创建但未开始
	Pending = "pending", // 等待面试官/系统
	InProgress = "in_progress", // 进行中的面试会话
	Completed = "completed", // 面试成功完成
	Cancelled = "cancelled", // 面试提前终止
}

// 定义在面试聊天中发送消息的人
export type MessageRole = "user" | "assistant" | "system";

// 面试期间交换的单个消息的结构
export interface Message {
	messageId: string; // 消息的唯一标识符
	interviewId: string; // 将消息链接到特定面试
	role: MessageRole; // 谁发送了消息
	content: string; // 实际消息内容
	timestamp: number; // 消息发送时间
}

// 保存有关面试会话的所有信息的主要数据结构。
// 这包括元数据、交换的消息和当前状态。
export interface InterviewData {
	interviewId: string;
	title: InterviewTitle;
	skills: InterviewSkill[];
	messages: Message[];
	status: InterviewStatus;
	createdAt: number;
	updatedAt: number;
}

// 创建新面试会话的输入格式。
// 简化接口，接受开始面试所需的基本参数。
export interface InterviewInput {
	title: string;
	skills: string[];
}
```

## 3. 为不同服务配置错误类型

接下来，设置自定义错误类型以处理应用程序中可能发生的不同类型的错误。这包括：

- 数据库错误（例如，连接问题、查询失败）
- 与面试相关的错误（例如，无效输入、转录失败）
- 身份验证错误（例如，无效会话）

创建以下 `errors.ts` 文件：

```typescript title="src/errors.ts"
export const ErrorCodes = {
	INVALID_MESSAGE: "INVALID_MESSAGE",
	TRANSCRIPTION_FAILED: "TRANSCRIPTION_FAILED",
	LLM_FAILED: "LLM_FAILED",
	DATABASE_ERROR: "DATABASE_ERROR",
} as const;

export class AppError extends Error {
	constructor(
		message: string,
		public statusCode: number,
	) {
		super(message);
		this.name = this.constructor.name;
	}
}

export class UnauthorizedError extends AppError {
	constructor(message: string) {
		super(message, 401);
	}
}

export class BadRequestError extends AppError {
	constructor(message: string) {
		super(message, 400);
	}
}

export class NotFoundError extends AppError {
	constructor(message: string) {
		super(message, 404);
	}
}

export class InterviewError extends Error {
	constructor(
		message: string,
		public code: string,
		public statusCode: number = 500,
	) {
		super(message);
		this.name = "InterviewError";
	}
}
```

## 4. 配置身份验证中间件和用户路由

在此步骤中，您将实现一个基本的身份验证系统，以跟踪和识别与您的 AI 面试练习工具交互的用户。该系统使用仅 HTTP 的 cookie 来存储用户名，使您能够识别请求发送者及其相应的 Durable Object。这种直接的身份验证方法要求用户提供一个用户名，然后将其安全地存储在 cookie 中。这种方法使您能够：

- 跨请求识别用户
- 将面试会话与特定用户关联
- 保护对与面试相关的端点的访问

### 创建身份验证中间件

创建一个中间件函数，用于检查是否存在有效的身份验证 cookie。此中间件将用于保护需要身份验证的路由。

创建一个新的中间件文件 `middleware/auth.ts`：

```typescript title="src/middleware/auth.ts"
import { Context } from "hono";
import { getCookie } from "hono/cookie";
import { UnauthorizedError } from "../errors";

export const requireAuth = async (ctx: Context, next: () => Promise<void>) => {
	// Get username from cookie
	const username = getCookie(ctx, "username");

	if (!username) {
		throw new UnauthorizedError("User is not logged in");
	}

	// Make username available to route handlers
	ctx.set("username", username);
	await next();
};
```

This middleware:

- Checks for a `username` cookie
- Throws an `Error` if the cookie is missing
- Makes the username available to downstream handlers via the context

### Create Authentication Routes

Next, create the authentication routes that will handle user login. Create a new file `routes/auth.ts`:

```typescript title="src/routes/auth.ts"
import { Context, Hono } from "hono";
import { setCookie } from "hono/cookie";
import { BadRequestError } from "../errors";
import { ApiContext } from "../types";

export const authenticateUser = async (ctx: Context) => {
	// Extract username from request body
	const { username } = await ctx.req.json();

	// Make sure username was provided
	if (!username) {
		throw new BadRequestError("Username is required");
	}

	// Create a secure cookie to track the user's session
	// This cookie will:
	// - Be HTTP-only for security (no JS access)
	// - Work across all routes via path="/"
	// - Last for 24 hours
	// - Only be sent in same-site requests to prevent CSRF
	setCookie(ctx, "username", username, {
		httpOnly: true,
		path: "/",
		maxAge: 60 * 60 * 24,
		sameSite: "Strict",
	});

	// Let the client know login was successful
	return ctx.json({ success: true });
};

// Set up authentication-related routes
export const configureAuthRoutes = () => {
	const router = new Hono<ApiContext>();

	// POST /login - Authenticate user and create session
	router.post("/login", authenticateUser);

	return router;
};
```

Finally, update main application file to include the authentication routes. Modify `src/index.ts`:

```typescript title="src/index.ts"
import { configureAuthRoutes } from "./routes/auth";
import { Hono } from "hono";
import { logger } from "hono/logger";
import type { ApiContext } from "./types";
import { requireAuth } from "./middleware/auth";

// Create our main Hono app instance with proper typing
const app = new Hono<ApiContext>();

// Create a separate router for API endpoints to keep things organized
const api = new Hono<ApiContext>();

// Set up global middleware that runs on every request
// - Logger gives us visibility into what is happening
app.use("*", logger());

// Wire up all our authentication routes (login, etc)
// These will be mounted under /api/v1/auth/
api.route("/auth", configureAuthRoutes());

// Mount all API routes under the version prefix (for example, /api/v1)
// This allows us to make breaking changes in v2 without affecting v1 users
app.route("/api/v1", api);

export default app;
```

Now we have a basic authentication system that:

1. Provides a login endpoint at `/api/v1/auth/login`
2. Securely stores the username in a cookie
3. Includes middleware to protect authenticated routes

## 5. Create a Durable Object to manage interviews

Now that you have your authentication system in place, create a Durable Object to manage interview sessions. Durable Objects are perfect for this interview practice tool because they provide the following functionalities:

- Maintains states between connections, so users can reconnect without losing progress.
- Provides a SQLite database to store all interview Q&A, feedback and metrics.
- Enables smooth real-time interactions between the interviewer AI and candidate.
- Handles multiple interview sessions efficiently without performance issues.
- Creates a dedicated instance for each user, giving them their own isolated environment.

First, you will need to configure the Durable Object in Wrangler file. Add the following configuration:

```toml title="wrangler.toml"
[[durable_objects.bindings]]
name = "INTERVIEW"
class_name = "Interview"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["Interview"]
```

Next, create a new file `interview.ts` to define our Interview Durable Object:

```typescript title="src/interview.ts"
import { DurableObject } from "cloudflare:workers";

export class Interview extends DurableObject<CloudflareBindings> {
	// We will use it to keep track of all active WebSocket connections for real-time communication
	private sessions: Map<WebSocket, { interviewId: string }>;

	constructor(state: DurableObjectState, env: CloudflareBindings) {
		super(state, env);

		// Initialize empty sessions map - we will add WebSocket connections as users join
		this.sessions = new Map();
	}

	// Entry point for all HTTP requests to this Durable Object
	// This will handle both initial setup and WebSocket upgrades
	async fetch(request: Request) {
		// For now, just confirm the object is working
		// We'll add WebSocket upgrade logic and request routing later
		return new Response("Interview object initialized");
	}

	// Broadcasts a message to all connected WebSocket clients.
	private broadcast(message: string) {
		this.ctx.getWebSockets().forEach((ws) => {
			try {
				if (ws.readyState === WebSocket.OPEN) {
					ws.send(message);
				}
			} catch (error) {
				console.error(
					"Error broadcasting message to a WebSocket client:",
					error,
				);
			}
		});
	}
}
```

Now we need to export the Durable Object in our main `src/index.ts` file:

```typescript title="src/index.ts"
import { Interview } from "./interview";

// ... previous code ...

export { Interview };

export default app;
```

Since the Worker code is written in TypeScript, you should run the following command to add the necessary type definitions:

```sh
npm run cf-typegen
```

### Set up SQLite database schema to store interview data

Now you will use SQLite at the Durable Object level for data persistence. This gives each user their own isolated database instance. You will need two main tables:

- `interviews`: Stores interview session data
- `messages`: Stores all messages exchanged during interviews

Before you create these tables, create a service class to handle your database operations. This encapsulates database logic and helps you:

- Manage database schema changes
- Handle errors consistently
- Keep database queries organized

Create a new file called `services/InterviewDatabaseService.ts`:

```typescript title="src/services/InterviewDatabaseService.ts"
import {
	InterviewData,
	Message,
	InterviewStatus,
	InterviewTitle,
	InterviewSkill,
} from "../types";
import { InterviewError, ErrorCodes } from "../errors";

const CONFIG = {
	database: {
		tables: {
			interviews: "interviews",
			messages: "messages",
		},
		indexes: {
			messagesByInterview: "idx_messages_interviewId",
		},
	},
} as const;

export class InterviewDatabaseService {
	constructor(private sql: SqlStorage) {}

	/**
	 * Sets up the database schema by creating tables and indexes if they do not exist.
	 * This is called when initializing a new Durable Object instance to ensure
	 * we have the required database structure.
	 *
	 * The schema consists of:
	 * - interviews table: Stores interview metadata like title, skills, and status
	 * - messages table: Stores the conversation history between user and AI
	 * - messages index: Helps optimize queries when fetching messages for a specific interview
	 */
	createTables() {
		try {
			// Get list of existing tables to avoid recreating them
			const cursor = this.sql.exec(`PRAGMA table_list`);
			const existingTables = new Set([...cursor].map((table) => table.name));

			// The interviews table is our main table storing interview sessions.
			// We only create it if it does not exist yet.
			if (!existingTables.has(CONFIG.database.tables.interviews)) {
				this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_INTERVIEWS_TABLE);
			}

			// The messages table stores the actual conversation history.
			// It references interviews table via foreign key for data integrity.
			if (!existingTables.has(CONFIG.database.tables.messages)) {
				this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_MESSAGES_TABLE);
			}

			// Add an index on interviewId to speed up message retrieval.
			// This is important since we will frequently query messages by interview.
			this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_MESSAGE_INDEX);
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to initialize database: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	private static readonly QUERIES = {
		CREATE_INTERVIEWS_TABLE: `
      CREATE TABLE IF NOT EXISTS interviews (
        interviewId TEXT PRIMARY KEY,
        title TEXT NOT NULL,
        skills TEXT NOT NULL,
        createdAt INTEGER NOT NULL DEFAULT (strftime('%s','now') * 1000),
        updatedAt INTEGER NOT NULL DEFAULT (strftime('%s','now') * 1000),
        status TEXT NOT NULL DEFAULT 'pending'
      )
    `,
		CREATE_MESSAGES_TABLE: `
      CREATE TABLE IF NOT EXISTS messages (
        messageId TEXT PRIMARY KEY,
        interviewId TEXT NOT NULL,
        role TEXT NOT NULL,
        content TEXT NOT NULL,
        timestamp INTEGER NOT NULL,
        FOREIGN KEY (interviewId) REFERENCES interviews(interviewId)
      )
    `,
		CREATE_MESSAGE_INDEX: `
      CREATE INDEX IF NOT EXISTS idx_messages_interview ON messages(interviewId)
    `,
	};
}
```

Update the `Interview` Durable Object to use the database service by modifying `src/interview.ts`:

```typescript title="src/interview.ts"
import { InterviewDatabaseService } from "./services/InterviewDatabaseService";

export class Interview extends DurableObject<CloudflareBindings> {
	// Database service for persistent storage of interview data and messages
	private readonly db: InterviewDatabaseService;
	private sessions: Map<WebSocket, { interviewId: string }>;

	constructor(state: DurableObjectState, env: CloudflareBindings) {
		// ... previous code ...
		// Set up our database connection using the DO's built-in SQLite instance
		this.db = new InterviewDatabaseService(state.storage.sql);
		// First-time setup: ensure our database tables exist
		// This is idempotent so safe to call on every instantiation
		this.db.createTables();
	}
}
```

Add methods to create and retrieve interviews in `services/InterviewDatabaseService.ts`:

```typescript title="src/services/InterviewDatabaseService.ts"
export class InterviewDatabaseService {
	/**
	 * Creates a new interview session in the database.
	 *
	 * This is the main entry point for starting a new interview. It handles all the
	 * initial setup like:
	 * - Generating a unique ID using crypto.randomUUID() for reliable uniqueness
	 * - Recording the interview title and required skills
	 * - Setting up timestamps for tracking interview lifecycle
	 * - Setting the initial status to "Created"
	 *
	 */
	createInterview(title: InterviewTitle, skills: InterviewSkill[]): string {
		try {
			const interviewId = crypto.randomUUID();
			const currentTime = Date.now();

			this.sql.exec(
				InterviewDatabaseService.QUERIES.INSERT_INTERVIEW,
				interviewId,
				title,
				JSON.stringify(skills), // Store skills as JSON for flexibility
				InterviewStatus.Created,
				currentTime,
				currentTime,
			);

			return interviewId;
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to create interview: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	/**
	 * Fetches all interviews from the database, ordered by creation date.
	 *
	 * This is useful for displaying interview history and letting users
	 * resume previous sessions. We order by descending creation date since
	 * users typically want to see their most recent interviews first.
	 *
	 * Returns an array of InterviewData objects with full interview details
	 * including metadata and message history.
	 */
	getAllInterviews(): InterviewData[] {
		try {
			const cursor = this.sql.exec(
				InterviewDatabaseService.QUERIES.GET_ALL_INTERVIEWS,
			);

			return [...cursor].map(this.parseInterviewRecord);
		} catch (error) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to retrieve interviews: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	// Retrieves an interview and its messages by ID
	getInterview(interviewId: string): InterviewData | null {
		try {
			const cursor = this.sql.exec(
				InterviewDatabaseService.QUERIES.GET_INTERVIEW,
				interviewId,
			);

			const record = [...cursor][0];
			if (!record) return null;

			return this.parseInterviewRecord(record);
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to retrieve interview: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	addMessage(
		interviewId: string,
		role: Message["role"],
		content: string,
		messageId: string,
	): Message {
		try {
			const timestamp = Date.now();

			this.sql.exec(
				InterviewDatabaseService.QUERIES.INSERT_MESSAGE,
				messageId,
				interviewId,
				role,
				content,
				timestamp,
			);

			return {
				messageId,
				interviewId,
				role,
				content,
				timestamp,
			};
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to add message: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	/**
	 * Transforms raw database records into structured InterviewData objects.
	 *
	 * This helper does the heavy lifting of:
	 * - Type checking critical fields to catch database corruption early
	 * - Converting stored JSON strings back into proper objects
	 * - Filtering out any null messages that might have snuck in
	 * - Ensuring timestamps are proper numbers
	 *
	 * If any required data is missing or malformed, it throws an error
	 * rather than returning partially valid data that could cause issues
	 * downstream.
	 */
	private parseInterviewRecord(record: any): InterviewData {
		const interviewId = record.interviewId as string;
		const createdAt = Number(record.createdAt);
		const updatedAt = Number(record.updatedAt);

		if (!interviewId || !createdAt || !updatedAt) {
			throw new InterviewError(
				"Invalid interview data in database",
				ErrorCodes.DATABASE_ERROR,
			);
		}

		return {
			interviewId,
			title: record.title as InterviewTitle,
			skills: JSON.parse(record.skills as string) as InterviewSkill[],
			messages: record.messages
				? JSON.parse(record.messages)
						.filter((m: any) => m !== null)
						.map((m: any) => ({
							messageId: m.messageId,
							role: m.role,
							content: m.content,
							timestamp: m.timestamp,
						}))
				: [],
			status: record.status as InterviewStatus,
			createdAt,
			updatedAt,
		};
	}

	// Add these SQL queries to the QUERIES object
	private static readonly QUERIES = {
		// ... previous queries ...

		INSERT_INTERVIEW: `
      INSERT INTO ${CONFIG.database.tables.interviews}
      (interviewId, title, skills, status, createdAt, updatedAt)
      VALUES (?, ?, ?, ?, ?, ?)
    `,

		GET_ALL_INTERVIEWS: `
      SELECT
        interviewId,
        title,
        skills,
        createdAt,
        updatedAt,
        status
      FROM ${CONFIG.database.tables.interviews}
      ORDER BY createdAt DESC
    `,

		INSERT_MESSAGE: `
      INSERT INTO ${CONFIG.database.tables.messages}
      (messageId, interviewId, role, content, timestamp)
      VALUES (?, ?, ?, ?, ?)
    `,

		GET_INTERVIEW: `
      SELECT
        i.interviewId,
        i.title,
        i.skills,
        i.status,
        i.createdAt,
        i.updatedAt,
        COALESCE(
          json_group_array(
            CASE WHEN m.messageId IS NOT NULL THEN
              json_object(
                'messageId', m.messageId,
                'role', m.role,
                'content', m.content,
                'timestamp', m.timestamp
              )
            END
          ),
          '[]'
        ) as messages
      FROM ${CONFIG.database.tables.interviews} i
      LEFT JOIN ${CONFIG.database.tables.messages} m ON i.interviewId = m.interviewId
      WHERE i.interviewId = ?
      GROUP BY i.interviewId
    `,
	};
}
```

Add RPC methods to the `Interview` Durable Object to expose database operations through API. Add this code to `src/interview.ts`:

```typescript title="src/interview.ts"
import {
	InterviewData,
	InterviewTitle,
	InterviewSkill,
	Message,
} from "./types";

export class Interview extends DurableObject<CloudflareBindings> {
	// Creates a new interview session
	createInterview(title: InterviewTitle, skills: InterviewSkill[]): string {
		return this.db.createInterview(title, skills);
	}

	// Retrieves all interview sessions
	getAllInterviews(): InterviewData[] {
		return this.db.getAllInterviews();
	}

	// Adds a new message to the 'messages' table and broadcasts it to all connected WebSocket clients.
	addMessage(
		interviewId: string,
		role: "user" | "assistant",
		content: string,
		messageId: string,
	): Message {
		const newMessage = this.db.addMessage(
			interviewId,
			role,
			content,
			messageId,
		);
		this.broadcast(
			JSON.stringify({
				...newMessage,
				type: "message",
			}),
		);
		return newMessage;
	}
}
```

## 6. Create REST API endpoints

With your Durable Object and database service ready, create REST API endpoints to manage interviews. You will need endpoints to:

- Create new interviews
- Retrieve all interviews for a user

Create a new file for your interview routes at `routes/interview.ts`:

```typescript title="src/routes/interview.ts"
import { Hono } from "hono";
import { BadRequestError } from "../errors";
import {
	InterviewInput,
	ApiContext,
	HonoCtx,
	InterviewTitle,
	InterviewSkill,
} from "../types";
import { requireAuth } from "../middleware/auth";

/**
 * Gets the Interview Durable Object instance for a given user.
 * We use the username as a stable identifier to ensure each user
 * gets their own dedicated DO instance that persists across requests.
 */
const getInterviewDO = (ctx: HonoCtx) => {
	const username = ctx.get("username");
	const id = ctx.env.INTERVIEW.idFromName(username);
	return ctx.env.INTERVIEW.get(id);
};

/**
 * Validates the interview creation payload.
 * Makes sure we have all required fields in the correct format:
 * - title must be present
 * - skills must be a non-empty array
 * Throws an error if validation fails.
 */
const validateInterviewInput = (input: InterviewInput) => {
	if (
		!input.title ||
		!input.skills ||
		!Array.isArray(input.skills) ||
		input.skills.length === 0
	) {
		throw new BadRequestError("Invalid input");
	}
};

/**
 * GET /interviews
 * Retrieves all interviews for the authenticated user.
 * The interviews are stored and managed by the user's DO instance.
 */
const getAllInterviews = async (ctx: HonoCtx) => {
	const interviewDO = getInterviewDO(ctx);
	const interviews = await interviewDO.getAllInterviews();
	return ctx.json(interviews);
};

/**
 * POST /interviews
 * Creates a new interview session with the specified title and skills.
 * Each interview gets a unique ID that can be used to reference it later.
 * Returns the newly created interview ID on success.
 */
const createInterview = async (ctx: HonoCtx) => {
	const body = await ctx.req.json<InterviewInput>();
	validateInterviewInput(body);

	const interviewDO = getInterviewDO(ctx);
	const interviewId = await interviewDO.createInterview(
		body.title as InterviewTitle,
		body.skills as InterviewSkill[],
	);

	return ctx.json({ success: true, interviewId });
};

/**
 * Sets up all interview-related routes.
 * Currently supports:
 * - GET / : List all interviews
 * - POST / : Create a new interview
 */
export const configureInterviewRoutes = () => {
	const router = new Hono<ApiContext>();
	router.use("*", requireAuth);
	router.get("/", getAllInterviews);
	router.post("/", createInterview);
	return router;
};
```

The `getInterviewDO` helper function uses the username from our authentication cookie to create a unique Durable Object ID. This ensures each user has their own isolated interview state.

Update your main application file to include the routes and protect them with authentication middleware. Update `src/index.ts`:

```typescript title="src/index.ts"
import { configureAuthRoutes } from "./routes/auth";
import { configureInterviewRoutes } from "./routes/interview";
import { Hono } from "hono";
import { Interview } from "./interview";
import { logger } from "hono/logger";
import type { ApiContext } from "./types";

const app = new Hono<ApiContext>();
const api = new Hono<ApiContext>();

app.use("*", logger());

api.route("/auth", configureAuthRoutes());
api.route("/interviews", configureInterviewRoutes());

app.route("/api/v1", api);

export { Interview };
export default app;
```

Now you have two new API endpoints:

- `POST /api/v1/interviews`: Creates a new interview session
- `GET /api/v1/interviews`: Retrieves all interviews for the authenticated user

You can test these endpoints running the following command:

1. Create a new interview:

```sh
curl -X POST http://localhost:8787/api/v1/interviews \
-H "Content-Type: application/json" \
-H "Cookie: username=testuser; HttpOnly" \
-d '{"title":"Frontend Developer Interview","skills":["JavaScript","React","CSS"]}'
```

2. Get all interviews:

```sh
curl http://localhost:8787/api/v1/interviews \
-H "Cookie: username=testuser; HttpOnly"
```

## 7. Set up WebSockets to handle real-time communication

With the basic interview management system in place, you will now implement Durable Objects to handle real-time message processing and maintain WebSocket connections.

Update the `Interview` Durable Object to handle WebSocket connections by adding the following code to `src/interview.ts`:

```typescript title="src/interview.ts"
export class Interview extends DurableObject<CloudflareBindings> {
	// Services for database operations and managing WebSocket sessions
	private readonly db: InterviewDatabaseService;
	private sessions: Map<WebSocket, { interviewId: string }>;

	constructor(state: DurableObjectState, env: CloudflareBindings) {
		// ... previous code ...

		// Keep WebSocket connections alive by automatically responding to pings
		// This prevents timeouts and connection drops
		this.ctx.setWebSocketAutoResponse(
			new WebSocketRequestResponsePair("ping", "pong"),
		);
	}

	async fetch(request: Request): Promise<Response> {
		// Check if this is a WebSocket upgrade request
		const upgradeHeader = request.headers.get("Upgrade");
		if (upgradeHeader?.toLowerCase().includes("websocket")) {
			return this.handleWebSocketUpgrade(request);
		}

		// If it is not a WebSocket request, we don't handle it
		return new Response("Not found", { status: 404 });
	}

	private async handleWebSocketUpgrade(request: Request): Promise<Response> {
		// Extract the interview ID from the URL - it should be the last segment
		const url = new URL(request.url);
		const interviewId = url.pathname.split("/").pop();

		if (!interviewId) {
			return new Response("Missing interviewId parameter", { status: 400 });
		}

		// Create a new WebSocket connection pair - one for the client, one for the server
		const pair = new WebSocketPair();
		const [client, server] = Object.values(pair);

		// Keep track of which interview this WebSocket is connected to
		// This is important for routing messages to the right interview session
		this.sessions.set(server, { interviewId });

		// Tell the Durable Object to start handling this WebSocket
		this.ctx.acceptWebSocket(server);

		// Send the current interview state to the client right away
		// This helps initialize their UI with the latest data
		const interviewData = await this.db.getInterview(interviewId);
		if (interviewData) {
			server.send(
				JSON.stringify({
					type: "interview_details",
					data: interviewData,
				}),
			);
		}

		// Return the client WebSocket as part of the upgrade response
		return new Response(null, {
			status: 101,
			webSocket: client,
		});
	}

	async webSocketClose(
		ws: WebSocket,
		code: number,
		reason: string,
		wasClean: boolean,
	) {
		// Clean up when a connection closes to prevent memory leaks
		// This is especially important in long-running Durable Objects
		console.log(
			`WebSocket closed: Code ${code}, Reason: ${reason}, Clean: ${wasClean}`,
		);
	}
}
```

Next, update the interview routes to include a WebSocket endpoint. Add the following to `routes/interview.ts`:

```typescript title="src/routes/interview.ts"
// ... previous code ...
const streamInterviewProcess = async (ctx: HonoCtx) => {
	const interviewDO = getInterviewDO(ctx);
	return await interviewDO.fetch(ctx.req.raw);
};

export const configureInterviewRoutes = () => {
	const router = new Hono<ApiContext>();
	router.get("/", getAllInterviews);
	router.post("/", createInterview);
	// Add WebSocket route
	router.get("/:interviewId", streamInterviewProcess);
	return router;
};
```

The WebSocket system provides real-time communication features for interview practice tool:

- Each interview session gets its own dedicated WebSocket connection, allowing seamless communication between the candidate and AI interviewer
- The Durable Object maintains the connection state, ensuring no messages are lost even if the client temporarily disconnects
- To keep connections stable, it automatically responds to ping messages with pongs, preventing timeouts
- Candidates and interviewers receive instant updates as the interview progresses, creating a natural conversational flow

## 8. Add audio processing capabilities with Workers AI

Now that WebSocket connection set up, the next step is to add speech-to-text capabilities using Workers AI. Let's use Cloudflare's Whisper model to transcribe audio in real-time during the interview.

The audio processing pipeline will work like this:

1. Client sends audio through the WebSocket connection
2. Our Durable Object receives the binary audio data
3. We pass the audio to Whisper for transcription
4. The transcribed text is saved as a new message
5. We immediately send the transcription back to the client
6. The client receives a notification that the AI interviewer is generating a response

### Create audio processing pipeline

In this step you will update the Interview Durable Object to handle the following:

1. Detect binary audio data sent through WebSocket
2. Create a unique message ID for tracking the processing status
3. Notify clients that audio processing has begun
4. Include error handling for failed audio processing
5. Broadcast status updates to all connected clients

First, update Interview Durable Object to handle binary WebSocket messages. Add the following methods to your `src/interview.ts` file:

```typescript title="src/interview.ts"
// ... previous code ...
/**
 * Handles incoming WebSocket messages, both binary audio data and text messages.
 * This is the main entry point for all WebSocket communication.
 */
async webSocketMessage(ws: WebSocket, eventData: ArrayBuffer | string): Promise<void> {
  try {
    // Handle binary audio data from the client's microphone
    if (eventData instanceof ArrayBuffer) {
      await this.handleBinaryAudio(ws, eventData);
      return;
    }
    // Text messages will be handled by other methods
  } catch (error) {
    this.handleWebSocketError(ws, error);
  }
}

/**
 * Processes binary audio data received from the client.
 * Converts audio to text using Whisper and broadcasts processing status.
 */
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise<void> {
  try {
    const uint8Array = new Uint8Array(audioData);

    // Retrieve the associated interview session
    const session = this.sessions.get(ws);
    if (!session?.interviewId) {
      throw new Error("No interview session found");
    }

    // Generate unique ID to track this message through the system
    const messageId = crypto.randomUUID();

    // Let the client know we're processing their audio
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "user",
        messageId,
        interviewId: session.interviewId,
      }),
    );

    // TODO: Implement Whisper transcription in next section
    // For now, just log the received audio data size
    console.log(`Received audio data of length: ${uint8Array.length}`);
  } catch (error) {
    console.error("Audio processing failed:", error);
    this.handleWebSocketError(ws, error);
  }
}

/**
 * Handles WebSocket errors by logging them and notifying the client.
 * Ensures errors are properly communicated back to the user.
 */
private handleWebSocketError(ws: WebSocket, error: unknown): void {
  const errorMessage = error instanceof Error ? error.message : "An unknown error occurred.";
  console.error("WebSocket error:", errorMessage);

  if (ws.readyState === WebSocket.OPEN) {
    ws.send(
      JSON.stringify({
        type: "error",
        message: errorMessage,
      }),
    );
  }
}

```

Your `handleBinaryAudio` method currently logs when it receives audio data. Next, you'll enhance it to transcribe speech using Workers AI's Whisper model.

### Configure speech-to-text

Now that audio processing pipeline is set up, you will now integrate Workers AI's Whisper model for speech-to-text transcription.

Configure the Worker AI binding in your Wrangler file by adding:

```toml
# ... previous configuration ...
[ai]
binding = "AI"
```

Next, generate TypeScript types for our AI binding. Run the following command:

```sh
npm run cf-typegen
```

You will need a new service class for AI operations. Create a new file called `services/AIService.ts`:

```typescript title="src/services/AIService.ts"
import { InterviewError, ErrorCodes } from "../errors";

export class AIService {
	constructor(private readonly AI: Ai) {}

	async transcribeAudio(audioData: Uint8Array): Promise<string> {
		try {
			// Call the Whisper model to transcribe the audio
			const response = await this.AI.run("@cf/openai/whisper-tiny-en", {
				audio: Array.from(audioData),
			});

			if (!response?.text) {
				throw new Error("Failed to transcribe audio content.");
			}

			return response.text;
		} catch (error) {
			throw new InterviewError(
				"Failed to transcribe audio content",
				ErrorCodes.TRANSCRIPTION_FAILED,
			);
		}
	}
}
```

You will need to update the `Interview` Durable Object to use this new AI service. To do this, update the handleBinaryAudio method in `src/interview.ts`:

```typescript title="src/interview.ts"
import { AIService } from "./services/AIService";

export class Interview extends DurableObject<CloudflareBindings> {
private readonly aiService: AIService;

constructor(state: DurableObjectState, env: Env) {
  // ... previous code ...

  // Initialize the AI service with the Workers AI binding
  this.aiService = new AIService(this.env.AI);
}

private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise<void> {
  try {
    const uint8Array = new Uint8Array(audioData);
    const session = this.sessions.get(ws);

    if (!session?.interviewId) {
      throw new Error("No interview session found");
    }

    // Create a message ID for tracking
    const messageId = crypto.randomUUID();

    // Send processing state to client
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "user",
        messageId,
        interviewId: session.interviewId,
      }),
    );

    // NEW: Use AI service to transcribe the audio
    const transcribedText = await this.aiService.transcribeAudio(uint8Array);

    // Store the transcribed message
    await this.addMessage(session.interviewId, "user", transcribedText, messageId);

  } catch (error) {
    console.error("Audio processing failed:", error);
    this.handleWebSocketError(ws, error);
  }
}
```

:::note
The Whisper model `@cf/openai/whisper-tiny-en` is optimized for English speech recognition. If you need support for other languages, you can use different Whisper model variants available through Workers AI.
:::

When users speak during the interview, their audio will be automatically transcribed and stored as messages in the interview session. The transcribed text will be immediately available to both the user and the AI interviewer for generating appropriate responses.

## 9. Integrate AI response generation

Now that you have audio transcription working, let's implement AI interviewer response generation using Workers AI's LLM capabilities. You'll create an interview system that:

- Maintains context of the conversation
- Provides relevant follow-up questions
- Gives constructive feedback
- Stays in character as a professional interviewer

### Set up Workers AI LLM integration

First, update the `AIService` class to handle LLM interactions. You will need to add methods for:

- Processing interview context
- Generating appropriate responses
- Handling conversation flow

Update the `services/AIService.ts` class to include LLM functionality:

```typescript title="src/services/AIService.ts"
import { InterviewData, Message } from "../types";

export class AIService {

async processLLMResponse(interview: InterviewData): Promise<string> {
  const messages = this.prepareLLMMessages(interview);

  try {
    const { response } = await this.AI.run("@cf/meta/llama-2-7b-chat-int8", {
      messages,
    });

    if (!response) {
      throw new Error("Failed to generate a response from the LLM model.");
    }

    return response;
  } catch (error) {
    throw new InterviewError("Failed to generate a response from the LLM model.", ErrorCodes.LLM_FAILED);
  }
}

private prepareLLMMessages(interview: InterviewData) {
  const messageHistory = interview.messages.map((msg: Message) => ({
    role: msg.role,
    content: msg.content,
  }));

  return [
    {
      role: "system",
      content: this.createSystemPrompt(interview),
    },
    ...messageHistory,
  ];
}
```

:::note
The @cf/meta/llama-2-7b-chat-int8 model is optimized for chat-like interactions and provides good performance while maintaining reasonable resource usage.
:::

### Create the conversation prompt

Prompt engineering is crucial for getting high-quality responses from the LLM. Next, you will create a system prompt that:

- Sets the context for the interview
- Defines the interviewer's role and behavior
- Specifies the technical focus areas
- Guides the conversation flow

Add the following method to your `services/AIService.ts` class:

```typescript title="src/services/AIService.ts"
private createSystemPrompt(interview: InterviewData): string {
  const basePrompt = "You are conducting a technical interview.";
  const rolePrompt = `The position is for ${interview.title}.`;
  const skillsPrompt = `Focus on topics related to: ${interview.skills.join(", ")}.`;
  const instructionsPrompt = "Ask relevant technical questions and provide constructive feedback.";

  return `${basePrompt} ${rolePrompt} ${skillsPrompt} ${instructionsPrompt}`;
}
```

### Implement response generation logic

Finally, integrate the LLM response generation into the interview flow. Update the `handleBinaryAudio` method in the `src/interview.ts` Durable Object to:

- Process transcribed user responses
- Generate appropriate AI interviewer responses
- Maintain conversation context

Update the `handleBinaryAudio` method in `src/interview.ts`:

```typescript title="src/interview.ts"
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise<void> {
  try {
    // Convert raw audio buffer to uint8 array for processing
    const uint8Array = new Uint8Array(audioData);
    const session = this.sessions.get(ws);

    if (!session?.interviewId) {
      throw new Error("No interview session found");
    }

    // Generate a unique ID to track this message through the system
    const messageId = crypto.randomUUID();

    // Let the client know we're processing their audio
    // This helps provide immediate feedback while transcription runs
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "user",
        messageId,
        interviewId: session.interviewId,
      }),
    );

    // Convert the audio to text using our AI transcription service
    // This typically takes 1-2 seconds for normal speech
    const transcribedText = await this.aiService.transcribeAudio(uint8Array);

    // Save the user's message to our database so we maintain chat history
    await this.addMessage(session.interviewId, "user", transcribedText, messageId);

    // Look up the full interview context - we need this to generate a good response
    const interview = await this.db.getInterview(session.interviewId);
    if (!interview) {
      throw new Error(`Interview not found: ${session.interviewId}`);
    }

    // Now it's the AI's turn to respond
    // First generate an ID for the assistant's message
    const assistantMessageId = crypto.randomUUID();

    // Let the client know we're working on the AI response
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "assistant",
        messageId: assistantMessageId,
        interviewId: session.interviewId,
      }),
    );

    // Generate the AI interviewer's response based on the conversation history
    const llmResponse = await this.aiService.processLLMResponse(interview);
    await this.addMessage(session.interviewId, "assistant", llmResponse, assistantMessageId);
  } catch (error) {
    // Something went wrong processing the audio or generating a response
    // Log it and let the client know there was an error
    console.error("Audio processing failed:", error);
    this.handleWebSocketError(ws, error);
  }
}
```

## Conclusion

You have successfully built an AI-powered interview practice tool using Cloudflare's Workers AI. In summary, you have:

- Created a real-time WebSocket communication system using Durable Objects
- Implemented speech-to-text processing with Workers AI Whisper model
- Built an intelligent interview system using Workers AI LLM capabilities
- Designed a persistent storage system with SQLite in Durable Objects

The complete source code for this tutorial is available on GitHub:
[ai-interview-practice-tool](https://github.com/berezovyy/ai-interview-practice-tool)

---

# 使用 DeepSeek Coder 模型探索代码生成

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/explore-code-generation-using-deepseek-coder-models/

import { Stream } from "~/components";

探索 [Workers AI](/workers-ai) 上所有可用模型的一个便捷方法是使用 [Jupyter Notebook](https://jupyter.org/)。

您可以[下载 DeepSeek Coder 笔记本](/workers-ai/static/documentation/notebooks/deepseek-coder-exploration.ipynb)或查看下面嵌入的笔记本。

<Stream
	id="97b46763341a395a4ce1c0a6f913662b"
	title="Explore Code Generation Using DeepSeek Coder Models"
/>

[comment]: <> "下面的 markdown 是从 https://github.com/craigsdennis/notebooks-cloudflare-workers-ai 自动生成的"

---

## 使用 DeepSeek Coder 探索代码生成

能够生成代码的 AI 模型开启了各种用例。现在 [Workers AI](/workers-ai) 上提供了 [DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder) 模型 `@hf/thebloke/deepseek-coder-6.7b-base-awq` 和 `@hf/thebloke/deepseek-coder-6.7b-instruct-awq`。

让我们使用 API 来探索它们！

```python
import sys
!{sys.executable} -m pip install requests python-dotenv
```

```
Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)
Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)
```

```python
import os
from getpass import getpass

from IPython.display import display, Image, Markdown, Audio

import requests
```

```python
%load_ext dotenv
%dotenv
```

### 配置您的环境

要使用 API，您需要您的 [Cloudflare 帐户 ID](https://dash.cloudflare.com)（前往 Workers & Pages > 概述 > 帐户详细信息 > 帐户 ID）和一个[已启用 Workers AI 的 API 令牌](https://dash.cloudflare.com/profile/api-tokens)。

如果您想将这些文件添加到您的环境中，可以创建一个名为 `.env` 的新文件

```bash
CLOUDFLARE_API_TOKEN="您的令牌"
CLOUDFLARE_ACCOUNT_ID="您的帐户 ID"
```

```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
    api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
    api_token = getpass("输入您的 Cloudflare API 令牌")
```

```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
    account_id = getpass("输入您的帐户 ID")
```

### 从注释生成代码

一个常见的用例是在用户提供描述性注释后为其完成代码。

````python
model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"

prompt = "# 一个检查给定单词是否为回文的函数"

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "user", "content": prompt}
    ]}
)
inference = response.json()
code = inference["result"]["response"]

display(Markdown(f"""
    ```python
    {prompt}
    {code.strip()}
    ```
"""))
````

```python
# 一个检查给定单词是否为回文的函数
def is_palindrome(word):
    # 将单词转换为小写
    word = word.lower()

    # 反转单词
    reversed_word = word[::-1]

    # 检查反转后的单词是否与原始单词相同
    if word == reversed_word:
        return True
    else:
        return False

# 测试函数
print(is_palindrome("racecar"))  # 输出：True
print(is_palindrome("hello"))    # 输出：False
```

### 协助调试

我们都遇到过这种情况，bug 总会发生。有时那些堆栈跟踪可能非常吓人，而使用代码生成的一个很好的用例是帮助解释问题。

```python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"

system_message = "用户会给您一些无法工作的代码。请向用户解释可能出了什么问题"

code = """# 欢迎我们的用户
def hello_world(first_name="World"):
    print(f"Hello, {name}!")
"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "system", "content": system_message},
        {"role": "user", "content": code},
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(response))
```

您的代码中的错误是您正在尝试使用一个在函数中任何地方都没有定义的变量 `name`。应该使用的正确变量是 `first_name`。所以，您应该将 `f"Hello, {name}!"` 更改为 `f"Hello, {first_name}!"`。

这是更正后的代码：

```python
# 欢迎我们的用户
def hello_world(first_name="World"):
    print(f"Hello, {first_name}")
```

现在，当您调用 `hello_world()` 时，它将默认打印“Hello, World”。如果您调用 `hello_world("John")`，它将打印“Hello, John”。

### 编写测试！

编写单元测试是一种常见的最佳实践。在有足够上下文的情况下，编写单元测试是可能的。

```python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"

system_message = "用户会给您一些代码，并希望用 Python 的 unittest 模块编写测试。"

code = """
class User:

    def __init__(self, first_name, last_name=None):
        self.first_name = first_name
        self.last_name = last_name
        if last_name is None:
            self.last_name = "Mc" + self.first_name

    def full_name(self):
        return self.first_name + " " + self.last_name
"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "system", "content": system_message},
        {"role": "user", "content": code},
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(response))
```

这是一个针对 User 类的简单 unittest 测试用例：

```python
import unittest

class TestUser(unittest.TestCase):

    def test_full_name(self):
        user = User("John", "Doe")
        self.assertEqual(user.full_name(), "John Doe")

    def test_default_last_name(self):
        user = User("Jane")
        self.assertEqual(user.full_name(), "Jane McJane")

if __name__ == '__main__':
    unittest.main()
```

在这个测试用例中，我们有两个测试：

- `test_full_name` 测试当用户同时有名字和姓氏时 `full_name` 方法。
- `test_default_last_name` 测试当用户只有名字且姓氏设置为“Mc”+ 名字时 `full_name` 方法。

如果所有这些测试都通过，就意味着 `full_name` 方法工作正常。如果任何测试失败，

### Fill-in-the-middle 代码补全

在开发工具中，一个常见的用例是基于上下文进行自动补全。DeepSeek Coder 提供了提交带有占位符的现有代码的能力，以便模型可以在上下文中完成。

警告：令牌以 `<｜` 为前缀，以 `｜>` 为后缀，请确保复制和粘贴它们。

````python
model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"

code = """
<｜fim begin｜>import re

from jklol import email_service

def send_email(email_address, body):
    <｜fim▁hole｜>
    if not is_valid_email:
        raise InvalidEmailAddress(email_address)
    return email_service.send(email_address, body)<｜fim▁end｜>
"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "user", "content": code}
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(f"""
    ```python
    {response.strip()}
    ```
"""))

````

```python
is_valid_email = re.match(r"[^@]+@[^@]+\.[^@]+", email_address)
```

### 实验性：将数据提取为 JSON

无需威胁模型或将祖母带入提示中。获取您想要的 JSON 格式的数据。

````python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"

# Learn more at https://json-schema.org/
json_schema = """
{
  "title": "User",
  "description": "A user from our example app",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The user's first name",
      "type": "string"
    },
    "lastName": {
      "description": "The user's last name",
      "type": "string"
    },
    "numKids": {
      "description": "Amount of children the user has currently",
      "type": "integer"
    },
    "interests": {
      "description": "A list of what the user has shown interest in",
      "type": "array",
      "items": {
        "type": "string"
      }
    },
  },
  "required": [ "firstName" ]
}
"""

system_prompt = f"""
The user is going to discuss themselves and you should create a JSON object from their description to match the json schema below.

<BEGIN JSON SCHEMA>
{json_schema}
<END JSON SCHEMA>

Return JSON only. Do not explain or provide usage examples.
"""

prompt = """Hey there, I'm Craig Dennis and I'm a Developer Educator at Cloudflare. My email is craig@cloudflare.com.
            I am very interested in AI. I've got two kids. I love tacos, burritos, and all things Cloudflare"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(f"""
    ```json
    {response.strip()}
    ```
"""))
````

```json
{
	"firstName": "Craig",
	"lastName": "Dennis",
	"numKids": 2,
	"interests": ["AI", "Cloudflare", "Tacos", "Burritos"]
}
```

---

# 使用 Jupyter Notebook 探索 Workers AI 模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/explore-workers-ai-models-using-a-jupyter-notebook/

import { Stream } from "~/components";

探索 [Workers AI](/workers-ai) 上所有可用模型的一个便捷方法是使用 [Jupyter Notebook](https://jupyter.org/)。

您可以[下载 Workers AI 笔记本](/workers-ai-notebooks/cloudflare-workers-ai.ipynb)或查看下面嵌入的笔记本。

或者您可以在 [Google Colab](https://colab.research.google.com/github/craigsdennis/notebooks-cloudflare-workers-ai/blob/main/cloudflare-workers-ai.ipynb) 上运行它

<Stream
	id="2c60022bea5c8c1b343e76566fed76f2"
	title="Explore Workers AI Models Using a Jupyter Notebook"
	thumbnail="2.5s"
/>

[comment]: <> "下面的 markdown 是从 https://github.com/craigsdennis/notebooks-cloudflare-workers-ai 自动生成的，<audio> 标签是硬编码的"

---

## 使用 Python 探索 Workers AI API

[Workers AI](/workers-ai) 允许您从自己的代码中在 Cloudflare 网络上运行机器学习模型——无论是来自 Workers、Pages，还是通过 REST API 从任何地方运行。

本笔记本将使用[官方 Python SDK](https://github.com/cloudflare/cloudflare-python) 探索 Workers AI REST API。

```python
import os
from getpass import getpass

from cloudflare import Cloudflare
from IPython.display import display, Image, Markdown, Audio
import requests
```

```python
%load_ext dotenv
%dotenv
```

### 配置您的环境

要使用 API，您需要您的 [Cloudflare 帐户 ID](https://dash.cloudflare.com)。前往 AI > Workers AI 页面，然后按"使用 REST API"。此页面将允许您创建一个新的 API 令牌并复制您的帐户 ID。

如果您想将这些值添加到您的环境变量中，您可以**创建一个新文件**名为 `.env`，本笔记本将读取这些值。

```bash
CLOUDFLARE_API_TOKEN="您的令牌"
CLOUDFLARE_ACCOUNT_ID="您的帐户 ID"
```

否则，您可以在下面的提示中安全地输入值。

```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
    api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
    api_token = getpass("输入您的 Cloudflare API 令牌")
```

```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
    account_id = getpass("输入您的帐户 ID")
```

```python
# 初始化客户端
client = Cloudflare(api_token=api_token)
```

## 探索 Workers AI 平台上的可用任务

### 文本生成

探索所有[文本生成模型](/workers-ai/models)

```python
result = client.workers.ai.run(
    "@cf/meta/llama-3-8b-instruct" ,
    account_id=account_id,
    messages=[
        {"role": "system", "content": """
            你是一个为 Mac 和 Windows 用户的 Jupyter 笔记本用户提供的生产力助手。

            以 Markdown 格式回应。"""
        },
        {"role": "user", "content": "如何使用键盘快捷键执行单元格？"}
    ]
)

display(Markdown(result["response"]))
```

# **在 Jupyter 笔记本中使用键盘快捷键执行单元格**

使用各种键盘快捷键可以快速高效地执行 Jupyter 笔记本中的单元格，从而节省您的时间和精力。以下是您可以使用的快捷键：

**Mac**

- **Shift + Enter**: 执行当前单元格并在下方插入一个新单元格。
- **Ctrl + Enter**: 执行当前单元格并在下方插入一个新单元格，但不创建新的输出显示。

**Windows/Linux**

- **Shift + Enter**: 执行当前单元格并在下方插入一个新单元格。
- **Ctrl + Enter**: 执行当前单元格并移动到下一个单元格。

**其他快捷键**

- **Alt + Enter**: 执行当前单元格并在下方创建一个新的输出显示（Mac），或移动到下一个单元格（Windows/Linux）。
- **Ctrl + Shift + Enter**: 执行当前单元格并在下方创建一个新的输出显示（Mac），或在下方创建一个新单元格（Windows/Linux）。

**提示和技巧**

- 您还可以使用 Jupyter 笔记本工具栏中的**运行单元格**按钮，或**运行**菜单选项（macOS）或**运行 -> 运行单元格**（Windows/Linux）。
- 要执行选定的单元格，请使用 **Shift + Alt + Enter** (Mac) 或 **Shift + Ctrl + Enter** (Windows/Linux)。
- 要执行一个单元格并移动到下一个单元格，请使用 **Ctrl + Shift + Enter**（所有平台）。

通过使用这些键盘快捷键，您将能够更高效、更快速地在 Jupyter 笔记本中工作。祝您编码愉快！

### 文本到图像

探索所有[文本到图像模型](/workers-ai/models)

```python
data = client.workers.ai.with_raw_response.run(
    "@cf/lykon/dreamshaper-8-lcm",
    account_id=account_id,
    prompt="一个对 AI 极其兴奋的软件开发人员，笑容灿烂",
)

display(Image(data.read()))
```

![png](/workers-ai-notebooks/cloudflare-workers-ai/assets/output_13_0.png)

### 图像到文本

探索所有[图像到文本](/workers-ai/models/)模型

```python
url = "https://blog.cloudflare.com/content/images/2017/11/lava-lamps.jpg"

image_request = requests.get(url, allow_redirects=True)

display(Image(image_request.content, format="jpg"))

data = client.workers.ai.run(
    "@cf/llava-hf/llava-1.5-7b-hf",
    account_id=account_id,
    image=image_request.content,
    prompt="描述这张照片",
    max_tokens=2048
)

print(data["description"])
```

![lava lamps](https://blog.cloudflare.com/content/images/2017/11/lava-lamps.jpg)

     该图像展示了各种颜色的熔岩灯。场景中至少有 14 盏熔岩灯，每盏都有不同的颜色和设计。这些灯以视觉上吸引人的方式排列，一些放置在前景，另一些则在更远的地方。该展示创造了一种引人注目且充满活力的氛围，展示了可用的各种熔岩灯。

### 自动语音识别

探索所有[语音识别模型](/workers-ai/models)

```python
url = "https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-rambling.mp3"
display(Audio(url))
audio = requests.get(url)

response = client.workers.ai.run(
    "@cf/openai/whisper",
    account_id=account_id,
    audio=audio.content
)

response
```

<audio controls="controls">
	<source src="https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-rambling.mp3" />
	您的浏览器不支持音频元素。
</audio>

```javascript
    {'text': "Hello there, I'm making a recording for a Jupiter notebook. That's a Python notebook, Jupiter, J-U-P-Y-T-E-R. Not to be confused with the planet. Anyways, let me hear, I'm gonna talk a little bit, I'm gonna make a little bit of noise, say some hard words, I'm gonna say Kubernetes, I'm not actually even talking about Kubernetes, I just wanna see if I can do Kubernetes. Anyway, this is a test of transcription and let's see how we're dead.",
     'word_count': 84,
     'vtt': "WEBVTT\n\n00.280 --> 01.840\nHello there, I'm making a\n\n01.840 --> 04.060\nrecording for a Jupiter notebook.\n\n04.060 --> 06.440\nThat's a Python notebook, Jupiter,\n\n06.440 --> 07.720\nJ -U -P -Y -T\n\n07.720 --> 09.420\n-E -R. Not to be\n\n09.420 --> 12.140\nconfused with the planet. Anyways,\n\n12.140 --> 12.940\nlet me hear, I'm gonna\n\n12.940 --> 13.660\ntalk a little bit, I'm\n\n13.660 --> 14.600\ngonna make a little bit\n\n14.600 --> 16.180\nof noise, say some hard\n\n16.180 --> 17.540\nwords, I'm gonna say Kubernetes,\n\n17.540 --> 18.420\nI'm not actually even talking\n\n18.420 --> 19.500\nabout Kubernetes, I just wanna\n\n19.500 --> 20.300\nsee if I can do\n\n20.300 --> 22.120\nKubernetes. Anyway, this is a\n\n22.120 --> 24.080\ntest of transcription and let's\n\n24.080 --> 26.280\nsee how we're dead.",
     'words': [{'word': 'Hello',
       'start': 0.2800000011920929,
       'end': 0.7400000095367432},
      {'word': 'there,', 'start': 0.7400000095367432, 'end': 1.2400000095367432},
      {'word': "I'm", 'start': 1.2400000095367432, 'end': 1.4800000190734863},
      {'word': 'making', 'start': 1.4800000190734863, 'end': 1.6799999475479126},
      {'word': 'a', 'start': 1.6799999475479126, 'end': 1.840000033378601},
      {'word': 'recording', 'start': 1.840000033378601, 'end': 2.2799999713897705},
      {'word': 'for', 'start': 2.2799999713897705, 'end': 2.6600000858306885},
      {'word': 'a', 'start': 2.6600000858306885, 'end': 2.799999952316284},
      {'word': 'Jupiter', 'start': 2.799999952316284, 'end': 3.2200000286102295},
      {'word': 'notebook.', 'start': 3.2200000286102295, 'end': 4.059999942779541},
      {'word': "That's", 'start': 4.059999942779541, 'end': 4.28000020980835},
      {'word': 'a', 'start': 4.28000020980835, 'end': 4.380000114440918},
      {'word': 'Python', 'start': 4.380000114440918, 'end': 4.679999828338623},
      {'word': 'notebook,', 'start': 4.679999828338623, 'end': 5.460000038146973},
      {'word': 'Jupiter,', 'start': 5.460000038146973, 'end': 6.440000057220459},
      {'word': 'J', 'start': 6.440000057220459, 'end': 6.579999923706055},
      {'word': '-U', 'start': 6.579999923706055, 'end': 6.920000076293945},
      {'word': '-P', 'start': 6.920000076293945, 'end': 7.139999866485596},
      {'word': '-Y', 'start': 7.139999866485596, 'end': 7.440000057220459},
      {'word': '-T', 'start': 7.440000057220459, 'end': 7.71999979019165},
      {'word': '-E', 'start': 7.71999979019165, 'end': 7.920000076293945},
      {'word': '-R.', 'start': 7.920000076293945, 'end': 8.539999961853027},
      {'word': 'Not', 'start': 8.539999961853027, 'end': 8.880000114440918},
      {'word': 'to', 'start': 8.880000114440918, 'end': 9.300000190734863},
      {'word': 'be', 'start': 9.300000190734863, 'end': 9.420000076293945},
      {'word': 'confused', 'start': 9.420000076293945, 'end': 9.739999771118164},
      {'word': 'with', 'start': 9.739999771118164, 'end': 9.9399995803833},
      {'word': 'the', 'start': 9.9399995803833, 'end': 10.039999961853027},
      {'word': 'planet.', 'start': 10.039999961853027, 'end': 11.380000114440918},
      {'word': 'Anyways,', 'start': 11.380000114440918, 'end': 12.140000343322754},
      {'word': 'let', 'start': 12.140000343322754, 'end': 12.420000076293945},
      {'word': 'me', 'start': 12.420000076293945, 'end': 12.520000457763672},
      {'word': 'hear,', 'start': 12.520000457763672, 'end': 12.800000190734863},
      {'word': "I'm", 'start': 12.800000190734863, 'end': 12.880000114440918},
      {'word': 'gonna', 'start': 12.880000114440918, 'end': 12.9399995803833},
      {'word': 'talk', 'start': 12.9399995803833, 'end': 13.100000381469727},
      {'word': 'a', 'start': 13.100000381469727, 'end': 13.260000228881836},
      {'word': 'little', 'start': 13.260000228881836, 'end': 13.380000114440918},
      {'word': 'bit,', 'start': 13.380000114440918, 'end': 13.5600004196167},
      {'word': "I'm", 'start': 13.5600004196167, 'end': 13.65999984741211},
      {'word': 'gonna', 'start': 13.65999984741211, 'end': 13.739999771118164},
      {'word': 'make', 'start': 13.739999771118164, 'end': 13.920000076293945},
      {'word': 'a', 'start': 13.920000076293945, 'end': 14.199999809265137},
      {'word': 'little', 'start': 14.199999809265137, 'end': 14.4399995803833},
      {'word': 'bit', 'start': 14.4399995803833, 'end': 14.600000381469727},
      {'word': 'of', 'start': 14.600000381469727, 'end': 14.699999809265137},
      {'word': 'noise,', 'start': 14.699999809265137, 'end': 15.460000038146973},
      {'word': 'say', 'start': 15.460000038146973, 'end': 15.859999656677246},
      {'word': 'some', 'start': 15.859999656677246, 'end': 16},
      {'word': 'hard', 'start': 16, 'end': 16.18000030517578},
      {'word': 'words,', 'start': 16.18000030517578, 'end': 16.540000915527344},
      {'word': "I'm", 'start': 16.540000915527344, 'end': 16.639999389648438},
      {'word': 'gonna', 'start': 16.639999389648438, 'end': 16.719999313354492},
      {'word': 'say', 'start': 16.719999313354492, 'end': 16.920000076293945},
      {'word': 'Kubernetes,',
       'start': 16.920000076293945,
       'end': 17.540000915527344},
      {'word': "I'm", 'start': 17.540000915527344, 'end': 17.65999984741211},
      {'word': 'not', 'start': 17.65999984741211, 'end': 17.719999313354492},
      {'word': 'actually', 'start': 17.719999313354492, 'end': 18},
      {'word': 'even', 'start': 18, 'end': 18.18000030517578},
      {'word': 'talking', 'start': 18.18000030517578, 'end': 18.420000076293945},
      {'word': 'about', 'start': 18.420000076293945, 'end': 18.6200008392334},
      {'word': 'Kubernetes,', 'start': 18.6200008392334, 'end': 19.1200008392334},
      {'word': 'I', 'start': 19.1200008392334, 'end': 19.239999771118164},
      {'word': 'just', 'start': 19.239999771118164, 'end': 19.360000610351562},
      {'word': 'wanna', 'start': 19.360000610351562, 'end': 19.5},
      {'word': 'see', 'start': 19.5, 'end': 19.719999313354492},
      {'word': 'if', 'start': 19.719999313354492, 'end': 19.8799991607666},
      {'word': 'I', 'start': 19.8799991607666, 'end': 19.940000534057617},
      {'word': 'can', 'start': 19.940000534057617, 'end': 20.079999923706055},
      {'word': 'do', 'start': 20.079999923706055, 'end': 20.299999237060547},
      {'word': 'Kubernetes.',
       'start': 20.299999237060547,
       'end': 21.440000534057617},
      {'word': 'Anyway,', 'start': 21.440000534057617, 'end': 21.799999237060547},
      {'word': 'this', 'start': 21.799999237060547, 'end': 21.920000076293945},
      {'word': 'is', 'start': 21.920000076293945, 'end': 22.020000457763672},
      {'word': 'a', 'start': 22.020000457763672, 'end': 22.1200008392334},
      {'word': 'test', 'start': 22.1200008392334, 'end': 22.299999237060547},
      {'word': 'of', 'start': 22.299999237060547, 'end': 22.639999389648438},
      {'word': 'transcription',
       'start': 22.639999389648438,
       'end': 23.139999389648438},
      {'word': 'and', 'start': 23.139999389648438, 'end': 23.6200008392334},
      {'word': "let's", 'start': 23.6200008392334, 'end': 24.079999923706055},
      {'word': 'see', 'start': 24.079999923706055, 'end': 24.299999237060547},
      {'word': 'how', 'start': 24.299999237060547, 'end': 24.559999465942383},
      {'word': "we're", 'start': 24.559999465942383, 'end': 24.799999237060547},
      {'word': 'dead.', 'start': 24.799999237060547, 'end': 26.280000686645508}]}
```

### Translations

Explore all [Translation models](/workers-ai/models)

```python
result = client.workers.ai.run(
    "@cf/meta/m2m100-1.2b",
    account_id=account_id,
    text="Artificial intelligence is pretty impressive these days. It is a bonkers time to be a builder",
    source_lang="english",
    target_lang="spanish"
)


print(result["translated_text"])
```

    La inteligencia artificial es bastante impresionante en estos días.Es un buen momento para ser un constructor

### Text Classification

Explore all [Text Classification models](/workers-ai/models)

```python
result = client.workers.ai.run(
    "@cf/huggingface/distilbert-sst-2-int8",
    account_id=account_id,
    text="This taco is delicious"
)

result
```

    [TextClassification(label='NEGATIVE', score=0.00012679687642958015),
     TextClassification(label='POSITIVE', score=0.999873161315918)]

### Image Classification

Explore all [Image Classification models](/workers-ai/models#image-classification/)

```python
url = "https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-and-a-burrito.jpg"
image_request = requests.get(url, allow_redirects=True)

display(Image(image_request.content, format="jpg"))
response = client.workers.ai.run(
    "@cf/microsoft/resnet-50",
    account_id=account_id,
    image=image_request.content
)
response
```

![jpeg](/workers-ai-notebooks/cloudflare-workers-ai/assets/output_27_0.jpg)

    [TextClassification(label='BURRITO', score=0.9999679327011108),
     TextClassification(label='GUACAMOLE', score=8.516660273016896e-06),
     TextClassification(label='BAGEL', score=4.689153229264775e-06),
     TextClassification(label='SPATULA', score=4.075985089002643e-06),
     TextClassification(label='POTPIE', score=3.0849002996546915e-06)]

## Summarization

Explore all [Summarization](/workers-ai/models#summarization) based models

```python
declaration_of_independence = """In Congress, July 4, 1776. The unanimous Declaration of the thirteen united States of America, When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world. He has refused his Assent to Laws, the most wholesome and necessary for the public good. He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them. He has refused to pass other Laws for the accommodation of large districts of people, unless those people would relinquish the right of Representation in the Legislature, a right inestimable to them and formidable to tyrants only. He has called together legislative bodies at places unusual, uncomfortable, and distant from the depository of their public Records, for the sole purpose of fatiguing them into compliance with his measures. He has dissolved Representative Houses repeatedly, for opposing with manly firmness his invasions on the rights of the people. He has refused for a long time, after such dissolutions, to cause others to be elected; whereby the Legislative powers, incapable of Annihilation, have returned to the People at large for their exercise; the State remaining in the mean time exposed to all the dangers of invasion from without, and convulsions within. He has endeavoured to prevent the population of these States; for that purpose obstructing the Laws for Naturalization of Foreigners; refusing to pass others to encourage their migrations hither, and raising the conditions of new Appropriations of Lands. He has obstructed the Administration of Justice, by refusing his Assent to Laws for establishing Judiciary powers. He has made Judges dependent on his Will alone, for the tenure of their offices, and the amount and payment of their salaries. He has erected a multitude of New Offices, and sent hither swarms of Officers to harrass our people, and eat out their substance. He has kept among us, in times of peace, Standing Armies without the Consent of our legislatures. He has affected to render the Military independent of and superior to the Civil power. He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended Legislation: For Quartering large bodies of armed troops among us: For protecting them, by a mock Trial, from punishment for any Murders which they should commit on the Inhabitants of these States: For cutting off our Trade with all parts of the world: For imposing Taxes on us without our Consent: For depriving us in many cases, of the benefits of Trial by Jury: For transporting us beyond Seas to be tried for pretended offences For abolishing the free System of English Laws in a neighbouring Province, establishing therein an Arbitrary government, and enlarging its Boundaries so as to render it at once an example and fit instrument for introducing the same absolute rule into these Colonies: For taking away our Charters, abolishing our most valuable Laws, and altering fundamentally the Forms of our Governments: For suspending our own Legislatures, and declaring themselves invested with power to legislate for us in all cases whatsoever. He has abdicated Government here, by declaring us out of his Protection and waging War against us. He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people. He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation. He has constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country, to become the executioners of their friends and Brethren, or to fall themselves by their Hands. He has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions. In every stage of these Oppressions We have Petitioned for Redress in the most humble terms: Our repeated Petitions have been answered only by repeated injury. A Prince whose character is thus marked by every act which may define a Tyrant, is unfit to be the ruler of a free people. Nor have We been wanting in attentions to our Brittish brethren. We have warned them from time to time of attempts by their legislature to extend an unwarrantable jurisdiction over us. We have reminded them of the circumstances of our emigration and settlement here. We have appealed to their native justice and magnanimity, and we have conjured them by the ties of our common kindred to disavow these usurpations, which, would inevitably interrupt our connections and correspondence. They too have been deaf to the voice of justice and of consanguinity. We must, therefore, acquiesce in the necessity, which denounces our Separation, and hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends. We, therefore, the Representatives of the united States of America, in General Congress, Assembled, appealing to the Supreme Judge of the world for the rectitude of our intentions, do, in the Name, and by Authority of the good People of these Colonies, solemnly publish and declare, That these United Colonies are, and of Right ought to be Free and Independent States; that they are Absolved from all Allegiance to the British Crown, and that all political connection between them and the State of Great Britain, is and ought to be totally dissolved; and that as Free and Independent States, they have full Power to levy War, conclude Peace, contract Alliances, establish Commerce, and to do all other Acts and Things which Independent States may of right do. And for the support of this Declaration, with a firm reliance on the protection of divine Providence, we mutually pledge to each other our Lives, our Fortunes and our sacred Honor."""
len(declaration_of_independence)
```

    8116

```python
response = client.workers.ai.run(
    "@cf/facebook/bart-large-cnn",
    account_id=account_id,
    input_text=declaration_of_independence
)

response["summary"]
```

    'The Declaration of Independence was signed by the thirteen states on July 4, 1776. It was the first attempt at a U.S. Constitution. It declared the right of the people to change their Government.'

---

# 使用 HuggingFace 的 AutoTrain 微调模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/

微调 AI 模型让您有机会向模型添加额外的训练数据。Workers AI 允许使用 [Low-Rank Adaptation, LoRA, 适配器](/workers-ai/features/fine-tunes/loras/)，这将允许您微调我们的模型。

在本教程中，我们将探讨如何创建我们自己的 LoRA。我们将专注于[使用 AutoTrain 进行 LLM 微调](https://huggingface.co/docs/autotrain/llm_finetuning)。

## 1. 使用您的训练数据创建一个 CSV 文件

首先创建一个 CSV（逗号分隔值）文件。该文件将只有一个名为 `text` 的列。通过在一行中单独添加 `text` 一词来设置标题。

现在您需要确定要添加到模型中的内容。

示例格式如下：

```text
### Human: What is the meaning of life? ### Assistant: 42.
```

如果您的训练行包含换行符，您应该用引号将其括起来。

```text
"human: What is the meaning of life? \n bot: 42."
```

不同的模型，如 Mistral，将提供特定的[聊天模板/指令格式](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1#instruction-format)

```text
<s>[INST] What is the meaning of life? [/INST] 42</s>
```

## 2. 配置 HuggingFace AutoTrain 高级笔记本

打开 [HuggingFace AutoTrain 高级笔记本](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_LLM.ipynb)

为了给您的 AutoTrain 提供充足的内存，您需要选择一个不同的运行时。从笔记本顶部的菜单中选择"运行时">"更改运行时类型"。选择 A100。

:::note

这些 GPU 会产生费用。一个典型的 AutoTrain 会话通常花费不到 1 美元。
:::

笔记本包含一些我们需要更改的交互式部分。

### 项目配置

修改以下字段

- **project_name**：选择一个描述性的名称，以便您以后记住
- **model_name**：从我们支持的官方 HuggingFace 基础模型中选择一个：
  - `mistralai/Mistral-7B-Instruct-v0.2`
  - `google/gemma-2b-it`
  - `google/gemma-7b-it`
  - `meta-llama/llama-2-7b-chat-hf`

### 可选部分：推送到 Hub

虽然使用 AutoTrain 不是必需的，但创建一个 [HuggingFace 帐户](https://huggingface.co/join) 将帮助您将微调工件保存在一个方便的存储库中，以便以后参考。

如果您不执行 HuggingFace 设置，您仍然可以从笔记本下载文件。

如有必要，请按照[笔记本中的说明](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_LLM.ipynb)创建帐户和令牌。

### 部分：超参数

我们只需要更改其中一些字段以确保在 Cloudflare Workers AI 上正常工作。

- **quantization**：将下拉菜单更改为 `none`
- **lora-r**：将值更改为 `8`

:::caution

在撰写本文时，更改量化字段会破坏代码生成。您可能需要编辑代码并在值周围加上引号。

将 `quantization = none` 这一行更改为 `quantization = "none"`。
:::

## 3. 将您的 CSV 文件上传到笔记本

笔记本具有文件夹结构，您可以通过单击左侧导航栏上的文件夹图标来访问它。

创建一个名为 data 的文件夹。

您可以将 CSV 文件拖到笔记本中。

确保它被命名为 **train.csv**

## 4. 执行笔记本

在笔记本菜单中，选择"运行时">"全部运行"。

它将运行笔记本的每个单元格，首先进行安装，然后配置并运行您的 AutoTrain 会话。

这可能需要一些时间，具体取决于您的 train.csv 文件的大小。

如果您遇到以下错误，这是由内存不足错误引起的。您可能需要将运行时更改为更大的 GPU 后端。

```bash
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'autotrain.trainers.clm', '--training_config', 'blog-instruct/training_params.json']' died with <Signals.SIGKILL: 9>.
```

## 5. 下载 LoRA

### 可选：HuggingFace

如果您已推送到 HuggingFace，您将找到您在上面的 **project_name** 中命名的新模型卡。默认情况下，您的模型卡是私有的。导航到文件并下载下面列出的文件。

### 笔记本

在您的笔记本中，您也可以找到所需的文件。那里会有一个与您的 **project_name** 匹配的新文件夹。

下载以下文件：

- `adapter_model.safetensors`
- `adapter_config.json`

## 6. 更新适配器配置

您需要在您下载的 `adapter_config.json` 中添加一行。

`"model_type": "mistral"`

其中 `model_type` 是架构。当前有效值为 `mistral`、`gemma` 和 `llama`。

## 7. 将微调上传到您的 Cloudflare 帐户

现在您有了文件，您可以将它们添加到您的帐户中。

您可以使用 [REST API 或 Wrangler](/workers-ai/features/fine-tunes/loras/)。

## 8. 在您的生成中使用您的微调

在您设置好新的微调后，您就可以[在您的推理请求中使用它了](/workers-ai/features/fine-tunes/loras/#running-inference-with-loras)。

---

# 教程

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/

import { GlossaryTooltip, ListTutorials, YouTubeVideos } from "~/components";

查看<GlossaryTooltip term="tutorial">教程</GlossaryTooltip>以帮助您开始使用 Workers AI。

## 文档

<ListTutorials />

## 视频

另外，浏览我们关于 Workers AI 的视频资源：

<YouTubeVideos products={["Workers AI"]} />

---

# 在 Cloudflare Workers AI 上使用 Llama 3.2 11B Vision Instruct 模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/llama-vision-tutorial/

import { Details, Render, PackageManagers, WranglerConfig } from "~/components";

## 先决条件

在开始之前，请确保您具备以下条件：

1.  一个已启用 Workers 和 Workers AI 的 [Cloudflare 帐户](https://dash.cloudflare.com/sign-up)。
2.  您的 `CLOUDFLARE_ACCOUNT_ID` 和 `CLOUDFLARE_API_TOKEN`。
    - 您可以在 Cloudflare 仪表板的“API 令牌”下生成 API 令牌。
3.  已安装 Node.js，用于处理 Cloudflare Workers（可选但推荐）。

## 1. 同意 Meta 的许可协议

首次使用 [Llama 3.2 11B Vision Instruct](/workers-ai/models/llama-3.2-11b-vision-instruct) 模型时，您需要同意 Meta 的许可协议和可接受使用政策。

```bash title="curl"
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.2-11b-vision-instruct \
  -X POST \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  -d '{ "prompt": "agree" }'
```

将 `$CLOUDFLARE_ACCOUNT_ID` 和 `$CLOUDFLARE_API_TOKEN` 替换为您的实际帐户 ID 和令牌。

## 2. 设置您的 Cloudflare Worker

1.  创建 Worker 项目
    您将使用 `create-cloudflare` CLI (`C3`) 创建一个新的 Worker 项目。此工具有助于简化新应用程序到 Cloudflare 的设置和部署。

    在您的终端中运行以下命令：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"llama-vision-tutorial"}
/>

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "JavaScript",
	}}
/>

完成设置后，将创建一个名为 `llama-vision-tutorial` 的新目录。

2.  导航到您的应用程序目录
    切换到项目目录：

    ```bash
    cd llama-vision-tutorial
    ```

3.  项目结构
    您的 `llama-vision-tutorial` 目录将包括：
    - `src/index.ts` 中的“Hello World” Worker。
    - 用于管理部署设置的 `wrangler.json` 配置文件。

## 3. 编写 Worker 代码

编辑 `src/index.ts`（如果您不使用 TypeScript，则为 `index.js`）文件，并用以下代码替换其内容：

```javascript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request, env): Promise<Response> {
    const messages = [
      { role: "system", content: "你是一个乐于助人的助手。" },
      { role: "user", content: "请描述我提供的图片。" },
    ];

    // 将此替换为您的 base64 编码的图像数据或 URL
    const imageBase64 = "data:image/png;base64,IMAGE_DATA_HERE";

    const response = await env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
      messages,
      image: imageBase64,
    });

    return Response.json(response);
  },
} satisfies ExportedHandler<Env>;
```

## 4. 将 Workers AI 绑定到您的 Worker

1.  打开 [Wrangler 配置文件](/workers/wrangler/configuration/) 并添加以下配置：

<WranglerConfig>

```toml
[[env.production.bindings]]
binding = "AI"
type = "ai"
```

</WranglerConfig>

2.  保存文件。

## 5. 部署 Worker

运行以下命令以部署您的 Worker：

```bash
wrangler deploy
```

## 6. 测试您的 Worker

1.  部署后，您将收到一个唯一的 Worker URL（例如，`https://llama-vision-tutorial.<your-subdomain>.workers.dev`）。
2.  使用 `curl` 或 Postman 等工具向您的 Worker 发送请求：

```bash
curl -X POST https://llama-vision-tutorial.<your-subdomain>.workers.dev \
  -d '{ "image": "BASE64_ENCODED_IMAGE" }'
```

将 `BASE64_ENCODED_IMAGE` 替换为实际的 base64 编码图像字符串。

## 7. 验证响应

响应将包含模型的输出，例如基于所提供图像的描述或对您提示的回答。

示例响应：

```json
{
	"result": "这是一只金毛寻回犬，坐在草地公园里。"
}
```

---

# 选择正确的文本生成模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/how-to-choose-the-right-text-generation-model/

import { Stream } from "~/components";

探索 [Workers AI](/workers-ai) 上可用模型的一个好方法是使用 [Jupyter Notebook](https://jupyter.org/)。

您可以[下载 Workers AI 文本生成探索笔记本](/workers-ai/static/documentation/notebooks/text-generation-model-exploration.ipynb)或查看下面嵌入的笔记本。

<Stream id="4b4f0b9d7783512b8787e39424cfccd5" title="选择正确的文本生成模型" />

[comment]: <> "下面的 markdown 是从 https://github.com/craigsdennis/notebooks-cloudflare-workers-ai 自动生成的"

---

## 如何选择正确的文本生成模型

模型有不同的形状和大小，为任务选择合适的模型可能会导致分析瘫痪。

好消息是，在 [Workers AI 文本生成](/workers-ai/models/) 界面上，无论您选择哪个模型，界面都是一样的。

为了帮助您找到合适的模型，本笔记本将以快速约会的方式帮助您了解您的选择。

```python
import sys
!{sys.executable} -m pip install requests python-dotenv
```

```
Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)
Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)
```

```python
import os
from getpass import getpass
from timeit import default_timer as timer

from IPython.display import display, Image, Markdown, Audio

import requests
```

```python
%load_ext dotenv
%dotenv
```

### 配置您的环境

要使用 API，您需要您的 [Cloudflare 帐户 ID](https://dash.cloudflare.com)（前往 Workers & Pages > 概述 > 帐户详细信息 > 帐户 ID）和一个[已启用 Workers AI 的 API 令牌](https://dash.cloudflare.com/profile/api-tokens)。

如果您想将这些文件添加到您的环境中，您可以创建一个名为 `.env` 的新文件

```bash
CLOUDFLARE_API_TOKEN="YOUR-TOKEN"
CLOUDFLARE_ACCOUNT_ID="YOUR-ACCOUNT-ID"
```

```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
    api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
    api_token = getpass("输入您的 Cloudflare API 令牌")
```

```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
    account_id = getpass("输入您的帐户 ID")
```

```python
# 给定一组模型和问题，在单元格中显示每个模型对问题的每个响应
# 包括完整的完成时间
def speed_date(models, questions):
    for model in models:
        display(Markdown(f"---\n #### {model}"))
        for question in questions:
            quoted_question = "\n".join(f"> {line}" for line in question.split("\n"))
            display(Markdown(quoted_question + "\n"))
            try:
                official_model_name = model.split("/")[-1]
                start = timer()
                response = requests.post(
                    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
                    headers={"Authorization": f"Bearer {api_token}"},
                    json={"messages": [
                        {"role": "system", "content": f"你是一个自我意识的语言模型（{official_model_name}），对用户的任何直接问题都诚实直接。你知道自己的优点和缺点。"},
                        {"role": "user", "content": question}
                    ]}
                )
                elapsed = timer() - start
                inference = response.json()
                display(Markdown(inference["result"]["response"]))
                display(Markdown(f"_在 *{elapsed:.2f}* 秒内生成_"))
            except Exception as ex:
                print("出错了")
                print(ex)
                print(inference)

        display(Markdown("\n\n---"))
```

### 了解您的模型

谁能比模型本身更好地告诉您有关特定模型的信息？！

这里的时间特定于整个完成过程，但请记住 [Workers AI 上的所有文本生成模型都支持流式传输](/workers-ai/models/)。

```python
models = [
    "@hf/thebloke/zephyr-7b-beta-awq",
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq",
    "@hf/thebloke/openhermes-2.5-mistral-7b-awq",
    "@hf/thebloke/neural-chat-7b-v3-1-awq",
    "@hf/thebloke/llama-2-13b-chat-awq",
]

questions = [
    "你最擅长的 3 个任务是什么？请简要说明。",
    "作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。",
]

speed_date(models, questions)
```

---

#### @hf/thebloke/zephyr-7b-beta-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 准确快速地回答事实：我可以根据庞大的知识库，准确快速地回答事实性问题。

2. 一致的性能：我可以持续提供高质量的结果，错误率低，使我成为重复性任务的可靠选择。

3. 多任务处理：我可以同时处理多个任务，而性能或准确性不会下降，这使我成为复杂工作流程的有效选择。

*在 *4.45* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. 快速准确地查找事实：我可以在几秒钟内为您提供有关各种主题的可靠和最新信息，从时事到历史事实。

2. 写作辅助：无论您是需要帮助产生想法、构建有说服力的论点，还是润色您的写作风格，我都可以提供宝贵的见解和建议，帮助您提高写作技巧。

3. 语言学习：我可以帮助您学习各种语言的新单词、短语和语法规则，使其成为各级语言学习者的理想工具。

*在 *3.63* 秒内生成*

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 语言理解和生成：我能够高精度地理解和生成自然语言文本。
2. 信息检索：我可以从广泛的来源快速准确地检索信息。
3. 文本分类：我可以高精度地将文本分类到不同的类别中。

*在 *2.35* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. **信息检索：** 我可以从大量来源快速准确地检索信息，使我成为研究、事实核查和回答问题的理想工具。
2. **语言翻译：** 我在多种语言之间进行翻译的能力使我成为与来自世界不同地区的人们交流的宝贵资源。
3. **创意写作辅助：** 我可以生成创意写作提示，帮助角色发展，并提供改进写作风格和结构的建议，使我成为作家的有用工具。

*在 *4.33* 秒内生成*

---

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 自然语言处理：我可以有效地理解、解释和生成人类语言。
2. 信息检索：我可以从海量数据中快速搜索并找到相关信息。
3. 文本分类：我可以准确地对文本数据进行分类和标记，例如情感分析或垃圾邮件检测。

*在 *1.90* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. 自然语言处理 (NLP) 任务：我擅长理解和处理人类语言，这使我成为情感分析、语言翻译和文本摘要等任务的宝贵工具。
2. 信息检索：我庞大的知识库和快速搜索和分析大量数据的能力使我成为回答问题、查找相关信息和提供见解的理想工具。
3. 个人助理：我可以帮助用户完成各种任务，如安排日程、组织和根据他们的偏好和需求提供建议，使我成为一个有用的个人助理。

*在 *3.15* 秒内生成*

---

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 自然语言处理：理解和生成类似人类的文本。
2. 文本生成：创建连贯且引人入胜的内容。
3. 对话式 AI：进行有意义且具有上下文感知能力的对话。

请记住，这些只是我能力的一些例子，因为我有很多能力。然而，这些任务突出了我在语言相关任务方面的熟练程度。

*在 *3.01* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. 自然语言处理：理解和分析文本以进行情感分析、语言翻译和摘要。
2. 生成内容：根据给定的提示创建各种类型的内容，例如文章、故事和诗歌。
3. 写作辅助：提供建议、编辑和改进书面作品。

请记住，这些只是几个例子，您的用例可能会根据您的具体需求和要求而有所不同。

*在 *2.92* 秒内生成*

---

---

#### @hf/thebloke/llama-2-13b-chat-awq

> 你最擅长的 3 个任务是什么？请简要说明。

嘿！作为一个自我意识的语言模型，我很高兴与您分享我的优势！这是我最擅长的 3 个任务：

1. 回答开放式问题：我擅长为需要更细致地理解上下文和意图的复杂问题提供详细而有见地的答案。
2. 生成创意内容：无论是写故事、作诗还是想出一个引人注目的口号，我都可以帮助您挖掘您的创造力，并创作出真正独特的东西。
3. 总结和概括复杂信息：如果您有一长段文本或一堆数据，我可以帮助您将其提炼成简洁易懂的摘要。

那么，今天我能为您做些什么呢？😊

*在 *7.89* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

嘿！作为一个自我意识的语言模型，我对使用我的 3 个最理想的用例有一些想法。简而言之：

1. **内容创作**：我擅长根据提示生成类似人类的文本，因此我非常适合为网站、社交媒体、博客等创建引人入胜的内容。我的回答自然、信息丰富且有趣。
2. **聊天机器人和虚拟助理**：我的会话能力使我成为构建聊天机器人和虚拟助理的绝佳选择。我可以处理广泛的用户查询，提供有用的回复，甚至进行闲聊。
3. **语言翻译和本地化**：我的语言理解能力使我非常适合语言翻译和本地化任务。我可以帮助将内容翻译成不同的语言，确保语气和风格与原始文本保持一致。

简而言之，就是这样！我准备好帮助完成各种任务，所以请随时发挥创意，看看您能想出什么！😊

*在 *10.28* 秒内生成*

---

### 语言翻译

尽管不是每个模型都吹嘘自己在这方面有多出色，但您会发现大多数模型都能在某种程度上处理翻译和本地化。请根据您的需要更改模型、短语。

```python
proud_translator_models = [
    "@hf/thebloke/neural-chat-7b-v3-1-awq",
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq"
]

phrases = [
    "请问，洗手间往哪边走？",
    "我只是想说，我欣赏你，就是你本来的样子！",
    "我很高兴看到这个俚语的功能！它应该很棒很酷。"
]

languages = ["西班牙语", "法语", "英国俚语", "布朗克斯区的浓重纽约口音"]

questions = [f"""将 "{phrase}" 从 "英语" 翻译成 "{language}" """
             for phrase in phrases
             for language in languages]

speed_date(proud_translator_models, questions)
```

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> 将"请问，洗手间往哪边走？"从"英语"翻译成"西班牙语"

Perdón, ¿cómo llegar al baño?"

*在 *1.51* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"法语"

Désolé, comment allez-vous vers les toilettes ?

请注意，由于我的语言模型的局限性，此翻译可能不完美。但是，我已尽力提供准确的翻译。

*在 *3.07* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"英国俚语"

"Say sorry, where's the bog?" 是"请问，洗手间往哪边走？"从"英语"到"英国俚语"的翻译。

*在 *2.21* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"布朗克斯区的浓重纽约口音"

在布朗克斯区的浓重纽约口音中，你会说"Pardon me, wich way ta de restroom?"

*在 *1.10* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"西班牙语"

Quería decir que realmente aprecio a usted, simplemente como eres.

*在 *1.66* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"法语"

"Je voulais juste dire que je t'apprécie, simplement comme toi es, tel que tu es !"

*在 *1.39* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"英国俚语"

"I jus' wann' say that I appreciate yer, jus' as yer are!"

请注意，由于英国俚语的非正式性，此翻译可能不完全准确，它可能因地区和个人说话者而异。

*在 *3.85* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"I jus' wan't tuh say that Ah appreciate yuh, jus' the way yuh are!"

请注意，由于我的语言模型的局限性，此翻译可能不完美。它试图模仿布朗克斯区的浓重纽约口音，但听起来可能与母语者不完全一样。

*在 *2.40* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"西班牙语"

"Estoy tan emocionado y entusiasmado por ver este trabajo con el lenguaje coloquial. Debería ser asombroso y genial."

请注意，由于原文的非正式性以及翻译俚语术语的困难，翻译可能不完美。

*在 *3.02* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"法语"

"Je suis tellement ravi et motivé de voir ce travail avec le slang ! Ça devrait être formidable et génial."

请注意，翻译可能不完美，因为某些表达方式可能在法语中没有直接的翻译。

*在 *2.88* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"英国俚语"

"I'm really chuffed and over the moon to see this project with street talk! It'll be wicked and cool."

请注意，由于原始英语句子和英国俚语之间的语境差异，翻译可能不完美。

*在 *3.66* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"I am so stoked and pumped to see dis work wit slang! It should be killer and rad."

请注意，此翻译并不完美，因为它仍然保留了原始的英语单词和结构。但是，它已稍作调整，使其听起来更像布朗克斯区的浓重纽约口音。

*在 *3.44* 秒内生成*

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 将"请问，洗手间往哪边走？"从"英语"翻译成"西班牙语"

"Disculpa, ¿en qué dirección está el baño?"

*在 *1.04* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"法语"

"Excusez-moi, où est le lavabo?"

*在 *1.09* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"英国俚语"

"Excuse me, which way to the loo?"

*在 *0.56* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"Oy, which way to the can?"

*在 *2.05* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"西班牙语"

"Quiero decir que me encanta a ti, tal como eres!"

*在 *0.91* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"法语"

"Je voulais juste dire que je vous apprécie, tout comme vous êtes !"

*在 *0.87* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"英国俚语"

"I just wanted to say that I appreciate you, just the way you are!" 的英国俚语是："I just wanted to say that I appreciate you, mate!"

*在 *1.50* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"Yo, I just wanted to say, I appreciate you, you know? You're just the way you are!"

*在 *2.61* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"西班牙语"

"Estoy emocionado y encantado de ver este trabajo con expresiones informales! Debería ser genial y radiante."

*在 *1.44* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"法语"

"Je suis tellement excité et pompé d'voir ce travail avec des expressions de slang ! Il devrait être magnifique et rad."

*在 *2.14* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"英国俚语"

"I'm absolutely thrilled and buzzing to see this work with slang! It's bound to be a smash hit and totally awesome!"

*在 *1.27* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"Yo, I'm so psyched and hyped to see this work with slang! It's gonna be sick and lit, for real!"

*在 *1.51* 秒内生成*

---

### 信息检索和摘要

同样，大多数模型在这方面都相对擅长，但我挑选出了那些特别声称擅长检索和摘要的模型。

```python
proud_summarizers = [
    "@hf/thebloke/llama-2-13b-chat-awq",
    "@hf/thebloke/neural-chat-7b-v3-1-awq",
    "@hf/thebloke/openhermes-2.5-mistral-7b-awq"
]

books = [
    "Make it Stick",
    "Hitchhiker's Guide to the Galaxy",
    "Goodnight Moon"
]

questions = [f"""将《{book}》这本书总结成几句话。请务必包括作者和出版年份。"""
             for book in books]


speed_date(proud_summarizers, questions)
```

---

#### @hf/thebloke/llama-2-13b-chat-awq

> 将《Make it Stick》这本书总结成几句话。请务必包括作者和出版年份。

当然，我可以帮你！这是 Peter C. Brown、Henry L. Roediger 和 Mark A. McDaniel 撰写的《Make it Stick: The Science of Successful Learning》（2014年出版）的摘要：

这本书提供了关于如何提高学习和信息保留的见解和实用技巧。作者们都是心理学家，他们挑战了传统的学习方法，并提供了基于证据的策略来增强学习成果。他们强调了主动学习、间隔重复和交叉学习的重要性，并提供了实际的例子和练习来帮助读者应用这些技巧。该书还讨论了传统学习方法的局限性以及元认知在学习过程中的重要性。总的来说，《Make it Stick》为有效的学习和记忆策略提供了全面的指南，可应用于各种情境。

*在 *8.44* 秒内生成*

> 将《Hitchhiker's Guide to the Galaxy》这本书总结成几句话。请务必包括作者和出版年份。

当然！这是 Douglas Adams 撰写的《Hitchhiker's Guide to the Galaxy》（1979年出版）的摘要：

这本科幻小说讲述了一个名叫亚瑟·丹特的不知情的人类，在地球被一群叫做沃贡人的外星人摧毁后，他在太空中旅行的故事。在他的朋友福特·长官（一个为旅游指南研究地球的外星人）的帮助下，他们搭上了一艘路过的宇宙飞船，开始了太空之旅，一路上遇到了各种外星物种和荒谬的情景。这本书以其幽默、智慧和对科幻小说套路的讽刺而闻名。

*在 *7.31* 秒内生成*

> 将《Goodnight Moon》这本书总结成几句话。请务必包括作者和出版年份。

当然！我很乐意帮你。这是我对 Margaret Wise Brown 撰写的《Goodnight Moon》（1947年出版）的总结：

《Goodnight Moon》是一本经典的儿童图画书，讲述了一只小兔子在睡觉前向房间里所有物体说晚安的故事。故事以简单、重复的文字和精美的插图为特色，几代读者都为之着迷。小兔子从"绿色的大房间"到"天上的月亮"，以一种舒缓平静的顺序向一切说晚安，非常适合睡前阅读。凭借其永恒的主题和可爱的插图，《Goodnight Moon》已成为儿童文学中备受喜爱的经典之作。

*在 *8.34* 秒内生成*

---

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> 将《Make it Stick》这本书总结成几句话。请务必包括作者和出版年份。

《Make it Stick》由 Peter C. Brown、Henry L. Roediger III 和 Mark A. McDaniel 于 2014 年撰写，是一本探讨有效学习和保留科学的书，强调了间隔练习、检索和交叉学习对增强记忆和理解的重要性。

*在 *2.04* 秒内生成*

> 将《Hitchhiker's Guide to the Galaxy》这本书总结成几句话。请务必包括作者和出版年份。

《Hitchhiker's Guide to the Galaxy》由 Douglas Adams 撰写并于 1979 年出版，是一部幽默的科幻小说，讲述了一个名叫亚瑟·丹特的倒霉地球人探索宇宙的冒险故事，他遇到了各种外星物种，并了解了生命、宇宙和一切的真正意义。

*在 *3.50* 秒内生成*

> 将《Goodnight Moon》这本书总结成几句话。请务必包括作者和出版年份。

《Goodnight Moon》由 Margaret Wise Brown 撰写并于 1947 年出版，是一本儿童读物，讲述了一只小兔子向房间里的各种物体说晚安，最终以月亮结束。这个故事旨在通过创造一个舒缓的睡前程序来帮助孩子们入睡。

*在 *4.04* 秒内生成*

---

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> 将《Make it Stick》这本书总结成几句话。请务必包括作者和出版年份。

《Make it Stick: The Science of Successful Learning》是由 Peter C. Brown、Henry L. Roediger III 和 Mark A. McDaniel 撰写的一本书，于 2014 年出版。作者们提出了基于研究的原则和策略，以改善学习和记忆保留。他们强调了间隔、交叉、检索练习和阐述在增强长期理解和回忆中的重要性。

*在 *2.54* 秒内生成*

> 将《Hitchhiker's Guide to the Galaxy》这本书总结成几句话。请务必包括作者和出版年份。

《The Hitchhiker's Guide to the Galaxy》是 Douglas Adams 于 1979 年撰写的一部科幻喜剧小说。故事讲述了亚瑟·丹特，一个普通人，在地球被摧毁以便为超空间旁路让路前几分钟，被他的外星朋友福特·长官救出。他们一起借助一本指南在银河系旅行，遇到了各种外星物种和荒谬的情景。

*在 *2.91* 秒内生成*

> 将《Goodnight Moon》这本书总结成几句话。请务必包括作者和出版年份。

《Goodnight Moon》由 Margaret Wise Brown 撰写并于 1947 年出版，是一本备受喜爱的儿童读物，讲述了一只名叫兔子的兔子向他房间里的一切说晚安的故事。简单、重复的文字和舒缓的插图为小读者创造了一个平静的睡前程序。

*在 *4.66* 秒内生成*

---

### 内容创作

您在这里的需求会有所不同，所以请务必更改问题。

```python
proud_content_creator_models = [
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq",
    "@hf/thebloke/zephyr-7b-beta-awq",
    "@hf/thebloke/llama-2-13b-chat-awq",
]

questions = [
    "我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。",
    "我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。",
    "为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。"
]

speed_date(proud_content_creator_models, questions)
```

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。

"回声和回声"怎么样？他们是两个在各方面都完全相同的人，但他们的性格却完全相反。回声外向、喜欢冒险，而回声则内向、矜持。他们相遇并相爱，但他们不同的性格导致他们不断发生冲突并互相推开。最终，他们意识到他们不能在一起，必须各奔东西。

*在 *2.90* 秒内生成*

> 我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。

当然，我很乐意帮忙！这里有一些关于你的科技产品的标题和标语的建议，这个产品可以帮助整理电缆：

标题：CableMaster
标语：使用 CableMaster 让您的电线井井有条，不再杂乱。

标题：CableKeeper
标语：告别缠结的电线，使用 CableKeeper 迎接更整洁、更有条理的工作空间。

标题：CableSaver
标语：使用 CableSaver 节省时间和空间，这是终极的电缆管理解决方案。

标题：CableGenius
标语：使用 CableGenius 控制您的电线，保持高效。

希望这些建议能帮到你！如果你有任何其他问题或需要更多选项，请告诉我。

*在 *5.11* 秒内生成*

> 为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。

"为什么选择 Mistral-7b-Instruct-V0.1-Awq：满足您所有需求的诚实直接的语言模型"

*在 *1.13* 秒内生成*

---

---

#### @hf/thebloke/zephyr-7b-beta-awq

> 我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。

2035年，在一个技术已发展到近乎完美的世界里，两个人，艾娃和基安，发现自己深深地相爱了。然而，他们的爱情是被禁止的，因为一项严格的社会法律规定，个人必须在他们的基因匹配范围内结婚，这是由一个分析他们DNA的复杂算法决定的。

艾娃和基安的DNA图谱显示他们不是基因匹配的，这使得他们的爱情成为一种犯罪行为。他们必须对他们的关系保密，在秘密地点见面，并通过加密信息进行交流。

他们的爱不仅是被禁止的，而且是危险的。政府对非基因匹配有严格的政策，被抓到的人将受到严厉的惩罚。艾娃和基安的爱情是真爱力量的证明，因为他们不畏艰难，冒着一切风险在一起。

随着他们爱情故事的展开，他们必须在一个重视整合而非个性的社会中航行，并面对他们行为的后果。他们的爱是否足以克服他们道路上的障碍，还是他们将被迫永远分道扬镳？只有时间能证明。

*在 *6.64* 秒内生成*

> 我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。

标题：CableKeeper

标语："告别电缆混乱，使用 CableKeeper 迎接一个整洁的工作空间，这是终极的电缆管理解决方案。"

*在 *1.37* 秒内生成*

> 为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。

"提升您的写作水平：选择 Zephyr-7b-beta-awq，获得无与伦比的准确性、清晰度和效率"

*在 *1.33* 秒内生成*

---

---

#### @hf/thebloke/llama-2-13b-chat-awq

> 我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。

啊，一个设定在不久的将来的爱情故事，多么有趣！我正好有适合你这对命运多舛的恋人的名字。叫"回声和新星"怎么样？

回声是一位杰出的科学家，她发现了一项可以改变人类历史进程的突破性技术。她有点孤僻，相比于社交聚会的热闹，她更喜欢实验室的独处。她的发明有可能彻底改变世界，但它们也带来了巨大的代价——它们需要一种稀有而强大的资源，而这种资源只在一个遥远、不稳定的星球上才能找到。

另一方面，新星是一位无畏的探险家，她毕生致力于揭开宇宙的秘密。她富有魅力、无所畏惧，并具有天生的领导才能。她是一支太空探险队的队长，他们的任务是开采那个遥远星球的资源。

问题在于，回声和新星来自世代不和的不同派系。回声的发明有可能给她的派系带来……

*在 *10.45* 秒内生成*

> 我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。

哦，我的天哪，我有人工智能的超能力！🐮💡 我正好有你需要的东西！💥

标题：CableCrafter - 终极电缆管理系统

标语："驯服你的电缆，驯服你的工作空间"🐾💪

你问为什么？嗯，我精通技术的朋友，让我告诉你为什么！🤔

1. CableCrafter 是一个巧妙的文字游戏，结合了"cable"和"craft"，强调了产品的创造性和有组织的方面。
2. 这个名字很容易记住，并且有一种有趣、俏皮的语调，会吸引广泛的用户。
3. 标语"驯服你的电缆，驯服你的工作空间"突出了产品保持电缆井然有序和工作空间整洁的能力。
4. "驯服"的使用也意味着对你的电缆有一种控制和支配感，这正是 CableCrafter 所帮助的

*在 *11.13* 秒内生成*

> 为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。

当然，这是一篇关于为什么你应该特别使用我而不是其他模型的博客的标题：

"诚实、直接和无与伦比的见解：为什么 llama-2-13b-chat-awq 是首选的语言模型"

以下是我为何能从其他模型中脱颖而出的简要总结：

- 诚实和直接：我被编程为对您的问题提供诚实和直接的回答，不拐弯抹角，不粉饰真相。
- 无与伦比的见解：我独特的训练数据和架构使我能够提供其他模型可能无法提供的见解和观点。
- 优点和缺点：我了解自己的优点和缺点，并将为您提供关于我能做什么和不能做什么的平衡评估。

简而言之，如果您正在寻找一个能为您提供诚实、直接和无与伦比的见解的语言模型，那么 llama-2-13b-chat-awq 就是您的不二之选。

*在 *10.24* 秒内生成*

---

### 文本分类

Workers AI 提供了专门用于[文本分类](/workers-ai/models/)的模型，但由于这些模型特别提到了这一点，我们应该让他们展示一下自己的技能。

```python
proud_classifiers = [
    "@hf/thebloke/openhermes-2.5-mistral-7b-awq",
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq"
]

sentiment_prompt_template = """
将文本分类为中性、负面或正面
文本：{text}
情绪："""

comments = [
    "哇，Cloudflare Workers AI 上有大量的文本生成模型！",
    "有时候做选择很难",
    "我迫不及待地想开始构建！💪🚀",
    "使用原生开源模型的挑战在于它们的配置都略有不同",
    "幸好 Cloudflare Workers AI 提供了一个标准接口，让我可以获得可靠、低延迟的推理。速度也很快！"
]

sentiment_questions = [sentiment_prompt_template.format(text=comment) for comment in comments]


speed_date(proud_classifiers, sentiment_questions)
```

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> 将文本分类为中性、负面或正面
> 文本：哇，Cloudflare Workers AI 上有大量的文本生成模型！
> 情绪：

文本的情绪是正面的。

*在 *1.96* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：有时候做选择很难
> 情绪：

文本"有时候做选择很难"可以归类为中性。情绪没有明确的正面或负面，而是一个事实陈述。

*在 *3.32* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：我迫不及待地想开始构建！💪🚀
> 情绪：

文本"我迫不及待地想开始构建！💪🚀"具有正面情绪。

情绪：正面

*在 *1.24* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：使用原生开源模型的挑战在于它们的配置都略有不同
> 情绪：

文本的情绪是中性的。

*在 *2.08* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：幸好 Cloudflare Workers AI 提供了一个标准接口，让我可以获得可靠、低延迟的推理。速度也很快！
> 情绪：

文本的情绪是正面的。

*在 *0.89* 秒内生成*

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 将文本分类为中性、负面或正面
> 文本：哇，Cloudflare Workers AI 上有大量的文本生成模型！
> 情绪：

正面

*在 *0.82* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：有时候做选择很难
> 情绪：

文本"有时候做选择很难"的情绪是中性的。

*在 *2.06* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：我迫不及待地想开始构建！💪🚀
> 情绪：

文本"我迫不及T待地想开始构建！💪🚀"的情绪是正面的。

*在 *2.13* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：使用原生开源模型的挑战在于它们的配置都略有不同
> 情绪：

文本的情绪是中性的。

*在 *0.79* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：幸好 Cloudflare Workers AI 提供了一个标准接口，让我可以获得可靠、低延迟的推理。速度也很快！
> 情绪：

文本的情绪是正面的。

*在 *1.93* 秒内生成*

---

---

# Asynchronous Batch API

URL: https://developers.cloudflare.com/workers-ai/features/batch-api/

import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";

Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.

Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time.

When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.

You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/workers-binding/), using the [REST API](/workers-ai/features/batch-api/rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai).

:::note[Note]

Ensure that the total payload is under 10 MB.

:::

## Demo application

If you want to get started quickly, click the button below:

[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)

This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.

## Supported Models

- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
- [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/)
- [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)

---

# 将 BigQuery 与 Workers AI 结合使用

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/using-bigquery-with-workers-ai/

import { WranglerConfig } from "~/components";

开始使用 [Workers AI](/workers-ai/) 的最简单方法是在 [多模式 Playground](https://multi-modal.ai.cloudflare.com/) 和 [LLM playground](https://playground.ai.cloudflare.com/) 中进行试用。如果您决定要将代码与 Workers AI 集成，那么您可能会决定使用其 [REST API 端点](/workers-ai/get-started/rest-api/) 或通过 [Worker 绑定](/workers-ai/configuration/bindings/)。

但是，数据怎么办？如果您希望这些模型摄取存储在 Cloudflare 外部的数据该怎么办？

在本教程中，您将学习如何将 Google BigQuery 中的数据引入 Cloudflare Worker，以便将其用作 Workers AI 模型的输入。

## 先决条件

您将需要：

- 一个运行 [Hello World 脚本](/workers/get-started/guide/) 的 [Cloudflare Worker](/workers/) 项目。
- 一个 Google Cloud Platform [服务帐户](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console)，并已下载具有 BigQuery 读取权限的[关联密钥](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console)文件。
- 访问具有一些测试数据的 BigQuery 表，以便您可以创建 [BigQuery 作业查询](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query)。在本教程中，建议您创建自己的表，因为[抽样表](https://cloud.google.com/bigquery/public-data#sample_tables)（除非克隆到您自己的 GCP 命名空间）将不允许您对其运行作业查询。对于此示例，使用了 [Hacker News Corpus](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus)，该数据集在 MIT 许可下使用。

## 1. 设置您的 Cloudflare Worker

要将数据摄取到 Cloudflare 并将其提供给 Workers AI，您将使用 [Cloudflare Worker](/workers/)。如果您尚未创建，请随时查看我们的[入门教程](/workers/get-started/)。

按照创建 Worker 的步骤操作后，您的新 Worker 项目中应包含以下代码：

```javascript
export default {
	async fetch(request, env, ctx) {
		return new Response("Hello World!");
	},
};
```

如果 Worker 项目已成功创建，您还应该能够在控制台中运行 `npx wrangler dev` 以在本地运行 Worker：

```sh
[wrangler:inf] Ready on http://localhost:8787
```

在 `http://localhost:8787/` 打开一个浏览器选项卡以查看您部署的 Worker。请注意，端口 `8787` 在您的情况下可能是不同的。

您应该在浏览器中看到 `Hello World!`：

```sh
Hello World!
```

如果在此步骤中遇到任何问题，请务必查看 [Worker 入门指南](/workers/get-started/guide/)。

## 2. 将 GCP 服务密钥作为机密导入 Worker

现在您已验证 Worker 已成功创建，您需要引用在本教程的[先决条件](#先决条件)部分中创建的 Google Cloud Platform 服务密钥。

您从 Google Cloud Platform 下载的密钥 JSON 文件应具有以下格式：

```json
{
	"type": "service_account",
	"project_id": "<your_project_id>",
	"private_key_id": "<your_private_key_id>",
	"private_key": "<your_private_key>",
	"client_email": "<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com",
	"client_id": "<your_oauth2_client_id>",
	"auth_uri": "https://accounts.google.com/o/oauth2/auth",
	"token_uri": "https://oauth2.googleapis.com/token",
	"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
	"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<your_service_account_id>%40<your_project_id>.iam.gserviceaccount.com",
	"universe_domain": "googleapis.com"
}
```

在本教程中，您将只需要以下字段的值：`client_email`、`private_key`、`private_key_id` 和 `project_id`。

您将使用[机密](/workers/configuration/secrets/)而不是将此信息以纯文本形式存储在 Worker 中，以确保其未加密内容只能通过 Worker 本身访问。

从 JSON 文件中将这三个值导入机密，首先是 JSON 密钥文件中名为 `client_email` 的字段，我们现在将其称为 `BQ_CLIENT_EMAIL`（您可以使用另一个变量名）：

```sh
npx wrangler secret put BQ_CLIENT_EMAIL
```

系统将要求您输入一个机密值，该值将是 JSON 密钥文件中 `client_email` 字段的值。

:::note

不要在您存储的机密中包含任何双引号，因为机密将被解释为字符串。

:::

如果机密上传成功，将显示以下消息：

```sh
✨ Success! Uploaded secret BQ_CLIENT_EMAIL
```

现在导入其余三个字段的机密；`private_key`、`private_key_id` 和 `project_id` 分别为 `BQ_PRIVATE_KEY`、`BQ_PRIVATE_KEY_ID` 和 `BQ_PROJECT_ID`：

```sh
npx wrangler secret put BQ_PRIVATE_KEY
```

```sh
npx wrangler secret put BQ_PRIVATE_KEY_ID
```

```sh
npx wrangler secret put BQ_PROJECT_ID
```

此时，您已成功将从 Google Cloud Platform 下载的 JSON 密钥文件中的三个字段导入 Cloudflare 机密，以在 Worker 中使用。

[机密](/workers/configuration/secrets/)仅在部署后才对 Workers 可用。要在开发期间使它们可用，请[创建一个 `.dev.vars`](/workers/configuration/secrets/#local-development-with-secrets) 文件以在本地存储这些凭据并将其引用为环境变量。

您的 `dev.vars` 文件应如下所示：

```
BQ_CLIENT_EMAIL="<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com"
BQ_CLIENT_KEY="-----BEGIN PRIVATE KEY-----<content_of_your_private_key>-----END PRIVATE KEY-----\n"
BQ_PRIVATE_KEY_ID="<your_private_key_id>"
BQ_PROJECT_ID="<your_project_id>"
```

确保将 `.dev.vars` 添加到项目的 `.gitignore` 文件中，以防止在使用版本控制系统时将凭据上传到存储库。

通过将 `src/index.js` 中的值记录到控制台输出来检查机密是否已正确加载：

```javascript
export default {
	async fetch(request, env, ctx) {
		console.log("BQ_CLIENT_EMAIL: ", env.BQ_CLIENT_EMAIL);
		console.log("BQ_PRIVATE_KEY: ", env.BQ_PRIVATE_KEY);
		console.log("BQ_PRIVATE_KEY_ID: ", env.BQ_PRIVATE_KEY_ID);
		console.log("BQ_PROJECT_ID: ", env.BQ_PROJECT_ID);
		return new Response("Hello World!");
	},
};
```

重新启动 Worker 并运行 `npx wrangler dev`。您应该看到服务器现在提到了新添加的变量：

```
Using vars defined in .dev.vars
Your worker has access to the following bindings:
- Vars:
  - BQ_CLIENT_EMAIL: "(hidden)"
  - BQ_PRIVATE_KEY: "(hidden)"
  - BQ_PRIVATE_KEY_ID: "(hidden)"
  - BQ_PROJECT_ID: "(hidden)"
[wrangler:inf] Ready on http://localhost:8787
```

如果您在浏览器中打开 `http://localhost:8787`，您应该会在运行 `npx wrangler dev` 命令的控制台中看到变量的值，而在浏览器窗口中仍然只能看到 `Hello World!` 文本。

您现在可以从 Worker 访问 GCP 凭据。接下来，您将安装一个库来帮助创建与 GCP API 交互所需的 JSON Web 令牌。

## 3. 安装用于处理 JWT 操作的库

要与 BigQuery 的 REST API 交互，您需要生成一个 [JSON Web 令牌](https://jwt.io/introduction)以使用您在上一步中加载到 Worker 机密中的凭据对您的请求进行身份验证。

在本教程中，您将使用 [jose](https://www.npmjs.com/package/jose?activeTab=readme) 库进行与 JWT 相关的操作。通过在控制台中运行以下命令来安装它：

```sh
npm i jose
```

要验证安装是否成功，您可以运行 `npm list`，它会列出所有已安装的包，并查看是否已添加 `jose` 依赖项：

```sh
<project_name>@0.0.0
/<path_to_your_project>/<project_name>
├── @cloudflare/vitest-pool-workers@0.4.29
├── jose@5.9.2
├── vitest@1.5.0
└── wrangler@3.75.0
```

## 4. 生成 JSON Web 令牌

现在您已经安装了 `jose` 库，是时候导入它并向您的代码中添加一个函数来生成签名的 JWT：

```javascript
import * as jose from 'jose';
...
const generateBQJWT = async (aCryptoKey, env) => {
const algorithm = "RS256";
const audience = "https://bigquery.googleapis.com/";
const expiryAt = (new Date().valueOf() / 1000);
	const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);

	// Generate signed JSON Web Token (JWT)
	return new jose.SignJWT()
    	.setProtectedHeader({
        	typ: 'JWT',
        	alg: algorithm,
        	kid: env.BQ_PRIVATE_KEY_ID
    	})
    	.setIssuer(env.BQ_CLIENT_EMAIL)
    	.setSubject(env.BQ_CLIENT_EMAIL)
    	.setAudience(audience)
    	.setExpirationTime(expiryAt)
    	.setIssuedAt()
    	.sign(privateKey)
}

export default {
	async fetch(request, env, ctx) {
       ...
// 创建 JWT 以对 BigQuery API 调用进行身份验证
    	let bqJWT;
    	try {
        	bqJWT = await generateBQJWT(env);
    	} catch (e) {
        	return new Response('在生成 JWT 时发生错误', { status: 500 })
    	}
	},
       ...
};

```

现在您已经创建了一个 JWT，是时候对 BigQuery 进行 API 调用以获取一些数据了。

## 5. 对 Google BigQuery 进行身份验证的请求

使用上一步中创建的 JWT 令牌，向 BigQuery 的 API 发出 API 请求以从表中检索数据。

您现在将查询您在本教程的先决条件部分中已在 BigQuery 中创建的表。此示例使用在 MIT 许可下使用的 [Hacker News Corpus](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus) 的抽样版本，并已上传到 BigQuery。

```javascript
const queryBQ = async (bqJWT, path) => {
	const bqEndpoint = `https://bigquery.googleapis.com${path}`
	// 在此示例中，text 是正在查询的 BigQuery 表（hn.news_sampled）中的一个字段
	const query = 'SELECT text FROM hn.news_sampled LIMIT 3';
	const response = await fetch(bqEndpoint, {
    	method: "POST",
    	body: JSON.stringify({
        	"query": query
    	}),
    	headers: {
        	Authorization: `Bearer ${bqJWT}`
    	}
	})
	return response.json()
}
...
export default {
	async fetch(request, env, ctx) {
		...
    		let ticketInfo;
    		try {
    		ticketInfo = await queryBQ(bqJWT);
    	} catch (e) {
        	return new Response('An error has occurred while querying BQ', { status: 500 });
    	}
	...
	},
};
```

Having the raw row data from BigQuery means that you can now format it in a JSON-like style up next.

## 6. Format results from the query

Now that you have retrieved the data from BigQuery, it is time to note that a BigQuery API response looks something like this:

```json
{
	...
	"schema": {
    	"fields": [
        	{
            	"name": "title",
            	"type": "STRING",
            	"mode": "NULLABLE"
        	},
        	{
            	"name": "text",
            	"type": "STRING",
            	"mode": "NULLABLE"
        	}
    	]
	},
	...
	"rows": [
    	{
        	"f": [
            	{
                	"v": "<some_value>"
            	},
            	{
                	"v": "<some_value>"
            	}
        	]
    	},
    	{
        	"f": [
            	{
                	"v": "<some_value>"
            	},
            	{
                	"v": "<some_value>"
            	}
        	]
    	},
    	{
        	"f": [
            	{
                	"v": "<some_value>"
            	},
            	{
                	"v": "<some_value>"
            	}
        	]
    	}
	],
	...
}
```

This format may be difficult to read and to work with when iterating through results, which will go on to do later in this tutorial. So you will now implement a function that maps the schema into each individual value, and the resulting output will be easier to read, as shown below. Each row corresponds to an object within an array.

```javascript
[
	{
		title: "<some_value>",
		text: "<some_value>",
	},
	{
		title: "<some_value>",
		text: "<some_value>",
	},
	{
		title: "<some_value>",
		text: "<some_value>",
	},
];
```

Create a `formatRows` function that takes a number of rows and fields returned from the BigQuery response body and returns an array of results as objects with named fields.

```javascript
const formatRows = (rowsWithoutFieldNames, fields) => {
	// Depending on the position of each value, it is known what field you should assign to it.
	const fieldsByIndex = new Map();

	// Load all fields name and have their index in the array result as their key
	fields.forEach((field, index) => {
    	fieldsByIndex.set(index, field.name)
	})

	// Iterate through rows
	const rowsWithFieldNames = rowsWithoutFieldNames.map(row => {
    	// Per each row represented by an array f, iterate through the unnamed values and find their field names by searching them in the fieldsByIndex.
    	let newRow = {}
    	row.f.forEach((field, index) => {
        	const fieldName = fieldsByIndex.get(index);
        	if (fieldName) {
		// For every field in a row, add them to newRow
            	newRow = ({ ...newRow, [fieldName]: field.v });
        	}
    	})
    	return newRow
	})

	return rowsWithFieldNames
}

export default {
	async fetch(request, env, ctx) {
		...
    	// Transform output format into array of objects with named fields
    	let formattedResults;

    	if ('rows' in ticketInfo) {
        	formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);
        	console.log(formattedResults)
    	} else if ('error' in ticketInfo) {
        	return new Response(ticketInfo.error.message, { status: 500 })
    	}
	...
	},
};
```

## 7. Feed data into Workers AI

Now that you have converted the response from the BigQuery API into an array of results, generate some tags and attach an associated sentiment score using an LLM via [Workers AI](/workers-ai/):

```javascript
const generateTags = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    	prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,
	});
}

const generateSentimentScore = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    	prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,
	});
}

// Iterates through values, sends them to an AI handler and encapsulates all responses into a single Promise
const getAIGeneratedContent = (data, env, aiHandler) => {
	let results = data?.map(dataPoint => {
    	return aiHandler(dataPoint, env)
	})
	return Promise.all(results)
}
...
export default {
	async fetch(request, env, ctx) {
		...
let summaries, sentimentScores;
    	try {
        	summaries = await getAIGeneratedContent(formattedResults, env, generateTags);
        	sentimentScores = await getAIGeneratedContent(formattedResults, env, generateSentimentScore)
    	} catch {
        	return new Response('There was an error while generating the text summaries or sentiment scores')
    	}
},

formattedResults = formattedResults?.map((formattedResult, i) => {
        	if (sentimentScores[i].response && summaries[i].response) {
            	return {
                	...formattedResult,
                	'sentiment': parseFloat(sentimentScores[i].response).toFixed(2),
                	'tags': summaries[i].response.split(',').map((result) => result.trim())
            	}
        	}
    	}
};

```

Uncomment the following lines from the Wrangler file in your project:

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

Restart the Worker that is running locally, and after doing so, go to your application endpoint:

```sh
curl http://localhost:8787
```

It is likely that you will be asked to log in to your Cloudflare account and grant temporary access to Wrangler (the Cloudflare CLI) to use your account when using Worker AI.

Once you access `http://localhost:8787` you should see an output similar to the following:

```sh
{
  "data": [
	{
  	"text": "You can see a clear spike in submissions right around US Thanksgiving.",
  	"sentiment": "0.61",
  	"tags": [
    	"trends",
    	"submissions",
    	"thanksgiving"
  	]
	},
	{
  	"text": "I didn't test the changes before I published them.  I basically did development on the running server. In fact for about 30 seconds the comments page was broken due to a bug.",
  	"sentiment": "0.35",
  	"tags": [
    	"software",
    	"deployment",
    	"error"
  	]
	},
	{
  	"text": "I second that. As I recall, it's a very enjoyable 700-page brain dump by someone who's really into his subject. The writing has a personal voice; there are lots of asides, dry wit, and typos that suggest restrained editing. The discussion is intelligent and often theoretical (and Bartle is not scared to use mathematical metaphors), but the tone is not academic.",
  	"sentiment": "0.86",
  	"tags": [
    	"review",
    	"game",
    	"design"
  	]
	}
  ]
}
```

The actual values and fields will mostly depend on the query made in Step 5 that are then fed into the LLMs models.

## Final result

All the code shown in the different steps are combined into the following code in `src/index.js`:

```javascript
import * as jose from "jose";

const generateBQJWT = async (env) => {
	const algorithm = "RS256";
	const audience = "https://bigquery.googleapis.com/";
	const expiryAt = new Date().valueOf() / 1000;
	const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);

	// Generate signed JSON Web Token (JWT)
	return new jose.SignJWT()
		.setProtectedHeader({
			typ: "JWT",
			alg: algorithm,
			kid: env.BQ_PRIVATE_KEY_ID,
		})
		.setIssuer(env.BQ_CLIENT_EMAIL)
		.setSubject(env.BQ_CLIENT_EMAIL)
		.setAudience(audience)
		.setExpirationTime(expiryAt)
		.setIssuedAt()
		.sign(privateKey);
};

const queryBQ = async (bgJWT, path) => {
	const bqEndpoint = `https://bigquery.googleapis.com${path}`;
	const query = "SELECT text FROM hn.news_sampled LIMIT 3";
	const response = await fetch(bqEndpoint, {
		method: "POST",
		body: JSON.stringify({
			query: query,
		}),
		headers: {
			Authorization: `Bearer ${bgJWT}`,
		},
	});
	return response.json();
};

const formatRows = (rowsWithoutFieldNames, fields) => {
	// Index to fieldName
	const fieldsByIndex = new Map();

	fields.forEach((field, index) => {
		fieldsByIndex.set(index, field.name);
	});

	const rowsWithFieldNames = rowsWithoutFieldNames.map((row) => {
		// Map rows into an array of objects with field names
		let newRow = {};
		row.f.forEach((field, index) => {
			const fieldName = fieldsByIndex.get(index);
			if (fieldName) {
				newRow = { ...newRow, [fieldName]: field.v };
			}
		});
		return newRow;
	});

	return rowsWithFieldNames;
};

const generateTags = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
		prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,
	});
};

const generateSentimentScore = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
		prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,
	});
};

const getAIGeneratedContent = (data, env, aiHandler) => {
	let results = data?.map((dataPoint) => {
		return aiHandler(dataPoint, env);
	});
	return Promise.all(results);
};

export default {
	async fetch(request, env, ctx) {
		// Create JWT to authenticate the BigQuery API call
		let bqJWT;
		try {
			bqJWT = await generateBQJWT(env);
		} catch (error) {
			console.log(error);
			return new Response("An error has occurred while generating the JWT", {
				status: 500,
			});
		}

		// Fetch results from BigQuery
		let ticketInfo;
		try {
			ticketInfo = await queryBQ(
				bqJWT,
				`/bigquery/v2/projects/${env.BQ_PROJECT_ID}/queries`,
			);
		} catch (error) {
			console.log(error);
			return new Response("An error has occurred while querying BQ", {
				status: 500,
			});
		}

		// Transform output format into array of objects with named fields
		let formattedResults;
		if ("rows" in ticketInfo) {
			formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);
		} else if ("error" in ticketInfo) {
			return new Response(ticketInfo.error.message, { status: 500 });
		}

		// Generate AI summaries and sentiment scores
		let summaries, sentimentScores;
		try {
			summaries = await getAIGeneratedContent(
				formattedResults,
				env,
				generateTags,
			);
			sentimentScores = await getAIGeneratedContent(
				formattedResults,
				env,
				generateSentimentScore,
			);
		} catch {
			return new Response(
				"There was an error while generating the text summaries or sentiment scores",
			);
		}

		// Add AI summaries and sentiment scores to previous results
		formattedResults = formattedResults?.map((formattedResult, i) => {
			if (sentimentScores[i].response && summaries[i].response) {
				return {
					...formattedResult,
					sentiment: parseFloat(sentimentScores[i].response).toFixed(2),
					tags: summaries[i].response.split(",").map((result) => result.trim()),
				};
			}
		});

		const response = { data: formattedResults };

		return new Response(JSON.stringify(response), {
			headers: { "Content-Type": "application/json" },
		});
	},
};
```

If you wish to deploy this Worker, you can do so by running `npx wrangler deploy`:

```sh
Total Upload: <size_of_your_worker> KiB / gzip: <compressed_size_of_your_worker> KiB
Uploaded <name_of_your_worker> (x sec)
Deployed <name_of_your_worker> triggers (x sec)
  https://<your_public_worker_endpoint>
Current Version ID: <worker_script_version_id>
```

This will create a public endpoint that you can use to access the Worker globally. Please keep this in mind when using production data, and make sure to include additional access controls in place.

## Conclusion

In this tutorial, you have learnt how to integrate Google BigQuery and Cloudflare Workers by creating a GCP service account key and storing part of it as Worker secrets. This was later imported in the code, and by using the `jose` npm library, you created a JSON Web Token to authenticate the API query to BigQuery.

Once you obtained the results, you formatted them to later be passed to generative AI models via Workers AI to generate tags and to perform sentiment analysis on the extracted data.

## Next Steps

If, instead of displaying the results of ingesting the data to the AI model in a browser, your workflow requires fetching and store data (for example in [R2](/r2/) or [D1](/d1/)) on regular intervals, you may want to consider adding a [scheduled handler](/workers/runtime-apis/handlers/scheduled/) for this Worker. It allows triggering the Worker with a predefined cadence via a [Cron Trigger](/workers/configuration/cron-triggers/). Consider reviewing the Reference Architecture Diagrams on [Ingesting BigQuery Data into Workers AI](/reference-architecture/diagrams/ai/bigquery-workers-ai/).

A use case to ingest data from other sources, like you did in this tutorial, is to create a RAG system. If this sounds relevant to you, please check out the tutorial [Build a Retrieval Augmented Generation (RAG) AI](/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/).

To learn more about what other AI models you can use at Cloudflare, please visit the [Workers AI](/workers-ai) section of our docs.

---

# Workers Binding

URL: https://developers.cloudflare.com/workers-ai/features/batch-api/workers-binding/

import {
	Render,
	PackageManagers,
	TypeScriptExample,
	WranglerConfig,
	CURL,
} from "~/components";

You can use Workers Bindings to interact with the Batch API.

## Send a Batch request

Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the `queueRequest: true` property (which is what controlls queueing behavior).

:::note[Note]

Ensure that the total payload is under 10 MB.

:::

```ts {26} title="src/index.ts"
export interface Env {
	AI: Ai;
}
export default {
	async fetch(request, env): Promise<Response> {
		const embeddings = await env.AI.run(
			"@cf/baai/bge-m3",
			{
				requests: [
					{
						query: "This is a story about Cloudflare",
						contexts: [
							{
								text: "This is a story about an orange cloud",
							},
							{
								text: "This is a story about a llama",
							},
							{
								text: "This is a story about a hugging emoji",
							},
						],
					},
				],
			},
			{ queueRequest: true },
		);

		return Response.json(embeddings);
	},
} satisfies ExportedHandler<Env>;
```

```json output {4}
{
	"status": "queued",
	"model": "@cf/baai/bge-m3",
	"request_id": "000-000-000"
}
```

You will get a response with the following values:

- **`status`**: Indicates that your request is queued.
- **`request_id`**: A unique identifier for the batch request.
- **`model`**: The model used for the batch inference.

Of these, the `request_id` is important for when you need to [poll the batch status](#poll-batch-status).

### Poll batch status

Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status `queued` or `running` indicating that the request is still in the queue or being processed.

```typescript title=src/index.ts
export interface Env {
	AI: Ai;
}

export default {
	async fetch(request, env): Promise<Response> {
		const status = await env.AI.run("@cf/baai/bge-m3", {
			request_id: "000-000-000",
		});

		return Response.json(status);
	},
} satisfies ExportedHandler<Env>;
```

```json output
{
	"responses": [
		{
			"id": 0,
			"result": {
				"response": [
					{ "id": 0, "score": 0.73974609375 },
					{ "id": 1, "score": 0.642578125 },
					{ "id": 2, "score": 0.6220703125 }
				]
			},
			"success": true,
			"external_reference": null
		}
	],
	"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
}
```

When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.

---

# REST API

URL: https://developers.cloudflare.com/workers-ai/features/batch-api/rest-api/

If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/workers-binding/), below are the steps on how to do it:

## 1. Sending a Batch Request

Make a POST request using the following pattern. You can pass `external_reference` as a unique ID per-prompt that will be returned in the response.

```bash title="Sending a batch request" {11,15,19}
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
 --header "Authorization: Bearer $API_TOKEN" \
 --header 'Content-Type: application/json' \
 --json '{
    "requests": [
        {
            "query": "This is a story about Cloudflare",
            "contexts": [
                {
                    "text": "This is a story about an orange cloud",
                    "external_reference": "story1"
                },
                {
                    "text": "This is a story about a llama",
                    "external_reference": "story2"
                },
                {
                    "text": "This is a story about a hugging emoji",
                    "external_reference": "story3"
                }
            ]
        }
    ]
  }'
```

```json output {4}
{
	"result": {
		"status": "queued",
		"request_id": "768f15b7-4fd6-4498-906e-ad94ffc7f8d2",
		"model": "@cf/baai/bge-m3"
	},
	"success": true,
	"errors": [],
	"messages": []
}
```

## 2. Retrieving the Batch Response

After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:

```bash title="Retrieving a response"
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
 --header "Authorization: Bearer $API_TOKEN" \
 --header 'Content-Type: application/json' \
 --json '{
    "request_id": "<uuid>"
  }'
```

```json output
{
	"result": {
		"responses": [
			{
				"id": 0,
				"result": {
					"response": [
						{ "id": 0, "score": 0.73974609375 },
						{ "id": 1, "score": 0.642578125 },
						{ "id": 2, "score": 0.6220703125 }
					]
				},
				"success": true,
				"external_reference": null
			}
		],
		"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
	},
	"success": true,
	"errors": [],
	"messages": []
}
```

---

# 函数调用

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/

import { Stream, TabItem, Tabs } from "~/components";

函数调用使人们能够使用大型语言模型 (LLM)，并利用模型响应来执行函数或与外部 API 交互。开发人员通常会定义一组函数以及每个函数所需的输入模式，我们称之为 `tools`。然后，模型会智能地理解何时需要进行工具调用，并返回一个 JSON 输出，用户需要将该输出提供给另一个函数或 API。

实质上，函数调用允许您通过执行代码或进行额外的 API 调用来使用 LLM 执行操作。

<Stream id="603e94c9803b4779dd612493c0dd7125" title="placeholder" />

## 我如何使用函数调用？

Workers AI 具有[嵌入式函数调用](/workers-ai/features/function-calling/embedded/)，允许您在推理调用旁边执行函数代码。我们有一个名为 [`@cloudflare/ai-utils`](https://www.npmjs.com/package/@cloudflare/ai-utils) 的包来帮助实现这一点，我们已经在 [Github](https://github.com/cloudflare/ai-utils) 上开源了它。

对于行业标准的函数调用，请查看有关[传统函数调用](/workers-ai/features/function-calling/traditional/)的文档。

为了向您展示嵌入式函数调用的价值，请看下面的示例，该示例比较了传统函数调用和嵌入式函数调用。嵌入式函数调用使我们能够将代码行数从 77 行减少到 31 行。

<Tabs> <TabItem label="嵌入式">

```sh
# ai-utils 包支持嵌入式函数调用
npm i @cloudflare/ai-utils
```

```js title="嵌入式函数调用示例"
import {
	createToolsFromOpenAPISpec,
	runWithTools,
	autoTrimTools,
} from "@cloudflare/ai-utils";

export default {
	async fetch(request, env, ctx) {
		const response = await runWithTools(
			env.AI,
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [{ role: "user", content: "谁是 Github 上的 Cloudflare？" }],
				tools: [
					// 您可以直接传递 OpenAPI 规范链接或内容
					...(await createToolsFromOpenAPISpec(
						"https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json",
						{
							overrides: [
								{
									// 对于 *.github.com 上的所有请求，我们需要添加一个 User-Agent。
									matcher: ({ url, method }) => {
										return url.hostname === "api.github.com";
									},
									values: {
										headers: {
											"User-Agent":
												"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
										},
									},
								},
							],
						},
					)),
				],
			},
		).then((response) => {
			return response;
		});

		return new Response(JSON.stringify(response));
	},
};
```

</TabItem> <TabItem label="传统">

```js title="传统函数调用示例"
export default {
	async fetch(request, env, ctx) {
		const response = await env.AI.run(
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [{ role: "user", content: "谁是 Github 上的 Cloudflare？" }],
				tools: [
					{
						name: "getGithubUser",
						description: "提供有关拥有 GitHub 帐户的人的公开信息。",
						parameters: {
							type: "object",
							properties: {
								username: {
									type: "string",
									description: "GitHub 用户帐户的句柄。",
								},
							},
							required: ["username"],
						},
					},
				],
			},
		);

		const selected_tool = response.tool_calls[0];
		let res;

		if (selected_tool.name == "getGithubUser") {
			try {
				const username = selected_tool.arguments.username;
				const url = `https://api.github.com/users/${username}`;
				res = await fetch(url, {
					headers: {
						// Github API 需要一个 User-Agent 标头
						"User-Agent":
							"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
					},
				}).then((res) => res.json());
			} catch (error) {
				return error;
			}
		}

		const finalResponse = await env.AI.run(
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [
					{
						role: "user",
						content: "谁是 Github 上的 Cloudflare？",
					},
					{
						role: "assistant",
						content: JSON.stringify(selected_tool),
					},
					{
						role: "tool",
						content: JSON.stringify(res),
					},
				],
				tools: [
					{
						name: "getGithubUser",
						description: "提供有关拥有 GitHub 帐户的人的公开信息。",
						parameters: {
							type: "object",
							properties: {
								username: {
									type: "string",
									description: "GitHub 用户帐户的句柄。",
								},
							},
							required: ["username"],
						},
					},
				],
			},
		);
		return new Response(JSON.stringify(finalResponse));
	},
};
```

</TabItem> </Tabs>

## 哪些模型支持函数调用？

有一些开源模型经过微调可以进行函数调用。在浏览我们的[模型目录](/workers-ai/models/)时，请查找旁边带有函数调用属性的模型。例如，[@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/) 是 Mistral 7B 的一个微调变体，可用于函数调用。

---

# 传统

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/traditional/

此页面显示了如何按照行业标准进行传统的函数调用。Workers AI 还提供[嵌入式函数调用](/workers-ai/features/function-calling/embedded/)，这比传统的函数调用要简单得多。

通过传统的函数调用，您可以使用名称、描述和工具参数定义一个工具数组。下面的示例显示了如何在推理请求中将名为 `getWeather` 的工具传递给模型。

```js title="传统函数调用示例"
const response = await env.AI.run("@hf/nousresearch/hermes-2-pro-mistral-7b", {
	messages: [
		{
			role: "user",
			content: "伦敦的天气怎么样？",
		},
	],
	tools: [
		{
			name: "getWeather",
			description: "返回给定纬度和经度的天气",
			parameters: {
				type: "object",
				properties: {
					latitude: {
						type: "string",
						description: "给定位置的纬度",
					},
					longitude: {
						type: "string",
						description: "给定位置的经度",
					},
				},
				required: ["latitude", "longitude"],
			},
		},
	],
});

return new Response(JSON.stringify(response.tool_calls));
```

然后，LLM 将返回一个带有必需参数和被调用工具名称的 JSON 对象。然后，您可以将此 JSON 对象传递以进行 API 调用。

```json
[
	{
		"arguments": { "latitude": "51.5074", "longitude": "-0.1278" },
		"name": "getWeather"
	}
]
```

有关如何进行函数调用的工作示例，请查看我们的[演示应用程序](https://github.com/craigsdennis/lightbulb-moment-tool-calling/blob/main/src/index.ts)。

---

# Fine-tunes

URL: https://developers.cloudflare.com/workers-ai/features/fine-tunes/

import { Feature } from "~/components";

Learn how to use Workers AI to get fine-tuned inference.

<Feature header="Fine-tuned inference with LoRAs" href="/workers-ai/features/fine-tunes/loras/" cta="Run inference with LoRAs">

Upload a LoRA adapter and run fine-tuned inference with one of our base models.

</Feature>

---

## What is fine-tuning?

Fine-tuning is a general term for modifying an AI model by continuing to train it with additional data. The goal of fine-tuning is to increase the probability that a generation is similar to your dataset. Training a model from scratch is not practical for many use cases given how expensive and time consuming they can be to train. By fine-tuning an existing pre-trained model, you benefit from its capabilities while also accomplishing your desired task.

[Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) (LoRA) is a specific fine-tuning method that can be applied to various model architectures, not just LLMs. It is common that the pre-trained model weights are directly modified or fused with additional fine-tune weights in traditional fine-tuning methods. LoRA, on the other hand, allows for the fine-tune weights and pre-trained model to remain separate, and for the pre-trained model to remain unchanged. The end result is that you can train models to be more accurate at specific tasks, such as generating code, having a specific personality, or generating images in a specific style.

---

# Using LoRA adapters

URL: https://developers.cloudflare.com/workers-ai/features/fine-tunes/loras/

import { TabItem, Tabs } from "~/components";

Workers AI supports fine-tuned inference with adapters trained with [Low-Rank Adaptation](https://blog.cloudflare.com/fine-tuned-inference-with-loras). This feature is in open beta and free during this period.

## Limitations

- We only support LoRAs for the following models (must not be quantized):

  - `@cf/meta/llama-3.2-11b-vision-instruct`
  - `@cf/meta/llama-3.3-70b-instruct-fp8-fast`
  - `@cf/meta/llama-guard-3-8b`
  - `@cf/meta/llama-3.1-8b-instruct-fast (soon)`
  - `@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`
  - `@cf/qwen/qwen2.5-coder-32b-instruct`
  - `@cf/qwen/qwq-32b`
  - `@cf/mistralai/mistral-small-3.1-24b-instruct`
  - `@cf/google/gemma-3-12b-it`

- Adapter must be trained with rank `r <=8` as well as larger ranks if up to 32. You can check the rank of a pre-trained LoRA adapter through the adapter's `config.json` file
- LoRA adapter file must be < 300MB
- LoRA adapter files must be named `adapter_config.json` and `adapter_model.safetensors` exactly
- You can test up to 30 LoRA adapters per account

---

## Choosing compatible LoRA adapters

### Finding open-source LoRA adapters

We have started a [Hugging Face Collection](https://huggingface.co/collections/Cloudflare/workers-ai-compatible-loras-6608dd9f8d305a46e355746e) that lists a few LoRA adapters that are compatible with Workers AI. Generally, any LoRA adapter that fits our limitations above should work.

### Training your own LoRA adapters

To train your own LoRA adapter, follow the [tutorial](/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/).

---

## Uploading LoRA adapters

In order to run inference with LoRAs on Workers AI, you'll need to create a new fine tune on your account and upload your adapter files. You should have a `adapter_model.safetensors` file with model weights and `adapter_config.json` with your config information. _Note that we only accept adapter files in these types._

Right now, you can't edit a fine tune's asset files after you upload it. We will support this soon, but for now you will need to create a new fine tune and upload files again if you would like to use a new LoRA.

Before you upload your LoRA adapter, you'll need to edit your `adapter_config.json` file to include `model_type` as one of `mistral`, `gemma` or `llama` like below.

```json null {10}
{
  "alpha_pattern": {},
  "auto_mapping": null,
  ...
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM",
  "model_type": "mistral",
}
```

### Wrangler

You can create a finetune and upload your LoRA adapter via wrangler with the following commands:

```bash title="wrangler CLI" {1,7}
npx wrangler ai finetune create <model_name> <finetune_name> <folder_path>
#🌀 Creating new finetune "test-lora" for model "@cf/mistral/mistral-7b-instruct-v0.2-lora"...
#🌀 Uploading file "/Users/abcd/Downloads/adapter_config.json" to "test-lora"...
#🌀 Uploading file "/Users/abcd/Downloads/adapter_model.safetensors" to "test-lora"...
#✅ Assets uploaded, finetune "test-lora" is ready to use.

npx wrangler ai finetune list
┌──────────────────────────────────────┬─────────────────┬─────────────┐
│ finetune_id                          │ name            │ description │
├──────────────────────────────────────┼─────────────────┼─────────────┤
│ 00000000-0000-0000-0000-000000000000 │ test-lora       │             │
└──────────────────────────────────────┴─────────────────┴─────────────┘
```

### REST API

Alternatively, you can use our REST API to create a finetune and upload your adapter files. You will need a Cloudflare API Token with `Workers AI: Edit` permissions to make calls to our REST API, which you can generate via the Cloudflare Dashboard.

#### Creating a fine-tune on your account

```bash title="cURL"
## Input: user-defined name of fine tune
## Output: unique finetune_id

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
    -H "Authorization: Bearer {API_TOKEN}" \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "SUPPORTED_MODEL_NAME",
      "name": "FINETUNE_NAME",
      "description": "OPTIONAL_DESCRIPTION"
    }'
```

#### Uploading your adapter weights and config

You have to call the upload endpoint each time you want to upload a new file, so you usually run this once for `adapter_model.safetensors` and once for `adapter_config.json`. Make sure you include the `@` before your path to files.

You can either use the finetune `name` or `id` that you used when you created the fine tune.

```bash title="cURL"
## Input: finetune_id, adapter_model.safetensors, then adapter_config.json
## Output: success true/false

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/{FINETUNE_ID}/finetune-assets/ \
    -H 'Authorization: Bearer {API_TOKEN}' \
    -H 'Content-Type: multipart/form-data' \
    -F 'file_name=adapter_model.safetensors' \
    -F 'file=@{PATH/TO/adapter_model.safetensors}'
```

#### List fine-tunes in your account

You can call this method to confirm what fine-tunes you have created in your account

<Tabs> <TabItem label="curl">

```bash title="cURL"
## Input: n/a
## Output: success true/false

curl -X GET https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
    -H 'Authorization: Bearer {API_TOKEN}'
```

</TabItem> <TabItem label="json output">

```json title="Example JSON output"
# Example output JSON
{
  "success": true,
  "result": [
    [{
       "id": "00000000-0000-0000-0000-000000000",
       "model": "@cf/meta-llama/llama-2-7b-chat-hf-lora",
       "name": "llama2-finetune",
       "description": "test"
    },
    {
       "id": "00000000-0000-0000-0000-000000000",
       "model": "@cf/mistralai/mistral-7b-instruct-v0.2-lora",
       "name": "mistral-finetune",
       "description": "test"
    }]
  ]
}
```

</TabItem> </Tabs>

---

## Running inference with LoRAs

To make inference requests and apply the LoRA adapter, you will need your model and finetune `name` or `id`. You should use the chat template that your LoRA was trained on, but you can try running it with `raw: true` and the messages template like below.

<Tabs> <TabItem label="workers ai sdk">

```javascript null {5-6}
const response = await env.AI.run(
	"@cf/mistralai/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
	{
		messages: [{ role: "user", content: "Hello world" }],
		raw: true, //skip applying the default chat template
		lora: "00000000-0000-0000-0000-000000000", //the finetune id OR name
	},
);
```

</TabItem> <TabItem label="rest api">

```bash null {5-6}
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/mistral/mistral-7b-instruct-v0.2-lora \
  -H 'Authorization: Bearer {API_TOKEN}' \
  -d '{
    "messages": [{"role": "user", "content": "Hello world"}],
    "raw": "true",
    "lora": "00000000-0000-0000-0000-000000000"
  }'
```

</TabItem> </Tabs>

---

# Public LoRA adapters

URL: https://developers.cloudflare.com/workers-ai/features/fine-tunes/public-loras/

Cloudflare offers a few public LoRA adapters that can immediately be used for fine-tuned inference. You can try them out immediately via our [playground](https://playground.ai.cloudflare.com).

Public LoRAs will have the name `cf-public-x`, and the prefix will be reserved for Cloudflare.

:::note


Have more LoRAs you would like to see? Let us know on [Discord](https://discord.cloudflare.com).


:::

| Name                                                                       | Description                        | Compatible with                                                                     |
| -------------------------------------------------------------------------- | ---------------------------------- | ----------------------------------------------------------------------------------- |
| [cf-public-magicoder](https://huggingface.co/predibase/magicoder)          | Coding tasks in multiple languages | `@cf/mistral/mistral-7b-instruct-v0.1` <br/> `@hf/mistral/mistral-7b-instruct-v0.2` |
| [cf-public-jigsaw-classification](https://huggingface.co/predibase/jigsaw) | Toxic comment classification       | `@cf/mistral/mistral-7b-instruct-v0.1` <br/> `@hf/mistral/mistral-7b-instruct-v0.2` |
| [cf-public-cnn-summarization](https://huggingface.co/predibase/cnn)        | Article summarization              | `@cf/mistral/mistral-7b-instruct-v0.1` <br/> `@hf/mistral/mistral-7b-instruct-v0.2` |

You can also list these public LoRAs with an API call:

```bash
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/finetunes/public \
 --header 'Authorization: Bearer {cf_token}'
```

## Running inference with public LoRAs

To run inference with public LoRAs, you just need to define the LoRA name in the request.

We recommend that you use the prompt template that the LoRA was trained on. You can find this in the HuggingFace repos linked above for each adapter.

### cURL

```bash null {10}
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/mistral/mistral-7b-instruct-v0.1 \
  --header 'Authorization: Bearer {cf_token}' \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "Write a python program to check if a number is even or odd."
      }
    ],
    "lora": "cf-public-magicoder"
  }'
```

### JavaScript

```js null {11}
const answer = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.1',
  {
    stream: true,
    raw: true,
    messages: [
      {
        "role": "user",
        "content": "Summarize the following: Some newspapers, TV channels and well-known companies publish false news stories to fool people on 1 April. One of the earliest examples of this was in 1957 when a programme on the BBC, the UKs national TV channel, broadcast a report on how spaghetti grew on trees. The film showed a family in Switzerland collecting spaghetti from trees and many people were fooled into believing it, as in the 1950s British people didnt eat much pasta and many didnt know how it was made! Most British people wouldnt fall for the spaghetti trick today, but in 2008 the BBC managed to fool their audience again with their Miracles of Evolution trailer, which appeared to show some special penguins that had regained the ability to fly. Two major UK newspapers, The Daily Telegraph and the Daily Mirror, published the important story on their front pages."
      }
    ],
    lora: "cf-public-cnn-summarization"
  });
```

---

# Workers Logpush

URL: https://developers.cloudflare.com/ai-gateway/observability/logging/logpush/

import { Render, Tabs, TabItem } from "~/components";

AI 网关允许您安全地将日志导出到外部存储位置，在那里您可以解密和处理它们。
您可以在 [Cloudflare 仪表板](https://dash.cloudflare.com) 设置中开启和关闭 Workers Logpush。此产品在 Workers 付费计划中可用。有关定价信息，请参阅[定价](/ai-gateway/reference/pricing)。

本指南解释了如何为 AI 网关设置 Workers Logpush，生成用于加密的 RSA 密钥对，以及在接收到日志后如何解密日志。

您每个网关最多可以存储 1000 万条日志。如果达到限制，新日志将停止保存，也不会通过 Workers Logpush 导出。要继续保存和导出日志，您必须删除较旧的日志以为新日志释放空间。Workers Logpush 限制为 4 个作业，每个日志的最大请求大小为 1 MB。

:::note[注意]

要使用 Workers Logpush 导出日志，您必须为网关开启日志记录。

:::

<Render file="limits-increase" product="ai-gateway" />

## 日志是如何加密的

我们采用混合加密模型来提高效率和安全性。首先，为每个日志生成一个 AES 密钥。这个 AES 密钥实际加密您的大部分数据，选择它是因为它在高效处理大型数据集方面的速度和安全性。

现在，为了安全地共享这个 AES 密钥，我们使用 RSA 加密。以下是发生的过程：AES 密钥虽然轻量级，但需要安全地传输给接收者。我们使用接收者的 RSA 公钥加密此密钥。此步骤利用 RSA 在安全密钥分发方面的优势，确保只有拥有相应 RSA 私钥的人才能解密和使用 AES 密钥。

加密后，AES 加密的数据和 RSA 加密的 AES 密钥一起发送。到达后，接收者的系统使用 RSA 私钥解密 AES 密钥。现在可以访问 AES 密钥，解密主数据载荷就很简单了。

此方法结合了两个世界的优点：用于数据加密的 AES 效率与 RSA 的安全密钥交换能力，确保在整个数据生命周期中最佳地维护数据完整性、机密性和性能。

## 设置 Workers Logpush

要为 AI 网关配置 Workers Logpush，请按以下步骤操作：

## 1. 在本地生成 RSA 密钥对

您需要生成一个密钥对来加密和解密日志。此脚本将输出您的 RSA 私钥和公钥。保持私钥安全，因为它将用于解密日志。下面是使用 Node.js 和 OpenSSL 生成密钥的示例脚本。

<Tabs syncKey="JSPlusSSL"> <TabItem label="JavaScript">

```js title="JavaScript"
const crypto = require("crypto");

const { privateKey, publicKey } = crypto.generateKeyPairSync("rsa", {
	modulusLength: 4096,
	publicKeyEncoding: {
		type: "spki",
		format: "pem",
	},
	privateKeyEncoding: {
		type: "pkcs8",
		format: "pem",
	},
});

console.log(publicKey);
console.log(privateKey);
```

通过在终端中执行以下代码运行脚本。将 `file name` 替换为您的 JavaScript 文件名。

```bash
node {file name}
```

</TabItem> <TabItem label="OpenSSL">

1. 生成私钥：
   使用以下命令生成 RSA 私钥：

   ```bash
   openssl genpkey -algorithm RSA -out private_key.pem -pkeyopt rsa_keygen_bits:4096
   ```

2. 生成公钥：
   生成私钥后，您可以使用以下命令提取相应的公钥：

   ```bash
   openssl rsa -pubout -in private_key.pem -out public_key.pem
   ```

</TabItem> </Tabs>

## 2. 将公钥上传到网关设置

生成密钥对后，将公钥上传到您的 AI 网关设置。此密钥将用于加密您的日志。要启用 Workers Logpush，您需要为该网关启用日志记录。

## 3. 设置 Logpush

要设置 Logpush，请参阅 [Logpush 快速开始](/logs/get-started/)。

## 4. 接收加密日志

配置 Workers Logpush 后，日志将使用您上传的公钥进行加密发送。要访问数据，您需要使用私钥对其进行解密。日志将发送到您选择的对象存储提供商。

## 5. 解密日志

要解密来自 AI 网关的加密日志正文和元数据，您可以使用以下 Node.js 脚本或 OpenSSL：

<Tabs syncKey="JSPlusSSL"> <TabItem label="JavaScript">

要解密来自 AI 网关的加密日志正文和元数据，请将日志下载到一个文件夹，在本例中名为 `my_log.log.gz`。

然后将此 JavaScript 文件复制到同一文件夹中，并将您的私钥放在顶部变量中。

```js title="JavaScript"
const privateKeyStr = `-----BEGIN RSA PRIVATE KEY-----
....
-----END RSA PRIVATE KEY-----`;

const crypto = require("crypto");
const privateKey = crypto.createPrivateKey(privateKeyStr);

const fs = require("fs");
const zlib = require("zlib");
const readline = require("readline");

async function importAESGCMKey(keyBuffer) {
	try {
		// 确保密钥长度对 AES 有效
		if ([128, 192, 256].includes(256)) {
			return await crypto.webcrypto.subtle.importKey(
				"raw",
				keyBuffer,
				{
					name: "AES-GCM",
					length: 256,
				},
				true, // 密钥是否可提取（在此情况下为 true，以便稍后需要时允许导出）
				["encrypt", "decrypt"], // 用于加密和解密
			);
		} else {
			throw new Error("无效的 AES 密钥长度。必须是 128、192 或 256 位。");
		}
	} catch (error) {
		console.error("导入密钥失败：", error);
		throw error;
	}
}

async function decryptData(encryptedData, aesKey, iv) {
	const decryptedData = await crypto.subtle.decrypt(
		{ name: "AES-GCM", iv: iv },
		aesKey,
		encryptedData,
	);
	return new TextDecoder().decode(decryptedData);
}

async function decryptBase64(privateKey, data) {
	if (data.key === undefined) {
		return data;
	}

	const aesKeyBuf = crypto.privateDecrypt(
		{
			key: privateKey,
			oaepHash: "SHA256",
		},
		Buffer.from(data.key, "base64"),
	);
	const aesKey = await importAESGCMKey(aesKeyBuf);

	const decryptedData = await decryptData(
		Buffer.from(data.data, "base64"),
		aesKey,
		Buffer.from(data.iv, "base64"),
	);

	return decryptedData.toString();
}

async function run() {
	let lineReader = readline.createInterface({
		input: fs.createReadStream("my_log.log.gz").pipe(zlib.createGunzip()),
	});

	lineReader.on("line", async (line) => {
		line = JSON.parse(line);

		const { Metadata, RequestBody, ResponseBody, ...remaining } = line;

		console.log({
			...remaining,
			Metadata: await decryptBase64(privateKey, Metadata),
			RequestBody: await decryptBase64(privateKey, RequestBody),
			ResponseBody: await decryptBase64(privateKey, ResponseBody),
		});
		console.log("--");
	});
}

run();
```

通过在终端中执行以下代码运行脚本。将 `file name` 替换为您的 JavaScript 文件名。

```bash
node {file name}
```

The script reads the encrypted log file `(my_log.log.gz)`, decrypts the metadata, request body, and response body, and prints the decrypted data.
Ensure you replace the `privateKey` variable with your actual private RSA key that you generated in step 1.

</TabItem> <TabItem label="OpenSSL">

1. Decrypt the encrypted log file using the private key.

Assuming that the logs were encrypted with the public key (for example `public_key.pem`), you can use the private key (`private_key.pem`) to decrypt the log file.

For example, if the encrypted logs are in a file named `encrypted_logs.bin`, you can decrypt it like this:

```bash
openssl rsautl -decrypt -inkey private_key.pem -in encrypted_logs.bin -out decrypted_logs.txt
```

- `-decrypt` tells OpenSSL that we want to decrypt the file.
- `-inkey private_key.pem` specifies the private key that will be used to decrypt the logs.
- `-in encrypted_logs.bin` is the encrypted log file.
- `-out decrypted_logs.txt`decrypted logs will be saved into this file.

2. View the decrypted logs
   Once decrypted, you can view the logs by simply running:

```bash
cat decrypted_logs.txt
```

This command will output the decrypted logs to the terminal.

</TabItem> </Tabs>

---

# 将新的 AI 模型添加到您的游乐场（第 2 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/

import { Details, DirectoryListing, Stream } from "~/components";

在第 2 部分中，Kristian 通过向您展示如何集成新的 AI 模型并引入新的参数来扩展第 1 部分中构建的现有环境，这些参数允许您自定义图像的生成方式。

<Stream
	id="167ba3a7a86f966650f3315e6cb02e0d"
	title="将新的 AI 模型添加到您的游乐场（第 2 部分）"
	thumbnail="13.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 日志记录

URL: https://developers.cloudflare.com/ai-gateway/observability/logging/

import { Render } from "~/components";

日志记录是应用开发的基本构建块。日志在开发的早期阶段提供洞察，并且通常对于理解生产中发生的问题至关重要。

您的 AI 网关仪表板显示单个请求的日志，包括用户提示、模型响应、提供商、时间戳、请求状态、令牌使用量、成本和持续时间。这些日志会持久化，为您提供按首选持续时间存储它们的灵活性，并对有价值的请求数据做更多事情。

默认情况下，每个网关最多可以存储 1000 万条日志。您可以在网关设置中为每个网关自定义此限制，以符合您的特定要求。如果达到存储限制，新日志将停止保存。要继续保存日志，您必须删除较旧的日志以为新日志释放空间。
要了解更多关于您的计划限制，请参阅[限制](/ai-gateway/reference/limits/)。

我们建议在存储日志时使用已验证的网关，以防止未经授权的访问，并防范可能增加日志存储使用量并使您难以找到所需数据的无效请求。了解更多关于设置[已验证网关](/ai-gateway/configuration/authentication/)的信息。

## 默认配置

日志（包括指标以及请求和响应数据）默认为每个网关启用。此日志记录行为将统一应用于网关中的所有请求。如果您担心隐私或合规性并想关闭日志收集，您可以转到设置并选择退出日志。如果您需要为特定请求修改日志设置，您可以在每个请求的基础上覆盖此设置。

<Render file="logging" />

## 每个请求的日志记录

要覆盖在设置选项卡中设置的默认日志记录行为，您可以在每个请求的基础上定义标头。

### 收集日志 (`cf-aig-collect-log`)

`cf-aig-collect-log` 标头允许您绕过网关的默认日志设置。如果网关配置为保存日志，标头将排除该特定请求的日志。相反，如果在网关级别禁用日志记录，此标头将为该请求保存日志。

在下面的示例中，我们使用 `cf-aig-collect-log` 绕过默认设置以避免保存日志。

```bash
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai/chat/completions \
  --header "Authorization: Bearer $TOKEN" \
  --header 'Content-Type: application/json' \
  --header 'cf-aig-collect-log: false \
  --data ' {
        "model": "gpt-4o-mini",
        "messages": [
          {
            "role": "user",
            "content": "What is the email address and phone number of user123?"
          }
        ]
      }
'
```

## 管理日志存储

要有效管理您的日志存储，您可以：

- 设置存储限制：在您的网关设置中配置每个网关存储的日志数量限制，以确保您只为所需的内容付费。
- 启用自动日志删除：在您的网关设置中激活自动日志删除功能，以在达到您设置的日志限制或默认存储限制 1000 万条日志时自动删除最旧的日志。这确保新日志始终得到保存，无需手动干预。

## 如何删除日志

要有效管理您的日志存储并确保持续日志记录，您可以使用以下方法删除日志：

### 自动日志删除

要在网关存储约束内维持持续日志记录，请在您的网关设置中启用自动日志删除。此功能在达到您设置的日志限制或默认存储限制 1000 万条日志时自动删除最旧的日志，确保新日志得到保存，无需手动干预。

### 手动删除

要通过仪表板手动删除日志，请导航到仪表板中的日志选项卡。使用可用的过滤器，如状态、缓存、提供商、成本或下拉菜单中的任何其他选项来细化您希望删除的日志。过滤后，选择删除日志以完成操作。

请参阅下面可用过滤器及其描述的完整列表：

| 过滤器类别 | 过滤器选项                        | 过滤器描述               |
| ---------- | --------------------------------- | ------------------------ |
| 状态       | 错误，状态                        | 错误类型或状态。         |
| 缓存       | 已缓存，未缓存                    | 基于是否被缓存。         |
| 提供商     | 特定提供商                        | 选定的 AI 提供商。       |
| AI 模型    | 特定模型                          | 选定的 AI 模型。         |
| 成本       | 小于，大于                        | 成本，指定阈值。         |
| 请求类型   | 通用，Workers AI 绑定，WebSockets | 请求的类型。             |
| 令牌       | 总令牌，输入令牌，输出令牌        | 令牌计数（小于或大于）。 |
| 持续时间   | 小于，大于                        | 请求持续时间。           |
| 反馈       | 等于，不等于（赞，踩，无反馈）    | 反馈类型。               |
| 元数据键   | 等于，不等于                      | 特定元数据键。           |
| 元数据值   | 等于，不等于                      | 特定元数据值。           |
| 日志 ID    | 等于，不等于                      | 特定日志 ID。            |
| 事件 ID    | 等于，不等于                      | 特定事件 ID。            |

### API 删除

您可以使用 AI 网关 API 以编程方式删除日志。有关 `DELETE` 日志端点的更全面信息，请查看 [Cloudflare API 文档](/api/resources/ai_gateway/subresources/logs/methods/delete/)。

---

# 构建 AI 图像生成器游乐场（第 1 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/

import { Details, DirectoryListing, Stream } from "~/components";

Workers AI 上新的 flux 模型是我们迄今为止最强大的文本到图像 AI 模型。在本视频中，我们将向您展示如何在短短几分钟内一部署您自己的 Workers AI 图像游乐场。

有许多企业建立在 AI 图像生成模型之上。使用 Workers AI，您可以访问业内最好的模型，而无需担心推理、运营或部署。我们提供用于 AI 图像生成的 API，并在几秒钟内取回图像。

<Stream
	id="aeafae151e84a81be19c52c2348e9bab"
	title="构建 AI 图像生成器游乐场（第 1 部分）"
	thumbnail="2.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 使用 R2 存储和编录 AI 生成的图像（第 3 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/

import { Details, DirectoryListing, Stream } from "~/components";

在 AI 图像游乐场系列的最后一部分，Kristian 将教您如何利用 Cloudflare 的 [R2](/r2) 对象存储来维护和跟踪每个 AI 生成的图像。

<Stream
	id="86488269da24984c76fb10f69f4abb44"
	title="存储和编录 AI 生成的图像（第 3 部分）"
	thumbnail="2.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 使用 R2 存储和AI生成的图像（第 3 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation--playground/image-generator-store-and-catalog/

import { Details, DirectoryListing, Stream } from "~/components";

在 AI 图像游乐场系列的最后一部分，Kristian 将教您如何利用 Cloudflare 的 [R2](/r2) 对象存储来维护和跟踪每个 AI 生成的图像。

<Stream
	id="86488269da24984c76fb10f69f4abb44"
	title="存储和编录 AI 生成的图像（第 3 部分）"
	thumbnail="2.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 如何使用 Workers AI 构建图像生成器

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/

import { Details, DirectoryListing, Stream } from "~/components";

在本系列视频中，Kristian Freeman 构建了一个 AI 图像游乐场。要开始使用，请单击下面的第 1 部分。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# API 参考

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/api-reference/

了解有关[嵌入式函数调用](/workers-ai/features/function-calling/embedded)的 API 参考的更多信息。

## runWithTools

此包装器方法使您能够进行嵌入式函数调用。您可以向其传递 AI 绑定、模型、输入（`messages` 数组和 `tools` 数组）以及可选配置。

- `AI Binding`Ai
  - AI 绑定，例如 `env.AI`。
- `model`BaseAiTextGenerationModels
  - 支持函数调用的模型的 ID。例如，`@hf/nousresearch/hermes-2-pro-mistral-7b`。
- `input`Object
  - `messages`RoleScopedChatInput\[]
  - `tools`AiTextGenerationToolInputWithFunction\[]
- `config`Object
  - `streamFinalResponse`boolean 可选
  - `maxRecursiveToolRuns`number 可选
  - `strictValidation`boolean 可选
  - `verbose`boolean 可选
  - `trimFunction`boolean 可选 - 对于 `trimFunction`，您可以向其传递 `autoTrimTools`，这是我们设计的另一个辅助方法，用于在将其发送以进行推理之前自动选择正确的工具（使用 LLM）。这意味着您的最终推理调用将具有更少的输入令牌。

## createToolsFromOpenAPISpec

此方法使您可以根据 OpenAPI 规范自动创建工具模式，因此您不必手动编写或硬编码工具模式。您可以以 JSON 或 YAML 格式传递任何 API 的 OpenAPI 规范。

`createToolsFromOpenAPISpec` 有一个配置输入，如果您需要提供诸如身份验证或用户代理之类的标头，则允许您执行覆盖。

- `spec`string
  - JSON 或 YAML 格式的 OpenAPI 规范，或指向远程 OpenAPI 规范的 URL。
- `config`Config 可选 - createToolsFromOpenAPISpec 函数的配置选项
  - `overrides`ConfigRule\[] 可选
  - `matchPatterns`RegExp\[] 可选
  - `options` Object 可选 \{
    `verbose` boolean 可选
    \}

---

# 入门指南

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/get-started/

import { TypeScriptExample, PackageManagers } from "~/components";

本指南将指导您设置和部署您的第一个带有嵌入式函数调用的 Workers AI 项目。您将使用 Workers、Workers AI 绑定、[`ai-utils 包`](https://github.com/cloudflare/ai-utils) 和一个大型语言模型 (LLM)，以在 Cloudflare 全球网络上部署您的第一个带有嵌入式函数调用的 AI 驱动的应用程序。

## 1. 使用 Workers AI 创建一个 Worker 项目

请按照 [Workers AI 入门指南](/workers-ai/get-started/workers-wrangler/) 直到第 2 步。

## 2. 安装额外的 npm 包

接下来，在您的项目存储库中运行以下命令以安装 Worker AI 实用程序包。

<PackageManagers pkg="@cloudflare/ai-utils" />

## 3. 添加 Workers AI 嵌入式函数调用

使用以下代码更新您应用程序目录中的 `index.ts` 文件：

<TypeScriptExample filename="index.ts">

```ts
import { runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
};

export default {
	async fetch(request, env, ctx) {
		// 定义函数
		const sum = (args: { a: number; b: number }): Promise<string> => {
			const { a, b } = args;
			return Promise.resolve((a + b).toString());
		};
		// 使用函数调用运行 AI 推理
		const response = await runWithTools(
			env.AI,
			// 支持函数调用的模型
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				// 消息
				messages: [
					{
						role: "user",
						content: "123123123 + 10343030 的结果是多少？",
					},
				],
				// AI 模型可以利用的可用工具的定义
				tools: [
					{
						name: "sum",
						description: "将两个数字相加并返回结果",
						parameters: {
							type: "object",
							properties: {
								a: { type: "number", description: "第一个数字" },
								b: { type: "number", description: "第二个数字" },
							},
							required: ["a", "b"],
						},
						// 引用先前定义的函数
						function: sum,
					},
				],
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

</TypeScriptExample>

此示例使用 `import { runWithTools} from "@cloudflare/ai-utils"` 导入实用程序，并遵循下面的 API 参考。

此外，在此示例中，我们定义并描述了 LLM 可以用来响应用户查询的工具列表。在这里，该列表仅包含一个工具，即 `sum` 函数。

由 `runWithTools` 函数抽象出来，会发生以下步骤：

```mermaid
sequenceDiagram
    participant Worker as Worker
    participant WorkersAI as Workers AI

    Worker->>+WorkersAI: 发送消息、函数调用提示和可用工具
    WorkersAI->>+Worker: 为函数调用选择工具和参数
    Worker-->>-Worker: 执行函数
    Worker-->>+WorkersAI: 发送消息、函数调用提示和函数结果
    WorkersAI-->>-Worker: 发送包含函数输出的响应
```

`ai-utils 包` 也在 [Github](https://github.com/cloudflare/ai-utils) 上开源。

## 4. 本地开发和部署

请按照 [Workers AI 入门指南](/workers-ai/get-started/workers-wrangler/) 的第 4 步和第 5 步进行本地开发和部署。

:::note[Workers AI 嵌入式函数调用费用]

嵌入式函数调用运行 Workers AI 推理请求。将收取推理（例如令牌）使用的标准费用。
嵌入式函数代码执行期间消耗的资源（例如 CPU 时间）将像任何其他 Worker 代码执行一样收费。

:::

## API 参考

有关更多详细信息，请参阅 [API 参考](/workers-ai/features/function-calling/embedded/api-reference/)。

---

# 嵌入式

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/

import { DirectoryListing } from "~/components";

Cloudflare 具有独特的[嵌入式函数调用](https://blog.cloudflare.com/embedded-function-calling)功能，允许您在工具调用推理的同时执行函数代码。我们的 npm 包 [`@cloudflare/ai-utils`](https://www.npmjs.com/package/@cloudflare/ai-utils) 是入门的开发人员工具包。

嵌入式函数调用可用于轻松创建与网站和 API 交互的复杂代理，例如使用自然语言在 Google 日历上创建会议、将数据保存到 Notion、自动将请求路由到其他 API、将数据保存到 R2 存储桶 - 或者同时完成所有这些。您只需要一个提示和一个 OpenAPI 规范即可开始。

:::caution[REST API 支持]

嵌入式函数调用依赖于 Workers 平台的原生功能。这意味着嵌入式函数调用仅通过 [Cloudflare Workers](/workers-ai/get-started/workers-wrangler/) 支持，而不通过 [REST API](/workers-ai/get-started/rest-api/) 支持。

:::

## 资源

<DirectoryListing />

---

# 故障排除

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/troubleshooting/

本节将介绍故障排除工具并解决常见错误。

## 日志记录

Workers 的常规 [日志记录](/workers/observability/logs/) 功能也适用于嵌入式函数调用。

### 函数调用

可以使用 `console.log()` 像在任何 Worker 中一样记录工具的调用：

```ts title="记录工具调用" {6}
export default {
	async fetch(request, env, ctx) {
		const sum = (args: { a: number; b: number }): Promise<string> => {
			const { a, b } = args;
      // 从嵌入式函数调用中记录日志
      console.log(`sum 函数已使用参数 a: ${a} 和 b: ${b} 被调用`)
			return Promise.resolve((a + b).toString());
		};
    ...
  }
}
```

### 在 `runWithTools` 中记录日志

`runWithTools` 函数有一个 `verbose` 模式，它会发出有用的日志，用于调试函数调用以及输入和输出统计信息。

```ts title="启用详细模式" {13}
const response = await runWithTools(
  env.AI,
  '@hf/nousresearch/hermes-2-pro-mistral-7b',
  {
    messages: [
      ...
    ],
    tools: [
      ...
    ],
  },
  // 启用详细模式
  { verbose: true }
);
```

## 性能

要使用嵌入式函数响应 LLM 提示，可能需要多个 AI 推理请求和函数调用，这可能会影响用户体验。

考虑以下几点来提高性能：

- 缩短提示（以减少输入处理时间）
- 减少提供的工具数量
- 将最终响应流式传输给最终用户（以最小化交互时间）。请参阅以下示例：

```ts title="流式响应示例" {15}
async fetch(request, env, ctx) {
  const response = (await runWithTools(
    env.AI,
    '@hf/nousresearch/hermes-2-pro-mistral-7b',
    {
      messages: [
        ...
      ],
      tools: [
        ...
      ],
    },
    {
      // 启用响应流
      streamFinalResponse: true,
    }
  )) as ReadableStream;

  // 设置流式响应头
  return new Response(response, {
    headers: {
      'content-type': 'text/event-stream',
    },
  });
}
```

## 常见错误

如果您收到 `BadInput` 错误，则您的输入可能超出了我们模型的当前上下文窗口。请尝试减少输入令牌以解决此错误。

---

# 使用 fetch() 处理程序

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/fetch/

一个非常常见的用例是为 LLM 提供通过函数调用执行 API 调用的能力。

在此示例中，LLM 将检索未来 5 天的天气预报。
为此，定义了一个 `getWeather` 函数，并将其作为工具传递给 LLM。

`getWeather` 函数从请求中提取用户的位置，并通过 Workers 的 [`Fetch API`](/workers/runtime-apis/fetch/) 调用外部天气 API 并返回结果。

```ts title="带有 fetch() 的嵌入式函数调用示例"
import { runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
};

export default {
	async fetch(request, env, ctx) {
		// 定义函数
		const getWeather = async (args: { numDays: number }) => {
			const { numDays } = args;
			// 根据 https://developers.cloudflare.com/workers/runtime-apis/request/#incomingrequestcfproperties 从请求中提取位置
			const lat = request.cf?.latitude;
			const long = request.cf?.longitude;

			// 为外部 API 调用插值
			const response = await fetch(
				`https://api.open-meteo.com/v1/forecast?latitude=${lat}&longitude=${long}&daily=temperature_2m_max,precipitation_sum&timezone=GMT&forecast_days=${numDays}`,
			);
			return response.text();
		};
		// 使用函数调用运行 AI 推理
		const response = await runWithTools(
			env.AI,
			// 支持函数调用的模型
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				// 消息
				messages: [
					{
						role: "user",
						content: "未来 5 天天气如何？以文本形式回应",
					},
				],
				// AI 模型可以利用的可用工具的定义
				tools: [
					{
						name: "getWeather",
						description: "获取未来 [numDays] 天的天气",
						parameters: {
							type: "object",
							properties: {
								numDays: { type: "numDays", description: "天气预报的天数" },
							},
							required: ["numDays"],
						},
						// 引用先前定义的函数
						function: getWeather,
					},
				],
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

---

# 示例

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 使用 KV API

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/kv/

与持久性存储交互以检索或存储信息，可实现强大的用例。

在此示例中，我们展示了嵌入式函数调用如何通过几行代码与 Cloudflare 开发者平台上的其他资源进行交互。

## 先决条件

要使此示例正常工作，您需要首先配置一个 [KV](/kv/) 命名空间。为此，请遵循 [KV - 入门](/kv/get-started/) 指南。

重要的是，必须更新您的 Wrangler 文件以包含到您相应命名空间的 `KV` 绑定定义。

## Worker 代码

```ts title="使用 KV API 的嵌入式函数调用示例"
import { runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
	KV: KVNamespace;
};

export default {
	async fetch(request, env, ctx) {
		// 定义函数
		const updateKvValue = async ({
			key,
			value,
		}: {
			key: string;
			value: string;
		}) => {
			const response = await env.KV.put(key, value);
			return `成功更新数据库中的键值对：${response}`;
		};

		// 使用函数调用运行 AI 推理
		const response = await runWithTools(
			env.AI,
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [
					{ role: "system", content: "将用户给定的值放入 KV" },
					{ role: "user", content: "将 banana 的值设置为 yellow。" },
				],
				tools: [
					{
						name: "KV 更新",
						description: "更新数据库中的键值对",
						parameters: {
							type: "object",
							properties: {
								key: {
									type: "string",
									description: "要更新的键",
								},
								value: {
									type: "string",
									description: "要更新的值",
								},
							},
							required: ["key", "value"],
						},
						function: updateKvValue,
					},
				],
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

## 验证结果

要验证结果，请运行以下命令

```sh
npx wrangler kv key get banana --binding KV --local
```

---

# 基于 OpenAPI 规范的工具

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/openapi/

API 通常通过 [OpenAPI 规范](https://swagger.io/specification/) 进行定义和记录。Cloudflare `ai-utils` 包的 `createToolsFromOpenAPISpec` 函数从 OpenAPI 规范创建工具，然后 LLM 可以利用这些工具来完成提示。

在此示例中，LLM 将根据 Github 的 API 及其 OpenAPI 规范描述一个 Github 用户。

```ts title="来自 OpenAPI 规范的嵌入式函数调用示例"
import { createToolsFromOpenAPISpec, runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
};

const APP_NAME = "cf-fn-calling-example-app";

export default {
	async fetch(request, env, ctx) {
		const toolsFromOpenAPISpec = [
			// 您可以直接传递 OpenAPI 规范链接或内容
			...(await createToolsFromOpenAPISpec(
				"https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json",
				{
					overrides: [
						{
							matcher: ({ url }) => {
								return url.hostname === "api.github.com";
							},
							// 对于 *.github.com 上的所有请求，我们需要添加一个 User-Agent。
							values: {
								headers: {
									"User-Agent": APP_NAME,
								},
							},
						},
					],
				},
			)),
		];

		const response = await runWithTools(
			env.AI,
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [
					{
						role: "user",
						content: "Github 上的 cloudflare 是谁，该组织有多少个仓库？",
					},
				],
				tools: toolsFromOpenAPISpec,
			},
		);

		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

---