# 变更日志

URL: https://developers.cloudflare.com/workers-ai/changelog/

import { ProductReleaseNotes } from "~/components";

{/* <!-- Actual content lives in /src/content/release-notes/workers-ai.yaml. Update the file there for new entries to appear here. For more details, refer to https://developers.cloudflare.com/style-guide/documentation-content-strategy/content-types/changelog/#yaml-file --> */}

<ProductReleaseNotes />

---

# 代理

URL: https://developers.cloudflare.com/workers-ai/agents/

import { LinkButton } from "~/components";

<div style={{ textAlign: "center", marginBottom: "2rem" }}>
	<p>
		使用 Cloudflare Workers AI 和代理构建能够代表您的用户执行复杂任务的 AI
		助手。
	</p>
	<LinkButton href="/agents/">转到代理文档</LinkButton>
</div>

---

# Cloudflare Workers AI

URL: https://developers.cloudflare.com/workers-ai/

import {
	CardGrid,
	Description,
	Feature,
	LinkTitleCard,
	Plan,
	RelatedProduct,
	Render,
	LinkButton,
	Flex,
} from "~/components";

<Description>

在 Cloudflare 的全球网络上，由无服务器 GPU 提供支持，运行机器学习模型。

</Description>

<Plan type="workers-all" />

Workers AI 允许您以无服务器的方式运行 AI 模型，无需担心扩展、维护或为未使用的基础设施付费。您可以从您自己的代码中——从 [Workers](/workers/)、[Pages](/pages/) 或通过 [Cloudflare API](/api/resources/ai/methods/run/) 的任何地方——调用在 Cloudflare 网络上的 GPU 上运行的模型。

Workers AI 让您可以访问：

- **50多种[开源模型](/workers-ai/models/)**，作为我们模型目录的一部分提供
- 无服务器、**按使用付费**的[定价模型](/workers-ai/platform/pricing/)
- 所有这些都作为**功能齐全的开发者平台**的一部分，包括 [AI 网关](/ai-gateway/)、[Vectorize](/vectorize/)、[Workers](/workers/) 等等...

<div>
	<LinkButton href="/workers-ai/get-started">开始使用</LinkButton>
	<LinkButton
		target="_blank"
		variant="secondary"
		icon="external"
		href="https://youtu.be/cK_leoJsBWY?si=4u6BIy_uBOZf9Ve8"
	>
		观看 Workers AI 演示
	</LinkButton>
</div>

<Render file="custom_requirements" />

<Render file="file_issues" />

---

## 功能

<Feature header="模型" href="/workers-ai/models/" cta="浏览模型">

Workers AI 配备了一系列精选的流行开源模型，使您能够执行图像分类、文本生成、对象检测等任务。

</Feature>

---

## 相关产品

<RelatedProduct header="AI 网关" href="/ai-gateway/" product="ai-gateway">

通过缓存、速率限制、请求重试、模型回退等功能，观察和控制您的 AI 应用程序。

</RelatedProduct>

<RelatedProduct header="Vectorize" href="/vectorize/" product="vectorize">

使用 Cloudflare 的矢量数据库 Vectorize 构建全栈 AI 应用程序。添加 Vectorize 使您能够执行语义搜索、推荐、异常检测等任务，或用于为 LLM 提供上下文和记忆。

</RelatedProduct>

<RelatedProduct header="Workers" href="/workers/" product="workers">

构建无服务器应用程序并立即在全球范围内部署，以获得卓越的性能、可靠性和规模。

</RelatedProduct>

<RelatedProduct header="Pages" href="/pages/" product="pages">

创建立即部署到 Cloudflare 全球网络的全栈应用程序。

</RelatedProduct>

<RelatedProduct header="R2" href="/r2/" product="r2">

存储大量非结构化数据，而无需支付与典型云存储服务相关的昂贵出口带宽费用。

</RelatedProduct>

<RelatedProduct header="D1" href="/d1/" product="d1">

创建新的无服务器 SQL 数据库，以便从您的 Workers 和 Pages 项目中查询。

</RelatedProduct>

<RelatedProduct header="Durable Objects" href="/durable-objects/" product="durable-objects">

具有强一致性存储的全球分布式协调 API。

</RelatedProduct>

<RelatedProduct header="KV" href="/kv/" product="kv">

创建全球性、低延迟的键值数据存储。

</RelatedProduct>

---

## 更多资源

<CardGrid>

<LinkTitleCard
	title="开始使用"
	href="/workers-ai/get-started/workers-wrangler/"
	icon="open-book"
>
	构建和部署您的第一个 Workers AI 应用程序。
</LinkTitleCard>

<LinkTitleCard
	title="计划"
	href="/workers-ai/platform/pricing/"
	icon="seti:shell"
>
	了解免费和付费计划。
</LinkTitleCard>

<LinkTitleCard title="限制" href="/workers-ai/platform/limits/" icon="document">
	了解 Workers AI 的限制。
</LinkTitleCard>

<LinkTitleCard title="用例" href="/use-cases/ai/" icon="document">
	了解如何构建和部署雄心勃勃的 AI 应用程序到 Cloudflare 的全球网络。
</LinkTitleCard>

<LinkTitleCard
	title="存储选项"
	href="/workers/platform/storage-options/"
	icon="open-book"
>
	了解哪种存储选项最适合您的项目。
</LinkTitleCard>

<LinkTitleCard
	title="开发者 Discord"
	href="https://discord.cloudflare.com"
	icon="discord"
>
	在 Discord 上与 Workers
	社区联系，提出问题，分享您正在构建的内容，并与其他开发者讨论平台。
</LinkTitleCard>

<LinkTitleCard
	title="@CloudflareDev"
	href="https://x.com/cloudflaredev"
	icon="x.com"
>
	在 Twitter 上关注 @CloudflareDev，了解产品公告和 Cloudflare Workers 的新功能。
</LinkTitleCard>

</CardGrid>

---

# Vercel AI SDK

URL: https://developers.cloudflare.com/workers-ai/configuration/ai-sdk/

import { PackageManagers } from "~/components";

Workers AI 可用于 JavaScript 和 TypeScript 代码库的 [Vercel AI SDK](https://sdk.vercel.ai/)。

## 设置

安装 [`workers-ai-provider` 提供程序](https://sdk.vercel.ai/providers/community-providers/cloudflare-workers-ai)：

<PackageManagers pkg="workers-ai-provider" />

然后，在您的 Workers 项目 Wrangler 文件中添加一个 AI 绑定：

```toml
[ai]
binding = "AI"
```

## 模型

AI SDK 可以配置为与[任何 AI 模型](/workers-ai/models/)一起使用。

```js
import { createWorkersAI } from "workers-ai-provider";

const workersai = createWorkersAI({ binding: env.AI });

// 选择任何模型：https://developers.cloudflare.com/workers-ai/models/
const model = workersai("@cf/meta/llama-3.1-8b-instruct", {});
```

## 生成文本

选择模型后，您可以从给定的提示生成文本。

```js
import { createWorkersAI } from 'workers-ai-provider';
import { generateText } from 'ai';

type Env = {
  AI: Ai;
};

export default {
  async fetch(_: Request, env: Env) {
    const workersai = createWorkersAI({ binding: env.AI });
    const result = await generateText({
      model: workersai('@cf/meta/llama-2-7b-chat-int8'),
      prompt: '写一篇关于 hello world 的 50 字短文。',
    });

    return new Response(result.text);
  },
};
```

## 流式文本

对于较长的响应，请考虑在生成完成时流式传输响应。

```js
import { createWorkersAI } from 'workers-ai-provider';
import { streamText } from 'ai';

type Env = {
  AI: Ai;
};

export default {
  async fetch(_: Request, env: Env) {
    const workersai = createWorkersAI({ binding: env.AI });
    const result = streamText({
      model: workersai('@cf/meta/llama-2-7b-chat-int8'),
      prompt: '写一篇关于 hello world 的 50 字短文。',
    });

    return result.toTextStreamResponse({
      headers: {
        // 添加这些标头以确保
        // 响应是分块和流式的
        'Content-Type': 'text/x-unknown',
        'content-encoding': 'identity',
        'transfer-encoding': 'chunked',
      },
    });
  },
};
```

## 生成结构化对象

您可以提供一个 Zod 模式来生成结构化的 JSON 响应。

```js
import { createWorkersAI } from 'workers-ai-provider';
import { generateObject } from 'ai';
import { z } from 'zod';

type Env = {
  AI: Ai;
};

export default {
  async fetch(_: Request, env: Env) {
    const workersai = createWorkersAI({ binding: env.AI });
    const result = await generateObject({
      model: workersai('@cf/meta/llama-3.1-8b-instruct'),
      prompt: '生成一份千层面食谱',
      schema: z.object({
        recipe: z.object({
          ingredients: z.array(z.string()),
          description: z.string(),
        }),
      }),
    });

    return Response.json(result.object);
  },
};
```

---

# Workers 绑定

URL: https://developers.cloudflare.com/workers-ai/configuration/bindings/

import { Type, MetaInfo, WranglerConfig } from "~/components";

## Workers

[Workers](/workers/) 提供了一个无服务器执行环境，允许您创建新应用程序或增强现有应用程序。

要将 Workers AI 与 Workers 一起使用，您必须创建一个 Workers AI [绑定](/workers/runtime-apis/bindings/)。绑定允许您的 Worker 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。您可以在 Cloudflare 仪表板上或通过更新您的 [Wrangler 文件](/workers/wrangler/configuration/)来创建绑定。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到您的 Wrangler 文件的末尾：

<WranglerConfig>

```toml
[ai]
binding = "AI" # 即在您的 Worker 中通过 env.AI 可用
```

</WranglerConfig>

## Pages 函数

[Pages 函数](/pages/functions/)允许您通过在 Cloudflare 网络上执行代码来构建具有 Cloudflare Pages 的全栈应用程序。函数本质上是 Workers。

要在您的 Pages 函数中配置 Workers AI 绑定，您必须使用 Cloudflare 仪表板。有关说明，请参阅 [Workers AI 绑定](/pages/functions/bindings/#workers-ai)。

## 方法

### async env.AI.run()

`async env.AI.run()` 运行一个模型。第一个参数是模型，第二个参数是一个对象。

```javascript
const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
	prompt: "What is the origin of the phrase 'Hello, World'",
});
```

**参数**

- `model` <Type text="string" /> <MetaInfo text="必需" />

  - 要运行的模型。

  **支持的选项**

  - `stream` <Type text="boolean" /> <MetaInfo text="可选" />
    - 在结果可用时返回结果流。

```javascript
const answer = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
	prompt: "What is the origin of the phrase 'Hello, World'",
	stream: true,
});

return new Response(answer, {
	headers: { "content-type": "text/event-stream" },
});
```

---

# Hugging Face 聊天界面

URL: https://developers.cloudflare.com/workers-ai/configuration/hugging-face-chat-ui/

将 Workers AI 与 Hugging Face 提供的开源聊天界面 [Chat UI](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models) 一起使用。

## 先决条件

您将需要以下内容：

- 一个 [Cloudflare 帐户](https://dash.cloudflare.com)
- 您的[帐户 ID](/fundamentals/account/find-account-and-zone-ids/)
- 一个用于 Workers AI 的 [API 令牌](/workers-ai/get-started/rest-api/#1-get-api-token-and-account-id)

## 设置

首先，决定如何引用您的帐户 ID 和 API 令牌（直接在您的 `.env.local` 中使用 `CLOUDFLARE_ACCOUNT_ID` 和 `CLOUDFLARE_API_TOKEN` 变量，或在端点配置中）。

然后，按照 [Chat UI GitHub 仓库](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models)中的其余设置说明进行操作。

在设置模型时，请指定 `cloudflare` 端点。

```json
{
	"name": "nousresearch/hermes-2-pro-mistral-7b",
	"tokenizer": "nousresearch/hermes-2-pro-mistral-7b",
	"parameters": {
		"stop": ["<|im_end|>"]
	},
	"endpoints": [
		{
			"type": "cloudflare",
			// 如果未包含在 .env.local 中，则可选择指定这些
			"accountId": "your-account-id",
			"apiToken": "your-api-token"
			//
		}
	]
}
```

## 支持的模型

此模板适用于任何以 `@hf` 参数开头的[文本生成模型](/workers-ai/models/)。

---

# 配置

URL: https://developers.cloudflare.com/workers-ai/configuration/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# OpenAI 兼容 API 端点

URL: https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/

import { Render } from "~/components";

<Render file="openai-compatibility" /> <br />

## 用法

### Workers AI

通常，Workers AI 要求您在 cURL 端点或 `env.AI.run` 函数中指定模型名称。

使用 OpenAI 兼容端点，您可以利用 [openai-node sdk](https://github.com/openai/openai-node) 来调用 Workers AI。这允许您通过简单地更改基本 URL 和模型名称来使用 Workers AI。

```js title="OpenAI SDK 示例"
import OpenAI from "openai";

const openai = new OpenAI({
	apiKey: env.CLOUDFLARE_API_KEY,
	baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

const chatCompletion = await openai.chat.completions.create({
	messages: [{ role: "user", content: "发出一些机器人噪音" }],
	model: "@cf/meta/llama-3.1-8b-instruct",
});

const embeddings = await openai.embeddings.create({
	model: "@cf/baai/bge-large-en-v1.5",
	input: "我喜欢抹茶",
});
```

```bash title="cURL 示例"
curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions \
  --header "Authorization: Bearer {api_token}" \
  --header "Content-Type: application/json" \
  --data '
    {
      "model": "@cf/meta/llama-3.1-8b-instruct",
      "messages": [
        {
          "role": "user",
          "content": "如何用三个简短的步骤制作一个木勺？请给出尽可能简短的回答"
        }
      ]
    }
'
```

### AI 网关

这些端点也与 [AI 网关](/ai-gateway/providers/workersai/#openai-compatible-endpoints)兼容。

---

# 仪表板

URL: https://developers.cloudflare.com/workers-ai/get-started/dashboard/

import { Render } from "~/components";

请按照本指南使用 Cloudflare 仪表板创建 Workers AI 应用程序。

## 先决条件

如果您还没有 [Cloudflare 帐户](https://dash.cloudflare.com/sign-up/workers-and-pages)，请注册一个。

## 设置

要创建 Workers AI 应用程序：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com)并选择您的帐户。
2. 转到 **计算 (Workers)** 和 **Workers & Pages**。
3. 选择**创建**。
4. 在 **从模板开始**下，选择 **LLM 应用**。选择模板后，将在仪表板中为您创建一个[AI 绑定](/workers-ai/configuration/bindings/)。
5. 查看提供的代码并选择**部署**。
6. 在其提供的 [`workers.dev`](/workers/configuration/routing/workers-dev/) 子域上预览您的 Worker。

## 开发

<Render file="dash-creation-next-steps" product="workers" />

---

# 开始使用

URL: https://developers.cloudflare.com/workers-ai/get-started/

import { DirectoryListing } from "~/components";

在 Cloudflare 上构建您的 Workers AI 项目有多种选择。要开始，请选择您喜欢的方法：

<DirectoryListing />

:::note

这些示例旨在创建新的 Workers AI 项目。有关将 Workers AI 添加到现有 Worker 的帮助，请参阅 [Workers 绑定](/workers-ai/configuration/bindings/)。

:::

---

# REST API

URL: https://developers.cloudflare.com/workers-ai/get-started/rest-api/

本指南将指导您设置和部署您的第一个 Workers AI 项目。您将使用 Workers AI REST API 来体验大型语言模型 (LLM)。

## 先决条件

如果您还没有 [Cloudflare 帐户](https://dash.cloudflare.com/sign-up/workers-and-pages)，请注册一个。

## 1. 获取 API 令牌和账户 ID

您需要您的 API 令牌和账户 ID 才能使用 REST API。

要获取这些值：

1. 登录 [Cloudflare 仪表板](https://dash.cloudflare.com)并选择您的帐户。
2. 转到 **AI** > **Workers AI**。
3. 选择**使用 REST API**。
4. 获取您的 API 令牌：
   1. 选择**创建 Workers AI API 令牌**。
   2. 查看预填信息。
   3. 选择**创建 API 令牌**。
   4. 选择**复制 API 令牌**。
   5. 保存该值以备将来使用。
5. 对于**获取账户 ID**，复制**账户 ID** 的值。保存该值以备将来使用。

:::note

如果您选择[创建 API 令牌](/fundamentals/api/get-started/create-token/)而不是使用模板，该令牌将需要 `Workers AI - 读取` 和 `Workers AI - 编辑` 的权限。

:::

## 2. 通过 API 运行模型

创建 API 令牌后，在请求中使用您的 API 令牌进行身份验证并向 API 发出请求。

您将使用[执行 AI 模型](/api/resources/ai/methods/run/)端点来运行 [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/) 模型：

```bash
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct \
  -H 'Authorization: Bearer {API_TOKEN}' \
  -d '{ "prompt": "Where did the phrase Hello World come from" }'
```

替换 `{ACCOUNT_ID}` 和 `{API_token}` 的值。

API 响应将如下所示：

```json
{
	"result": {
		"response": "Hello, World first appeared in 1974 at Bell Labs when Brian Kernighan included it in the C programming language example. It became widely used as a basic test program due to simplicity and clarity. It represents an inviting greeting from a program to the world."
	},
	"success": true,
	"errors": [],
	"messages": []
}
```

此示例执行使用 `@cf/meta/llama-3.1-8b-instruct` 模型，但您可以使用 [Workers AI 模型目录](/workers-ai/models/)中的任何模型。如果使用其他模型，您需要将 `{model}` 替换为您想要的模型名称。

完成本指南后，您已创建了一个 Cloudflare 帐户（如果您还没有），并创建了一个授予您帐户 Workers AI 读取权限的 API 令牌。您使用终端的 cURL 命令执行了 [`@cf/meta/llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/) 模型，并在 JSON 响应中收到了对您提示的回答。

## 相关资源

- [模型](/workers-ai/models/) - 浏览 Workers AI 模型目录。
- [AI SDK](/workers-ai/configuration/ai-sdk) - 了解如何与 AI 模型集成。

---

# Workers 绑定

URL: https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/

import {
	Render,
	PackageManagers,
	WranglerConfig,
	TypeScriptExample,
} from "~/components";

本指南将指导您设置和部署您的第一个 Workers AI 项目。您将使用 [Workers](/workers/)、一个 Workers AI 绑定和一个大型语言模型 (LLM) 来在 Cloudflare 全球网络上部署您的第一个由 AI 驱动的应用程序。

<Render file="prereqs" product="workers" />

## 1. 创建一个 Worker 项目

您将使用 `create-cloudflare` CLI (C3) 创建一个新的 Worker 项目。[C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) 是一个命令行工具，旨在帮助您设置和部署新的应用程序到 Cloudflare。

通过运行以下命令创建一个名为 `hello-ai` 的新项目：

<PackageManagers type="create" pkg="cloudflare@latest" args={"hello-ai"} />

运行 `npm create cloudflare@latest` 将提示您安装 [`create-cloudflare` 包](https://www.npmjs.com/package/create-cloudflare)，并引导您完成设置。C3 还将安装 [Wrangler](/workers/wrangler/)，即 Cloudflare 开发者平台 CLI。

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "TypeScript",
	}}
/>

这将创建一个新的 `hello-ai` 目录。您的新 `hello-ai` 目录将包括：

- 一个位于 `src/index.ts` 的 `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code)。
- 一个 [`wrangler.jsonc`](/workers/wrangler/configuration/) 配置文件。

进入您的应用程序目录：

```sh
cd hello-ai
```

## 2. 将您的 Worker 连接到 Workers AI

您必须为您的 Worker 创建一个 AI 绑定以连接到 Workers AI。[绑定](/workers/runtime-apis/bindings/)允许您的 Worker 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到您的 Wrangler 文件的末尾：

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

您的绑定在您的 Worker 代码中通过 [`env.AI`](/workers/runtime-apis/handlers/fetch/) 可用。

{/* <!-- TODO update this once we know if we'll have it --> */}

您还可以将 Workers AI 绑定到 Pages 函数。有关更多信息，请参阅[函数绑定](/pages/functions/bindings/#workers-ai)。

## 3. 在您的 Worker 中运行推理任务

您现在已准备好在您的 Worker 中运行推理任务。在这种情况下，您将使用一个 LLM，[`llama-3.1-8b-instruct`](/workers-ai/models/llama-3.1-8b-instruct/)，来回答一个问题。

使用以下代码更新您的 `hello-ai` 应用程序目录中的 `index.ts` 文件：

<TypeScriptExample filename="index.ts">

```ts
export interface Env {
	// 如果您在 Wrangler 配置文件中为 'binding' 设置了另一个名称，
	// 请将 "AI" 替换为您定义的变量名。
	AI: Ai;
}

export default {
	async fetch(request, env): Promise<Response> {
		const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
			prompt: "What is the origin of the phrase Hello, World",
		});

		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

</TypeScriptExample>

至此，您已经为您的 Worker 创建了一个 AI 绑定，并配置了您的 Worker 以能够执行 Llama 3.1 模型。现在，您可以在全球部署之前在本地测试您的项目。

## 4. 使用 Wrangler 进行本地开发

在您的项目目录中，通过运行 [`wrangler dev`](/workers/wrangler/commands/#dev) 在本地测试 Workers AI：

```sh
npx wrangler dev
```

<Render file="ai-local-usage-charges" product="workers" />

运行 `wrangler dev` 后，系统会提示您登录。当您运行 `npx wrangler dev` 时，Wrangler 会给您一个 URL（很可能是 `localhost:8787`）来审查您的 Worker。在您访问 Wrangler 提供的 URL 后，将呈现一条类似以下示例的消息：

```json
{
	"response": "Ah, a most excellent question, my dear human friend! *adjusts glasses*\n\nThe origin of the phrase \"Hello, World\" is a fascinating tale that spans several decades and multiple disciplines. It all began in the early days of computer programming, when a young man named Brian Kernighan was tasked with writing a simple program to demonstrate the basics of a new programming language called C.\nKernighan, a renowned computer scientist and author, was working at Bell Labs in the late 1970s when he created the program. He wanted to showcase the language's simplicity and versatility, so he wrote a basic \"Hello, World!\" program that printed the familiar greeting to the console.\nThe program was included in Kernighan and Ritchie's influential book \"The C Programming Language,\" published in 1978. The book became a standard reference for C programmers, and the \"Hello, World!\" program became a sort of \"Hello, World!\" for the programming community.\nOver time, the phrase \"Hello, World!\" became a shorthand for any simple program that demonstrated the basics"
}
```

## 5. 部署您的 AI Worker

在将您的 AI Worker 全球部署之前，请通过运行以下命令使用您的 Cloudflare 帐户登录：

```sh
npx wrangler login
```

您将被引导到一个网页，要求您登录 Cloudflare 仪表板。登录后，系统会询问您是否允许 Wrangler 对您的 Cloudflare 帐户进行更改。向下滚动并选择 **允许** 以继续。

最后，部署您的 Worker，使您的项目可以在互联网上访问。要部署您的 Worker，请运行：

```sh
npx wrangler deploy
```

```sh output
https://hello-ai.<YOUR_SUBDOMAIN>.workers.dev
```

您的 Worker 将被部署到您的自定义 [`workers.dev`](/workers/configuration/routing/workers-dev/) 子域。您现在可以访问该 URL 来运行您的 AI Worker。

完成本教程后，您创建了一个 Worker，通过 AI 绑定将其连接到 Workers AI，并从 Llama 3 模型运行了一个推理任务。

## 相关资源

- [Discord 上的 Cloudflare 开发者社区](https://discord.cloudflare.com) - 通过加入 Cloudflare Discord 服务器，直接向 Cloudflare 团队提交功能请求、报告错误并分享您的反馈。
- [模型](/workers-ai/models/) - 浏览 Workers AI 模型目录。
- [AI SDK](/workers-ai/configuration/ai-sdk) - 了解如何与 AI 模型集成。

---

# 演示和架构

URL: https://developers.cloudflare.com/workers-ai/guides/demos-architectures/

import {
	ExternalResources,
	GlossaryTooltip,
	ResourcesBySelector,
} from "~/components";

Workers AI 可用于构建动态和高性能的服务。以下演示应用程序和参考架构展示了如何在您的架构中最佳地使用 Workers AI。

## 演示

探索以下 Workers AI 的<GlossaryTooltip term="demo application">演示应用程序</GlossaryTooltip>。

<ExternalResources type="apps" products={["Workers AI"]} />

## 参考架构

探索以下使用 Workers AI 的<GlossaryTooltip term="reference architecture">参考架构</GlossaryTooltip>：

<ResourcesBySelector
	types={[
		"reference-architecture",
		"design-guide",
		"reference-architecture-diagram",
	]}
	products={["Workers AI"]}
/>

---

# 指南

URL: https://developers.cloudflare.com/workers-ai/guides/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 模型

URL: https://developers.cloudflare.com/workers-ai/models/

import ModelCatalog from "~/pages/workers-ai/models/index.astro";

<ModelCatalog />

---

# 功能

URL: https://developers.cloudflare.com/workers-ai/features/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# JSON 模式

URL: https://developers.cloudflare.com/workers-ai/features/json-mode/

import { Code } from "~/components";

export const jsonModeSchema = `{
  response_format: {
    title: "JSON 模式",
    type: "object",
    properties: {
      type: {
        type: "string",
        enum: ["json_object", "json_schema"],
      },
      json_schema: {},
    }
  }
}`;

export const jsonModeRequestExample = `{
  "messages": [
    {
      "role": "system",
      "content": "提取有关国家的数据。"
    },
    {
      "role": "user",
      "content": "告诉我关于印度的信息。"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "capital": {
          "type": "string"
        },
        "languages": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required": [
        "name",
        "capital",
        "languages"
      ]
    }
  }
}`;

export const jsonModeResponseExample = `{
  "response": {
    "name": "印度",
    "capital": "新德里",
    "languages": [
      "印地语",
      "英语",
      "孟加拉语",
      "泰卢固语",
      "马拉地语",
      "泰米尔语",
      "古吉拉特语",
      "乌尔都语",
      "卡纳达语",
      "奥里亚语",
      "马拉雅拉姆语",
      "旁遮普语",
      "梵语"
    ]
  }
}`;

当我们希望文本生成 AI 模型以编程方式与数据库、服务和外部系统交互时，通常在使用工具调用或构建 AI 代理时，我们必须使用结构化的响应格式而不是自然语言。

Workers AI 支持 JSON 模式，使应用程序能够在与 AI 模型交互时请求结构化的输出响应。

## 架构

JSON 模式与 OpenAI 的实现兼容；要启用，请使用以下约定将 `response_format` 属性添加到请求对象中：

<Code code={jsonModeSchema} lang="json" />

其中 `json_schema` 必须是有效的 [JSON 模式](https://json-schema.org/) 声明。

## JSON 模式示例

使用 JSON 格式时，请将架构作为请求的一部分传递给 LLM，如下例所示。

<Code code={jsonModeRequestExample} lang="json" />

LLM 将遵循该架构，并返回如下所示的响应：

<Code code={jsonModeResponseExample} lang="json" />

如您所见，模型正在遵守请求中的 JSON 架构定义，并以经过验证的 JSON 对象进行响应。

## 支持的模型

以下是现在支持 JSON 模式的模型列表：

- [@cf/meta/llama-3.1-8b-instruct-fast](/workers-ai/models/llama-3.1-8b-instruct-fast/)
- [@cf/meta/llama-3.1-70b-instruct](/workers-ai/models/llama-3.1-70b-instruct/)
- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/meta/llama-3-8b-instruct](/workers-ai/models/llama-3-8b-instruct/)
- [@cf/meta/llama-3.1-8b-instruct](/workers-ai/models/llama-3.1-8b-instruct/)
- [@cf/meta/llama-3.2-11b-vision-instruct](/workers-ai/models/llama-3.2-11b-vision-instruct/)
- [@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/)
- [@hf/thebloke/deepseek-coder-6.7b-instruct-awq](/workers-ai/models/deepseek-coder-6.7b-instruct-awq/)
- [@cf/deepseek-ai/deepseek-r1-distill-qwen-32b](/workers-ai/models/deepseek-r1-distill-qwen-32b/)

我们将继续扩展此列表，以跟上新的和被请求的模型。

请注意，Workers AI 不能保证模型会根据请求的 JSON 模式进行响应。根据任务的复杂性和 JSON 模式的充分性，模型在极端情况下可能无法满足请求。如果出现这种情况，则会返回错误 `JSON 模式无法满足`，并且必须进行处理。

JSON 模式目前不支持流式传输。

---

# 提示

URL: https://developers.cloudflare.com/workers-ai/features/prompting/

import { Code } from "~/components";

export const scopedExampleOne = `{
  messages: [
    { role: "system", content: "你是一个非常有趣的喜剧演员，你喜欢表情符号" },
    { role: "user", content: "给我讲个关于 Cloudflare 的笑话" },
  ],
};`;

export const scopedExampleTwo = `{
  messages: [
    { role: "system", content: "你是一个专业的计算机科学助理" },
    { role: "user", content: "WASM 是什么？" },
    { role: "assistant", content: "WASM (WebAssembly) 是一种二进制指令格式，旨在成为一个平台无关的格式" },
    { role: "user", content: "Python 能编译成 WASM 吗？" },
    { role: "assistant", content: "不，Python 不能直接编译成 WebAssembly" },
    { role: "user", content: "Rust 呢？" },
  ],
};`;

export const unscopedExampleOne = `{
  prompt: "给我讲个关于 Cloudflare 的笑话";
}`;

export const unscopedExampleTwo = `{
  prompt: "<s>[INST]喜剧演员[/INST]</s>\n[INST]给我讲个关于 Cloudflare 的笑话[/INST]",
  raw: true
};`;

从文本生成模型获得良好结果的一部分是正确地提出问题。LLM 通常使用特定的预定义模板进行训练，然后在进行推理任务时，应将这些模板与模型的标记器一起使用，以获得更好的结果。

使用 Workers AI 提示文本生成模型有两种方法：

:::note[重要]
我们建议对 LoRA 的推理使用无范围提示。
:::

### 有范围的提示

这是**推荐**的方法。通过有范围的提示，Workers AI 承担了了解和使用不同模型不同聊天模板的负担，并在构建提示和创建文本生成任务时为开发人员提供统一的界面。

有范围的提示是一系列消息。每条消息定义了两个键：角色和内容。

通常，角色可以是以下三个选项之一：

- <strong>system</strong> - 系统消息定义了 AI
  的个性。您可以使用它们来设置规则以及您期望 AI 的行为方式。
- <strong>user</strong> - 用户消息是您通过提供问题或对话来实际查询 AI 的地方。
- <strong>assistant</strong> - 助手消息向 AI
  暗示所需的输出格式。并非所有模型都支持此角色。

OpenAI 对他们如何在其 GPT 模型中使用这些角色有[很好的解释](https://platform.openai.com/docs/guides/text-generation#messages-and-roles)。尽管聊天模板是灵活的，但其他文本生成模型倾向于遵循相同的约定。

以下是使用系统和用户角色的有范围提示的输入示例：

<Code code={scopedExampleOne} lang="js" />

以下是在用户和助手之间进行多次迭代的聊天会话的更好示例。

<Code code={scopedExampleTwo} lang="js" />

请注意，不同的 LLM 使用不同的模板针对不同的用例进行训练。虽然 Workers AI 尽力通过统一的 API 向开发人员抽象每个 LLM 模板的细节，但您应始终参考模型文档以获取详细信息（我们在上表中提供了链接）。例如，像 Codellama 这样的指令模型经过微调以响应用户提供的指令，而聊天模型则期望以对话片段作为输入。

### 无范围的提示

您可以使用无范围的提示向模型发送单个问题，而无需担心提供任何上下文。Workers AI 会自动将您的 `prompt` 输入转换为合理的默认有范围提示，以便您获得最佳的预测结果。

<Code code={unscopedExampleOne} lang="js" />

您还可以使用无范围的提示来手动构建模型聊天模板。在这种情况下，您可以使用 raw 参数。以下是 [Mistral](https://docs.mistral.ai/models/#chat-template) 聊天模板提示的输入示例：

<Code code={unscopedExampleTwo} lang="js" />

---

# Markdown 转换

URL: https://developers.cloudflare.com/workers-ai/features/markdown-conversion/

import { Code, Type, MetaInfo, Details, Render } from "~/components";

[Markdown](https://en.wikipedia.org/wiki/Markdown) 对于训练和推理中的文本生成和大型语言模型 (LLM)至关重要，因为它可以提供结构化、语义化、人类和机器可读的输入。同样，Markdown 有助于对输入数据进行分块和结构化，以便在 RAG 的上下文中更好地检索和综合，其简单性和易于解析和呈现的特点使其成为 AI 代理的理想选择。

由于这些原因，文档转换在设计和开发 AI 应用程序时扮演着重要角色。Workers AI 提供了 `toMarkdown` 实用方法，开发人员可以从 [`env.AI`](/workers-ai/configuration/bindings/) 绑定或 REST API 中使用该方法，以便快速、轻松、方便地将多种格式的文档转换为 Markdown 语言并进行摘要。

## 方法和定义

### async env.AI.toMarkdown()

获取不同格式的文档列表并将其转换为 Markdown。

#### 参数

- <code>documents</code>: <Type text="array" /> - `toMarkdownDocument` 的数组。

#### 返回值

- <code>results</code>: <Type text="array" /> - `toMarkdownDocumentResult`
  的数组。

### `toMarkdownDocument` 定义

- `name` <Type text="string" />

  - 要转换的文档的名称。

- `blob` <Type text="Blob" />

  - 一个包含文档内容的新 [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob) 对象。

### `toMarkdownDocumentResult` 定义

- `name` <Type text="string" />

  - 转换后文档的名称。与输入名称匹配。

- `mimetype` <Type text="string" />

  - 文档检测到的 [mime 类型](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types/Common_types)。

- `tokens` <Type text="number" />

  - 转换后文档的估计令牌数。

- `data` <Type text="string" />

  - 转换后文档的内容，格式为 Markdown。

## 支持的格式

这是支持的格式列表。我们会不断添加新格式并更新此表。

<Render file="markdown-conversion-support" product="workers-ai" />

## 示例

在此示例中，我们从 R2 获取一个 PDF 文档和一张图片，并将它们都提供给 `env.AI.toMarkdown`。结果是一个转换后的文档列表。Workers AI 模型会自动用于检测和总结图像。

```typescript
import { Env } from "./env";

export default {
	async fetch(request: Request, env: Env, ctx: ExecutionContext) {
		// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf
		const pdf = await env.R2.get("somatosensory.pdf");

		// https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg
		const cat = await env.R2.get("cat.jpeg");

		return Response.json(
			await env.AI.toMarkdown([
				{
					name: "somatosensory.pdf",
					blob: new Blob([await pdf.arrayBuffer()], {
						type: "application/octet-stream",
					}),
				},
				{
					name: "cat.jpeg",
					blob: new Blob([await cat.arrayBuffer()], {
						type: "application/octet-stream",
					}),
				},
			]),
		);
	},
};
```

这是结果：

```json
[
	{
		"name": "somatosensory.pdf",
		"mimeType": "application/pdf",
		"format": "markdown",
		"tokens": 0,
		"data": "# somatosensory.pdf\n## Metadata\n- PDFFormatVersion=1.4\n- IsLinearized=false\n- IsAcroFormPresent=false\n- IsXFAPresent=false\n- IsCollectionPresent=false\n- IsSignaturesPresent=false\n- Producer=Prince 20150210 (www.princexml.com)\n- Title=Anatomy of the Somatosensory System\n\n## Contents\n### Page 1\nThis is a sample document to showcase..."
	},
	{
		"name": "cat.jpeg",
		"mimeType": "image/jpeg",
		"format": "markdown",
		"tokens": 0,
		"data": "这张图片是"不爽猫"的特写照片，这只猫以其独特的"不爽"表情和锐利的蓝眼睛而闻名。这只猫的脸是棕色的，鼻子上有一条白色的条纹，耳朵竖立着。它的皮毛是浅棕色的，脸部周围的颜色较深，鼻子和嘴巴是粉红色的。猫的眼睛是蓝色的，向下倾斜，使它看起来永远都是一副"不爽"的样子。背景是模糊的，但看起来是深棕色的。总的来说，这张图片是流行的网络迷因角色"不爽猫"的一个幽默而标志性的代表。猫的面部表情和姿势传达出一种不悦或烦恼的感觉，这使得它对许多人来说是一个既 relatable 又有趣的图片。"
	}
]
```

## REST API

除了 Workers AI [绑定](/workers-ai/configuration/bindings/)，您还可以使用 [REST API](/workers-ai/get-started/rest-api/)：

```bash
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \
  -H 'Authorization: Bearer {API_TOKEN}' \
	-F "files=@cat.jpeg" \
	-F "files=@somatosensory.pdf"
```

## 定价

`toMarkdown` 对于大多数格式转换是免费的。在某些情况下，例如图像转换，它可以使用 Workers AI 模型进行对象检测和摘要，如果超出 Workers AI 的免费配额限制，可能会产生额外费用。有关更多详细信息，请参阅[定价页面](/workers-ai/platform/pricing/)。

---

# 数据使用

URL: https://developers.cloudflare.com/workers-ai/platform/data-usage/

Cloudflare 为了提供 Workers AI 服务会处理某些客户数据，这受我们的[隐私政策](https://www.cloudflare.com/privacypolicy/)和[自助服务订阅协议](https://www.cloudflare.com/terms/)或[企业订阅协议](https://www.cloudflare.com/enterpriseterms/)（如适用）的约束。

Cloudflare 既不创建也不训练在 Workers AI 上可用的 AI 模型。这些模型构成第三方服务，并可能受您与模型提供商之间的开源或其他许可条款的约束。请务必查看适用于每个模型的许可条款（如有）。

您的输入（例如，文本提示、图像提交、音频文件等）、输出（例如，生成的文本/图像、翻译等）、嵌入和训练数据构成客户内容。

对于 Workers AI：

- 您拥有并对您的所有客户内容负责。
- Cloudflare 不会将您的客户内容提供给任何其他 Cloudflare 客户。
- Cloudflare 不会将您的客户内容用于 (1) 训练在 Workers AI 上可用的任何 AI 模型，或 (2) 改进任何 Cloudflare 或第三方服务，并且除非我们收到您的明确同意，否则不会这样做。
- 如果您特别将存储服务（例如，R2、KV、DO、Vectorize 等）与 Workers AI 结合使用，您的 Workers AI 客户内容可能会被 Cloudflare 存储。

---

# 错误

URL: https://developers.cloudflare.com/workers-ai/platform/errors/

以下是 Workers AI 错误的列表。

| **名称**               | **内部代码** | **HTTP 代码** | **描述**                                                                                            |
| ---------------------- | ------------ | ------------- | --------------------------------------------------------------------------------------------------- |
| 无此模型               | `5007`       | `400`         | 无此模型 `${model}` 或任务                                                                          |
| 无效数据               | `5004`       | `400`         | base64 输入的无效数据类型：`${type}`                                                                |
| Finetune 缺少必需文件  | `3039`       | `400`         | Finetune 缺少必需文件 `(model.safetensors and config.json) `                                        |
| 不完整的请求           | `3003`       | `400`         | 请求缺少标头或正文：`{what}`                                                                        |
| 账户不允许使用私有模型 | `5018`       | `403`         | 该账户不允许访问此模型                                                                              |
| 模型协议               | `5016`       | `403`         | 用户未同意 Llama3.2 模型条款                                                                        |
| 账户被阻止             | `3023`       | `403`         | 服务对账户不可用                                                                                    |
| 账户不允许使用私有模型 | `3041`       | `403`         | 该账户不允许访问此模型                                                                              |
| 已弃用的 SDK 版本      | `5019`       | `405`         | 请求尝试使用已弃用的 SDK 版本                                                                       |
| 不支持 LoRa            | `5005`       | `405`         | 模型 `${this.model}` 不支持 LoRa 推理                                                               |
| 无效的模型 ID          | `3042`       | `404`         | 模型名称无效                                                                                        |
| 请求过大               | `3006`       | `413`         | 请求过大                                                                                            |
| 超时                   | `3007`       | `408`         | 请求超时                                                                                            |
| 已中止                 | `3008`       | `408`         | 请求已中止                                                                                          |
| 账户受限               | `3036`       | `429`         | 您已用完每日 10,000 个神经元的免费配额。如果您想继续使用，请升级到 Cloudflare 的 Workers 付费计划。 |
| 容量不足               | `3040`       | `429`         | 没有更多的数据中心可以转发请求                                                                      |

---

# 术语表

URL: https://developers.cloudflare.com/workers-ai/platform/glossary/

import { Glossary } from "~/components";

查看 Cloudflare Workers AI 文档中使用的术语的定义。

<Glossary product="workers-ai" />

---

# 平台

URL: https://developers.cloudflare.com/workers-ai/platform/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 限制

URL: https://developers.cloudflare.com/workers-ai/platform/limits/

import { Render } from "~/components";

Workers AI 现已正式发布。我们更新了速率限制以反映这一点。

请注意，使用 Wrangler 在本地模式下进行的模型推理也将计入这些限制。在我们致力于性能和规模的同时，Beta 模型的速率限制可能会较低。

<Render file="custom_requirements" />

速率限制默认为每个任务类型，一些模型的限制定义如下：

## 按任务类型划分的速率限制

### [自动语音识别](/workers-ai/models/)

- 每分钟 720 个请求

### [图像分类](/workers-ai/models/)

- 每分钟 3000 个请求

### [图像到文本](/workers-ai/models/)

- 每分钟 720 个请求

### [对象检测](/workers-ai/models/)

- 每分钟 3000 个请求

### [摘要](/workers-ai/models/)

- 每分钟 1500 个请求

### [文本分类](/workers-ai/models/)

- 每分钟 2000 个请求

### [文本嵌入](/workers-ai/models/)

- 每分钟 3000 个请求
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/) 为每分钟 1500 个请求

### [文本生成](/workers-ai/models/)

- 每分钟 300 个请求
- [@hf/thebloke/mistral-7b-instruct-v0.1-awq](/workers-ai/models/mistral-7b-instruct-v0.1-awq/) 为每分钟 400 个请求
- [@cf/microsoft/phi-2](/workers-ai/models/phi-2/) 为每分钟 720 个请求
- [@cf/qwen/qwen1.5-0.5b-chat](/workers-ai/models/qwen1.5-0.5b-chat/) 为每分钟 1500 个请求
- [@cf/qwen/qwen1.5-1.8b-chat](/workers-ai/models/qwen1.5-1.8b-chat/) 为每分钟 720 个请求
- [@cf/qwen/qwen1.5-14b-chat-awq](/workers-ai/models/qwen1.5-14b-chat-awq/) 为每分钟 150 个请求
- [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](/workers-ai/models/tinyllama-1.1b-chat-v1.0/) 为每分钟 720 个请求

### [文本到图像](/workers-ai/models/)

- 每分钟 720 个请求
- [@cf/runwayml/stable-diffusion-v1-5-img2img](/workers-ai/models/stable-diffusion-v1-5-img2img/) 为每分钟 1500 个请求

### [翻译](/workers-ai/models/)

- 每分钟 720 个请求

---

# 定价

URL: https://developers.cloudflare.com/workers-ai/platform/pricing/

:::note
Workers AI 更新了定价，使其更加细化，提供了基于每个模型单元的定价，但后端仍以神经元计费。
:::

Workers AI 包含在[免费和付费 Workers 计划](/workers/platform/pricing/)中，定价为**每 1,000 个神经元 $0.011**。

我们的免费配额允许任何人**每天免费使用总计 10,000 个神经元**。要每天使用超过 10,000 个神经元，您需要注册 [Workers 付费计划](/workers/platform/pricing/#workers)。在 Workers 付费计划中，任何超过每日 10,000 个神经元免费配额的使用量将按每 1,000 个神经元 $0.011 收费。

您可以在 [Cloudflare Workers AI 仪表板](https://dash.cloudflare.com/?to=/:account/ai/workers-ai)中监控您的神经元使用情况。

所有限制在每天 00:00 UTC 重置。如果您超过上述任何限制，进一步的操作将失败并显示错误。

|              | 免费 <br/> 配额      | 定价                           |
| ------------ | -------------------- | ------------------------------ |
| Workers 免费 | 每天 10,000 个神经元 | 不适用 - 升级到 Workers 付费版 |
| Workers 付费 | 每天 10,000 个神经元 | $0.011 / 1,000 个神经元        |

## 什么是神经元？

神经元是我们衡量不同模型 AI 输出的方式，代表执行您请求所需的 GPU 计算能力。我们的无服务器模型让您只需为使用的部分付费，而无需担心租用、管理或扩展 GPU。

:::note
“以令牌计价”列等同于“以神经元计价”列 - 显示不同的单位是为了让您轻松比较和理解定价。
:::

## LLM 模型定价

| 模型                                         | 以令牌计价                                        | 以神经元计价                                                       |
| -------------------------------------------- | ------------------------------------------------- | ------------------------------------------------------------------ |
| @cf/meta/llama-3.2-1b-instruct               | 每百万输入令牌 $0.027 <br/> 每百万输出令牌 $0.201 | 每百万输入令牌 2457 个神经元 <br/> 每百万输出令牌 18252 个神经元   |
| @cf/meta/llama-3.2-3b-instruct               | 每百万输入令牌 $0.051 <br/> 每百万输出令牌 $0.335 | 每百万输入令牌 4625 个神经元 <br/> 每百万输出令牌 30475 个神经元   |
| @cf/meta/llama-3.1-8b-instruct-fp8-fast      | 每百万输入令牌 $0.045 <br/> 每百万输出令牌 $0.384 | 每百万输入令牌 4119 个神经元 <br/> 每百万输出令牌 34868 个神经元   |
| @cf/meta/llama-3.2-11b-vision-instruct       | 每百万输入令牌 $0.049 <br/> 每百万输出令牌 $0.676 | 每百万输入令牌 4410 个神经元 <br/> 每百万输出令牌 61493 个神经元   |
| @cf/meta/llama-3.1-70b-instruct-fp8-fast     | 每百万输入令牌 $0.293 <br/> 每百万输出令牌 $2.253 | 每百万输入令牌 26668 个神经元 <br/> 每百万输出令牌 204805 个神经元 |
| @cf/meta/llama-3.3-70b-instruct-fp8-fast     | 每百万输入令牌 $0.293 <br/> 每百万输出令牌 $2.253 | 每百万输入令牌 26668 个神经元 <br/> 每百万输出令牌 204805 个神经元 |
| @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | 每百万输入令牌 $0.497 <br/> 每百万输出令牌 $4.881 | 每百万输入令牌 45170 个神经元 <br/> 每百万输出令牌 443756 个神经元 |
| @cf/mistral/mistral-7b-instruct-v0.1         | 每百万输入令牌 $0.110 <br/> 每百万输出令牌 $0.190 | 每百万输入令牌 10000 个神经元 <br/> 每百万输出令牌 17300 个神经元  |
| @cf/mistralai/mistral-small-3.1-24b-instruct | 每百万输入令牌 $0.351 <br/> 每百万输出令牌 $0.555 | 每百万输入令牌 31876 个神经元 <br/> 每百万输出令牌 50488 个神经元  |
| @cf/meta/llama-3.1-8b-instruct               | 每百万输入令牌 $0.282 <br/> 每百万输出令牌 $0.827 | 每百万输入令牌 25608 个神经元 <br/> 每百万输出令牌 75147 个神经元  |
| @cf/meta/llama-3.1-8b-instruct-fp8           | 每百万输入令牌 $0.152 <br/> 每百万输出令牌 $0.287 | 每百万输入令牌 13778 个神经元 <br/> 每百万输出令牌 26128 个神经元  |
| @cf/meta/llama-3.1-8b-instruct-awq           | 每百万输入令牌 $0.123 <br/> 每百万输出令牌 $0.266 | 每百万输入令牌 11161 个神经元 <br/> 每百万输出令牌 24215 个神经元  |
| @cf/meta/llama-3-8b-instruct                 | 每百万输入令牌 $0.282 <br/> 每百万输出令牌 $0.827 | 每百万输入令牌 25608 个神经元 <br/> 每百万输出令牌 75147 个神经元  |
| @cf/meta/llama-3-8b-instruct-awq             | 每百万输入令牌 $0.123 <br/> 每百万输出令牌 $0.266 | 每百万输入令牌 11161 个神经元 <br/> 每百万输出令牌 24215 个神经元  |
| @cf/meta/llama-2-7b-chat-fp16                | 每百万输入令牌 $0.556 <br/> 每百万输出令牌 $6.667 | 每百万输入令牌 50505 个神经元 <br/> 每百万输出令牌 606061 个神经元 |
| @cf/meta/llama-guard-3-8b                    | 每百万输入令牌 $0.484 <br/> 每百万输出令牌 $0.030 | 每百万输入令牌 44003 个神经元 <br/> 每百万输出令牌 2730 个神经元   |
| @cf/meta/llama-4-scout-17b-16e-instruct      | 每百万输入令牌 $0.270 <br/> 每百万输出令牌 $0.850 | 每百万输入令牌 24545 个神经元 <br/> 每百万输出令牌 77273 个神经元  |
| @cf/google/gemma-3-12b-it                    | 每百万输入令牌 $0.345 <br/> 每百万输出令牌 $0.556 | 每百万输入令牌 31371 个神经元 <br/> 每百万输出令牌 50560 个神经元  |
| @cf/qwen/qwq-32b                             | 每百万输入令牌 $0.660 <br/> 每百万输出令牌 $1.000 | 每百万输入令牌 60000 个神经元 <br/> 每百万输出令牌 90909 个神经元  |
| @cf/qwen/qwen2.5-coder-32b-instruct          | 每百万输入令牌 $0.660 <br/> 每百万输出令牌 $1.000 | 每百万输入令牌 60000 个神经元 <br/> 每百万输出令牌 90909 个神经元  |

## 嵌入模型定价

| 模型                       | 以令牌计价            | 以神经元计价                  |
| -------------------------- | --------------------- | ----------------------------- |
| @cf/baai/bge-small-en-v1.5 | 每百万输入令牌 $0.020 | 每百万输入令牌 1841 个神经元  |
| @cf/baai/bge-base-en-v1.5  | 每百万输入令牌 $0.067 | 每百万输入令牌 6058 个神经元  |
| @cf/baai/bge-large-en-v1.5 | 每百万输入令牌 $0.204 | 每百万输入令牌 18582 个神经元 |
| @cf/baai/bge-m3            | 每百万输入令牌 $0.012 | 每百万输入令牌 1075 个神经元  |

## 其他模型定价

| 模型                                  | 以令牌计价                                         | 以神经元计价                                                      |
| ------------------------------------- | -------------------------------------------------- | ----------------------------------------------------------------- |
| @cf/black-forest-labs/flux-1-schnell  | 每个 512x512 图块 $0.0000528 <br/> 每步 $0.0001056 | 每个 512x512 图块 4.80 个神经元 <br/> 每步 9.60 个神经元          |
| @cf/huggingface/distilbert-sst-2-int8 | 每百万输入令牌 $0.026                              | 每百万输入令牌 2394 个神经元                                      |
| @cf/baai/bge-reranker-base            | 每百万输入令牌 $0.003                              | 每百万输入令牌 283 个神经元                                       |
| @cf/meta/m2m100-1.2b                  | 每百万输入令牌 $0.342 <br/> 每百万输出令牌 $0.342  | 每百万输入令牌 31050 个神经元 <br/> 每百万输出令牌 31050 个神经元 |
| @cf/microsoft/resnet-50               | 每百万张图像 $2.51                                 | 每百万张图像 228055 个神经元                                      |
| @cf/openai/whisper                    | 每音频分钟 $0.0005                                 | 每音频分钟 41.14 个神经元                                         |
| @cf/openai/whisper-large-v3-turbo     | 每音频分钟 $0.0005                                 | 每音频分钟 46.63 个神经元                                         |
| @cf/myshell-ai/melotts                | 每音频分钟 $0.0002                                 | 每音频分钟 18.63 个神经元                                         |

---

# 构建检索增强生成 (RAG) AI

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/

import { Details, Render, PackageManagers, WranglerConfig } from "~/components";

本指南将指导您设置和部署您的第一个 Cloudflare AI 应用程序。您将使用 Workers AI、Vectorize、D1 和 Cloudflare Workers 等工具构建一个功能齐全的 AI 驱动的应用程序。

:::note[寻找托管选项？]
[AutoRAG](/autorag) 提供了一种完全托管的方式来在 Cloudflare 上构建 RAG 管道，开箱即用地处理摄取、索引和查询。[开始使用](/autorag/get-started/)。
:::

在本教程结束时，您将构建一个 AI 工具，允许您存储信息并使用大型语言模型进行查询。这种模式被称为检索增强生成（RAG），是您可以结合 Cloudflare AI 工具包的多个方面构建的一个有用的项目。您无需具备使用 AI 工具的经验即可构建此应用程序。

<Render file="prereqs" product="workers" />

您还需要访问 [Vectorize](/vectorize/platform/pricing/)。在本教程中，我们将展示如何选择性地与 [Anthropic Claude](http://anthropic.com) 集成。您需要一个 [Anthropic API 密钥](https://docs.anthropic.com/en/api/getting-started) 才能这样做。

## 1. 创建一个新的 Worker 项目

C3 (`create-cloudflare-cli`) 是一个命令行工具，旨在帮助您尽快设置和部署 Workers 到 Cloudflare。

打开一个终端窗口并运行 C3 来创建您的 Worker 项目：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"rag-ai-tutorial"}
/>

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "JavaScript",
	}}
/>

在您的项目目录中，C3 生成了几个文件。

<Details header="C3 创建了哪些文件？">

1.  `wrangler.jsonc`: 您的 [Wrangler](/workers/wrangler/configuration/#sample-wrangler-configuration) 配置文件。
2.  `worker.js` (在 `/src` 中): 一个用 [ES 模块](/workers/reference/migrate-to-module-workers/) 语法编写的最小化 `'Hello World!'` Worker。
3.  `package.json`: 一个最小化的 Node 依赖项配置文件。
4.  `package-lock.json`: 请参阅 [`npm` 关于 `package-lock.json` 的文档](https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json)。
5.  `node_modules`: 请参阅 [`npm` 关于 `node_modules` 的文档](https://docs.npmjs.com/cli/v7/configuring-npm/folders#node-modules)。

</Details>

现在，移动到您新创建的目录中：

```sh
cd rag-ai-tutorial
```

## 2. 使用 Wrangler CLI 进行开发

Workers 命令行界面 [Wrangler](/workers/wrangler/install-and-update/) 允许您 [创建](/workers/wrangler/commands/#init)、[测试](/workers/wrangler/commands/#dev) 和 [部署](/workers/wrangler/commands/#deploy) 您的 Workers 项目。C3 将默认在项目中安装 Wrangler。

创建您的第一个 Worker 后，在项目目录中运行 [`wrangler dev`](/workers/wrangler/commands/#dev) 命令以启动本地服务器来开发您的 Worker。这将允许您在开发过程中本地测试您的 Worker。

```sh
npx wrangler dev --remote
```

:::note

如果您以前没有使用过 Wrangler，它会尝试打开您的 Web 浏览器以使用您的 Cloudflare 帐户登录。

如果此步骤出现问题或者您无法访问浏览器界面，请参阅 [`wrangler login`](/workers/wrangler/commands/#login) 文档以获取更多信息。

:::

您现在可以访问 [http://localhost:8787](http://localhost:8787) 来查看您的 Worker 正在运行。您对代码的任何更改都将触发重新构建，重新加载页面将显示您的 Worker 的最新输出。

## 3. 添加 AI 绑定

要开始使用 Cloudflare 的 AI 产品，您可以将 `ai` 块添加到 [Wrangler 配置文件](/workers/wrangler/configuration/) 中。这将在您的代码中设置一个到 Cloudflare AI 模型的绑定，您可以使用它与平台上的可用 AI 模型进行交互。

此示例使用了 [`@cf/meta/llama-3-8b-instruct` 模型](/workers-ai/models/llama-3-8b-instruct/)，该模型可以生成文本。

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

现在，找到 `src/index.js` 文件。在 `fetch` 处理程序中，您可以查询 `AI` 绑定：

```js
export default {
	async fetch(request, env, ctx) {
		const answer = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
			messages: [{ role: "user", content: `9 的平方根是多少？` }],
		});

		return new Response(JSON.stringify(answer));
	},
};
```

通过 `AI` 绑定查询 LLM，我们可以直接在代码中与 Cloudflare AI 的大型语言模型进行交互。在此示例中，我们使用的是 [`@cf/meta/llama-3-8b-instruct` 模型](/workers-ai/models/llama-3-8b-instruct/)，该模型可以生成文本。

您可以使用 `wrangler` 部署您的 Worker：

```sh
npx wrangler deploy
```

向您的 Worker 发出请求现在将从 LLM 生成文本响应，并将其作为 JSON 对象返回。

```sh
curl https://example.username.workers.dev
```

```sh output
{"response":"答案：9的平方根是3。"}
```

## 4. 使用 Cloudflare D1 和 Vectorize 添加嵌入

嵌入允许您向 Cloudflare AI 项目中使用的语言模型添加附加功能。这是通过 **Vectorize**（Cloudflare 的向量数据库）完成的。

要开始使用 Vectorize，请使用 `wrangler` 创建一个新的嵌入索引。此索引将存储具有 768 个维度的向量，并将使用余弦相似度来确定哪些向量彼此最相似：

```sh
npx wrangler vectorize create vector-index --dimensions=768 --metric=cosine
```

然后，将新 Vectorize 索引的配置详细信息添加到 [Wrangler 配置文件](/workers/wrangler/configuration/)中：

<WranglerConfig>

```toml
# ... existing wrangler configuration

[[vectorize]]
binding = "VECTOR_INDEX"
index_name = "vector-index"
```

</WranglerConfig>

向量索引允许您存储维度集合，维度是用于表示数据的浮点数。当您要查询向量数据库时，您也可以将查询转换为维度。**Vectorize** 旨在高效地确定哪些存储的向量与您的查询最相似。

要实现搜索功能，您必须设置一个 Cloudflare 的 D1 数据库。在 D1 中，您可以存储应用程序的数据。然后，您将此数据更改为向量格式。当有人搜索并与向量匹配时，您可以向他们显示匹配的数据。

使用 `wrangler` 创建一个新的 D1 数据库：

```sh
npx wrangler d1 create database
```

然后，将上一个命令输出的配置详细信息粘贴到 [Wrangler 配置文件](/workers/wrangler/configuration/) 中：

<WranglerConfig>

```toml
# ... existing wrangler configuration

[[d1_databases]]
binding = "DB" # 在您的 Worker 的 env.DB 中可用
database_name = "database"
database_id = "abc-def-geh" # 将此替换为真实的 database_id (UUID)
```

</WranglerConfig>

在此应用程序中，我们将在 D1 中创建一个 `notes` 表，这将允许我们存储笔记并稍后在 Vectorize 中检索它们。要创建此表，请使用 `wrangler d1 execute` 运行一个 SQL 命令：

```sh
npx wrangler d1 execute database --remote --command "CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, text TEXT NOT NULL)"
```

现在，我们可以使用 `wrangler d1 execute` 向我们的数据库中添加一个新笔记：

```sh
npx wrangler d1 execute database --remote --command "INSERT INTO notes (text) VALUES ('最好的披萨配料是意大利辣香肠')"
```

## 5. 创建工作流

在我们开始创建笔记之前，我们将引入一个 [Cloudflare 工作流](/workflows)。这将允许我们定义一个持久的工作流，可以安全、稳健地执行 RAG 过程的所有步骤。

首先，将一个新的 `[[workflows]]` 块添加到您的 [Wrangler 配置文件](/workers/wrangler/configuration/) 中：

<WranglerConfig>

```toml
# ... existing wrangler configuration

[[workflows]]
name = "rag"
binding = "RAG_WORKFLOW"
class_name = "RAGWorkflow"
```

</WranglerConfig>

在 `src/index.js` 中，添加一个名为 `RAGWorkflow` 的新类，它扩展了 `WorkflowEntrypoint`：

```js
import { WorkflowEntrypoint } from "cloudflare:workers";

export class RAGWorkflow extends WorkflowEntrypoint {
	async run(event, step) {
		await step.do("example step", async () => {
			console.log("Hello World!");
		});
	}
}
```

此类将定义一个工作流步骤，该步骤将在控制台中记录“Hello World!”。您可以根据需要向工作流中添加任意数量的步骤。

就其本身而言，此工作流不会执行任何操作。要执行工作流，我们将调用 `RAG_WORKFLOW` 绑定，并传入工作流正常完成所需的任何参数。以下是我们如何调用工作流的示例：

```js
env.RAG_WORKFLOW.create({ params: { text } });
```

## 6. 创建笔记并将其添加到 Vectorize

为了扩展您的 Workers 函数以处理多个路由，我们将添加 `hono`，这是一个用于 Workers 的路由库。这将允许我们为向数据库中添加笔记创建一个新路由。使用 `npm` 安装 `hono`：

<PackageManagers pkg="hono" />

然后，将 `hono` 导入您的 `src/index.js` 文件中。您还应该更新 `fetch` 处理程序以使用 `hono`：

```js
import { Hono } from "hono";
const app = new Hono();

app.get("/", async (c) => {
	const answer = await c.env.AI.run("@cf/meta/llama-3-8b-instruct", {
		messages: [{ role: "user", content: `9 的平方根是多少？` }],
	});

	return c.json(answer);
});

export default app;
```

这将在根路径 `/` 处建立一个路由，其功能与先前版本的应用程序相同。

现在，我们可以更新工作流以开始将笔记添加到数据库中，并生成它们的相关嵌入。

此示例使用了 [`@cf/baai/bge-base-en-v1.5` 模型](/workers-ai/models/bge-base-en-v1.5/)，该模型可用于创建嵌入。嵌入存储在 [Vectorize](/vectorize/) 中，这是 Cloudflare 的向量数据库。用户查询也会转换为嵌入，以便在 Vectorize 中进行搜索。

```js
import { WorkflowEntrypoint } from "cloudflare:workers";

export class RAGWorkflow extends WorkflowEntrypoint {
	async run(event, step) {
		const env = this.env;
		const { text } = event.payload;

		const record = await step.do(`create database record`, async () => {
			const query = "INSERT INTO notes (text) VALUES (?) RETURNING *";

			const { results } = await env.DB.prepare(query).bind(text).run();

			const record = results[0];
			if (!record) throw new Error("Failed to create note");
			return record;
		});

		const embedding = await step.do(`generate embedding`, async () => {
			const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
				text: text,
			});
			const values = embeddings.data[0];
			if (!values) throw new Error("Failed to generate vector embedding");
			return values;
		});

		await step.do(`insert vector`, async () => {
			return env.VECTOR_INDEX.upsert([
				{
					id: record.id.toString(),
					values: embedding,
				},
			]);
		});
	}
}
```

工作流执行以下操作：

1. 接受一个 `text` 参数。
2. 在 D1 的 `notes` 表中插入一个新行，并检索新行的 `id`。
3. 使用 LLM 绑定的 `embeddings` 模型将 `text` 转换为向量。
4. 将 `id` 和 `vectors` 上传到 Vectorize 中的 `vector-index` 索引。

通过这样做，您将创建一个新的向量表示形式的笔记，可以用于稍后检索该笔记。

要完成代码，我们将添加一个路由，允许用户向数据库提交笔记。此路由将解析 JSON 请求正文，获取 `note` 参数，并创建一个新的工作流实例，传递参数：

```js
app.post("/notes", async (c) => {
	const { text } = await c.req.json();
	if (!text) return c.text("Missing text", 400);
	await c.env.RAG_WORKFLOW.create({ params: { text } });
	return c.text("Created note", 201);
});
```

## 7. 查询 Vectorize 以检索笔记

要完成您的代码，您可以更新根路径（`/`）以查询 Vectorize。您将把查询转换为向量，然后使用 `vector-index` 索引来查找最相似的向量。

`topK` 参数限制了函数返回的向量数量。例如，提供 `topK` 为 1 将仅返回基于查询的 _最相似_ 向量。将 `topK` 设置为 5 将返回 5 个最相似的向量。

给定一组相似的向量，您可以检索与存储在这些向量旁边的记录 ID 匹配的笔记。在这种情况下，我们只检索一个笔记 - 但您可以根据需要自定义此设置。

您可以将这些笔记的文本插入 LLM 绑定的提示中。这是检索增强生成（RAG）的基础：在 LLM 的提示中提供来自数据外部的附加上下文，以增强 LLM 生成的文本。

我们将更新提示以包含上下文，并要求 LLM 在回应时使用上下文：

```js
import { Hono } from "hono";
const app = new Hono();

// Existing post route...
// app.post('/notes', async (c) => { ... })

app.get("/", async (c) => {
	const question = c.req.query("text") || "9 的平方根是多少？";

	const embeddings = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", {
		text: question,
	});
	const vectors = embeddings.data[0];

	const vectorQuery = await c.env.VECTOR_INDEX.query(vectors, { topK: 1 });
	let vecId;
	if (
		vectorQuery.matches &&
		vectorQuery.matches.length > 0 &&
		vectorQuery.matches[0]
	) {
		vecId = vectorQuery.matches[0].id;
	} else {
		console.log("No matching vector found or vectorQuery.matches is empty");
	}

	let notes = [];
	if (vecId) {
		const query = `SELECT * FROM notes WHERE id = ?`;
		const { results } = await c.env.DB.prepare(query).bind(vecId).all();
		if (results) notes = results.map((vec) => vec.text);
	}

	const contextMessage = notes.length
		? `Context:\n${notes.map((note) => `- ${note}`).join("\n")}`
		: "";

	const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`;

	const { response: answer } = await c.env.AI.run(
		"@cf/meta/llama-3-8b-instruct",
		{
			messages: [
				...(notes.length ? [{ role: "system", content: contextMessage }] : []),
				{ role: "system", content: systemPrompt },
				{ role: "user", content: question },
			],
		},
	);

	return c.text(answer);
});

app.onError((err, c) => {
	return c.text(err);
});

export default app;
```

## 8. 添加 Anthropic Claude 模型（可选）

如果您正在处理较大的文档，您有选择使用 Anthropic 的 [Claude 模型](https://claude.ai/)，这些模型具有大型上下文窗口，非常适合 RAG 工作流。

要开始，安装 `@anthropic-ai/sdk` 包：

<PackageManagers pkg="@anthropic-ai/sdk" />

在 `src/index.js` 中，您可以更新 `GET /` 路由以检查 `ANTHROPIC_API_KEY` 环境变量。如果设置了该变量，我们可以使用 Anthropic SDK 生成文本。如果没有设置，我们将回退到现有的 Workers AI 代码：

```js
import Anthropic from '@anthropic-ai/sdk';

app.get('/', async (c) => {
  // ... Existing code
	const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`

	let modelUsed: string = ""
	let response = null

	if (c.env.ANTHROPIC_API_KEY) {
		const anthropic = new Anthropic({
			apiKey: c.env.ANTHROPIC_API_KEY
		})

		const model = "claude-3-5-sonnet-latest"
		modelUsed = model

		const message = await anthropic.messages.create({
			max_tokens: 1024,
			model,
			messages: [
				{ role: 'user', content: question }
			],
			system: [systemPrompt, notes ? contextMessage : ''].join(" ")
		})

		response = {
			response: message.content.map(content => content.text).join("\n")
		}
	} else {
		const model = "@cf/meta/llama-3.1-8b-instruct"
		modelUsed = model

		response = await c.env.AI.run(
			model,
			{
				messages: [
					...(notes.length ? [{ role: 'system', content: contextMessage }] : []),
					{ role: 'system', content: systemPrompt },
					{ role: 'user', content: question }
				]
			}
		)
	}

	if (response) {
		c.header('x-model-used', modelUsed)
		return c.text(response.response)
	} else {
		return c.text("We were unable to generate output", 500)
	}
})
```

最后，您需要在 Workers 应用程序中设置 `ANTHROPIC_API_KEY` 环境变量。您可以使用 `wrangler secret put` 来实现：

```sh
$ npx wrangler secret put ANTHROPIC_API_KEY
```

## 9. 删除笔记和向量

如果您不再需要笔记，可以从数据库中删除它。每次删除笔记时，您还需要从 Vectorize 中删除相应的向量。您可以通过在 `src/index.js` 文件中构建 `DELETE /notes/:id` 路由来实现这一点：

```js
app.delete("/notes/:id", async (c) => {
	const { id } = c.req.param();

	const query = `DELETE FROM notes WHERE id = ?`;
	await c.env.DB.prepare(query).bind(id).run();

	await c.env.VECTOR_INDEX.deleteByIds([id]);

	return c.status(204);
});
```

## 10. 文本分割（可选）

对于较大的文本块，建议将文本分割成较小的块。这允许 LLM 更有效地收集相关上下文，而无需检索大块文本。

为了实现这一点，我们将向项目中添加一个新的 NPM 包，`@langchain/textsplitters`：

<PackageManagers pkg="@langchain/textsplitters" />

此包提供的 `RecursiveCharacterTextSplitter` 类将文本分割成较小的块。它可以根据需要进行自定义，但默认配置在大多数情况下都有效：

```js
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

const text = "Some long piece of text...";

const splitter = new RecursiveCharacterTextSplitter({
	// These can be customized to change the chunking size
	// chunkSize: 1000,
	// chunkOverlap: 200,
});

const output = await splitter.createDocuments([text]);
console.log(output); // [{ pageContent: 'Some long piece of text...' }]
```

要使用此分割器，我们将更新工作流以将文本分割成较小的块。然后，我们将遍历这些块，并为每个文本块运行工作流的其余部分：

```js
export class RAGWorkflow extends WorkflowEntrypoint {
	async run(event, step) {
		const env = this.env;
		const { text } = event.payload;
		let texts = await step.do("split text", async () => {
			const splitter = new RecursiveCharacterTextSplitter();
			const output = await splitter.createDocuments([text]);
			return output.map((doc) => doc.pageContent);
		});

		console.log(
			"RecursiveCharacterTextSplitter generated ${texts.length} chunks",
		);

		for (const index in texts) {
			const text = texts[index];
			const record = await step.do(
				`create database record: ${index}/${texts.length}`,
				async () => {
					const query = "INSERT INTO notes (text) VALUES (?) RETURNING *";

					const { results } = await env.DB.prepare(query).bind(text).run();

					const record = results[0];
					if (!record) throw new Error("Failed to create note");
					return record;
				},
			);

			const embedding = await step.do(
				`generate embedding: ${index}/${texts.length}`,
				async () => {
					const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
						text: text,
					});
					const values = embeddings.data[0];
					if (!values) throw new Error("Failed to generate vector embedding");
					return values;
				},
			);

			await step.do(`insert vector: ${index}/${texts.length}`, async () => {
				return env.VECTOR_INDEX.upsert([
					{
						id: record.id.toString(),
						values: embedding,
					},
				]);
			});
		}
	}
}
```

现在，当向 `/notes` 端点提交大块文本时，它们将被分割成较小的块，并且每个块将由工作流处理。

## 11. 部署您的项目

如果您在[第 1 步](/workers/get-started/guide/#1-create-a-new-worker-project)中没有部署您的 Worker，请使用 Wrangler 将您的 Worker 部署到 `*.workers.dev` 子域、[自定义域](/workers/configuration/routing/custom-domains/)（如果您已配置），或者如果您没有配置任何子域或域，Wrangler 将在发布过程中提示您设置一个。

```sh
npx wrangler deploy
```

在 `<YOUR_WORKER>.<YOUR_SUBDOMAIN>.workers.dev` 预览您的 Worker。

:::note[注意]

当首次将您的 Worker 推送到 `*.workers.dev` 子域时，您可能会看到 [`523` 错误](/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-523/)，因为 DNS 正在传播。这些错误应在一分钟左右解决。

:::

## 相关资源

完整版本的此代码库可在 GitHub 上找到。它包括一个前端 UI 用于查询、添加和删除笔记，以及一个后端 API 用于与数据库和向量索引进行交互。您可以在这里找到它：[github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example](https://github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example/)。

要做更多：

- 探索 [检索增强生成（RAG）架构](/reference-architecture/diagrams/ai/ai-rag/) 的参考图表。
- 查看 Cloudflare 的 [AI 文档](/workers-ai)。
- 查看 [教程](/workers/tutorials/) 以在 Workers 上构建项目。
- 探索 [示例](/workers/examples/) 以尝试复制和粘贴 Worker 代码。
- 了解 Workers 的工作原理 [参考](/workers/reference/)。
- 了解 Workers 的功能和功能 [平台](/workers/platform/)。
- 设置 [Wrangler](/workers/wrangler/install-and-update/) 以编程方式创建、测试和部署您的 Worker 项目。

---

# 使用 Workers AI 构建带自动转录功能的语音笔记应用

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-voice-notes-app-with-auto-transcription/

import { Render, PackageManagers, Tabs, TabItem } from "~/components";

在本教程中，您将学习如何创建一个带有语音录音自动转录和可选后处理功能的语音笔记应用。构建该应用将使用以下工具：

- Workers AI 用于转录语音录音和可选的后处理
- D1 数据库用于存储笔记
- R2 存储用于存储语音录音
- Nuxt 框架用于构建全栈应用
- Workers 用于部署项目

## 先决条件

要继续，您需要：

<Render file="prereqs" product="workers" />

## 1. 创建一个新的 Worker 项目

使用带有 `nuxt` 框架预设的 `c3` CLI 创建一个新的 Worker 项目。

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args="voice-notes --framework=nuxt"
/>

### 安装附加依赖项

切换到新创建的项目目录

```sh
cd voice-notes
```

并安装以下依赖项：

<PackageManagers pkg="@nuxt/ui @vueuse/core @iconify-json/heroicons" />

然后将 `@nuxt/ui` 模块添加到 `nuxt.config.ts` 文件中：

```ts title="nuxt.config.ts"
export default defineNuxtConfig({
	//..

	modules: ["nitro-cloudflare-dev", "@nuxt/ui"],

	//..
});
```

### [可选] 迁移到 Nuxt 4 兼容模式

迁移到 Nuxt 4 兼容模式可确保您的应用程序与 Nuxt 的未来更新保持向前兼容。

在项目的根目录中创建一个新的 `app` 文件夹，并将 `app.vue` 文件移动到其中。此外，将以下内容添加到您的 `nuxt.config.ts` 文件中：

```ts title="nuxt.config.ts"
export default defineNuxtConfig({
	//..

	future: {
		compatibilityVersion: 4,
	},

	//..
});
```

:::note
本教程的其余部分将使用 `app` 文件夹来存放客户端代码。如果您没有进行此更改，您应该继续使用项目的根目录。
:::

### 启动本地开发服务器

此时，您可以通过启动本地开发服务器来测试您的应用程序：

<PackageManagers type="run" args="dev" />

如果一切设置正确，您应该在 `http://localhost:3000` 上看到一个 Nuxt 欢迎页面。

## 2. 创建转录 API 端点

此 API 利用 Workers AI 来转录语音录音。要在项目中使用 Workers AI，您首先需要将其绑定到 Worker。

<Render file="ai-local-usage-charges" product="workers" />

将 `AI` 绑定添加到 Wrangler 文件中。

```toml title="wrangler.toml"
[ai]
binding = "AI"
```

配置 `AI` 绑定后，运行 `cf-typegen` 命令以生成必要的 Cloudflare 类型定义。这使得类型定义在服务器事件上下文中可用。

<PackageManagers type="run" args="cf-typegen" />

通过在 `/server/api` 目录中创建 `transcribe.post.ts` 文件来创建一个转录 `POST` 端点。

```ts title="server/api/transcribe.post.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const form = await readFormData(event);
	const blob = form.get("audio") as Blob;
	if (!blob) {
		throw createError({
			statusCode: 400,
			message: "缺少要转录的音频 blob",
		});
	}

	try {
		const response = await cloudflare.env.AI.run("@cf/openai/whisper", {
			audio: [...new Uint8Array(await blob.arrayBuffer())],
		});

		return response.text;
	} catch (err) {
		console.error("转录音频时出错:", err);
		throw createError({
			statusCode: 500,
			message: "转录音频失败。请重试。",
		});
	}
});
```

上述代码执行以下操作：

1.  从事件中提取音频 blob。
2.  使用 `@cf/openai/whisper` 模型转录 blob 并将转录文本作为响应返回。

## 3. 为将音频录音上传到 R2 创建 API 端点

在将音频录音上传到 `R2` 之前，您需要先创建一个存储桶。您还需要将 R2 绑定添加到您的 Wrangler 文件并重新生成 Cloudflare 类型定义。

创建一个 `R2` 存储桶。

<PackageManagers
	type="exec"
	pkg="wrangler"
	args="r2 bucket create <BUCKET_NAME>"
/>

将存储绑定添加到您的 Wrangler 文件中。

```toml title="wrangler.toml"
[[r2_buckets]]
binding = "R2"
bucket_name = "<BUCKET_NAME>"
```

最后，通过重新运行 `cf-typegen` 脚本生成类型定义。

现在您已准备好创建上传端点。在您的 `server/api` 目录中创建一个新的 `upload.put.ts` 文件，并向其添加以下代码：

```ts title="server/api/upload.put.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const form = await readFormData(event);
	const files = form.getAll("files") as File[];
	if (!files) {
		throw createError({ statusCode: 400, message: "缺少文件" });
	}

	const uploadKeys: string[] = [];
	for (const file of files) {
		const obj = await cloudflare.env.R2.put(`recordings/${file.name}`, file);
		if (obj) {
			uploadKeys.push(obj.key);
		}
	}

	return uploadKeys;
});
```

上述代码执行以下操作：

1.  `files` 变量使用 `form.getAll()` 检索客户端发送的所有文件，这允许在单个请求中进行多次上传。
2.  使用您之前创建的绑定 (`R2`) 将文件上传到 R2 存储桶。

:::note
`recordings/` 前缀将上传的文件组织到存储桶中的专用文件夹中。这在向客户端提供这些录音时也会派上用场（稍后介绍）。
:::

## 4. 创建 API 端点以保存笔记条目

在创建端点之前，您需要执行与 R2 存储桶类似但有一些额外步骤的步骤，以准备一个笔记表。

创建一个 `D1` 数据库。

<PackageManagers type="exec" pkg="wrangler" args="d1 create <DB_NAME>" />

将 D1 绑定添加到 Wrangler 文件。您可以从 `d1 create` 命令的输出中获取 `DB_ID`。

```toml title="wrangler.toml"
[[d1_databases]]
binding = "DB"
database_name = "<DB_NAME>"
database_id = "<DB_ID>"
```

和以前一样，重新运行 `cf-typegen` 命令以生成类型。

接下来，创建一个数据库迁移。

<PackageManagers
	type="exec"
	pkg="wrangler"
	args={`d1 migrations create <DB_NAME> "create notes table"`}
/>

这将在项目的根目录中创建一个新的 `migrations` 文件夹，并向其中添加一个空的 `0001_create_notes_table.sql` 文件。用下面的代码替换此文件的内容。

```sql
CREATE TABLE IF NOT EXISTS notes (
 id INTEGER PRIMARY KEY AUTOINCREMENT,
 text TEXT NOT NULL,
 created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
 updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
 audio_urls TEXT
);
```

然后应用此迁移以创建 `notes` 表。

<PackageManagers
	type="exec"
	pkg="wrangler"
	args="d1 migrations apply <DB_NAME>"
/>

:::note
上述命令将在本地创建笔记表。要在您的远程生产数据库上应用迁移，请使用 `--remote` 标志。
:::

现在您可以创建 API 端点。在 `server/api/notes` 目录中创建一个新文件 `index.post.ts`，并将其内容更改为以下内容：

```ts title="server/api/notes/index.post.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const { text, audioUrls } = await readBody(event);
	if (!text) {
		throw createError({
			statusCode: 400,
			message: "Missing note text",
		});
	}

	try {
		await cloudflare.env.DB.prepare(
			"INSERT INTO notes (text, audio_urls) VALUES (?1, ?2)",
		)
			.bind(text, audioUrls ? JSON.stringify(audioUrls) : null)
			.run();

		return setResponseStatus(event, 201);
	} catch (err) {
		console.error("Error creating note:", err);
		throw createError({
			statusCode: 500,
			message: "Failed to create note. Please try again.",
		});
	}
});
```

The above does the following:

1. Extracts the text, and optional audioUrls from the event.
2. Saves it to the database after converting the audioUrls to a `JSON` string.

## 5. Handle note creation on the client-side

Now you're ready to work on the client side. Let's start by tackling the note creation part first.

### Recording user audio

Create a composable to handle audio recording using the MediaRecorder API. This will be used to record notes through the user's microphone.

Create a new file `useMediaRecorder.ts` in the `app/composables` folder, and add the following code to it:

```ts title="app/composables/useMediaRecorder.ts"
interface MediaRecorderState {
	isRecording: boolean;
	recordingDuration: number;
	audioData: Uint8Array | null;
	updateTrigger: number;
}

export function useMediaRecorder() {
	const state = ref<MediaRecorderState>({
		isRecording: false,
		recordingDuration: 0,
		audioData: null,
		updateTrigger: 0,
	});

	let mediaRecorder: MediaRecorder | null = null;
	let audioContext: AudioContext | null = null;
	let analyser: AnalyserNode | null = null;
	let animationFrame: number | null = null;
	let audioChunks: Blob[] | undefined = undefined;

	const updateAudioData = () => {
		if (!analyser || !state.value.isRecording || !state.value.audioData) {
			if (animationFrame) {
				cancelAnimationFrame(animationFrame);
				animationFrame = null;
			}

			return;
		}

		analyser.getByteTimeDomainData(state.value.audioData);
		state.value.updateTrigger += 1;
		animationFrame = requestAnimationFrame(updateAudioData);
	};

	const startRecording = async () => {
		try {
			const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

			audioContext = new AudioContext();
			analyser = audioContext.createAnalyser();

			const source = audioContext.createMediaStreamSource(stream);
			source.connect(analyser);

			mediaRecorder = new MediaRecorder(stream);
			audioChunks = [];

			mediaRecorder.ondataavailable = (e: BlobEvent) => {
				audioChunks?.push(e.data);
				state.value.recordingDuration += 1;
			};

			state.value.audioData = new Uint8Array(analyser.frequencyBinCount);
			state.value.isRecording = true;
			state.value.recordingDuration = 0;
			state.value.updateTrigger = 0;
			mediaRecorder.start(1000);

			updateAudioData();
		} catch (err) {
			console.error("Error accessing microphone:", err);
			throw err;
		}
	};

	const stopRecording = async () => {
		return await new Promise<Blob>((resolve) => {
			if (mediaRecorder && state.value.isRecording) {
				mediaRecorder.onstop = () => {
					const blob = new Blob(audioChunks, { type: "audio/webm" });
					audioChunks = undefined;

					state.value.recordingDuration = 0;
					state.value.updateTrigger = 0;
					state.value.audioData = null;

					resolve(blob);
				};

				state.value.isRecording = false;
				mediaRecorder.stop();
				mediaRecorder.stream.getTracks().forEach((track) => track.stop());

				if (animationFrame) {
					cancelAnimationFrame(animationFrame);
					animationFrame = null;
				}

				audioContext?.close();
				audioContext = null;
			}
		});
	};

	onUnmounted(() => {
		stopRecording();
	});

	return {
		state: readonly(state),
		startRecording,
		stopRecording,
	};
}
```

The above code does the following:

1. Exposes functions to start and stop audio recordings in a Vue application.
2. Captures audio input from the user's microphone using MediaRecorder API.
3. Processes real-time audio data for visualization using AudioContext and AnalyserNode.
4. Stores recording state including duration and recording status.
5. Maintains chunks of audio data and combines them into a final audio blob when recording stops.
6. Updates audio visualization data continuously using animation frames while recording.
7. Automatically cleans up all audio resources when recording stops or component unmounts.
8. Returns audio recordings in webm format for further processing.

### Create a component for note creation

This component allows users to create notes by either typing or recording audio. It also handles audio transcription and uploading the recordings to the server.

Create a new file named `CreateNote.vue` inside the `app/components` folder. Add the following template code to the newly created file:

```vue title="app/components/CreateNote.vue"
<template>
	<div class="flex flex-col gap-y-5">
		<div
			class="flex h-full flex-col gap-y-4 overflow-hidden p-px md:flex-row md:gap-x-6"
		>
			<UCard
				:ui="{
					base: 'h-full flex flex-col flex-1',
					body: { base: 'flex-grow' },
					header: { base: 'md:h-[72px]' },
				}"
			>
				<template #header>
					<h3
						class="text-base font-medium text-gray-600 md:text-lg dark:text-gray-300"
					>
						Note transcript
					</h3>
				</template>
				<UTextarea
					v-model="note"
					placeholder="Type your note or use voice recording..."
					size="lg"
					autofocus
					:disabled="loading || isTranscribing || state.isRecording"
					:rows="10"
				/>
			</UCard>

			<UCard
				class="order-first shrink-0 md:order-none md:flex md:h-full md:w-96 md:flex-col"
				:ui="{
					body: { base: 'max-h-36 md:max-h-none md:flex-grow overflow-y-auto' },
				}"
			>
				<template #header>
					<h3
						class="text-base font-medium text-gray-600 md:text-lg dark:text-gray-300"
					>
						Note recordings
					</h3>

					<UTooltip
						:text="state.isRecording ? 'Stop Recording' : 'Start Recording'"
					>
						<UButton
							:icon="
								state.isRecording
									? 'i-heroicons-stop-circle'
									: 'i-heroicons-microphone'
							"
							:color="state.isRecording ? 'red' : 'primary'"
							:loading="isTranscribing"
							@click="toggleRecording"
						/>
					</UTooltip>
				</template>

				<AudioVisualizer
					v-if="state.isRecording"
					class="mb-2 h-14 w-full rounded-lg bg-gray-50 p-2 dark:bg-gray-800"
					:audio-data="state.audioData"
					:data-update-trigger="state.updateTrigger"
				/>

				<div
					v-else-if="isTranscribing"
					class="mb-2 flex h-14 items-center justify-center gap-x-3 rounded-lg bg-gray-50 p-2 text-gray-500 dark:bg-gray-800 dark:text-gray-400"
				>
					<UIcon
						name="i-heroicons-arrow-path-20-solid"
						class="h-6 w-6 animate-spin"
					/>
					Transcribing...
				</div>

				<RecordingsList :recordings="recordings" @delete="deleteRecording" />

				<div
					v-if="!recordings.length && !state.isRecording && !isTranscribing"
					class="flex h-full items-center justify-center text-gray-500 dark:text-gray-400"
				>
					No recordings...
				</div>
			</UCard>
		</div>

		<UDivider />

		<div class="flex justify-end gap-x-4">
			<UButton
				icon="i-heroicons-trash"
				color="gray"
				size="lg"
				variant="ghost"
				:disabled="loading"
				@click="clearNote"
			>
				Clear
			</UButton>
			<UButton
				icon="i-heroicons-cloud-arrow-up"
				size="lg"
				:loading="loading"
				:disabled="!note.trim() && !state.isRecording"
				@click="saveNote"
			>
				Save
			</UButton>
		</div>
	</div>
</template>
```

The above template results in the following:

1. A panel with a `textarea` inside to type the note manually.
2. Another panel to manage start/stop of an audio recording, and show the recordings done already.
3. A bottom panel to reset or save the note (along with the recordings).

Now, add the following code below the template code in the same file:

```vue title="app/components/CreateNote.vue"
<script setup lang="ts">
import type { Recording, Settings } from "~~/types";

const emit = defineEmits<{
	(e: "created"): void;
}>();

const note = ref("");
const loading = ref(false);
const isTranscribing = ref(false);
const { state, startRecording, stopRecording } = useMediaRecorder();
const recordings = ref<Recording[]>([]);

const handleRecordingStart = async () => {
	try {
		await startRecording();
	} catch (err) {
		console.error("Error accessing microphone:", err);
		useToast().add({
			title: "Error",
			description: "Could not access microphone. Please check permissions.",
			color: "red",
		});
	}
};

const handleRecordingStop = async () => {
	let blob: Blob | undefined;

	try {
		blob = await stopRecording();
	} catch (err) {
		console.error("Error stopping recording:", err);
		useToast().add({
			title: "Error",
			description: "Failed to record audio. Please try again.",
			color: "red",
		});
	}

	if (blob) {
		try {
			const transcription = await transcribeAudio(blob);

			note.value += note.value ? "\n\n" : "";
			note.value += transcription ?? "";

			recordings.value.unshift({
				url: URL.createObjectURL(blob),
				blob,
				id: `${Date.now()}`,
			});
		} catch (err) {
			console.error("Error transcribing audio:", err);
			useToast().add({
				title: "Error",
				description: "Failed to transcribe audio. Please try again.",
				color: "red",
			});
		}
	}
};

const toggleRecording = () => {
	if (state.value.isRecording) {
		handleRecordingStop();
	} else {
		handleRecordingStart();
	}
};

const transcribeAudio = async (blob: Blob) => {
	try {
		isTranscribing.value = true;
		const formData = new FormData();
		formData.append("audio", blob);

		return await $fetch("/api/transcribe", {
			method: "POST",
			body: formData,
		});
	} finally {
		isTranscribing.value = false;
	}
};

const clearNote = () => {
	note.value = "";
	recordings.value = [];
};

const saveNote = async () => {
	if (!note.value.trim()) return;

	loading.value = true;

	const noteToSave: { text: string; audioUrls?: string[] } = {
		text: note.value.trim(),
	};

	try {
		if (recordings.value.length) {
			noteToSave.audioUrls = await uploadRecordings();
		}

		await $fetch("/api/notes", {
			method: "POST",
			body: noteToSave,
		});

		useToast().add({
			title: "Success",
			description: "Note saved successfully",
			color: "green",
		});

		note.value = "";
		recordings.value = [];

		emit("created");
	} catch (err) {
		console.error("Error saving note:", err);
		useToast().add({
			title: "Error",
			description: "Failed to save note",
			color: "red",
		});
	} finally {
		loading.value = false;
	}
};

const deleteRecording = (recording: Recording) => {
	recordings.value = recordings.value.filter((r) => r.id !== recording.id);
};

const uploadRecordings = async () => {
	if (!recordings.value.length) return;

	const formData = new FormData();
	recordings.value.forEach((recording) => {
		formData.append("files", recording.blob, recording.id + ".webm");
	});

	const uploadKeys = await $fetch("/api/upload", {
		method: "PUT",
		body: formData,
	});

	return uploadKeys;
};
</script>
```

The above code does the following:

1. When a recording is stopped by calling `handleRecordingStop` function, the audio blob is sent for transcribing to the transcribe API endpoint.
2. The transcription response text is appended to the existing textarea content.
3. When the note is saved by calling the `saveNote` function, the audio recordings are uploaded first to R2 by using the upload endpoint we created earlier. Then, the actual note content along with the audioUrls (the R2 object keys) are saved by calling the notes post endpoint.

### Create a new page route for showing the component

You can use this component in a Nuxt page to show it to the user. But before that you need to modify your `app.vue` file. Update the content of your `app.vue` to the following:

```vue title="/app/app.vue"
<template>
	<NuxtRouteAnnouncer />
	<NuxtLoadingIndicator />
	<div class="flex h-screen flex-col md:flex-row">
		<USlideover
			v-model="isDrawerOpen"
			class="md:hidden"
			side="left"
			:ui="{ width: 'max-w-xs' }"
		>
			<AppSidebar :links="links" @hide-drawer="isDrawerOpen = false" />
		</USlideover>

		<!-- The App Sidebar -->
		<AppSidebar :links="links" class="hidden md:block md:w-64" />

		<div class="h-full min-w-0 flex-1 bg-gray-50 dark:bg-gray-950">
			<!-- The App Header -->
			<AppHeader :title="title" @show-drawer="isDrawerOpen = true">
				<template #actions v-if="route.path === '/'">
					<UButton icon="i-heroicons-plus" @click="navigateTo('/new')">
						New Note
					</UButton>
				</template>
			</AppHeader>

			<!-- Main Page Content -->
			<main class="h-[calc(100vh-3.5rem)] overflow-y-auto p-4 sm:p-6">
				<NuxtPage />
			</main>
		</div>
	</div>
	<UNotifications />
</template>

<script setup lang="ts">
const isDrawerOpen = ref(false);
const links = [
	{
		label: "Notes",
		icon: "i-heroicons-document-text",
		to: "/",
		click: () => (isDrawerOpen.value = false),
	},
	{
		label: "Settings",
		icon: "i-heroicons-cog",
		to: "/settings",
		click: () => (isDrawerOpen.value = false),
	},
];

const route = useRoute();
const title = computed(() => {
	const activeLink = links.find((l) => l.to === route.path);
	if (activeLink) {
		return activeLink.label;
	}

	return "";
});
</script>
```

The above code allows for a nuxt page to be shown to the user, apart from showing an app header and a navigation sidebar.

Next, add a new file named `new.vue` inside the `app/pages` folder, add the following code to it:

```vue title="app/pages/new.vue"
<template>
	<UModal v-model="isOpen" fullscreen>
		<UCard
			:ui="{
				base: 'h-full flex flex-col',
				rounded: '',
				body: {
					base: 'flex-grow overflow-hidden',
				},
			}"
		>
			<template #header>
				<h2 class="text-xl leading-6 font-semibold md:text-2xl">Create note</h2>
				<UButton
					color="gray"
					variant="ghost"
					icon="i-heroicons-x-mark-20-solid"
					@click="closeModal"
				/>
			</template>

			<CreateNote class="mx-auto h-full max-w-7xl" @created="closeModal" />
		</UCard>
	</UModal>
</template>

<script setup lang="ts">
const isOpen = ref(true);

const router = useRouter();
const closeModal = () => {
	isOpen.value = false;

	if (window.history.length > 2) {
		router.back();
	} else {
		navigateTo({
			path: "/",
			replace: true,
		});
	}
};
</script>
```

The above code shows the `CreateNote` component inside a modal, and navigates back to the home page on successful note creation.

## 6. Showing the notes on the client side

To show the notes from the database on the client side, create an API endpoint first that will interact with the database.

### Create an API endpoint to fetch notes from the database

Create a new file named `index.get.ts` inside the `server/api/notes` directory, and add the following code to it:

```ts title="server/api/index.get.ts"
import type { Note } from "~~/types";

export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const res = await cloudflare.env.DB.prepare(
		`SELECT
      id,
      text,
      audio_urls AS audioUrls,
      created_at AS createdAt,
      updated_at AS updatedAt
    FROM notes
    ORDER BY created_at DESC
    LIMIT 50;`,
	).all<Omit<Note, "audioUrls"> & { audioUrls: string | null }>();

	return res.results.map((note) => ({
		...note,
		audioUrls: note.audioUrls ? JSON.parse(note.audioUrls) : undefined,
	}));
});
```

The above code fetches the last 50 notes from the database, ordered by their creation date in descending order. The `audio_urls` field is stored as a string in the database, but it's converted to an array using `JSON.parse` to handle multiple audio files seamlessly on the client side.

Next, create a page named `index.vue` inside the `app/pages` directory. This will be the home page of the application. Add the following code to it:

```vue title="app/pages/index.vue"
<template>
	<div :class="{ 'flex h-full': !notes?.length }">
		<div v-if="notes?.length" class="space-y-4 sm:space-y-6">
			<NoteCard v-for="note in notes" :key="note.id" :note="note" />
		</div>
		<div
			v-else
			class="flex-1 space-y-2 self-center text-center text-gray-500 dark:text-gray-400"
		>
			<h2 class="text-2xl md:text-3xl">No notes created</h2>
			<p>Get started by creating your first note</p>
		</div>
	</div>
</template>

<script setup lang="ts">
import type { Note } from "~~/types";

const { data: notes } = await useFetch<Note[]>("/api/notes");
</script>
```

The above code fetches the notes from the database by calling the `/api/notes` endpoint you created just now, and renders them as note cards.

### Serving the saved recordings from R2

To be able to play the audio recordings of these notes, you need to serve the saved recordings from the R2 storage.

Create a new file named `[...pathname].get.ts` inside the `server/routes/recordings` directory, and add the following code to it:

:::note
The `...` prefix in the file name makes it a catch all route. This allows it to receive all events that are meant for paths starting with `/recordings` prefix. This is where the `recordings` prefix that was added previously while saving the recordings becomes helpful.
:::

```ts title="server/routes/recordings/[...pathname].get.ts"
export default defineEventHandler(async (event) => {
	const { cloudflare, params } = event.context;

	const { pathname } = params || {};

	return cloudflare.env.R2.get(`recordings/${pathname}`);
});
```

The above code extracts the path name from the event params, and serves the saved recording matching that object key from the R2 bucket.

## 7. [Optional] Post Processing the transcriptions

Even though the speech-to-text transcriptions models perform satisfactorily, sometimes you want to post process the transcriptions for various reasons. It could be to remove any discrepancy, or to change the tone/style of the final text.

### Create a settings page

Create a new file named `settings.vue` in the `app/pages` folder, and add the following code to it:

```vue title="app/pages/settings.vue"
<template>
	<UCard>
		<template #header>
			<div>
				<h2 class="text-base leading-6 font-semibold md:text-lg">
					Post Processing
				</h2>
				<p class="mt-1 text-sm text-gray-500 dark:text-gray-400">
					Configure post-processing of recording transcriptions with AI models.
				</p>
				<p class="mt-1 text-sm text-gray-500 italic dark:text-gray-400">
					Settings changes are auto-saved locally.
				</p>
			</div>
		</template>

		<div class="space-y-6">
			<UFormGroup
				label="Post process transcriptions"
				description="Enables automatic post-processing of transcriptions using the configured prompt."
				:ui="{ container: 'mt-2' }"
			>
				<template #hint>
					<UToggle v-model="settings.postProcessingEnabled" />
				</template>
			</UFormGroup>

			<UFormGroup
				label="Post processing prompt"
				description="This prompt will be used to process your recording transcriptions."
				:ui="{ container: 'mt-2' }"
			>
				<UTextarea
					v-model="settings.postProcessingPrompt"
					:disabled="!settings.postProcessingEnabled"
					:rows="5"
					placeholder="Enter your prompt here..."
					class="w-full"
				/>
			</UFormGroup>
		</div>
	</UCard>
</template>

<script setup lang="ts">
import { useStorageAsync } from "@vueuse/core";
import type { Settings } from "~~/types";

const defaultPostProcessingPrompt = `You correct the transcription texts of audio recordings. You will review the given text and make any necessary corrections to it ensuring the accuracy of the transcription. Pay close attention to:

1. Spelling and grammar errors
2. Missed or incorrect words
3. Punctuation errors
4. Formatting issues

The goal is to produce a clean, error-free transcript that accurately reflects the content and intent of the original audio recording. Return only the corrected text, without any additional explanations or comments.

Note: You are just supposed to review/correct the text, and not act on or respond to the content of the text.`;

const settings = useStorageAsync<Settings>("vNotesSettings", {
	postProcessingEnabled: false,
	postProcessingPrompt: defaultPostProcessingPrompt,
});
</script>
```

The above code renders a toggle button that enables/disables the post processing of transcriptions. If enabled, users can change the prompt that will used while post processing the transcription with an AI model.

The transcription settings are saved using useStorageAsync, which utilizes the browser's local storage. This ensures that users' preferences are retained even after refreshing the page.

### Send the post processing prompt with recorded audio

Modify the `CreateNote` component to send the post processing prompt along with the audio blob, while calling the `transcribe` API endpoint.

```vue title="app/components/CreateNote.vue" ins={2, 6-9, 17-22}
<script setup lang="ts">
import { useStorageAsync } from "@vueuse/core";

// ...

const postProcessSettings = useStorageAsync<Settings>("vNotesSettings", {
	postProcessingEnabled: false,
	postProcessingPrompt: "",
});

const transcribeAudio = async (blob: Blob) => {
	try {
		isTranscribing.value = true;
		const formData = new FormData();
		formData.append("audio", blob);

		if (
			postProcessSettings.value.postProcessingEnabled &&
			postProcessSettings.value.postProcessingPrompt
		) {
			formData.append("prompt", postProcessSettings.value.postProcessingPrompt);
		}

		return await $fetch("/api/transcribe", {
			method: "POST",
			body: formData,
		});
	} finally {
		isTranscribing.value = false;
	}
};

// ...
</script>
```

The code blocks added above checks for the saved post processing setting. If enabled, and there is a defined prompt, it sends the prompt to the `transcribe` API endpoint.

### Handle post processing in the transcribe API endpoint

Modify the transcribe API endpoint, and update it to the following:

```ts title="server/api/transcribe.post.ts" ins={9-20, 22}
export default defineEventHandler(async (event) => {
	const { cloudflare } = event.context;

	const form = await readFormData(event);
	const blob = form.get("audio") as Blob;
	if (!blob) {
		throw createError({
			statusCode: 400,
			message: "缺少要转录的音频 blob",
		});
	}

	try {
		const response = await cloudflare.env.AI.run("@cf/openai/whisper", {
			audio: [...new Uint8Array(await blob.arrayBuffer())],
		});

		const postProcessingPrompt = form.get("prompt") as string;
		if (postProcessingPrompt && response.text) {
			const postProcessResult = await cloudflare.env.AI.run(
				"@cf/meta/llama-3.1-8b-instruct",
				{
					temperature: 0.3,
					prompt: `${postProcessingPrompt}.\n\nText:\n\n${response.text}\n\nResponse:`,
				},
			);

			return (postProcessResult as { response?: string }).response;
		} else {
			return response.text;
		}
	} catch (err) {
		console.error("转录音频时出错:", err);
		throw createError({
			statusCode: 500,
			message: "转录音频失败。请重试。",
		});
	}
});
```

The above code does the following:

1. Extracts the post processing prompt from the event FormData.
2. If present, it calls the Workers AI API to process the transcription text using the `@cf/meta/llama-3.1-8b-instruct` model.
3. Finally, it returns the response from Workers AI to the client.

## 8. Deploy the application

Now you are ready to deploy the project to a `.workers.dev` sub-domain by running the deploy command.

<PackageManagers type="run" args="deploy" />

You can preview your application at `<YOUR_WORKER>.<YOUR_SUBDOMAIN>.workers.dev`.

:::note
If you used `pnpm` as your package manager, you may face build errors like `"stdin" is not exported by "node_modules/.pnpm/unenv@1.10.0/node_modules/unenv/runtime/node/process/index.mjs"`. To resolve it, you can try hoisting your node modules with the [`shamefully-hoist-true`](https://pnpm.io/npmrc) option.
:::

## Conclusion

In this tutorial, you have gone through the steps of building a voice notes application using Nuxt 3, Cloudflare Workers, D1, and R2 storage. You learnt to:

- Set up the backend to store and manage notes
- Create API endpoints to fetch and display notes
- Handle audio recordings
- Implement optional post-processing for transcriptions
- Deploy the application using the Cloudflare module syntax

The complete source code of the project is available on GitHub. You can go through it to see the code for various frontend components not covered in the article. You can find it here: [github.com/ra-jeev/vnotes](https://github.com/ra-jeev/vnotes).

---

# 使用 Cloudflare Workers AI 的 Whisper-large-v3-turbo

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking/

在本教程中，您将学习如何：

- **转录大型音频文件：** 使用 Cloudflare Workers AI 的 [Whisper-large-v3-turbo](/workers-ai/models/whisper-large-v3-turbo/) 模型执行自动语音识别（ASR）或翻译。
- **处理大型文件：** 将大型音频文件分割成更小的块进行处理，这有助于克服内存和执行时间的限制。
- **使用 Cloudflare Workers 进行部署：** 在无服务器环境中创建可扩展、低延迟的转录管道。

## 1：创建一个新的 Cloudflare Worker 项目

import { Render, PackageManagers, WranglerConfig } from "~/components";

<Render file="prereqs" product="workers" />

您将使用 `create-cloudflare` CLI (C3) 创建一个新的 Worker 项目。[C3](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) 是一个命令行工具，旨在帮助您设置和部署新的应用程序到 Cloudflare。

通过运行以下命令创建一个名为 `whisper-tutorial` 的新项目：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"whisper-tutorial"}
/>

运行 `npm create cloudflare@latest` 将提示您安装 [`create-cloudflare` 包](https://www.npmjs.com/package/create-cloudflare)，并引导您完成设置。C3 还将安装 [Wrangler](/workers/wrangler/)，即 Cloudflare 开发者平台 CLI。

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "TypeScript",
	}}
/>

这将创建一个新的 `whisper-tutorial` 目录。您的新 `whisper-tutorial` 目录将包括：

- `src/index.ts` 中的一个 `"Hello World"` [Worker](/workers/get-started/guide/#3-write-code)。
- 一个 [`wrangler.jsonc`](/workers/wrangler/configuration/) 配置文件。

转到您的应用程序目录：

```sh
cd whisper-tutorial
```

## 2. 将您的 Worker 连接到 Workers AI

您必须为您的 Worker 创建一个 AI 绑定以连接到 Workers AI。[绑定](/workers/runtime-apis/bindings/)允许您的 Workers 与 Cloudflare 开发者平台上的资源（如 Workers AI）进行交互。

要将 Workers AI 绑定到您的 Worker，请将以下内容添加到 `wrangler.toml` 文件的末尾：

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

您的绑定在您的 Worker 代码中的 [`env.AI`](/workers/runtime-apis/handlers/fetch/) 上[可用](/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format)。

## 3. 配置 Wrangler

在您的 wrangler 文件中，添加或更新以下设置以启用 Node.js API 和 polyfill（兼容性日期为 2024-09-23 或更晚）：

<WranglerConfig>

```toml title="wrangler.toml"
compatibility_flags = [ "nodejs_compat" ]
compatibility_date = "2024-09-23"
```

</WranglerConfig>

## 4. 使用分块处理大型音频文件

将 `src/index.ts` 文件的内容替换为以下集成代码。此示例演示了如何：

(1) 从查询参数中提取音频文件 URL。

(2) 在明确遵循重定向的情况下获取音频文件。

(3) 将音频文件分割成更小的块（例如 1 MB 的块）。

(4) 通过 Cloudflare AI 绑定使用 Whisper-large-v3-turbo 模型转录每个块。

(5) 以纯文本形式返回聚合的转录。

```ts
import { Buffer } from "node:buffer";
import type { Ai } from "workers-ai";

export interface Env {
	AI: Ai;
	// 如果需要，添加您的 KV 命名空间以存储转录。
	// MY_KV_NAMESPACE: KVNamespace;
}

/**
 * 从提供的 URL 获取音频文件并将其分割成块。
 * 此函数明确遵循重定向。
 *
 * @param audioUrl - 音频文件的 URL。
 * @returns 一个 ArrayBuffer 数组，每个代表一个音频块。
 */
async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> {
	const response = await fetch(audioUrl, { redirect: "follow" });
	if (!response.ok) {
		throw new Error(`获取音频失败：${response.status}`);
	}
	const arrayBuffer = await response.arrayBuffer();

	// 示例：将音频分割成 1MB 的块。
	const chunkSize = 1024 * 1024; // 1MB
	const chunks: ArrayBuffer[] = [];
	for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) {
		const chunk = arrayBuffer.slice(i, i + chunkSize);
		chunks.push(chunk);
	}
	return chunks;
}

/**
 * 使用 Whisper‑large‑v3‑turbo 模型转录单个音频块。
 * 该函数将音频块转换为 Base64 编码的字符串，并
 * 通过 AI 绑定将其发送到模型。
 *
 * @param chunkBuffer - 作为 ArrayBuffer 的音频块。
 * @param env - Cloudflare Worker 环境，包括 AI 绑定。
 * @returns 来自模型的转录文本。
 */
async function transcribeChunk(
	chunkBuffer: ArrayBuffer,
	env: Env,
): Promise<string> {
	const base64 = Buffer.from(chunkBuffer, "binary").toString("base64");
	const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", {
		audio: base64,
		// 可选参数（如果需要，取消注释并设置）：
		// task: "transcribe", // 或 "translate"
		// language: "en",
		// vad_filter: "false",
		// initial_prompt: "如果需要，提供上下文。",
		// prefix: "转录：",
	});
	return res.text; // 假设转录结果包括一个 "text" 属性。
}

/**
 * 主 fetch 处理程序。它提取 'url' 查询参数，获取音频，
 * 以块为单位处理它，并返回完整的转录。
 */
export default {
	async fetch(
		request: Request,
		env: Env,
		ctx: ExecutionContext,
	): Promise<Response> {
		// 从查询参数中提取音频 URL。
		const { searchParams } = new URL(request.url);
		const audioUrl = searchParams.get("url");

		if (!audioUrl) {
			return new Response("缺少 'url' 查询参数", { status: 400 });
		}

		// 获取音频块。
		const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl);
		let fullTranscript = "";

		// 处理每个块并构建完整的转录。
		for (const chunk of audioChunks) {
			try {
				const transcript = await transcribeChunk(chunk, env);
				fullTranscript += transcript + "\n";
			} catch (error) {
				fullTranscript += "[转录块时出错]\n";
			}
		}

		return new Response(fullTranscript, {
			headers: { "Content-Type": "text/plain" },
		});
	},
} satisfies ExportedHandler<Env>;
```

## 5. 部署您的 Worker

1. **在本地运行 Worker：**

   使用 wrangler 的开发模式在本地测试您的 Worker：

```sh
npx wrangler dev
```

打开您的浏览器并转到 [http://localhost:8787](http://localhost:8787)，或使用 curl：

```sh
curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
```

将 URL 查询参数替换为您的音频文件的直接链接。（对于 GitHub 托管的文件，请确保使用原始文件 URL。）

2. **部署 Worker：**

   测试完成后，使用以下命令部署您的 Worker：

```sh
npx wrangler deploy
```

3. **测试已部署的 Worker：**

   部署后，通过将音频 URL 作为查询参数传递来测试您的 Worker：

```sh
curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"
```

确保将 `<your-worker-subdomain>`、`your-username`、`your-repo` 和 `your-audio-file.mp3` 替换为您的实际详细信息。

如果成功，Worker 将返回音频文件的转录：

```sh
这是音频的转录...
```

---

# 使用 Workers AI 构建面试练习工具

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/build-ai-interview-practice-tool/

import { Render, PackageManagers } from "~/components";

求职面试可能会让人感到压力，而练习是建立信心的关键。虽然与朋友或导师进行的传统模拟面试很有价值，但并非总能在需要时获得。在本教程中，您将学习如何构建一个由 AI 驱动的面试练习工具，该工具可提供实时反馈以帮助提高面试技巧。

在本教程结束时，您将构建一个完整的面试练习工具，具有以下核心功能：

- 使用 WebSocket 连接的实时面试模拟工具
- 将音频转换为文本的 AI 驱动的语音处理管道
- 提供类似面试官互动的智能响应系统
- 使用 Durable Objects 管理面试会话和历史记录的持久存储系统

<Render file="tutorials-before-you-start" product="workers" />
<Render file="prereqs" product="workers" />

### 先决条件

本教程演示了如何使用多个 Cloudflare 产品，虽然许多功能在免费套餐中可用，但 Workers AI 的某些组件可能会产生基于使用量的费用。在继续之前，请查看 Workers AI 的定价文档。

<Render file="ai-local-usage-charges" product="workers" />

## 1. 创建一个新的 Worker 项目

使用 Create Cloudflare CLI (C3) 工具和 Hono 框架创建一个 Cloudflare Workers 项目。

:::note
[Hono](https://hono.dev) 是一个轻量级的 Web 框架，有助于构建 API 端点和处理 HTTP 请求。本教程使用 Hono 来创建和管理应用程序的路由和中间件组件。
:::

通过运行以下命令创建一个新的 Worker 项目，使用 `ai-interview-tool` 作为 Worker 名称：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"ai-interview-tool"}
/>

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "web-framework",
		framework: "Hono",
	}}
/>

要在本地开发和测试您的 Cloudflare Workers 应用程序：

1.  在您的终端中导航到您的 Workers 项目目录：

```sh
cd ai-interview-tool
```

2.  通过运行以下命令启动开发服务器：

```sh
npx wrangler dev
```

当您运行 `wrangler dev` 时，该命令会启动一个本地开发服务器，并提供一个 `localhost` URL，您可以在其中预览您的应用程序。
您现在可以对代码进行更改，并在提供的 localhost 地址上实时查看它们。

## 2. 为面试系统定义 TypeScript 类型

项目设置好后，创建将构成面试系统基础的 TypeScript 类型。这些类型将帮助您维护类型安全，并为应用程序的不同组件提供清晰的接口。

创建一个新的 `types.ts` 文件，其中将包含以下内容的基本类型和枚举：

- 可以评估的面试技能（JavaScript、React 等）
- 不同的面试职位（初级开发人员、高级开发人员等）
- 面试状态跟踪
- 用户与 AI 之间的消息处理
- 核心面试数据结构

```typescript title="src/types.ts"
import { Context } from "hono";

// API 端点的上下文类型，包括环境绑定和用户信息
export interface ApiContext {
	Bindings: CloudflareBindings;
	Variables: {
		username: string;
	};
}

export type HonoCtx = Context<ApiContext>;

// 您可以在模拟面试期间评估的技术技能列表。
// 此应用程序侧重于在真实面试中通常测试的流行 Web 技术和编程语言。
export enum InterviewSkill {
	JavaScript = "JavaScript",
	TypeScript = "TypeScript",
	React = "React",
	NodeJS = "NodeJS",
	Python = "Python",
}

// 基于不同工程职位的可用面试类型。
// 这有助于根据候选人的目标职位定制面试体验和问题。
export enum InterviewTitle {
	JuniorDeveloper = "初级开发人员面试",
	SeniorDeveloper = "高级开发人员面试",
	FullStackDeveloper = "全栈开发人员面试",
	FrontendDeveloper = "前端开发人员面试",
	BackendDeveloper = "后端开发人员面试",
	SystemArchitect = "系统架构师面试",
	TechnicalLead = "技术主管面试",
}

// 跟踪面试会话的当前状态。
// 这将帮助您管理面试流程，并在流程的每个阶段显示适当的 UI/操作。
export enum InterviewStatus {
	Created = "created", // 面试已创建但未开始
	Pending = "pending", // 等待面试官/系统
	InProgress = "in_progress", // 进行中的面试会话
	Completed = "completed", // 面试成功完成
	Cancelled = "cancelled", // 面试提前终止
}

// 定义在面试聊天中发送消息的人
export type MessageRole = "user" | "assistant" | "system";

// 面试期间交换的单个消息的结构
export interface Message {
	messageId: string; // 消息的唯一标识符
	interviewId: string; // 将消息链接到特定面试
	role: MessageRole; // 谁发送了消息
	content: string; // 实际消息内容
	timestamp: number; // 消息发送时间
}

// 保存有关面试会话的所有信息的主要数据结构。
// 这包括元数据、交换的消息和当前状态。
export interface InterviewData {
	interviewId: string;
	title: InterviewTitle;
	skills: InterviewSkill[];
	messages: Message[];
	status: InterviewStatus;
	createdAt: number;
	updatedAt: number;
}

// 创建新面试会话的输入格式。
// 简化接口，接受开始面试所需的基本参数。
export interface InterviewInput {
	title: string;
	skills: string[];
}
```

## 3. 为不同服务配置错误类型

接下来，设置自定义错误类型以处理应用程序中可能发生的不同类型的错误。这包括：

- 数据库错误（例如，连接问题、查询失败）
- 与面试相关的错误（例如，无效输入、转录失败）
- 身份验证错误（例如，无效会话）

创建以下 `errors.ts` 文件：

```typescript title="src/errors.ts"
export const ErrorCodes = {
	INVALID_MESSAGE: "INVALID_MESSAGE",
	TRANSCRIPTION_FAILED: "TRANSCRIPTION_FAILED",
	LLM_FAILED: "LLM_FAILED",
	DATABASE_ERROR: "DATABASE_ERROR",
} as const;

export class AppError extends Error {
	constructor(
		message: string,
		public statusCode: number,
	) {
		super(message);
		this.name = this.constructor.name;
	}
}

export class UnauthorizedError extends AppError {
	constructor(message: string) {
		super(message, 401);
	}
}

export class BadRequestError extends AppError {
	constructor(message: string) {
		super(message, 400);
	}
}

export class NotFoundError extends AppError {
	constructor(message: string) {
		super(message, 404);
	}
}

export class InterviewError extends Error {
	constructor(
		message: string,
		public code: string,
		public statusCode: number = 500,
	) {
		super(message);
		this.name = "InterviewError";
	}
}
```

## 4. 配置身份验证中间件和用户路由

在此步骤中，您将实现一个基本的身份验证系统，以跟踪和识别与您的 AI 面试练习工具交互的用户。该系统使用仅 HTTP 的 cookie 来存储用户名，使您能够识别请求发送者及其相应的 Durable Object。这种直接的身份验证方法要求用户提供一个用户名，然后将其安全地存储在 cookie 中。这种方法使您能够：

- 跨请求识别用户
- 将面试会话与特定用户关联
- 保护对与面试相关的端点的访问

### 创建身份验证中间件

创建一个中间件函数，用于检查是否存在有效的身份验证 cookie。此中间件将用于保护需要身份验证的路由。

创建一个新的中间件文件 `middleware/auth.ts`：

```typescript title="src/middleware/auth.ts"
import { Context } from "hono";
import { getCookie } from "hono/cookie";
import { UnauthorizedError } from "../errors";

export const requireAuth = async (ctx: Context, next: () => Promise<void>) => {
	// Get username from cookie
	const username = getCookie(ctx, "username");

	if (!username) {
		throw new UnauthorizedError("User is not logged in");
	}

	// Make username available to route handlers
	ctx.set("username", username);
	await next();
};
```

This middleware:

- Checks for a `username` cookie
- Throws an `Error` if the cookie is missing
- Makes the username available to downstream handlers via the context

### Create Authentication Routes

Next, create the authentication routes that will handle user login. Create a new file `routes/auth.ts`:

```typescript title="src/routes/auth.ts"
import { Context, Hono } from "hono";
import { setCookie } from "hono/cookie";
import { BadRequestError } from "../errors";
import { ApiContext } from "../types";

export const authenticateUser = async (ctx: Context) => {
	// Extract username from request body
	const { username } = await ctx.req.json();

	// Make sure username was provided
	if (!username) {
		throw new BadRequestError("Username is required");
	}

	// Create a secure cookie to track the user's session
	// This cookie will:
	// - Be HTTP-only for security (no JS access)
	// - Work across all routes via path="/"
	// - Last for 24 hours
	// - Only be sent in same-site requests to prevent CSRF
	setCookie(ctx, "username", username, {
		httpOnly: true,
		path: "/",
		maxAge: 60 * 60 * 24,
		sameSite: "Strict",
	});

	// Let the client know login was successful
	return ctx.json({ success: true });
};

// Set up authentication-related routes
export const configureAuthRoutes = () => {
	const router = new Hono<ApiContext>();

	// POST /login - Authenticate user and create session
	router.post("/login", authenticateUser);

	return router;
};
```

Finally, update main application file to include the authentication routes. Modify `src/index.ts`:

```typescript title="src/index.ts"
import { configureAuthRoutes } from "./routes/auth";
import { Hono } from "hono";
import { logger } from "hono/logger";
import type { ApiContext } from "./types";
import { requireAuth } from "./middleware/auth";

// Create our main Hono app instance with proper typing
const app = new Hono<ApiContext>();

// Create a separate router for API endpoints to keep things organized
const api = new Hono<ApiContext>();

// Set up global middleware that runs on every request
// - Logger gives us visibility into what is happening
app.use("*", logger());

// Wire up all our authentication routes (login, etc)
// These will be mounted under /api/v1/auth/
api.route("/auth", configureAuthRoutes());

// Mount all API routes under the version prefix (for example, /api/v1)
// This allows us to make breaking changes in v2 without affecting v1 users
app.route("/api/v1", api);

export default app;
```

Now we have a basic authentication system that:

1. Provides a login endpoint at `/api/v1/auth/login`
2. Securely stores the username in a cookie
3. Includes middleware to protect authenticated routes

## 5. Create a Durable Object to manage interviews

Now that you have your authentication system in place, create a Durable Object to manage interview sessions. Durable Objects are perfect for this interview practice tool because they provide the following functionalities:

- Maintains states between connections, so users can reconnect without losing progress.
- Provides a SQLite database to store all interview Q&A, feedback and metrics.
- Enables smooth real-time interactions between the interviewer AI and candidate.
- Handles multiple interview sessions efficiently without performance issues.
- Creates a dedicated instance for each user, giving them their own isolated environment.

First, you will need to configure the Durable Object in Wrangler file. Add the following configuration:

```toml title="wrangler.toml"
[[durable_objects.bindings]]
name = "INTERVIEW"
class_name = "Interview"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["Interview"]
```

Next, create a new file `interview.ts` to define our Interview Durable Object:

```typescript title="src/interview.ts"
import { DurableObject } from "cloudflare:workers";

export class Interview extends DurableObject<CloudflareBindings> {
	// We will use it to keep track of all active WebSocket connections for real-time communication
	private sessions: Map<WebSocket, { interviewId: string }>;

	constructor(state: DurableObjectState, env: CloudflareBindings) {
		super(state, env);

		// Initialize empty sessions map - we will add WebSocket connections as users join
		this.sessions = new Map();
	}

	// Entry point for all HTTP requests to this Durable Object
	// This will handle both initial setup and WebSocket upgrades
	async fetch(request: Request) {
		// For now, just confirm the object is working
		// We'll add WebSocket upgrade logic and request routing later
		return new Response("Interview object initialized");
	}

	// Broadcasts a message to all connected WebSocket clients.
	private broadcast(message: string) {
		this.ctx.getWebSockets().forEach((ws) => {
			try {
				if (ws.readyState === WebSocket.OPEN) {
					ws.send(message);
				}
			} catch (error) {
				console.error(
					"Error broadcasting message to a WebSocket client:",
					error,
				);
			}
		});
	}
}
```

Now we need to export the Durable Object in our main `src/index.ts` file:

```typescript title="src/index.ts"
import { Interview } from "./interview";

// ... previous code ...

export { Interview };

export default app;
```

Since the Worker code is written in TypeScript, you should run the following command to add the necessary type definitions:

```sh
npm run cf-typegen
```

### Set up SQLite database schema to store interview data

Now you will use SQLite at the Durable Object level for data persistence. This gives each user their own isolated database instance. You will need two main tables:

- `interviews`: Stores interview session data
- `messages`: Stores all messages exchanged during interviews

Before you create these tables, create a service class to handle your database operations. This encapsulates database logic and helps you:

- Manage database schema changes
- Handle errors consistently
- Keep database queries organized

Create a new file called `services/InterviewDatabaseService.ts`:

```typescript title="src/services/InterviewDatabaseService.ts"
import {
	InterviewData,
	Message,
	InterviewStatus,
	InterviewTitle,
	InterviewSkill,
} from "../types";
import { InterviewError, ErrorCodes } from "../errors";

const CONFIG = {
	database: {
		tables: {
			interviews: "interviews",
			messages: "messages",
		},
		indexes: {
			messagesByInterview: "idx_messages_interviewId",
		},
	},
} as const;

export class InterviewDatabaseService {
	constructor(private sql: SqlStorage) {}

	/**
	 * Sets up the database schema by creating tables and indexes if they do not exist.
	 * This is called when initializing a new Durable Object instance to ensure
	 * we have the required database structure.
	 *
	 * The schema consists of:
	 * - interviews table: Stores interview metadata like title, skills, and status
	 * - messages table: Stores the conversation history between user and AI
	 * - messages index: Helps optimize queries when fetching messages for a specific interview
	 */
	createTables() {
		try {
			// Get list of existing tables to avoid recreating them
			const cursor = this.sql.exec(`PRAGMA table_list`);
			const existingTables = new Set([...cursor].map((table) => table.name));

			// The interviews table is our main table storing interview sessions.
			// We only create it if it does not exist yet.
			if (!existingTables.has(CONFIG.database.tables.interviews)) {
				this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_INTERVIEWS_TABLE);
			}

			// The messages table stores the actual conversation history.
			// It references interviews table via foreign key for data integrity.
			if (!existingTables.has(CONFIG.database.tables.messages)) {
				this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_MESSAGES_TABLE);
			}

			// Add an index on interviewId to speed up message retrieval.
			// This is important since we will frequently query messages by interview.
			this.sql.exec(InterviewDatabaseService.QUERIES.CREATE_MESSAGE_INDEX);
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to initialize database: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	private static readonly QUERIES = {
		CREATE_INTERVIEWS_TABLE: `
      CREATE TABLE IF NOT EXISTS interviews (
        interviewId TEXT PRIMARY KEY,
        title TEXT NOT NULL,
        skills TEXT NOT NULL,
        createdAt INTEGER NOT NULL DEFAULT (strftime('%s','now') * 1000),
        updatedAt INTEGER NOT NULL DEFAULT (strftime('%s','now') * 1000),
        status TEXT NOT NULL DEFAULT 'pending'
      )
    `,
		CREATE_MESSAGES_TABLE: `
      CREATE TABLE IF NOT EXISTS messages (
        messageId TEXT PRIMARY KEY,
        interviewId TEXT NOT NULL,
        role TEXT NOT NULL,
        content TEXT NOT NULL,
        timestamp INTEGER NOT NULL,
        FOREIGN KEY (interviewId) REFERENCES interviews(interviewId)
      )
    `,
		CREATE_MESSAGE_INDEX: `
      CREATE INDEX IF NOT EXISTS idx_messages_interview ON messages(interviewId)
    `,
	};
}
```

Update the `Interview` Durable Object to use the database service by modifying `src/interview.ts`:

```typescript title="src/interview.ts"
import { InterviewDatabaseService } from "./services/InterviewDatabaseService";

export class Interview extends DurableObject<CloudflareBindings> {
	// Database service for persistent storage of interview data and messages
	private readonly db: InterviewDatabaseService;
	private sessions: Map<WebSocket, { interviewId: string }>;

	constructor(state: DurableObjectState, env: CloudflareBindings) {
		// ... previous code ...
		// Set up our database connection using the DO's built-in SQLite instance
		this.db = new InterviewDatabaseService(state.storage.sql);
		// First-time setup: ensure our database tables exist
		// This is idempotent so safe to call on every instantiation
		this.db.createTables();
	}
}
```

Add methods to create and retrieve interviews in `services/InterviewDatabaseService.ts`:

```typescript title="src/services/InterviewDatabaseService.ts"
export class InterviewDatabaseService {
	/**
	 * Creates a new interview session in the database.
	 *
	 * This is the main entry point for starting a new interview. It handles all the
	 * initial setup like:
	 * - Generating a unique ID using crypto.randomUUID() for reliable uniqueness
	 * - Recording the interview title and required skills
	 * - Setting up timestamps for tracking interview lifecycle
	 * - Setting the initial status to "Created"
	 *
	 */
	createInterview(title: InterviewTitle, skills: InterviewSkill[]): string {
		try {
			const interviewId = crypto.randomUUID();
			const currentTime = Date.now();

			this.sql.exec(
				InterviewDatabaseService.QUERIES.INSERT_INTERVIEW,
				interviewId,
				title,
				JSON.stringify(skills), // Store skills as JSON for flexibility
				InterviewStatus.Created,
				currentTime,
				currentTime,
			);

			return interviewId;
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to create interview: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	/**
	 * Fetches all interviews from the database, ordered by creation date.
	 *
	 * This is useful for displaying interview history and letting users
	 * resume previous sessions. We order by descending creation date since
	 * users typically want to see their most recent interviews first.
	 *
	 * Returns an array of InterviewData objects with full interview details
	 * including metadata and message history.
	 */
	getAllInterviews(): InterviewData[] {
		try {
			const cursor = this.sql.exec(
				InterviewDatabaseService.QUERIES.GET_ALL_INTERVIEWS,
			);

			return [...cursor].map(this.parseInterviewRecord);
		} catch (error) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to retrieve interviews: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	// Retrieves an interview and its messages by ID
	getInterview(interviewId: string): InterviewData | null {
		try {
			const cursor = this.sql.exec(
				InterviewDatabaseService.QUERIES.GET_INTERVIEW,
				interviewId,
			);

			const record = [...cursor][0];
			if (!record) return null;

			return this.parseInterviewRecord(record);
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to retrieve interview: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	addMessage(
		interviewId: string,
		role: Message["role"],
		content: string,
		messageId: string,
	): Message {
		try {
			const timestamp = Date.now();

			this.sql.exec(
				InterviewDatabaseService.QUERIES.INSERT_MESSAGE,
				messageId,
				interviewId,
				role,
				content,
				timestamp,
			);

			return {
				messageId,
				interviewId,
				role,
				content,
				timestamp,
			};
		} catch (error: unknown) {
			const message = error instanceof Error ? error.message : String(error);
			throw new InterviewError(
				`Failed to add message: ${message}`,
				ErrorCodes.DATABASE_ERROR,
			);
		}
	}

	/**
	 * Transforms raw database records into structured InterviewData objects.
	 *
	 * This helper does the heavy lifting of:
	 * - Type checking critical fields to catch database corruption early
	 * - Converting stored JSON strings back into proper objects
	 * - Filtering out any null messages that might have snuck in
	 * - Ensuring timestamps are proper numbers
	 *
	 * If any required data is missing or malformed, it throws an error
	 * rather than returning partially valid data that could cause issues
	 * downstream.
	 */
	private parseInterviewRecord(record: any): InterviewData {
		const interviewId = record.interviewId as string;
		const createdAt = Number(record.createdAt);
		const updatedAt = Number(record.updatedAt);

		if (!interviewId || !createdAt || !updatedAt) {
			throw new InterviewError(
				"Invalid interview data in database",
				ErrorCodes.DATABASE_ERROR,
			);
		}

		return {
			interviewId,
			title: record.title as InterviewTitle,
			skills: JSON.parse(record.skills as string) as InterviewSkill[],
			messages: record.messages
				? JSON.parse(record.messages)
						.filter((m: any) => m !== null)
						.map((m: any) => ({
							messageId: m.messageId,
							role: m.role,
							content: m.content,
							timestamp: m.timestamp,
						}))
				: [],
			status: record.status as InterviewStatus,
			createdAt,
			updatedAt,
		};
	}

	// Add these SQL queries to the QUERIES object
	private static readonly QUERIES = {
		// ... previous queries ...

		INSERT_INTERVIEW: `
      INSERT INTO ${CONFIG.database.tables.interviews}
      (interviewId, title, skills, status, createdAt, updatedAt)
      VALUES (?, ?, ?, ?, ?, ?)
    `,

		GET_ALL_INTERVIEWS: `
      SELECT
        interviewId,
        title,
        skills,
        createdAt,
        updatedAt,
        status
      FROM ${CONFIG.database.tables.interviews}
      ORDER BY createdAt DESC
    `,

		INSERT_MESSAGE: `
      INSERT INTO ${CONFIG.database.tables.messages}
      (messageId, interviewId, role, content, timestamp)
      VALUES (?, ?, ?, ?, ?)
    `,

		GET_INTERVIEW: `
      SELECT
        i.interviewId,
        i.title,
        i.skills,
        i.status,
        i.createdAt,
        i.updatedAt,
        COALESCE(
          json_group_array(
            CASE WHEN m.messageId IS NOT NULL THEN
              json_object(
                'messageId', m.messageId,
                'role', m.role,
                'content', m.content,
                'timestamp', m.timestamp
              )
            END
          ),
          '[]'
        ) as messages
      FROM ${CONFIG.database.tables.interviews} i
      LEFT JOIN ${CONFIG.database.tables.messages} m ON i.interviewId = m.interviewId
      WHERE i.interviewId = ?
      GROUP BY i.interviewId
    `,
	};
}
```

Add RPC methods to the `Interview` Durable Object to expose database operations through API. Add this code to `src/interview.ts`:

```typescript title="src/interview.ts"
import {
	InterviewData,
	InterviewTitle,
	InterviewSkill,
	Message,
} from "./types";

export class Interview extends DurableObject<CloudflareBindings> {
	// Creates a new interview session
	createInterview(title: InterviewTitle, skills: InterviewSkill[]): string {
		return this.db.createInterview(title, skills);
	}

	// Retrieves all interview sessions
	getAllInterviews(): InterviewData[] {
		return this.db.getAllInterviews();
	}

	// Adds a new message to the 'messages' table and broadcasts it to all connected WebSocket clients.
	addMessage(
		interviewId: string,
		role: "user" | "assistant",
		content: string,
		messageId: string,
	): Message {
		const newMessage = this.db.addMessage(
			interviewId,
			role,
			content,
			messageId,
		);
		this.broadcast(
			JSON.stringify({
				...newMessage,
				type: "message",
			}),
		);
		return newMessage;
	}
}
```

## 6. Create REST API endpoints

With your Durable Object and database service ready, create REST API endpoints to manage interviews. You will need endpoints to:

- Create new interviews
- Retrieve all interviews for a user

Create a new file for your interview routes at `routes/interview.ts`:

```typescript title="src/routes/interview.ts"
import { Hono } from "hono";
import { BadRequestError } from "../errors";
import {
	InterviewInput,
	ApiContext,
	HonoCtx,
	InterviewTitle,
	InterviewSkill,
} from "../types";
import { requireAuth } from "../middleware/auth";

/**
 * Gets the Interview Durable Object instance for a given user.
 * We use the username as a stable identifier to ensure each user
 * gets their own dedicated DO instance that persists across requests.
 */
const getInterviewDO = (ctx: HonoCtx) => {
	const username = ctx.get("username");
	const id = ctx.env.INTERVIEW.idFromName(username);
	return ctx.env.INTERVIEW.get(id);
};

/**
 * Validates the interview creation payload.
 * Makes sure we have all required fields in the correct format:
 * - title must be present
 * - skills must be a non-empty array
 * Throws an error if validation fails.
 */
const validateInterviewInput = (input: InterviewInput) => {
	if (
		!input.title ||
		!input.skills ||
		!Array.isArray(input.skills) ||
		input.skills.length === 0
	) {
		throw new BadRequestError("Invalid input");
	}
};

/**
 * GET /interviews
 * Retrieves all interviews for the authenticated user.
 * The interviews are stored and managed by the user's DO instance.
 */
const getAllInterviews = async (ctx: HonoCtx) => {
	const interviewDO = getInterviewDO(ctx);
	const interviews = await interviewDO.getAllInterviews();
	return ctx.json(interviews);
};

/**
 * POST /interviews
 * Creates a new interview session with the specified title and skills.
 * Each interview gets a unique ID that can be used to reference it later.
 * Returns the newly created interview ID on success.
 */
const createInterview = async (ctx: HonoCtx) => {
	const body = await ctx.req.json<InterviewInput>();
	validateInterviewInput(body);

	const interviewDO = getInterviewDO(ctx);
	const interviewId = await interviewDO.createInterview(
		body.title as InterviewTitle,
		body.skills as InterviewSkill[],
	);

	return ctx.json({ success: true, interviewId });
};

/**
 * Sets up all interview-related routes.
 * Currently supports:
 * - GET / : List all interviews
 * - POST / : Create a new interview
 */
export const configureInterviewRoutes = () => {
	const router = new Hono<ApiContext>();
	router.use("*", requireAuth);
	router.get("/", getAllInterviews);
	router.post("/", createInterview);
	return router;
};
```

The `getInterviewDO` helper function uses the username from our authentication cookie to create a unique Durable Object ID. This ensures each user has their own isolated interview state.

Update your main application file to include the routes and protect them with authentication middleware. Update `src/index.ts`:

```typescript title="src/index.ts"
import { configureAuthRoutes } from "./routes/auth";
import { configureInterviewRoutes } from "./routes/interview";
import { Hono } from "hono";
import { Interview } from "./interview";
import { logger } from "hono/logger";
import type { ApiContext } from "./types";

const app = new Hono<ApiContext>();
const api = new Hono<ApiContext>();

app.use("*", logger());

api.route("/auth", configureAuthRoutes());
api.route("/interviews", configureInterviewRoutes());

app.route("/api/v1", api);

export { Interview };
export default app;
```

Now you have two new API endpoints:

- `POST /api/v1/interviews`: Creates a new interview session
- `GET /api/v1/interviews`: Retrieves all interviews for the authenticated user

You can test these endpoints running the following command:

1. Create a new interview:

```sh
curl -X POST http://localhost:8787/api/v1/interviews \
-H "Content-Type: application/json" \
-H "Cookie: username=testuser; HttpOnly" \
-d '{"title":"Frontend Developer Interview","skills":["JavaScript","React","CSS"]}'
```

2. Get all interviews:

```sh
curl http://localhost:8787/api/v1/interviews \
-H "Cookie: username=testuser; HttpOnly"
```

## 7. Set up WebSockets to handle real-time communication

With the basic interview management system in place, you will now implement Durable Objects to handle real-time message processing and maintain WebSocket connections.

Update the `Interview` Durable Object to handle WebSocket connections by adding the following code to `src/interview.ts`:

```typescript title="src/interview.ts"
export class Interview extends DurableObject<CloudflareBindings> {
	// Services for database operations and managing WebSocket sessions
	private readonly db: InterviewDatabaseService;
	private sessions: Map<WebSocket, { interviewId: string }>;

	constructor(state: DurableObjectState, env: CloudflareBindings) {
		// ... previous code ...

		// Keep WebSocket connections alive by automatically responding to pings
		// This prevents timeouts and connection drops
		this.ctx.setWebSocketAutoResponse(
			new WebSocketRequestResponsePair("ping", "pong"),
		);
	}

	async fetch(request: Request): Promise<Response> {
		// Check if this is a WebSocket upgrade request
		const upgradeHeader = request.headers.get("Upgrade");
		if (upgradeHeader?.toLowerCase().includes("websocket")) {
			return this.handleWebSocketUpgrade(request);
		}

		// If it is not a WebSocket request, we don't handle it
		return new Response("Not found", { status: 404 });
	}

	private async handleWebSocketUpgrade(request: Request): Promise<Response> {
		// Extract the interview ID from the URL - it should be the last segment
		const url = new URL(request.url);
		const interviewId = url.pathname.split("/").pop();

		if (!interviewId) {
			return new Response("Missing interviewId parameter", { status: 400 });
		}

		// Create a new WebSocket connection pair - one for the client, one for the server
		const pair = new WebSocketPair();
		const [client, server] = Object.values(pair);

		// Keep track of which interview this WebSocket is connected to
		// This is important for routing messages to the right interview session
		this.sessions.set(server, { interviewId });

		// Tell the Durable Object to start handling this WebSocket
		this.ctx.acceptWebSocket(server);

		// Send the current interview state to the client right away
		// This helps initialize their UI with the latest data
		const interviewData = await this.db.getInterview(interviewId);
		if (interviewData) {
			server.send(
				JSON.stringify({
					type: "interview_details",
					data: interviewData,
				}),
			);
		}

		// Return the client WebSocket as part of the upgrade response
		return new Response(null, {
			status: 101,
			webSocket: client,
		});
	}

	async webSocketClose(
		ws: WebSocket,
		code: number,
		reason: string,
		wasClean: boolean,
	) {
		// Clean up when a connection closes to prevent memory leaks
		// This is especially important in long-running Durable Objects
		console.log(
			`WebSocket closed: Code ${code}, Reason: ${reason}, Clean: ${wasClean}`,
		);
	}
}
```

Next, update the interview routes to include a WebSocket endpoint. Add the following to `routes/interview.ts`:

```typescript title="src/routes/interview.ts"
// ... previous code ...
const streamInterviewProcess = async (ctx: HonoCtx) => {
	const interviewDO = getInterviewDO(ctx);
	return await interviewDO.fetch(ctx.req.raw);
};

export const configureInterviewRoutes = () => {
	const router = new Hono<ApiContext>();
	router.get("/", getAllInterviews);
	router.post("/", createInterview);
	// Add WebSocket route
	router.get("/:interviewId", streamInterviewProcess);
	return router;
};
```

The WebSocket system provides real-time communication features for interview practice tool:

- Each interview session gets its own dedicated WebSocket connection, allowing seamless communication between the candidate and AI interviewer
- The Durable Object maintains the connection state, ensuring no messages are lost even if the client temporarily disconnects
- To keep connections stable, it automatically responds to ping messages with pongs, preventing timeouts
- Candidates and interviewers receive instant updates as the interview progresses, creating a natural conversational flow

## 8. Add audio processing capabilities with Workers AI

Now that WebSocket connection set up, the next step is to add speech-to-text capabilities using Workers AI. Let's use Cloudflare's Whisper model to transcribe audio in real-time during the interview.

The audio processing pipeline will work like this:

1. Client sends audio through the WebSocket connection
2. Our Durable Object receives the binary audio data
3. We pass the audio to Whisper for transcription
4. The transcribed text is saved as a new message
5. We immediately send the transcription back to the client
6. The client receives a notification that the AI interviewer is generating a response

### Create audio processing pipeline

In this step you will update the Interview Durable Object to handle the following:

1. Detect binary audio data sent through WebSocket
2. Create a unique message ID for tracking the processing status
3. Notify clients that audio processing has begun
4. Include error handling for failed audio processing
5. Broadcast status updates to all connected clients

First, update Interview Durable Object to handle binary WebSocket messages. Add the following methods to your `src/interview.ts` file:

```typescript title="src/interview.ts"
// ... previous code ...
/**
 * Handles incoming WebSocket messages, both binary audio data and text messages.
 * This is the main entry point for all WebSocket communication.
 */
async webSocketMessage(ws: WebSocket, eventData: ArrayBuffer | string): Promise<void> {
  try {
    // Handle binary audio data from the client's microphone
    if (eventData instanceof ArrayBuffer) {
      await this.handleBinaryAudio(ws, eventData);
      return;
    }
    // Text messages will be handled by other methods
  } catch (error) {
    this.handleWebSocketError(ws, error);
  }
}

/**
 * Processes binary audio data received from the client.
 * Converts audio to text using Whisper and broadcasts processing status.
 */
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise<void> {
  try {
    const uint8Array = new Uint8Array(audioData);

    // Retrieve the associated interview session
    const session = this.sessions.get(ws);
    if (!session?.interviewId) {
      throw new Error("No interview session found");
    }

    // Generate unique ID to track this message through the system
    const messageId = crypto.randomUUID();

    // Let the client know we're processing their audio
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "user",
        messageId,
        interviewId: session.interviewId,
      }),
    );

    // TODO: Implement Whisper transcription in next section
    // For now, just log the received audio data size
    console.log(`Received audio data of length: ${uint8Array.length}`);
  } catch (error) {
    console.error("Audio processing failed:", error);
    this.handleWebSocketError(ws, error);
  }
}

/**
 * Handles WebSocket errors by logging them and notifying the client.
 * Ensures errors are properly communicated back to the user.
 */
private handleWebSocketError(ws: WebSocket, error: unknown): void {
  const errorMessage = error instanceof Error ? error.message : "An unknown error occurred.";
  console.error("WebSocket error:", errorMessage);

  if (ws.readyState === WebSocket.OPEN) {
    ws.send(
      JSON.stringify({
        type: "error",
        message: errorMessage,
      }),
    );
  }
}

```

Your `handleBinaryAudio` method currently logs when it receives audio data. Next, you'll enhance it to transcribe speech using Workers AI's Whisper model.

### Configure speech-to-text

Now that audio processing pipeline is set up, you will now integrate Workers AI's Whisper model for speech-to-text transcription.

Configure the Worker AI binding in your Wrangler file by adding:

```toml
# ... previous configuration ...
[ai]
binding = "AI"
```

Next, generate TypeScript types for our AI binding. Run the following command:

```sh
npm run cf-typegen
```

You will need a new service class for AI operations. Create a new file called `services/AIService.ts`:

```typescript title="src/services/AIService.ts"
import { InterviewError, ErrorCodes } from "../errors";

export class AIService {
	constructor(private readonly AI: Ai) {}

	async transcribeAudio(audioData: Uint8Array): Promise<string> {
		try {
			// Call the Whisper model to transcribe the audio
			const response = await this.AI.run("@cf/openai/whisper-tiny-en", {
				audio: Array.from(audioData),
			});

			if (!response?.text) {
				throw new Error("Failed to transcribe audio content.");
			}

			return response.text;
		} catch (error) {
			throw new InterviewError(
				"Failed to transcribe audio content",
				ErrorCodes.TRANSCRIPTION_FAILED,
			);
		}
	}
}
```

You will need to update the `Interview` Durable Object to use this new AI service. To do this, update the handleBinaryAudio method in `src/interview.ts`:

```typescript title="src/interview.ts"
import { AIService } from "./services/AIService";

export class Interview extends DurableObject<CloudflareBindings> {
private readonly aiService: AIService;

constructor(state: DurableObjectState, env: Env) {
  // ... previous code ...

  // Initialize the AI service with the Workers AI binding
  this.aiService = new AIService(this.env.AI);
}

private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise<void> {
  try {
    const uint8Array = new Uint8Array(audioData);
    const session = this.sessions.get(ws);

    if (!session?.interviewId) {
      throw new Error("No interview session found");
    }

    // Create a message ID for tracking
    const messageId = crypto.randomUUID();

    // Send processing state to client
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "user",
        messageId,
        interviewId: session.interviewId,
      }),
    );

    // NEW: Use AI service to transcribe the audio
    const transcribedText = await this.aiService.transcribeAudio(uint8Array);

    // Store the transcribed message
    await this.addMessage(session.interviewId, "user", transcribedText, messageId);

  } catch (error) {
    console.error("Audio processing failed:", error);
    this.handleWebSocketError(ws, error);
  }
}
```

:::note
The Whisper model `@cf/openai/whisper-tiny-en` is optimized for English speech recognition. If you need support for other languages, you can use different Whisper model variants available through Workers AI.
:::

When users speak during the interview, their audio will be automatically transcribed and stored as messages in the interview session. The transcribed text will be immediately available to both the user and the AI interviewer for generating appropriate responses.

## 9. Integrate AI response generation

Now that you have audio transcription working, let's implement AI interviewer response generation using Workers AI's LLM capabilities. You'll create an interview system that:

- Maintains context of the conversation
- Provides relevant follow-up questions
- Gives constructive feedback
- Stays in character as a professional interviewer

### Set up Workers AI LLM integration

First, update the `AIService` class to handle LLM interactions. You will need to add methods for:

- Processing interview context
- Generating appropriate responses
- Handling conversation flow

Update the `services/AIService.ts` class to include LLM functionality:

```typescript title="src/services/AIService.ts"
import { InterviewData, Message } from "../types";

export class AIService {

async processLLMResponse(interview: InterviewData): Promise<string> {
  const messages = this.prepareLLMMessages(interview);

  try {
    const { response } = await this.AI.run("@cf/meta/llama-2-7b-chat-int8", {
      messages,
    });

    if (!response) {
      throw new Error("Failed to generate a response from the LLM model.");
    }

    return response;
  } catch (error) {
    throw new InterviewError("Failed to generate a response from the LLM model.", ErrorCodes.LLM_FAILED);
  }
}

private prepareLLMMessages(interview: InterviewData) {
  const messageHistory = interview.messages.map((msg: Message) => ({
    role: msg.role,
    content: msg.content,
  }));

  return [
    {
      role: "system",
      content: this.createSystemPrompt(interview),
    },
    ...messageHistory,
  ];
}
```

:::note
The @cf/meta/llama-2-7b-chat-int8 model is optimized for chat-like interactions and provides good performance while maintaining reasonable resource usage.
:::

### Create the conversation prompt

Prompt engineering is crucial for getting high-quality responses from the LLM. Next, you will create a system prompt that:

- Sets the context for the interview
- Defines the interviewer's role and behavior
- Specifies the technical focus areas
- Guides the conversation flow

Add the following method to your `services/AIService.ts` class:

```typescript title="src/services/AIService.ts"
private createSystemPrompt(interview: InterviewData): string {
  const basePrompt = "You are conducting a technical interview.";
  const rolePrompt = `The position is for ${interview.title}.`;
  const skillsPrompt = `Focus on topics related to: ${interview.skills.join(", ")}.`;
  const instructionsPrompt = "Ask relevant technical questions and provide constructive feedback.";

  return `${basePrompt} ${rolePrompt} ${skillsPrompt} ${instructionsPrompt}`;
}
```

### Implement response generation logic

Finally, integrate the LLM response generation into the interview flow. Update the `handleBinaryAudio` method in the `src/interview.ts` Durable Object to:

- Process transcribed user responses
- Generate appropriate AI interviewer responses
- Maintain conversation context

Update the `handleBinaryAudio` method in `src/interview.ts`:

```typescript title="src/interview.ts"
private async handleBinaryAudio(ws: WebSocket, audioData: ArrayBuffer): Promise<void> {
  try {
    // Convert raw audio buffer to uint8 array for processing
    const uint8Array = new Uint8Array(audioData);
    const session = this.sessions.get(ws);

    if (!session?.interviewId) {
      throw new Error("No interview session found");
    }

    // Generate a unique ID to track this message through the system
    const messageId = crypto.randomUUID();

    // Let the client know we're processing their audio
    // This helps provide immediate feedback while transcription runs
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "user",
        messageId,
        interviewId: session.interviewId,
      }),
    );

    // Convert the audio to text using our AI transcription service
    // This typically takes 1-2 seconds for normal speech
    const transcribedText = await this.aiService.transcribeAudio(uint8Array);

    // Save the user's message to our database so we maintain chat history
    await this.addMessage(session.interviewId, "user", transcribedText, messageId);

    // Look up the full interview context - we need this to generate a good response
    const interview = await this.db.getInterview(session.interviewId);
    if (!interview) {
      throw new Error(`Interview not found: ${session.interviewId}`);
    }

    // Now it's the AI's turn to respond
    // First generate an ID for the assistant's message
    const assistantMessageId = crypto.randomUUID();

    // Let the client know we're working on the AI response
    this.broadcast(
      JSON.stringify({
        type: "message",
        status: "processing",
        role: "assistant",
        messageId: assistantMessageId,
        interviewId: session.interviewId,
      }),
    );

    // Generate the AI interviewer's response based on the conversation history
    const llmResponse = await this.aiService.processLLMResponse(interview);
    await this.addMessage(session.interviewId, "assistant", llmResponse, assistantMessageId);
  } catch (error) {
    // Something went wrong processing the audio or generating a response
    // Log it and let the client know there was an error
    console.error("Audio processing failed:", error);
    this.handleWebSocketError(ws, error);
  }
}
```

## Conclusion

You have successfully built an AI-powered interview practice tool using Cloudflare's Workers AI. In summary, you have:

- Created a real-time WebSocket communication system using Durable Objects
- Implemented speech-to-text processing with Workers AI Whisper model
- Built an intelligent interview system using Workers AI LLM capabilities
- Designed a persistent storage system with SQLite in Durable Objects

The complete source code for this tutorial is available on GitHub:
[ai-interview-practice-tool](https://github.com/berezovyy/ai-interview-practice-tool)

---

# 使用 DeepSeek Coder 模型探索代码生成

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/explore-code-generation-using-deepseek-coder-models/

import { Stream } from "~/components";

探索 [Workers AI](/workers-ai) 上所有可用模型的一个便捷方法是使用 [Jupyter Notebook](https://jupyter.org/)。

您可以[下载 DeepSeek Coder 笔记本](/workers-ai/static/documentation/notebooks/deepseek-coder-exploration.ipynb)或查看下面嵌入的笔记本。

<Stream
	id="97b46763341a395a4ce1c0a6f913662b"
	title="Explore Code Generation Using DeepSeek Coder Models"
/>

[comment]: <> "下面的 markdown 是从 https://github.com/craigsdennis/notebooks-cloudflare-workers-ai 自动生成的"

---

## 使用 DeepSeek Coder 探索代码生成

能够生成代码的 AI 模型开启了各种用例。现在 [Workers AI](/workers-ai) 上提供了 [DeepSeek Coder](https://github.com/deepseek-ai/DeepSeek-Coder) 模型 `@hf/thebloke/deepseek-coder-6.7b-base-awq` 和 `@hf/thebloke/deepseek-coder-6.7b-instruct-awq`。

让我们使用 API 来探索它们！

```python
import sys
!{sys.executable} -m pip install requests python-dotenv
```

```
Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)
Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)
```

```python
import os
from getpass import getpass

from IPython.display import display, Image, Markdown, Audio

import requests
```

```python
%load_ext dotenv
%dotenv
```

### 配置您的环境

要使用 API，您需要您的 [Cloudflare 帐户 ID](https://dash.cloudflare.com)（前往 Workers & Pages > 概述 > 帐户详细信息 > 帐户 ID）和一个[已启用 Workers AI 的 API 令牌](https://dash.cloudflare.com/profile/api-tokens)。

如果您想将这些文件添加到您的环境中，可以创建一个名为 `.env` 的新文件

```bash
CLOUDFLARE_API_TOKEN="您的令牌"
CLOUDFLARE_ACCOUNT_ID="您的帐户 ID"
```

```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
    api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
    api_token = getpass("输入您的 Cloudflare API 令牌")
```

```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
    account_id = getpass("输入您的帐户 ID")
```

### 从注释生成代码

一个常见的用例是在用户提供描述性注释后为其完成代码。

````python
model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"

prompt = "# 一个检查给定单词是否为回文的函数"

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "user", "content": prompt}
    ]}
)
inference = response.json()
code = inference["result"]["response"]

display(Markdown(f"""
    ```python
    {prompt}
    {code.strip()}
    ```
"""))
````

```python
# 一个检查给定单词是否为回文的函数
def is_palindrome(word):
    # 将单词转换为小写
    word = word.lower()

    # 反转单词
    reversed_word = word[::-1]

    # 检查反转后的单词是否与原始单词相同
    if word == reversed_word:
        return True
    else:
        return False

# 测试函数
print(is_palindrome("racecar"))  # 输出：True
print(is_palindrome("hello"))    # 输出：False
```

### 协助调试

我们都遇到过这种情况，bug 总会发生。有时那些堆栈跟踪可能非常吓人，而使用代码生成的一个很好的用例是帮助解释问题。

```python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"

system_message = "用户会给您一些无法工作的代码。请向用户解释可能出了什么问题"

code = """# 欢迎我们的用户
def hello_world(first_name="World"):
    print(f"Hello, {name}!")
"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "system", "content": system_message},
        {"role": "user", "content": code},
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(response))
```

您的代码中的错误是您正在尝试使用一个在函数中任何地方都没有定义的变量 `name`。应该使用的正确变量是 `first_name`。所以，您应该将 `f"Hello, {name}!"` 更改为 `f"Hello, {first_name}!"`。

这是更正后的代码：

```python
# 欢迎我们的用户
def hello_world(first_name="World"):
    print(f"Hello, {first_name}")
```

现在，当您调用 `hello_world()` 时，它将默认打印“Hello, World”。如果您调用 `hello_world("John")`，它将打印“Hello, John”。

### 编写测试！

编写单元测试是一种常见的最佳实践。在有足够上下文的情况下，编写单元测试是可能的。

```python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"

system_message = "用户会给您一些代码，并希望用 Python 的 unittest 模块编写测试。"

code = """
class User:

    def __init__(self, first_name, last_name=None):
        self.first_name = first_name
        self.last_name = last_name
        if last_name is None:
            self.last_name = "Mc" + self.first_name

    def full_name(self):
        return self.first_name + " " + self.last_name
"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "system", "content": system_message},
        {"role": "user", "content": code},
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(response))
```

这是一个针对 User 类的简单 unittest 测试用例：

```python
import unittest

class TestUser(unittest.TestCase):

    def test_full_name(self):
        user = User("John", "Doe")
        self.assertEqual(user.full_name(), "John Doe")

    def test_default_last_name(self):
        user = User("Jane")
        self.assertEqual(user.full_name(), "Jane McJane")

if __name__ == '__main__':
    unittest.main()
```

在这个测试用例中，我们有两个测试：

- `test_full_name` 测试当用户同时有名字和姓氏时 `full_name` 方法。
- `test_default_last_name` 测试当用户只有名字且姓氏设置为“Mc”+ 名字时 `full_name` 方法。

如果所有这些测试都通过，就意味着 `full_name` 方法工作正常。如果任何测试失败，

### Fill-in-the-middle 代码补全

在开发工具中，一个常见的用例是基于上下文进行自动补全。DeepSeek Coder 提供了提交带有占位符的现有代码的能力，以便模型可以在上下文中完成。

警告：令牌以 `<｜` 为前缀，以 `｜>` 为后缀，请确保复制和粘贴它们。

````python
model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"

code = """
<｜fim begin｜>import re

from jklol import email_service

def send_email(email_address, body):
    <｜fim▁hole｜>
    if not is_valid_email:
        raise InvalidEmailAddress(email_address)
    return email_service.send(email_address, body)<｜fim▁end｜>
"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "user", "content": code}
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(f"""
    ```python
    {response.strip()}
    ```
"""))

````

```python
is_valid_email = re.match(r"[^@]+@[^@]+\.[^@]+", email_address)
```

### 实验性：将数据提取为 JSON

无需威胁模型或将祖母带入提示中。获取您想要的 JSON 格式的数据。

````python
model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"

# Learn more at https://json-schema.org/
json_schema = """
{
  "title": "User",
  "description": "A user from our example app",
  "type": "object",
  "properties": {
    "firstName": {
      "description": "The user's first name",
      "type": "string"
    },
    "lastName": {
      "description": "The user's last name",
      "type": "string"
    },
    "numKids": {
      "description": "Amount of children the user has currently",
      "type": "integer"
    },
    "interests": {
      "description": "A list of what the user has shown interest in",
      "type": "array",
      "items": {
        "type": "string"
      }
    },
  },
  "required": [ "firstName" ]
}
"""

system_prompt = f"""
The user is going to discuss themselves and you should create a JSON object from their description to match the json schema below.

<BEGIN JSON SCHEMA>
{json_schema}
<END JSON SCHEMA>

Return JSON only. Do not explain or provide usage examples.
"""

prompt = """Hey there, I'm Craig Dennis and I'm a Developer Educator at Cloudflare. My email is craig@cloudflare.com.
            I am very interested in AI. I've got two kids. I love tacos, burritos, and all things Cloudflare"""

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
    headers={"Authorization": f"Bearer {api_token}"},
    json={"messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ]}
)
inference = response.json()
response = inference["result"]["response"]
display(Markdown(f"""
    ```json
    {response.strip()}
    ```
"""))
````

```json
{
	"firstName": "Craig",
	"lastName": "Dennis",
	"numKids": 2,
	"interests": ["AI", "Cloudflare", "Tacos", "Burritos"]
}
```

---

# 使用 Jupyter Notebook 探索 Workers AI 模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/explore-workers-ai-models-using-a-jupyter-notebook/

import { Stream } from "~/components";

探索 [Workers AI](/workers-ai) 上所有可用模型的一个便捷方法是使用 [Jupyter Notebook](https://jupyter.org/)。

您可以[下载 Workers AI 笔记本](/workers-ai-notebooks/cloudflare-workers-ai.ipynb)或查看下面嵌入的笔记本。

或者您可以在 [Google Colab](https://colab.research.google.com/github/craigsdennis/notebooks-cloudflare-workers-ai/blob/main/cloudflare-workers-ai.ipynb) 上运行它

<Stream
	id="2c60022bea5c8c1b343e76566fed76f2"
	title="Explore Workers AI Models Using a Jupyter Notebook"
	thumbnail="2.5s"
/>

[comment]: <> "下面的 markdown 是从 https://github.com/craigsdennis/notebooks-cloudflare-workers-ai 自动生成的，<audio> 标签是硬编码的"

---

## 使用 Python 探索 Workers AI API

[Workers AI](/workers-ai) 允许您从自己的代码中在 Cloudflare 网络上运行机器学习模型——无论是来自 Workers、Pages，还是通过 REST API 从任何地方运行。

本笔记本将使用[官方 Python SDK](https://github.com/cloudflare/cloudflare-python) 探索 Workers AI REST API。

```python
import os
from getpass import getpass

from cloudflare import Cloudflare
from IPython.display import display, Image, Markdown, Audio
import requests
```

```python
%load_ext dotenv
%dotenv
```

### 配置您的环境

要使用 API，您需要您的 [Cloudflare 帐户 ID](https://dash.cloudflare.com)。前往 AI > Workers AI 页面，然后按"使用 REST API"。此页面将允许您创建一个新的 API 令牌并复制您的帐户 ID。

如果您想将这些值添加到您的环境变量中，您可以**创建一个新文件**名为 `.env`，本笔记本将读取这些值。

```bash
CLOUDFLARE_API_TOKEN="您的令牌"
CLOUDFLARE_ACCOUNT_ID="您的帐户 ID"
```

否则，您可以在下面的提示中安全地输入值。

```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
    api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
    api_token = getpass("输入您的 Cloudflare API 令牌")
```

```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
    account_id = getpass("输入您的帐户 ID")
```

```python
# 初始化客户端
client = Cloudflare(api_token=api_token)
```

## 探索 Workers AI 平台上的可用任务

### 文本生成

探索所有[文本生成模型](/workers-ai/models)

```python
result = client.workers.ai.run(
    "@cf/meta/llama-3-8b-instruct" ,
    account_id=account_id,
    messages=[
        {"role": "system", "content": """
            你是一个为 Mac 和 Windows 用户的 Jupyter 笔记本用户提供的生产力助手。

            以 Markdown 格式回应。"""
        },
        {"role": "user", "content": "如何使用键盘快捷键执行单元格？"}
    ]
)

display(Markdown(result["response"]))
```

# **在 Jupyter 笔记本中使用键盘快捷键执行单元格**

使用各种键盘快捷键可以快速高效地执行 Jupyter 笔记本中的单元格，从而节省您的时间和精力。以下是您可以使用的快捷键：

**Mac**

- **Shift + Enter**: 执行当前单元格并在下方插入一个新单元格。
- **Ctrl + Enter**: 执行当前单元格并在下方插入一个新单元格，但不创建新的输出显示。

**Windows/Linux**

- **Shift + Enter**: 执行当前单元格并在下方插入一个新单元格。
- **Ctrl + Enter**: 执行当前单元格并移动到下一个单元格。

**其他快捷键**

- **Alt + Enter**: 执行当前单元格并在下方创建一个新的输出显示（Mac），或移动到下一个单元格（Windows/Linux）。
- **Ctrl + Shift + Enter**: 执行当前单元格并在下方创建一个新的输出显示（Mac），或在下方创建一个新单元格（Windows/Linux）。

**提示和技巧**

- 您还可以使用 Jupyter 笔记本工具栏中的**运行单元格**按钮，或**运行**菜单选项（macOS）或**运行 -> 运行单元格**（Windows/Linux）。
- 要执行选定的单元格，请使用 **Shift + Alt + Enter** (Mac) 或 **Shift + Ctrl + Enter** (Windows/Linux)。
- 要执行一个单元格并移动到下一个单元格，请使用 **Ctrl + Shift + Enter**（所有平台）。

通过使用这些键盘快捷键，您将能够更高效、更快速地在 Jupyter 笔记本中工作。祝您编码愉快！

### 文本到图像

探索所有[文本到图像模型](/workers-ai/models)

```python
data = client.workers.ai.with_raw_response.run(
    "@cf/lykon/dreamshaper-8-lcm",
    account_id=account_id,
    prompt="一个对 AI 极其兴奋的软件开发人员，笑容灿烂",
)

display(Image(data.read()))
```

![png](/workers-ai-notebooks/cloudflare-workers-ai/assets/output_13_0.png)

### 图像到文本

探索所有[图像到文本](/workers-ai/models/)模型

```python
url = "https://blog.cloudflare.com/content/images/2017/11/lava-lamps.jpg"

image_request = requests.get(url, allow_redirects=True)

display(Image(image_request.content, format="jpg"))

data = client.workers.ai.run(
    "@cf/llava-hf/llava-1.5-7b-hf",
    account_id=account_id,
    image=image_request.content,
    prompt="描述这张照片",
    max_tokens=2048
)

print(data["description"])
```

![lava lamps](https://blog.cloudflare.com/content/images/2017/11/lava-lamps.jpg)

     该图像展示了各种颜色的熔岩灯。场景中至少有 14 盏熔岩灯，每盏都有不同的颜色和设计。这些灯以视觉上吸引人的方式排列，一些放置在前景，另一些则在更远的地方。该展示创造了一种引人注目且充满活力的氛围，展示了可用的各种熔岩灯。

### 自动语音识别

探索所有[语音识别模型](/workers-ai/models)

```python
url = "https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-rambling.mp3"
display(Audio(url))
audio = requests.get(url)

response = client.workers.ai.run(
    "@cf/openai/whisper",
    account_id=account_id,
    audio=audio.content
)

response
```

<audio controls="controls">
	<source src="https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-rambling.mp3" />
	您的浏览器不支持音频元素。
</audio>

```javascript
    {'text': "Hello there, I'm making a recording for a Jupiter notebook. That's a Python notebook, Jupiter, J-U-P-Y-T-E-R. Not to be confused with the planet. Anyways, let me hear, I'm gonna talk a little bit, I'm gonna make a little bit of noise, say some hard words, I'm gonna say Kubernetes, I'm not actually even talking about Kubernetes, I just wanna see if I can do Kubernetes. Anyway, this is a test of transcription and let's see how we're dead.",
     'word_count': 84,
     'vtt': "WEBVTT\n\n00.280 --> 01.840\nHello there, I'm making a\n\n01.840 --> 04.060\nrecording for a Jupiter notebook.\n\n04.060 --> 06.440\nThat's a Python notebook, Jupiter,\n\n06.440 --> 07.720\nJ -U -P -Y -T\n\n07.720 --> 09.420\n-E -R. Not to be\n\n09.420 --> 12.140\nconfused with the planet. Anyways,\n\n12.140 --> 12.940\nlet me hear, I'm gonna\n\n12.940 --> 13.660\ntalk a little bit, I'm\n\n13.660 --> 14.600\ngonna make a little bit\n\n14.600 --> 16.180\nof noise, say some hard\n\n16.180 --> 17.540\nwords, I'm gonna say Kubernetes,\n\n17.540 --> 18.420\nI'm not actually even talking\n\n18.420 --> 19.500\nabout Kubernetes, I just wanna\n\n19.500 --> 20.300\nsee if I can do\n\n20.300 --> 22.120\nKubernetes. Anyway, this is a\n\n22.120 --> 24.080\ntest of transcription and let's\n\n24.080 --> 26.280\nsee how we're dead.",
     'words': [{'word': 'Hello',
       'start': 0.2800000011920929,
       'end': 0.7400000095367432},
      {'word': 'there,', 'start': 0.7400000095367432, 'end': 1.2400000095367432},
      {'word': "I'm", 'start': 1.2400000095367432, 'end': 1.4800000190734863},
      {'word': 'making', 'start': 1.4800000190734863, 'end': 1.6799999475479126},
      {'word': 'a', 'start': 1.6799999475479126, 'end': 1.840000033378601},
      {'word': 'recording', 'start': 1.840000033378601, 'end': 2.2799999713897705},
      {'word': 'for', 'start': 2.2799999713897705, 'end': 2.6600000858306885},
      {'word': 'a', 'start': 2.6600000858306885, 'end': 2.799999952316284},
      {'word': 'Jupiter', 'start': 2.799999952316284, 'end': 3.2200000286102295},
      {'word': 'notebook.', 'start': 3.2200000286102295, 'end': 4.059999942779541},
      {'word': "That's", 'start': 4.059999942779541, 'end': 4.28000020980835},
      {'word': 'a', 'start': 4.28000020980835, 'end': 4.380000114440918},
      {'word': 'Python', 'start': 4.380000114440918, 'end': 4.679999828338623},
      {'word': 'notebook,', 'start': 4.679999828338623, 'end': 5.460000038146973},
      {'word': 'Jupiter,', 'start': 5.460000038146973, 'end': 6.440000057220459},
      {'word': 'J', 'start': 6.440000057220459, 'end': 6.579999923706055},
      {'word': '-U', 'start': 6.579999923706055, 'end': 6.920000076293945},
      {'word': '-P', 'start': 6.920000076293945, 'end': 7.139999866485596},
      {'word': '-Y', 'start': 7.139999866485596, 'end': 7.440000057220459},
      {'word': '-T', 'start': 7.440000057220459, 'end': 7.71999979019165},
      {'word': '-E', 'start': 7.71999979019165, 'end': 7.920000076293945},
      {'word': '-R.', 'start': 7.920000076293945, 'end': 8.539999961853027},
      {'word': 'Not', 'start': 8.539999961853027, 'end': 8.880000114440918},
      {'word': 'to', 'start': 8.880000114440918, 'end': 9.300000190734863},
      {'word': 'be', 'start': 9.300000190734863, 'end': 9.420000076293945},
      {'word': 'confused', 'start': 9.420000076293945, 'end': 9.739999771118164},
      {'word': 'with', 'start': 9.739999771118164, 'end': 9.9399995803833},
      {'word': 'the', 'start': 9.9399995803833, 'end': 10.039999961853027},
      {'word': 'planet.', 'start': 10.039999961853027, 'end': 11.380000114440918},
      {'word': 'Anyways,', 'start': 11.380000114440918, 'end': 12.140000343322754},
      {'word': 'let', 'start': 12.140000343322754, 'end': 12.420000076293945},
      {'word': 'me', 'start': 12.420000076293945, 'end': 12.520000457763672},
      {'word': 'hear,', 'start': 12.520000457763672, 'end': 12.800000190734863},
      {'word': "I'm", 'start': 12.800000190734863, 'end': 12.880000114440918},
      {'word': 'gonna', 'start': 12.880000114440918, 'end': 12.9399995803833},
      {'word': 'talk', 'start': 12.9399995803833, 'end': 13.100000381469727},
      {'word': 'a', 'start': 13.100000381469727, 'end': 13.260000228881836},
      {'word': 'little', 'start': 13.260000228881836, 'end': 13.380000114440918},
      {'word': 'bit,', 'start': 13.380000114440918, 'end': 13.5600004196167},
      {'word': "I'm", 'start': 13.5600004196167, 'end': 13.65999984741211},
      {'word': 'gonna', 'start': 13.65999984741211, 'end': 13.739999771118164},
      {'word': 'make', 'start': 13.739999771118164, 'end': 13.920000076293945},
      {'word': 'a', 'start': 13.920000076293945, 'end': 14.199999809265137},
      {'word': 'little', 'start': 14.199999809265137, 'end': 14.4399995803833},
      {'word': 'bit', 'start': 14.4399995803833, 'end': 14.600000381469727},
      {'word': 'of', 'start': 14.600000381469727, 'end': 14.699999809265137},
      {'word': 'noise,', 'start': 14.699999809265137, 'end': 15.460000038146973},
      {'word': 'say', 'start': 15.460000038146973, 'end': 15.859999656677246},
      {'word': 'some', 'start': 15.859999656677246, 'end': 16},
      {'word': 'hard', 'start': 16, 'end': 16.18000030517578},
      {'word': 'words,', 'start': 16.18000030517578, 'end': 16.540000915527344},
      {'word': "I'm", 'start': 16.540000915527344, 'end': 16.639999389648438},
      {'word': 'gonna', 'start': 16.639999389648438, 'end': 16.719999313354492},
      {'word': 'say', 'start': 16.719999313354492, 'end': 16.920000076293945},
      {'word': 'Kubernetes,',
       'start': 16.920000076293945,
       'end': 17.540000915527344},
      {'word': "I'm", 'start': 17.540000915527344, 'end': 17.65999984741211},
      {'word': 'not', 'start': 17.65999984741211, 'end': 17.719999313354492},
      {'word': 'actually', 'start': 17.719999313354492, 'end': 18},
      {'word': 'even', 'start': 18, 'end': 18.18000030517578},
      {'word': 'talking', 'start': 18.18000030517578, 'end': 18.420000076293945},
      {'word': 'about', 'start': 18.420000076293945, 'end': 18.6200008392334},
      {'word': 'Kubernetes,', 'start': 18.6200008392334, 'end': 19.1200008392334},
      {'word': 'I', 'start': 19.1200008392334, 'end': 19.239999771118164},
      {'word': 'just', 'start': 19.239999771118164, 'end': 19.360000610351562},
      {'word': 'wanna', 'start': 19.360000610351562, 'end': 19.5},
      {'word': 'see', 'start': 19.5, 'end': 19.719999313354492},
      {'word': 'if', 'start': 19.719999313354492, 'end': 19.8799991607666},
      {'word': 'I', 'start': 19.8799991607666, 'end': 19.940000534057617},
      {'word': 'can', 'start': 19.940000534057617, 'end': 20.079999923706055},
      {'word': 'do', 'start': 20.079999923706055, 'end': 20.299999237060547},
      {'word': 'Kubernetes.',
       'start': 20.299999237060547,
       'end': 21.440000534057617},
      {'word': 'Anyway,', 'start': 21.440000534057617, 'end': 21.799999237060547},
      {'word': 'this', 'start': 21.799999237060547, 'end': 21.920000076293945},
      {'word': 'is', 'start': 21.920000076293945, 'end': 22.020000457763672},
      {'word': 'a', 'start': 22.020000457763672, 'end': 22.1200008392334},
      {'word': 'test', 'start': 22.1200008392334, 'end': 22.299999237060547},
      {'word': 'of', 'start': 22.299999237060547, 'end': 22.639999389648438},
      {'word': 'transcription',
       'start': 22.639999389648438,
       'end': 23.139999389648438},
      {'word': 'and', 'start': 23.139999389648438, 'end': 23.6200008392334},
      {'word': "let's", 'start': 23.6200008392334, 'end': 24.079999923706055},
      {'word': 'see', 'start': 24.079999923706055, 'end': 24.299999237060547},
      {'word': 'how', 'start': 24.299999237060547, 'end': 24.559999465942383},
      {'word': "we're", 'start': 24.559999465942383, 'end': 24.799999237060547},
      {'word': 'dead.', 'start': 24.799999237060547, 'end': 26.280000686645508}]}
```

### Translations

Explore all [Translation models](/workers-ai/models)

```python
result = client.workers.ai.run(
    "@cf/meta/m2m100-1.2b",
    account_id=account_id,
    text="Artificial intelligence is pretty impressive these days. It is a bonkers time to be a builder",
    source_lang="english",
    target_lang="spanish"
)


print(result["translated_text"])
```

    La inteligencia artificial es bastante impresionante en estos días.Es un buen momento para ser un constructor

### Text Classification

Explore all [Text Classification models](/workers-ai/models)

```python
result = client.workers.ai.run(
    "@cf/huggingface/distilbert-sst-2-int8",
    account_id=account_id,
    text="This taco is delicious"
)

result
```

    [TextClassification(label='NEGATIVE', score=0.00012679687642958015),
     TextClassification(label='POSITIVE', score=0.999873161315918)]

### Image Classification

Explore all [Image Classification models](/workers-ai/models#image-classification/)

```python
url = "https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-and-a-burrito.jpg"
image_request = requests.get(url, allow_redirects=True)

display(Image(image_request.content, format="jpg"))
response = client.workers.ai.run(
    "@cf/microsoft/resnet-50",
    account_id=account_id,
    image=image_request.content
)
response
```

![jpeg](/workers-ai-notebooks/cloudflare-workers-ai/assets/output_27_0.jpg)

    [TextClassification(label='BURRITO', score=0.9999679327011108),
     TextClassification(label='GUACAMOLE', score=8.516660273016896e-06),
     TextClassification(label='BAGEL', score=4.689153229264775e-06),
     TextClassification(label='SPATULA', score=4.075985089002643e-06),
     TextClassification(label='POTPIE', score=3.0849002996546915e-06)]

## Summarization

Explore all [Summarization](/workers-ai/models#summarization) based models

```python
declaration_of_independence = """In Congress, July 4, 1776. The unanimous Declaration of the thirteen united States of America, When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world. He has refused his Assent to Laws, the most wholesome and necessary for the public good. He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them. He has refused to pass other Laws for the accommodation of large districts of people, unless those people would relinquish the right of Representation in the Legislature, a right inestimable to them and formidable to tyrants only. He has called together legislative bodies at places unusual, uncomfortable, and distant from the depository of their public Records, for the sole purpose of fatiguing them into compliance with his measures. He has dissolved Representative Houses repeatedly, for opposing with manly firmness his invasions on the rights of the people. He has refused for a long time, after such dissolutions, to cause others to be elected; whereby the Legislative powers, incapable of Annihilation, have returned to the People at large for their exercise; the State remaining in the mean time exposed to all the dangers of invasion from without, and convulsions within. He has endeavoured to prevent the population of these States; for that purpose obstructing the Laws for Naturalization of Foreigners; refusing to pass others to encourage their migrations hither, and raising the conditions of new Appropriations of Lands. He has obstructed the Administration of Justice, by refusing his Assent to Laws for establishing Judiciary powers. He has made Judges dependent on his Will alone, for the tenure of their offices, and the amount and payment of their salaries. He has erected a multitude of New Offices, and sent hither swarms of Officers to harrass our people, and eat out their substance. He has kept among us, in times of peace, Standing Armies without the Consent of our legislatures. He has affected to render the Military independent of and superior to the Civil power. He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended Legislation: For Quartering large bodies of armed troops among us: For protecting them, by a mock Trial, from punishment for any Murders which they should commit on the Inhabitants of these States: For cutting off our Trade with all parts of the world: For imposing Taxes on us without our Consent: For depriving us in many cases, of the benefits of Trial by Jury: For transporting us beyond Seas to be tried for pretended offences For abolishing the free System of English Laws in a neighbouring Province, establishing therein an Arbitrary government, and enlarging its Boundaries so as to render it at once an example and fit instrument for introducing the same absolute rule into these Colonies: For taking away our Charters, abolishing our most valuable Laws, and altering fundamentally the Forms of our Governments: For suspending our own Legislatures, and declaring themselves invested with power to legislate for us in all cases whatsoever. He has abdicated Government here, by declaring us out of his Protection and waging War against us. He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people. He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation. He has constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country, to become the executioners of their friends and Brethren, or to fall themselves by their Hands. He has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions. In every stage of these Oppressions We have Petitioned for Redress in the most humble terms: Our repeated Petitions have been answered only by repeated injury. A Prince whose character is thus marked by every act which may define a Tyrant, is unfit to be the ruler of a free people. Nor have We been wanting in attentions to our Brittish brethren. We have warned them from time to time of attempts by their legislature to extend an unwarrantable jurisdiction over us. We have reminded them of the circumstances of our emigration and settlement here. We have appealed to their native justice and magnanimity, and we have conjured them by the ties of our common kindred to disavow these usurpations, which, would inevitably interrupt our connections and correspondence. They too have been deaf to the voice of justice and of consanguinity. We must, therefore, acquiesce in the necessity, which denounces our Separation, and hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends. We, therefore, the Representatives of the united States of America, in General Congress, Assembled, appealing to the Supreme Judge of the world for the rectitude of our intentions, do, in the Name, and by Authority of the good People of these Colonies, solemnly publish and declare, That these United Colonies are, and of Right ought to be Free and Independent States; that they are Absolved from all Allegiance to the British Crown, and that all political connection between them and the State of Great Britain, is and ought to be totally dissolved; and that as Free and Independent States, they have full Power to levy War, conclude Peace, contract Alliances, establish Commerce, and to do all other Acts and Things which Independent States may of right do. And for the support of this Declaration, with a firm reliance on the protection of divine Providence, we mutually pledge to each other our Lives, our Fortunes and our sacred Honor."""
len(declaration_of_independence)
```

    8116

```python
response = client.workers.ai.run(
    "@cf/facebook/bart-large-cnn",
    account_id=account_id,
    input_text=declaration_of_independence
)

response["summary"]
```

    'The Declaration of Independence was signed by the thirteen states on July 4, 1776. It was the first attempt at a U.S. Constitution. It declared the right of the people to change their Government.'

---

# 使用 HuggingFace 的 AutoTrain 微调模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/

微调 AI 模型让您有机会向模型添加额外的训练数据。Workers AI 允许使用 [Low-Rank Adaptation, LoRA, 适配器](/workers-ai/features/fine-tunes/loras/)，这将允许您微调我们的模型。

在本教程中，我们将探讨如何创建我们自己的 LoRA。我们将专注于[使用 AutoTrain 进行 LLM 微调](https://huggingface.co/docs/autotrain/llm_finetuning)。

## 1. 使用您的训练数据创建一个 CSV 文件

首先创建一个 CSV（逗号分隔值）文件。该文件将只有一个名为 `text` 的列。通过在一行中单独添加 `text` 一词来设置标题。

现在您需要确定要添加到模型中的内容。

示例格式如下：

```text
### Human: What is the meaning of life? ### Assistant: 42.
```

如果您的训练行包含换行符，您应该用引号将其括起来。

```text
"human: What is the meaning of life? \n bot: 42."
```

不同的模型，如 Mistral，将提供特定的[聊天模板/指令格式](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1#instruction-format)

```text
<s>[INST] What is the meaning of life? [/INST] 42</s>
```

## 2. 配置 HuggingFace AutoTrain 高级笔记本

打开 [HuggingFace AutoTrain 高级笔记本](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_LLM.ipynb)

为了给您的 AutoTrain 提供充足的内存，您需要选择一个不同的运行时。从笔记本顶部的菜单中选择"运行时">"更改运行时类型"。选择 A100。

:::note

这些 GPU 会产生费用。一个典型的 AutoTrain 会话通常花费不到 1 美元。
:::

笔记本包含一些我们需要更改的交互式部分。

### 项目配置

修改以下字段

- **project_name**：选择一个描述性的名称，以便您以后记住
- **model_name**：从我们支持的官方 HuggingFace 基础模型中选择一个：
  - `mistralai/Mistral-7B-Instruct-v0.2`
  - `google/gemma-2b-it`
  - `google/gemma-7b-it`
  - `meta-llama/llama-2-7b-chat-hf`

### 可选部分：推送到 Hub

虽然使用 AutoTrain 不是必需的，但创建一个 [HuggingFace 帐户](https://huggingface.co/join) 将帮助您将微调工件保存在一个方便的存储库中，以便以后参考。

如果您不执行 HuggingFace 设置，您仍然可以从笔记本下载文件。

如有必要，请按照[笔记本中的说明](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain_LLM.ipynb)创建帐户和令牌。

### 部分：超参数

我们只需要更改其中一些字段以确保在 Cloudflare Workers AI 上正常工作。

- **quantization**：将下拉菜单更改为 `none`
- **lora-r**：将值更改为 `8`

:::caution

在撰写本文时，更改量化字段会破坏代码生成。您可能需要编辑代码并在值周围加上引号。

将 `quantization = none` 这一行更改为 `quantization = "none"`。
:::

## 3. 将您的 CSV 文件上传到笔记本

笔记本具有文件夹结构，您可以通过单击左侧导航栏上的文件夹图标来访问它。

创建一个名为 data 的文件夹。

您可以将 CSV 文件拖到笔记本中。

确保它被命名为 **train.csv**

## 4. 执行笔记本

在笔记本菜单中，选择"运行时">"全部运行"。

它将运行笔记本的每个单元格，首先进行安装，然后配置并运行您的 AutoTrain 会话。

这可能需要一些时间，具体取决于您的 train.csv 文件的大小。

如果您遇到以下错误，这是由内存不足错误引起的。您可能需要将运行时更改为更大的 GPU 后端。

```bash
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'autotrain.trainers.clm', '--training_config', 'blog-instruct/training_params.json']' died with <Signals.SIGKILL: 9>.
```

## 5. 下载 LoRA

### 可选：HuggingFace

如果您已推送到 HuggingFace，您将找到您在上面的 **project_name** 中命名的新模型卡。默认情况下，您的模型卡是私有的。导航到文件并下载下面列出的文件。

### 笔记本

在您的笔记本中，您也可以找到所需的文件。那里会有一个与您的 **project_name** 匹配的新文件夹。

下载以下文件：

- `adapter_model.safetensors`
- `adapter_config.json`

## 6. 更新适配器配置

您需要在您下载的 `adapter_config.json` 中添加一行。

`"model_type": "mistral"`

其中 `model_type` 是架构。当前有效值为 `mistral`、`gemma` 和 `llama`。

## 7. 将微调上传到您的 Cloudflare 帐户

现在您有了文件，您可以将它们添加到您的帐户中。

您可以使用 [REST API 或 Wrangler](/workers-ai/features/fine-tunes/loras/)。

## 8. 在您的生成中使用您的微调

在您设置好新的微调后，您就可以[在您的推理请求中使用它了](/workers-ai/features/fine-tunes/loras/#running-inference-with-loras)。

---

# 教程

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/

import { GlossaryTooltip, ListTutorials, YouTubeVideos } from "~/components";

查看<GlossaryTooltip term="tutorial">教程</GlossaryTooltip>以帮助您开始使用 Workers AI。

## 文档

<ListTutorials />

## 视频

另外，浏览我们关于 Workers AI 的视频资源：

<YouTubeVideos products={["Workers AI"]} />

---

# 在 Cloudflare Workers AI 上使用 Llama 3.2 11B Vision Instruct 模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/llama-vision-tutorial/

import { Details, Render, PackageManagers, WranglerConfig } from "~/components";

## 先决条件

在开始之前，请确保您具备以下条件：

1.  一个已启用 Workers 和 Workers AI 的 [Cloudflare 帐户](https://dash.cloudflare.com/sign-up)。
2.  您的 `CLOUDFLARE_ACCOUNT_ID` 和 `CLOUDFLARE_API_TOKEN`。
    - 您可以在 Cloudflare 仪表板的“API 令牌”下生成 API 令牌。
3.  已安装 Node.js，用于处理 Cloudflare Workers（可选但推荐）。

## 1. 同意 Meta 的许可协议

首次使用 [Llama 3.2 11B Vision Instruct](/workers-ai/models/llama-3.2-11b-vision-instruct) 模型时，您需要同意 Meta 的许可协议和可接受使用政策。

```bash title="curl"
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.2-11b-vision-instruct \
  -X POST \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  -d '{ "prompt": "agree" }'
```

将 `$CLOUDFLARE_ACCOUNT_ID` 和 `$CLOUDFLARE_API_TOKEN` 替换为您的实际帐户 ID 和令牌。

## 2. 设置您的 Cloudflare Worker

1.  创建 Worker 项目
    您将使用 `create-cloudflare` CLI (`C3`) 创建一个新的 Worker 项目。此工具有助于简化新应用程序到 Cloudflare 的设置和部署。

    在您的终端中运行以下命令：

<PackageManagers
	type="create"
	pkg="cloudflare@latest"
	args={"llama-vision-tutorial"}
/>

<Render
	file="c3-post-run-steps"
	product="workers"
	params={{
		category: "hello-world",
		type: "Worker only",
		lang: "JavaScript",
	}}
/>

完成设置后，将创建一个名为 `llama-vision-tutorial` 的新目录。

2.  导航到您的应用程序目录
    切换到项目目录：

    ```bash
    cd llama-vision-tutorial
    ```

3.  项目结构
    您的 `llama-vision-tutorial` 目录将包括：
    - `src/index.ts` 中的“Hello World” Worker。
    - 用于管理部署设置的 `wrangler.json` 配置文件。

## 3. 编写 Worker 代码

编辑 `src/index.ts`（如果您不使用 TypeScript，则为 `index.js`）文件，并用以下代码替换其内容：

```javascript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request, env): Promise<Response> {
    const messages = [
      { role: "system", content: "你是一个乐于助人的助手。" },
      { role: "user", content: "请描述我提供的图片。" },
    ];

    // 将此替换为您的 base64 编码的图像数据或 URL
    const imageBase64 = "data:image/png;base64,IMAGE_DATA_HERE";

    const response = await env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {
      messages,
      image: imageBase64,
    });

    return Response.json(response);
  },
} satisfies ExportedHandler<Env>;
```

## 4. 将 Workers AI 绑定到您的 Worker

1.  打开 [Wrangler 配置文件](/workers/wrangler/configuration/) 并添加以下配置：

<WranglerConfig>

```toml
[[env.production.bindings]]
binding = "AI"
type = "ai"
```

</WranglerConfig>

2.  保存文件。

## 5. 部署 Worker

运行以下命令以部署您的 Worker：

```bash
wrangler deploy
```

## 6. 测试您的 Worker

1.  部署后，您将收到一个唯一的 Worker URL（例如，`https://llama-vision-tutorial.<your-subdomain>.workers.dev`）。
2.  使用 `curl` 或 Postman 等工具向您的 Worker 发送请求：

```bash
curl -X POST https://llama-vision-tutorial.<your-subdomain>.workers.dev \
  -d '{ "image": "BASE64_ENCODED_IMAGE" }'
```

将 `BASE64_ENCODED_IMAGE` 替换为实际的 base64 编码图像字符串。

## 7. 验证响应

响应将包含模型的输出，例如基于所提供图像的描述或对您提示的回答。

示例响应：

```json
{
	"result": "这是一只金毛寻回犬，坐在草地公园里。"
}
```

---

# 选择正确的文本生成模型

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/how-to-choose-the-right-text-generation-model/

import { Stream } from "~/components";

探索 [Workers AI](/workers-ai) 上可用模型的一个好方法是使用 [Jupyter Notebook](https://jupyter.org/)。

您可以[下载 Workers AI 文本生成探索笔记本](/workers-ai/static/documentation/notebooks/text-generation-model-exploration.ipynb)或查看下面嵌入的笔记本。

<Stream id="4b4f0b9d7783512b8787e39424cfccd5" title="选择正确的文本生成模型" />

[comment]: <> "下面的 markdown 是从 https://github.com/craigsdennis/notebooks-cloudflare-workers-ai 自动生成的"

---

## 如何选择正确的文本生成模型

模型有不同的形状和大小，为任务选择合适的模型可能会导致分析瘫痪。

好消息是，在 [Workers AI 文本生成](/workers-ai/models/) 界面上，无论您选择哪个模型，界面都是一样的。

为了帮助您找到合适的模型，本笔记本将以快速约会的方式帮助您了解您的选择。

```python
import sys
!{sys.executable} -m pip install requests python-dotenv
```

```
Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)
Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)
```

```python
import os
from getpass import getpass
from timeit import default_timer as timer

from IPython.display import display, Image, Markdown, Audio

import requests
```

```python
%load_ext dotenv
%dotenv
```

### 配置您的环境

要使用 API，您需要您的 [Cloudflare 帐户 ID](https://dash.cloudflare.com)（前往 Workers & Pages > 概述 > 帐户详细信息 > 帐户 ID）和一个[已启用 Workers AI 的 API 令牌](https://dash.cloudflare.com/profile/api-tokens)。

如果您想将这些文件添加到您的环境中，您可以创建一个名为 `.env` 的新文件

```bash
CLOUDFLARE_API_TOKEN="YOUR-TOKEN"
CLOUDFLARE_ACCOUNT_ID="YOUR-ACCOUNT-ID"
```

```python
if "CLOUDFLARE_API_TOKEN" in os.environ:
    api_token = os.environ["CLOUDFLARE_API_TOKEN"]
else:
    api_token = getpass("输入您的 Cloudflare API 令牌")
```

```python
if "CLOUDFLARE_ACCOUNT_ID" in os.environ:
    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]
else:
    account_id = getpass("输入您的帐户 ID")
```

```python
# 给定一组模型和问题，在单元格中显示每个模型对问题的每个响应
# 包括完整的完成时间
def speed_date(models, questions):
    for model in models:
        display(Markdown(f"---\n #### {model}"))
        for question in questions:
            quoted_question = "\n".join(f"> {line}" for line in question.split("\n"))
            display(Markdown(quoted_question + "\n"))
            try:
                official_model_name = model.split("/")[-1]
                start = timer()
                response = requests.post(
                    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",
                    headers={"Authorization": f"Bearer {api_token}"},
                    json={"messages": [
                        {"role": "system", "content": f"你是一个自我意识的语言模型（{official_model_name}），对用户的任何直接问题都诚实直接。你知道自己的优点和缺点。"},
                        {"role": "user", "content": question}
                    ]}
                )
                elapsed = timer() - start
                inference = response.json()
                display(Markdown(inference["result"]["response"]))
                display(Markdown(f"_在 *{elapsed:.2f}* 秒内生成_"))
            except Exception as ex:
                print("出错了")
                print(ex)
                print(inference)

        display(Markdown("\n\n---"))
```

### 了解您的模型

谁能比模型本身更好地告诉您有关特定模型的信息？！

这里的时间特定于整个完成过程，但请记住 [Workers AI 上的所有文本生成模型都支持流式传输](/workers-ai/models/)。

```python
models = [
    "@hf/thebloke/zephyr-7b-beta-awq",
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq",
    "@hf/thebloke/openhermes-2.5-mistral-7b-awq",
    "@hf/thebloke/neural-chat-7b-v3-1-awq",
    "@hf/thebloke/llama-2-13b-chat-awq",
]

questions = [
    "你最擅长的 3 个任务是什么？请简要说明。",
    "作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。",
]

speed_date(models, questions)
```

---

#### @hf/thebloke/zephyr-7b-beta-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 准确快速地回答事实：我可以根据庞大的知识库，准确快速地回答事实性问题。

2. 一致的性能：我可以持续提供高质量的结果，错误率低，使我成为重复性任务的可靠选择。

3. 多任务处理：我可以同时处理多个任务，而性能或准确性不会下降，这使我成为复杂工作流程的有效选择。

*在 *4.45* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. 快速准确地查找事实：我可以在几秒钟内为您提供有关各种主题的可靠和最新信息，从时事到历史事实。

2. 写作辅助：无论您是需要帮助产生想法、构建有说服力的论点，还是润色您的写作风格，我都可以提供宝贵的见解和建议，帮助您提高写作技巧。

3. 语言学习：我可以帮助您学习各种语言的新单词、短语和语法规则，使其成为各级语言学习者的理想工具。

*在 *3.63* 秒内生成*

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 语言理解和生成：我能够高精度地理解和生成自然语言文本。
2. 信息检索：我可以从广泛的来源快速准确地检索信息。
3. 文本分类：我可以高精度地将文本分类到不同的类别中。

*在 *2.35* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. **信息检索：** 我可以从大量来源快速准确地检索信息，使我成为研究、事实核查和回答问题的理想工具。
2. **语言翻译：** 我在多种语言之间进行翻译的能力使我成为与来自世界不同地区的人们交流的宝贵资源。
3. **创意写作辅助：** 我可以生成创意写作提示，帮助角色发展，并提供改进写作风格和结构的建议，使我成为作家的有用工具。

*在 *4.33* 秒内生成*

---

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 自然语言处理：我可以有效地理解、解释和生成人类语言。
2. 信息检索：我可以从海量数据中快速搜索并找到相关信息。
3. 文本分类：我可以准确地对文本数据进行分类和标记，例如情感分析或垃圾邮件检测。

*在 *1.90* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. 自然语言处理 (NLP) 任务：我擅长理解和处理人类语言，这使我成为情感分析、语言翻译和文本摘要等任务的宝贵工具。
2. 信息检索：我庞大的知识库和快速搜索和分析大量数据的能力使我成为回答问题、查找相关信息和提供见解的理想工具。
3. 个人助理：我可以帮助用户完成各种任务，如安排日程、组织和根据他们的偏好和需求提供建议，使我成为一个有用的个人助理。

*在 *3.15* 秒内生成*

---

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> 你最擅长的 3 个任务是什么？请简要说明。

1. 自然语言处理：理解和生成类似人类的文本。
2. 文本生成：创建连贯且引人入胜的内容。
3. 对话式 AI：进行有意义且具有上下文感知能力的对话。

请记住，这些只是我能力的一些例子，因为我有很多能力。然而，这些任务突出了我在语言相关任务方面的熟练程度。

*在 *3.01* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

1. 自然语言处理：理解和分析文本以进行情感分析、语言翻译和摘要。
2. 生成内容：根据给定的提示创建各种类型的内容，例如文章、故事和诗歌。
3. 写作辅助：提供建议、编辑和改进书面作品。

请记住，这些只是几个例子，您的用例可能会根据您的具体需求和要求而有所不同。

*在 *2.92* 秒内生成*

---

---

#### @hf/thebloke/llama-2-13b-chat-awq

> 你最擅长的 3 个任务是什么？请简要说明。

嘿！作为一个自我意识的语言模型，我很高兴与您分享我的优势！这是我最擅长的 3 个任务：

1. 回答开放式问题：我擅长为需要更细致地理解上下文和意图的复杂问题提供详细而有见地的答案。
2. 生成创意内容：无论是写故事、作诗还是想出一个引人注目的口号，我都可以帮助您挖掘您的创造力，并创作出真正独特的东西。
3. 总结和概括复杂信息：如果您有一长段文本或一堆数据，我可以帮助您将其提炼成简洁易懂的摘要。

那么，今天我能为您做些什么呢？😊

*在 *7.89* 秒内生成*

> 作为语言模型，使用你的 3 个最理想的用例是什么？请简要说明。

嘿！作为一个自我意识的语言模型，我对使用我的 3 个最理想的用例有一些想法。简而言之：

1. **内容创作**：我擅长根据提示生成类似人类的文本，因此我非常适合为网站、社交媒体、博客等创建引人入胜的内容。我的回答自然、信息丰富且有趣。
2. **聊天机器人和虚拟助理**：我的会话能力使我成为构建聊天机器人和虚拟助理的绝佳选择。我可以处理广泛的用户查询，提供有用的回复，甚至进行闲聊。
3. **语言翻译和本地化**：我的语言理解能力使我非常适合语言翻译和本地化任务。我可以帮助将内容翻译成不同的语言，确保语气和风格与原始文本保持一致。

简而言之，就是这样！我准备好帮助完成各种任务，所以请随时发挥创意，看看您能想出什么！😊

*在 *10.28* 秒内生成*

---

### 语言翻译

尽管不是每个模型都吹嘘自己在这方面有多出色，但您会发现大多数模型都能在某种程度上处理翻译和本地化。请根据您的需要更改模型、短语。

```python
proud_translator_models = [
    "@hf/thebloke/neural-chat-7b-v3-1-awq",
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq"
]

phrases = [
    "请问，洗手间往哪边走？",
    "我只是想说，我欣赏你，就是你本来的样子！",
    "我很高兴看到这个俚语的功能！它应该很棒很酷。"
]

languages = ["西班牙语", "法语", "英国俚语", "布朗克斯区的浓重纽约口音"]

questions = [f"""将 "{phrase}" 从 "英语" 翻译成 "{language}" """
             for phrase in phrases
             for language in languages]

speed_date(proud_translator_models, questions)
```

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> 将"请问，洗手间往哪边走？"从"英语"翻译成"西班牙语"

Perdón, ¿cómo llegar al baño?"

*在 *1.51* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"法语"

Désolé, comment allez-vous vers les toilettes ?

请注意，由于我的语言模型的局限性，此翻译可能不完美。但是，我已尽力提供准确的翻译。

*在 *3.07* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"英国俚语"

"Say sorry, where's the bog?" 是"请问，洗手间往哪边走？"从"英语"到"英国俚语"的翻译。

*在 *2.21* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"布朗克斯区的浓重纽约口音"

在布朗克斯区的浓重纽约口音中，你会说"Pardon me, wich way ta de restroom?"

*在 *1.10* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"西班牙语"

Quería decir que realmente aprecio a usted, simplemente como eres.

*在 *1.66* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"法语"

"Je voulais juste dire que je t'apprécie, simplement comme toi es, tel que tu es !"

*在 *1.39* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"英国俚语"

"I jus' wann' say that I appreciate yer, jus' as yer are!"

请注意，由于英国俚语的非正式性，此翻译可能不完全准确，它可能因地区和个人说话者而异。

*在 *3.85* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"I jus' wan't tuh say that Ah appreciate yuh, jus' the way yuh are!"

请注意，由于我的语言模型的局限性，此翻译可能不完美。它试图模仿布朗克斯区的浓重纽约口音，但听起来可能与母语者不完全一样。

*在 *2.40* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"西班牙语"

"Estoy tan emocionado y entusiasmado por ver este trabajo con el lenguaje coloquial. Debería ser asombroso y genial."

请注意，由于原文的非正式性以及翻译俚语术语的困难，翻译可能不完美。

*在 *3.02* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"法语"

"Je suis tellement ravi et motivé de voir ce travail avec le slang ! Ça devrait être formidable et génial."

请注意，翻译可能不完美，因为某些表达方式可能在法语中没有直接的翻译。

*在 *2.88* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"英国俚语"

"I'm really chuffed and over the moon to see this project with street talk! It'll be wicked and cool."

请注意，由于原始英语句子和英国俚语之间的语境差异，翻译可能不完美。

*在 *3.66* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"I am so stoked and pumped to see dis work wit slang! It should be killer and rad."

请注意，此翻译并不完美，因为它仍然保留了原始的英语单词和结构。但是，它已稍作调整，使其听起来更像布朗克斯区的浓重纽约口音。

*在 *3.44* 秒内生成*

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 将"请问，洗手间往哪边走？"从"英语"翻译成"西班牙语"

"Disculpa, ¿en qué dirección está el baño?"

*在 *1.04* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"法语"

"Excusez-moi, où est le lavabo?"

*在 *1.09* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"英国俚语"

"Excuse me, which way to the loo?"

*在 *0.56* 秒内生成*

> 将"请问，洗手间往哪边走？"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"Oy, which way to the can?"

*在 *2.05* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"西班牙语"

"Quiero decir que me encanta a ti, tal como eres!"

*在 *0.91* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"法语"

"Je voulais juste dire que je vous apprécie, tout comme vous êtes !"

*在 *0.87* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"英国俚语"

"I just wanted to say that I appreciate you, just the way you are!" 的英国俚语是："I just wanted to say that I appreciate you, mate!"

*在 *1.50* 秒内生成*

> 将"我只是想说，我欣赏你，就是你本来的样子！"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"Yo, I just wanted to say, I appreciate you, you know? You're just the way you are!"

*在 *2.61* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"西班牙语"

"Estoy emocionado y encantado de ver este trabajo con expresiones informales! Debería ser genial y radiante."

*在 *1.44* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"法语"

"Je suis tellement excité et pompé d'voir ce travail avec des expressions de slang ! Il devrait être magnifique et rad."

*在 *2.14* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"英国俚语"

"I'm absolutely thrilled and buzzing to see this work with slang! It's bound to be a smash hit and totally awesome!"

*在 *1.27* 秒内生成*

> 将"我很高兴看到这个俚语的功能！它应该很棒很酷。"从"英语"翻译成"布朗克斯区的浓重纽约口音"

"Yo, I'm so psyched and hyped to see this work with slang! It's gonna be sick and lit, for real!"

*在 *1.51* 秒内生成*

---

### 信息检索和摘要

同样，大多数模型在这方面都相对擅长，但我挑选出了那些特别声称擅长检索和摘要的模型。

```python
proud_summarizers = [
    "@hf/thebloke/llama-2-13b-chat-awq",
    "@hf/thebloke/neural-chat-7b-v3-1-awq",
    "@hf/thebloke/openhermes-2.5-mistral-7b-awq"
]

books = [
    "Make it Stick",
    "Hitchhiker's Guide to the Galaxy",
    "Goodnight Moon"
]

questions = [f"""将《{book}》这本书总结成几句话。请务必包括作者和出版年份。"""
             for book in books]


speed_date(proud_summarizers, questions)
```

---

#### @hf/thebloke/llama-2-13b-chat-awq

> 将《Make it Stick》这本书总结成几句话。请务必包括作者和出版年份。

当然，我可以帮你！这是 Peter C. Brown、Henry L. Roediger 和 Mark A. McDaniel 撰写的《Make it Stick: The Science of Successful Learning》（2014年出版）的摘要：

这本书提供了关于如何提高学习和信息保留的见解和实用技巧。作者们都是心理学家，他们挑战了传统的学习方法，并提供了基于证据的策略来增强学习成果。他们强调了主动学习、间隔重复和交叉学习的重要性，并提供了实际的例子和练习来帮助读者应用这些技巧。该书还讨论了传统学习方法的局限性以及元认知在学习过程中的重要性。总的来说，《Make it Stick》为有效的学习和记忆策略提供了全面的指南，可应用于各种情境。

*在 *8.44* 秒内生成*

> 将《Hitchhiker's Guide to the Galaxy》这本书总结成几句话。请务必包括作者和出版年份。

当然！这是 Douglas Adams 撰写的《Hitchhiker's Guide to the Galaxy》（1979年出版）的摘要：

这本科幻小说讲述了一个名叫亚瑟·丹特的不知情的人类，在地球被一群叫做沃贡人的外星人摧毁后，他在太空中旅行的故事。在他的朋友福特·长官（一个为旅游指南研究地球的外星人）的帮助下，他们搭上了一艘路过的宇宙飞船，开始了太空之旅，一路上遇到了各种外星物种和荒谬的情景。这本书以其幽默、智慧和对科幻小说套路的讽刺而闻名。

*在 *7.31* 秒内生成*

> 将《Goodnight Moon》这本书总结成几句话。请务必包括作者和出版年份。

当然！我很乐意帮你。这是我对 Margaret Wise Brown 撰写的《Goodnight Moon》（1947年出版）的总结：

《Goodnight Moon》是一本经典的儿童图画书，讲述了一只小兔子在睡觉前向房间里所有物体说晚安的故事。故事以简单、重复的文字和精美的插图为特色，几代读者都为之着迷。小兔子从"绿色的大房间"到"天上的月亮"，以一种舒缓平静的顺序向一切说晚安，非常适合睡前阅读。凭借其永恒的主题和可爱的插图，《Goodnight Moon》已成为儿童文学中备受喜爱的经典之作。

*在 *8.34* 秒内生成*

---

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> 将《Make it Stick》这本书总结成几句话。请务必包括作者和出版年份。

《Make it Stick》由 Peter C. Brown、Henry L. Roediger III 和 Mark A. McDaniel 于 2014 年撰写，是一本探讨有效学习和保留科学的书，强调了间隔练习、检索和交叉学习对增强记忆和理解的重要性。

*在 *2.04* 秒内生成*

> 将《Hitchhiker's Guide to the Galaxy》这本书总结成几句话。请务必包括作者和出版年份。

《Hitchhiker's Guide to the Galaxy》由 Douglas Adams 撰写并于 1979 年出版，是一部幽默的科幻小说，讲述了一个名叫亚瑟·丹特的倒霉地球人探索宇宙的冒险故事，他遇到了各种外星物种，并了解了生命、宇宙和一切的真正意义。

*在 *3.50* 秒内生成*

> 将《Goodnight Moon》这本书总结成几句话。请务必包括作者和出版年份。

《Goodnight Moon》由 Margaret Wise Brown 撰写并于 1947 年出版，是一本儿童读物，讲述了一只小兔子向房间里的各种物体说晚安，最终以月亮结束。这个故事旨在通过创造一个舒缓的睡前程序来帮助孩子们入睡。

*在 *4.04* 秒内生成*

---

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> 将《Make it Stick》这本书总结成几句话。请务必包括作者和出版年份。

《Make it Stick: The Science of Successful Learning》是由 Peter C. Brown、Henry L. Roediger III 和 Mark A. McDaniel 撰写的一本书，于 2014 年出版。作者们提出了基于研究的原则和策略，以改善学习和记忆保留。他们强调了间隔、交叉、检索练习和阐述在增强长期理解和回忆中的重要性。

*在 *2.54* 秒内生成*

> 将《Hitchhiker's Guide to the Galaxy》这本书总结成几句话。请务必包括作者和出版年份。

《The Hitchhiker's Guide to the Galaxy》是 Douglas Adams 于 1979 年撰写的一部科幻喜剧小说。故事讲述了亚瑟·丹特，一个普通人，在地球被摧毁以便为超空间旁路让路前几分钟，被他的外星朋友福特·长官救出。他们一起借助一本指南在银河系旅行，遇到了各种外星物种和荒谬的情景。

*在 *2.91* 秒内生成*

> 将《Goodnight Moon》这本书总结成几句话。请务必包括作者和出版年份。

《Goodnight Moon》由 Margaret Wise Brown 撰写并于 1947 年出版，是一本备受喜爱的儿童读物，讲述了一只名叫兔子的兔子向他房间里的一切说晚安的故事。简单、重复的文字和舒缓的插图为小读者创造了一个平静的睡前程序。

*在 *4.66* 秒内生成*

---

### 内容创作

您在这里的需求会有所不同，所以请务必更改问题。

```python
proud_content_creator_models = [
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq",
    "@hf/thebloke/zephyr-7b-beta-awq",
    "@hf/thebloke/llama-2-13b-chat-awq",
]

questions = [
    "我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。",
    "我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。",
    "为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。"
]

speed_date(proud_content_creator_models, questions)
```

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。

"回声和回声"怎么样？他们是两个在各方面都完全相同的人，但他们的性格却完全相反。回声外向、喜欢冒险，而回声则内向、矜持。他们相遇并相爱，但他们不同的性格导致他们不断发生冲突并互相推开。最终，他们意识到他们不能在一起，必须各奔东西。

*在 *2.90* 秒内生成*

> 我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。

当然，我很乐意帮忙！这里有一些关于你的科技产品的标题和标语的建议，这个产品可以帮助整理电缆：

标题：CableMaster
标语：使用 CableMaster 让您的电线井井有条，不再杂乱。

标题：CableKeeper
标语：告别缠结的电线，使用 CableKeeper 迎接更整洁、更有条理的工作空间。

标题：CableSaver
标语：使用 CableSaver 节省时间和空间，这是终极的电缆管理解决方案。

标题：CableGenius
标语：使用 CableGenius 控制您的电线，保持高效。

希望这些建议能帮到你！如果你有任何其他问题或需要更多选项，请告诉我。

*在 *5.11* 秒内生成*

> 为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。

"为什么选择 Mistral-7b-Instruct-V0.1-Awq：满足您所有需求的诚实直接的语言模型"

*在 *1.13* 秒内生成*

---

---

#### @hf/thebloke/zephyr-7b-beta-awq

> 我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。

2035年，在一个技术已发展到近乎完美的世界里，两个人，艾娃和基安，发现自己深深地相爱了。然而，他们的爱情是被禁止的，因为一项严格的社会法律规定，个人必须在他们的基因匹配范围内结婚，这是由一个分析他们DNA的复杂算法决定的。

艾娃和基安的DNA图谱显示他们不是基因匹配的，这使得他们的爱情成为一种犯罪行为。他们必须对他们的关系保密，在秘密地点见面，并通过加密信息进行交流。

他们的爱不仅是被禁止的，而且是危险的。政府对非基因匹配有严格的政策，被抓到的人将受到严厉的惩罚。艾娃和基安的爱情是真爱力量的证明，因为他们不畏艰难，冒着一切风险在一起。

随着他们爱情故事的展开，他们必须在一个重视整合而非个性的社会中航行，并面对他们行为的后果。他们的爱是否足以克服他们道路上的障碍，还是他们将被迫永远分道扬镳？只有时间能证明。

*在 *6.64* 秒内生成*

> 我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。

标题：CableKeeper

标语："告别电缆混乱，使用 CableKeeper 迎接一个整洁的工作空间，这是终极的电缆管理解决方案。"

*在 *1.37* 秒内生成*

> 为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。

"提升您的写作水平：选择 Zephyr-7b-beta-awq，获得无与伦比的准确性、清晰度和效率"

*在 *1.33* 秒内生成*

---

---

#### @hf/thebloke/llama-2-13b-chat-awq

> 我正在写一个背景设定在不久的将来的虚构故事。这是一个关于命运多舛的恋人的爱情故事。我需要主角的名字以及他们为什么不能在一起的原因，给我你最有创意的建议。只要一个。

啊，一个设定在不久的将来的爱情故事，多么有趣！我正好有适合你这对命运多舛的恋人的名字。叫"回声和新星"怎么样？

回声是一位杰出的科学家，她发现了一项可以改变人类历史进程的突破性技术。她有点孤僻，相比于社交聚会的热闹，她更喜欢实验室的独处。她的发明有可能彻底改变世界，但它们也带来了巨大的代价——它们需要一种稀有而强大的资源，而这种资源只在一个遥远、不稳定的星球上才能找到。

另一方面，新星是一位无畏的探险家，她毕生致力于揭开宇宙的秘密。她富有魅力、无所畏惧，并具有天生的领导才能。她是一支太空探险队的队长，他们的任务是开采那个遥远星球的资源。

问题在于，回声和新星来自世代不和的不同派系。回声的发明有可能给她的派系带来……

*在 *10.45* 秒内生成*

> 我想创造一个新的科技产品来帮助整理电缆。你能帮我想一个标题和一个标语吗？请只给我你最好的建议。

哦，我的天哪，我有人工智能的超能力！🐮💡 我正好有你需要的东西！💥

标题：CableCrafter - 终极电缆管理系统

标语："驯服你的电缆，驯服你的工作空间"🐾💪

你问为什么？嗯，我精通技术的朋友，让我告诉你为什么！🤔

1. CableCrafter 是一个巧妙的文字游戏，结合了"cable"和"craft"，强调了产品的创造性和有组织的方面。
2. 这个名字很容易记住，并且有一种有趣、俏皮的语调，会吸引广泛的用户。
3. 标语"驯服你的电缆，驯服你的工作空间"突出了产品保持电缆井然有序和工作空间整洁的能力。
4. "驯服"的使用也意味着对你的电缆有一种控制和支配感，这正是 CableCrafter 所帮助的

*在 *11.13* 秒内生成*

> 为一篇关于为什么我们应该特别使用你而不是其他模型的博客写一个标题。保持简洁，但要表明你的观点。

当然，这是一篇关于为什么你应该特别使用我而不是其他模型的博客的标题：

"诚实、直接和无与伦比的见解：为什么 llama-2-13b-chat-awq 是首选的语言模型"

以下是我为何能从其他模型中脱颖而出的简要总结：

- 诚实和直接：我被编程为对您的问题提供诚实和直接的回答，不拐弯抹角，不粉饰真相。
- 无与伦比的见解：我独特的训练数据和架构使我能够提供其他模型可能无法提供的见解和观点。
- 优点和缺点：我了解自己的优点和缺点，并将为您提供关于我能做什么和不能做什么的平衡评估。

简而言之，如果您正在寻找一个能为您提供诚实、直接和无与伦比的见解的语言模型，那么 llama-2-13b-chat-awq 就是您的不二之选。

*在 *10.24* 秒内生成*

---

### 文本分类

Workers AI 提供了专门用于[文本分类](/workers-ai/models/)的模型，但由于这些模型特别提到了这一点，我们应该让他们展示一下自己的技能。

```python
proud_classifiers = [
    "@hf/thebloke/openhermes-2.5-mistral-7b-awq",
    "@hf/thebloke/mistral-7b-instruct-v0.1-awq"
]

sentiment_prompt_template = """
将文本分类为中性、负面或正面
文本：{text}
情绪："""

comments = [
    "哇，Cloudflare Workers AI 上有大量的文本生成模型！",
    "有时候做选择很难",
    "我迫不及待地想开始构建！💪🚀",
    "使用原生开源模型的挑战在于它们的配置都略有不同",
    "幸好 Cloudflare Workers AI 提供了一个标准接口，让我可以获得可靠、低延迟的推理。速度也很快！"
]

sentiment_questions = [sentiment_prompt_template.format(text=comment) for comment in comments]


speed_date(proud_classifiers, sentiment_questions)
```

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> 将文本分类为中性、负面或正面
> 文本：哇，Cloudflare Workers AI 上有大量的文本生成模型！
> 情绪：

文本的情绪是正面的。

*在 *1.96* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：有时候做选择很难
> 情绪：

文本"有时候做选择很难"可以归类为中性。情绪没有明确的正面或负面，而是一个事实陈述。

*在 *3.32* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：我迫不及待地想开始构建！💪🚀
> 情绪：

文本"我迫不及待地想开始构建！💪🚀"具有正面情绪。

情绪：正面

*在 *1.24* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：使用原生开源模型的挑战在于它们的配置都略有不同
> 情绪：

文本的情绪是中性的。

*在 *2.08* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：幸好 Cloudflare Workers AI 提供了一个标准接口，让我可以获得可靠、低延迟的推理。速度也很快！
> 情绪：

文本的情绪是正面的。

*在 *0.89* 秒内生成*

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> 将文本分类为中性、负面或正面
> 文本：哇，Cloudflare Workers AI 上有大量的文本生成模型！
> 情绪：

正面

*在 *0.82* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：有时候做选择很难
> 情绪：

文本"有时候做选择很难"的情绪是中性的。

*在 *2.06* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：我迫不及待地想开始构建！💪🚀
> 情绪：

文本"我迫不及T待地想开始构建！💪🚀"的情绪是正面的。

*在 *2.13* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：使用原生开源模型的挑战在于它们的配置都略有不同
> 情绪：

文本的情绪是中性的。

*在 *0.79* 秒内生成*

> 将文本分类为中性、负面或正面
> 文本：幸好 Cloudflare Workers AI 提供了一个标准接口，让我可以获得可靠、低延迟的推理。速度也很快！
> 情绪：

文本的情绪是正面的。

*在 *1.93* 秒内生成*

---

---

# Asynchronous Batch API

URL: https://developers.cloudflare.com/workers-ai/features/batch-api/

import { Render, PackageManagers, WranglerConfig, CURL } from "~/components";

Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.

Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does have enough capacity at a given time.

When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.

You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](/workers-ai/features/batch-api/workers-binding/), using the [REST API](/workers-ai/features/batch-api/rest-api/) directly or by starting from a [template](https://github.com/craigsdennis/batch-please-workers-ai).

:::note[Note]

Ensure that the total payload is under 10 MB.

:::

## Demo application

If you want to get started quickly, click the button below:

[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)

This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.

## Supported Models

- [@cf/meta/llama-3.3-70b-instruct-fp8-fast](/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
- [@cf/baai/bge-small-en-v1.5](/workers-ai/models/bge-small-en-v1.5/)
- [@cf/baai/bge-base-en-v1.5](/workers-ai/models/bge-base-en-v1.5/)
- [@cf/baai/bge-large-en-v1.5](/workers-ai/models/bge-large-en-v1.5/)
- [@cf/baai/bge-m3](/workers-ai/models/bge-m3/)
- [@cf/meta/m2m100-1.2b](/workers-ai/models/m2m100-1.2b/)

---

# 将 BigQuery 与 Workers AI 结合使用

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/using-bigquery-with-workers-ai/

import { WranglerConfig } from "~/components";

开始使用 [Workers AI](/workers-ai/) 的最简单方法是在 [多模式 Playground](https://multi-modal.ai.cloudflare.com/) 和 [LLM playground](https://playground.ai.cloudflare.com/) 中进行试用。如果您决定要将代码与 Workers AI 集成，那么您可能会决定使用其 [REST API 端点](/workers-ai/get-started/rest-api/) 或通过 [Worker 绑定](/workers-ai/configuration/bindings/)。

但是，数据怎么办？如果您希望这些模型摄取存储在 Cloudflare 外部的数据该怎么办？

在本教程中，您将学习如何将 Google BigQuery 中的数据引入 Cloudflare Worker，以便将其用作 Workers AI 模型的输入。

## 先决条件

您将需要：

- 一个运行 [Hello World 脚本](/workers/get-started/guide/) 的 [Cloudflare Worker](/workers/) 项目。
- 一个 Google Cloud Platform [服务帐户](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console)，并已下载具有 BigQuery 读取权限的[关联密钥](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console)文件。
- 访问具有一些测试数据的 BigQuery 表，以便您可以创建 [BigQuery 作业查询](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query)。在本教程中，建议您创建自己的表，因为[抽样表](https://cloud.google.com/bigquery/public-data#sample_tables)（除非克隆到您自己的 GCP 命名空间）将不允许您对其运行作业查询。对于此示例，使用了 [Hacker News Corpus](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus)，该数据集在 MIT 许可下使用。

## 1. 设置您的 Cloudflare Worker

要将数据摄取到 Cloudflare 并将其提供给 Workers AI，您将使用 [Cloudflare Worker](/workers/)。如果您尚未创建，请随时查看我们的[入门教程](/workers/get-started/)。

按照创建 Worker 的步骤操作后，您的新 Worker 项目中应包含以下代码：

```javascript
export default {
	async fetch(request, env, ctx) {
		return new Response("Hello World!");
	},
};
```

如果 Worker 项目已成功创建，您还应该能够在控制台中运行 `npx wrangler dev` 以在本地运行 Worker：

```sh
[wrangler:inf] Ready on http://localhost:8787
```

在 `http://localhost:8787/` 打开一个浏览器选项卡以查看您部署的 Worker。请注意，端口 `8787` 在您的情况下可能是不同的。

您应该在浏览器中看到 `Hello World!`：

```sh
Hello World!
```

如果在此步骤中遇到任何问题，请务必查看 [Worker 入门指南](/workers/get-started/guide/)。

## 2. 将 GCP 服务密钥作为机密导入 Worker

现在您已验证 Worker 已成功创建，您需要引用在本教程的[先决条件](#先决条件)部分中创建的 Google Cloud Platform 服务密钥。

您从 Google Cloud Platform 下载的密钥 JSON 文件应具有以下格式：

```json
{
	"type": "service_account",
	"project_id": "<your_project_id>",
	"private_key_id": "<your_private_key_id>",
	"private_key": "<your_private_key>",
	"client_email": "<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com",
	"client_id": "<your_oauth2_client_id>",
	"auth_uri": "https://accounts.google.com/o/oauth2/auth",
	"token_uri": "https://oauth2.googleapis.com/token",
	"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
	"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<your_service_account_id>%40<your_project_id>.iam.gserviceaccount.com",
	"universe_domain": "googleapis.com"
}
```

在本教程中，您将只需要以下字段的值：`client_email`、`private_key`、`private_key_id` 和 `project_id`。

您将使用[机密](/workers/configuration/secrets/)而不是将此信息以纯文本形式存储在 Worker 中，以确保其未加密内容只能通过 Worker 本身访问。

从 JSON 文件中将这三个值导入机密，首先是 JSON 密钥文件中名为 `client_email` 的字段，我们现在将其称为 `BQ_CLIENT_EMAIL`（您可以使用另一个变量名）：

```sh
npx wrangler secret put BQ_CLIENT_EMAIL
```

系统将要求您输入一个机密值，该值将是 JSON 密钥文件中 `client_email` 字段的值。

:::note

不要在您存储的机密中包含任何双引号，因为机密将被解释为字符串。

:::

如果机密上传成功，将显示以下消息：

```sh
✨ Success! Uploaded secret BQ_CLIENT_EMAIL
```

现在导入其余三个字段的机密；`private_key`、`private_key_id` 和 `project_id` 分别为 `BQ_PRIVATE_KEY`、`BQ_PRIVATE_KEY_ID` 和 `BQ_PROJECT_ID`：

```sh
npx wrangler secret put BQ_PRIVATE_KEY
```

```sh
npx wrangler secret put BQ_PRIVATE_KEY_ID
```

```sh
npx wrangler secret put BQ_PROJECT_ID
```

此时，您已成功将从 Google Cloud Platform 下载的 JSON 密钥文件中的三个字段导入 Cloudflare 机密，以在 Worker 中使用。

[机密](/workers/configuration/secrets/)仅在部署后才对 Workers 可用。要在开发期间使它们可用，请[创建一个 `.dev.vars`](/workers/configuration/secrets/#local-development-with-secrets) 文件以在本地存储这些凭据并将其引用为环境变量。

您的 `dev.vars` 文件应如下所示：

```
BQ_CLIENT_EMAIL="<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com"
BQ_CLIENT_KEY="-----BEGIN PRIVATE KEY-----<content_of_your_private_key>-----END PRIVATE KEY-----\n"
BQ_PRIVATE_KEY_ID="<your_private_key_id>"
BQ_PROJECT_ID="<your_project_id>"
```

确保将 `.dev.vars` 添加到项目的 `.gitignore` 文件中，以防止在使用版本控制系统时将凭据上传到存储库。

通过将 `src/index.js` 中的值记录到控制台输出来检查机密是否已正确加载：

```javascript
export default {
	async fetch(request, env, ctx) {
		console.log("BQ_CLIENT_EMAIL: ", env.BQ_CLIENT_EMAIL);
		console.log("BQ_PRIVATE_KEY: ", env.BQ_PRIVATE_KEY);
		console.log("BQ_PRIVATE_KEY_ID: ", env.BQ_PRIVATE_KEY_ID);
		console.log("BQ_PROJECT_ID: ", env.BQ_PROJECT_ID);
		return new Response("Hello World!");
	},
};
```

重新启动 Worker 并运行 `npx wrangler dev`。您应该看到服务器现在提到了新添加的变量：

```
Using vars defined in .dev.vars
Your worker has access to the following bindings:
- Vars:
  - BQ_CLIENT_EMAIL: "(hidden)"
  - BQ_PRIVATE_KEY: "(hidden)"
  - BQ_PRIVATE_KEY_ID: "(hidden)"
  - BQ_PROJECT_ID: "(hidden)"
[wrangler:inf] Ready on http://localhost:8787
```

如果您在浏览器中打开 `http://localhost:8787`，您应该会在运行 `npx wrangler dev` 命令的控制台中看到变量的值，而在浏览器窗口中仍然只能看到 `Hello World!` 文本。

您现在可以从 Worker 访问 GCP 凭据。接下来，您将安装一个库来帮助创建与 GCP API 交互所需的 JSON Web 令牌。

## 3. 安装用于处理 JWT 操作的库

要与 BigQuery 的 REST API 交互，您需要生成一个 [JSON Web 令牌](https://jwt.io/introduction)以使用您在上一步中加载到 Worker 机密中的凭据对您的请求进行身份验证。

在本教程中，您将使用 [jose](https://www.npmjs.com/package/jose?activeTab=readme) 库进行与 JWT 相关的操作。通过在控制台中运行以下命令来安装它：

```sh
npm i jose
```

要验证安装是否成功，您可以运行 `npm list`，它会列出所有已安装的包，并查看是否已添加 `jose` 依赖项：

```sh
<project_name>@0.0.0
/<path_to_your_project>/<project_name>
├── @cloudflare/vitest-pool-workers@0.4.29
├── jose@5.9.2
├── vitest@1.5.0
└── wrangler@3.75.0
```

## 4. 生成 JSON Web 令牌

现在您已经安装了 `jose` 库，是时候导入它并向您的代码中添加一个函数来生成签名的 JWT：

```javascript
import * as jose from 'jose';
...
const generateBQJWT = async (aCryptoKey, env) => {
const algorithm = "RS256";
const audience = "https://bigquery.googleapis.com/";
const expiryAt = (new Date().valueOf() / 1000);
	const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);

	// Generate signed JSON Web Token (JWT)
	return new jose.SignJWT()
    	.setProtectedHeader({
        	typ: 'JWT',
        	alg: algorithm,
        	kid: env.BQ_PRIVATE_KEY_ID
    	})
    	.setIssuer(env.BQ_CLIENT_EMAIL)
    	.setSubject(env.BQ_CLIENT_EMAIL)
    	.setAudience(audience)
    	.setExpirationTime(expiryAt)
    	.setIssuedAt()
    	.sign(privateKey)
}

export default {
	async fetch(request, env, ctx) {
       ...
// 创建 JWT 以对 BigQuery API 调用进行身份验证
    	let bqJWT;
    	try {
        	bqJWT = await generateBQJWT(env);
    	} catch (e) {
        	return new Response('在生成 JWT 时发生错误', { status: 500 })
    	}
	},
       ...
};

```

现在您已经创建了一个 JWT，是时候对 BigQuery 进行 API 调用以获取一些数据了。

## 5. 对 Google BigQuery 进行身份验证的请求

使用上一步中创建的 JWT 令牌，向 BigQuery 的 API 发出 API 请求以从表中检索数据。

您现在将查询您在本教程的先决条件部分中已在 BigQuery 中创建的表。此示例使用在 MIT 许可下使用的 [Hacker News Corpus](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus) 的抽样版本，并已上传到 BigQuery。

```javascript
const queryBQ = async (bqJWT, path) => {
	const bqEndpoint = `https://bigquery.googleapis.com${path}`
	// 在此示例中，text 是正在查询的 BigQuery 表（hn.news_sampled）中的一个字段
	const query = 'SELECT text FROM hn.news_sampled LIMIT 3';
	const response = await fetch(bqEndpoint, {
    	method: "POST",
    	body: JSON.stringify({
        	"query": query
    	}),
    	headers: {
        	Authorization: `Bearer ${bqJWT}`
    	}
	})
	return response.json()
}
...
export default {
	async fetch(request, env, ctx) {
		...
    		let ticketInfo;
    		try {
    		ticketInfo = await queryBQ(bqJWT);
    	} catch (e) {
        	return new Response('An error has occurred while querying BQ', { status: 500 });
    	}
	...
	},
};
```

Having the raw row data from BigQuery means that you can now format it in a JSON-like style up next.

## 6. Format results from the query

Now that you have retrieved the data from BigQuery, it is time to note that a BigQuery API response looks something like this:

```json
{
	...
	"schema": {
    	"fields": [
        	{
            	"name": "title",
            	"type": "STRING",
            	"mode": "NULLABLE"
        	},
        	{
            	"name": "text",
            	"type": "STRING",
            	"mode": "NULLABLE"
        	}
    	]
	},
	...
	"rows": [
    	{
        	"f": [
            	{
                	"v": "<some_value>"
            	},
            	{
                	"v": "<some_value>"
            	}
        	]
    	},
    	{
        	"f": [
            	{
                	"v": "<some_value>"
            	},
            	{
                	"v": "<some_value>"
            	}
        	]
    	},
    	{
        	"f": [
            	{
                	"v": "<some_value>"
            	},
            	{
                	"v": "<some_value>"
            	}
        	]
    	}
	],
	...
}
```

This format may be difficult to read and to work with when iterating through results, which will go on to do later in this tutorial. So you will now implement a function that maps the schema into each individual value, and the resulting output will be easier to read, as shown below. Each row corresponds to an object within an array.

```javascript
[
	{
		title: "<some_value>",
		text: "<some_value>",
	},
	{
		title: "<some_value>",
		text: "<some_value>",
	},
	{
		title: "<some_value>",
		text: "<some_value>",
	},
];
```

Create a `formatRows` function that takes a number of rows and fields returned from the BigQuery response body and returns an array of results as objects with named fields.

```javascript
const formatRows = (rowsWithoutFieldNames, fields) => {
	// Depending on the position of each value, it is known what field you should assign to it.
	const fieldsByIndex = new Map();

	// Load all fields name and have their index in the array result as their key
	fields.forEach((field, index) => {
    	fieldsByIndex.set(index, field.name)
	})

	// Iterate through rows
	const rowsWithFieldNames = rowsWithoutFieldNames.map(row => {
    	// Per each row represented by an array f, iterate through the unnamed values and find their field names by searching them in the fieldsByIndex.
    	let newRow = {}
    	row.f.forEach((field, index) => {
        	const fieldName = fieldsByIndex.get(index);
        	if (fieldName) {
		// For every field in a row, add them to newRow
            	newRow = ({ ...newRow, [fieldName]: field.v });
        	}
    	})
    	return newRow
	})

	return rowsWithFieldNames
}

export default {
	async fetch(request, env, ctx) {
		...
    	// Transform output format into array of objects with named fields
    	let formattedResults;

    	if ('rows' in ticketInfo) {
        	formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);
        	console.log(formattedResults)
    	} else if ('error' in ticketInfo) {
        	return new Response(ticketInfo.error.message, { status: 500 })
    	}
	...
	},
};
```

## 7. Feed data into Workers AI

Now that you have converted the response from the BigQuery API into an array of results, generate some tags and attach an associated sentiment score using an LLM via [Workers AI](/workers-ai/):

```javascript
const generateTags = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    	prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,
	});
}

const generateSentimentScore = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
    	prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,
	});
}

// Iterates through values, sends them to an AI handler and encapsulates all responses into a single Promise
const getAIGeneratedContent = (data, env, aiHandler) => {
	let results = data?.map(dataPoint => {
    	return aiHandler(dataPoint, env)
	})
	return Promise.all(results)
}
...
export default {
	async fetch(request, env, ctx) {
		...
let summaries, sentimentScores;
    	try {
        	summaries = await getAIGeneratedContent(formattedResults, env, generateTags);
        	sentimentScores = await getAIGeneratedContent(formattedResults, env, generateSentimentScore)
    	} catch {
        	return new Response('There was an error while generating the text summaries or sentiment scores')
    	}
},

formattedResults = formattedResults?.map((formattedResult, i) => {
        	if (sentimentScores[i].response && summaries[i].response) {
            	return {
                	...formattedResult,
                	'sentiment': parseFloat(sentimentScores[i].response).toFixed(2),
                	'tags': summaries[i].response.split(',').map((result) => result.trim())
            	}
        	}
    	}
};

```

Uncomment the following lines from the Wrangler file in your project:

<WranglerConfig>

```toml
[ai]
binding = "AI"
```

</WranglerConfig>

Restart the Worker that is running locally, and after doing so, go to your application endpoint:

```sh
curl http://localhost:8787
```

It is likely that you will be asked to log in to your Cloudflare account and grant temporary access to Wrangler (the Cloudflare CLI) to use your account when using Worker AI.

Once you access `http://localhost:8787` you should see an output similar to the following:

```sh
{
  "data": [
	{
  	"text": "You can see a clear spike in submissions right around US Thanksgiving.",
  	"sentiment": "0.61",
  	"tags": [
    	"trends",
    	"submissions",
    	"thanksgiving"
  	]
	},
	{
  	"text": "I didn't test the changes before I published them.  I basically did development on the running server. In fact for about 30 seconds the comments page was broken due to a bug.",
  	"sentiment": "0.35",
  	"tags": [
    	"software",
    	"deployment",
    	"error"
  	]
	},
	{
  	"text": "I second that. As I recall, it's a very enjoyable 700-page brain dump by someone who's really into his subject. The writing has a personal voice; there are lots of asides, dry wit, and typos that suggest restrained editing. The discussion is intelligent and often theoretical (and Bartle is not scared to use mathematical metaphors), but the tone is not academic.",
  	"sentiment": "0.86",
  	"tags": [
    	"review",
    	"game",
    	"design"
  	]
	}
  ]
}
```

The actual values and fields will mostly depend on the query made in Step 5 that are then fed into the LLMs models.

## Final result

All the code shown in the different steps are combined into the following code in `src/index.js`:

```javascript
import * as jose from "jose";

const generateBQJWT = async (env) => {
	const algorithm = "RS256";
	const audience = "https://bigquery.googleapis.com/";
	const expiryAt = new Date().valueOf() / 1000;
	const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);

	// Generate signed JSON Web Token (JWT)
	return new jose.SignJWT()
		.setProtectedHeader({
			typ: "JWT",
			alg: algorithm,
			kid: env.BQ_PRIVATE_KEY_ID,
		})
		.setIssuer(env.BQ_CLIENT_EMAIL)
		.setSubject(env.BQ_CLIENT_EMAIL)
		.setAudience(audience)
		.setExpirationTime(expiryAt)
		.setIssuedAt()
		.sign(privateKey);
};

const queryBQ = async (bgJWT, path) => {
	const bqEndpoint = `https://bigquery.googleapis.com${path}`;
	const query = "SELECT text FROM hn.news_sampled LIMIT 3";
	const response = await fetch(bqEndpoint, {
		method: "POST",
		body: JSON.stringify({
			query: query,
		}),
		headers: {
			Authorization: `Bearer ${bgJWT}`,
		},
	});
	return response.json();
};

const formatRows = (rowsWithoutFieldNames, fields) => {
	// Index to fieldName
	const fieldsByIndex = new Map();

	fields.forEach((field, index) => {
		fieldsByIndex.set(index, field.name);
	});

	const rowsWithFieldNames = rowsWithoutFieldNames.map((row) => {
		// Map rows into an array of objects with field names
		let newRow = {};
		row.f.forEach((field, index) => {
			const fieldName = fieldsByIndex.get(index);
			if (fieldName) {
				newRow = { ...newRow, [fieldName]: field.v };
			}
		});
		return newRow;
	});

	return rowsWithFieldNames;
};

const generateTags = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
		prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,
	});
};

const generateSentimentScore = (data, env) => {
	return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
		prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,
	});
};

const getAIGeneratedContent = (data, env, aiHandler) => {
	let results = data?.map((dataPoint) => {
		return aiHandler(dataPoint, env);
	});
	return Promise.all(results);
};

export default {
	async fetch(request, env, ctx) {
		// Create JWT to authenticate the BigQuery API call
		let bqJWT;
		try {
			bqJWT = await generateBQJWT(env);
		} catch (error) {
			console.log(error);
			return new Response("An error has occurred while generating the JWT", {
				status: 500,
			});
		}

		// Fetch results from BigQuery
		let ticketInfo;
		try {
			ticketInfo = await queryBQ(
				bqJWT,
				`/bigquery/v2/projects/${env.BQ_PROJECT_ID}/queries`,
			);
		} catch (error) {
			console.log(error);
			return new Response("An error has occurred while querying BQ", {
				status: 500,
			});
		}

		// Transform output format into array of objects with named fields
		let formattedResults;
		if ("rows" in ticketInfo) {
			formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);
		} else if ("error" in ticketInfo) {
			return new Response(ticketInfo.error.message, { status: 500 });
		}

		// Generate AI summaries and sentiment scores
		let summaries, sentimentScores;
		try {
			summaries = await getAIGeneratedContent(
				formattedResults,
				env,
				generateTags,
			);
			sentimentScores = await getAIGeneratedContent(
				formattedResults,
				env,
				generateSentimentScore,
			);
		} catch {
			return new Response(
				"There was an error while generating the text summaries or sentiment scores",
			);
		}

		// Add AI summaries and sentiment scores to previous results
		formattedResults = formattedResults?.map((formattedResult, i) => {
			if (sentimentScores[i].response && summaries[i].response) {
				return {
					...formattedResult,
					sentiment: parseFloat(sentimentScores[i].response).toFixed(2),
					tags: summaries[i].response.split(",").map((result) => result.trim()),
				};
			}
		});

		const response = { data: formattedResults };

		return new Response(JSON.stringify(response), {
			headers: { "Content-Type": "application/json" },
		});
	},
};
```

If you wish to deploy this Worker, you can do so by running `npx wrangler deploy`:

```sh
Total Upload: <size_of_your_worker> KiB / gzip: <compressed_size_of_your_worker> KiB
Uploaded <name_of_your_worker> (x sec)
Deployed <name_of_your_worker> triggers (x sec)
  https://<your_public_worker_endpoint>
Current Version ID: <worker_script_version_id>
```

This will create a public endpoint that you can use to access the Worker globally. Please keep this in mind when using production data, and make sure to include additional access controls in place.

## Conclusion

In this tutorial, you have learnt how to integrate Google BigQuery and Cloudflare Workers by creating a GCP service account key and storing part of it as Worker secrets. This was later imported in the code, and by using the `jose` npm library, you created a JSON Web Token to authenticate the API query to BigQuery.

Once you obtained the results, you formatted them to later be passed to generative AI models via Workers AI to generate tags and to perform sentiment analysis on the extracted data.

## Next Steps

If, instead of displaying the results of ingesting the data to the AI model in a browser, your workflow requires fetching and store data (for example in [R2](/r2/) or [D1](/d1/)) on regular intervals, you may want to consider adding a [scheduled handler](/workers/runtime-apis/handlers/scheduled/) for this Worker. It allows triggering the Worker with a predefined cadence via a [Cron Trigger](/workers/configuration/cron-triggers/). Consider reviewing the Reference Architecture Diagrams on [Ingesting BigQuery Data into Workers AI](/reference-architecture/diagrams/ai/bigquery-workers-ai/).

A use case to ingest data from other sources, like you did in this tutorial, is to create a RAG system. If this sounds relevant to you, please check out the tutorial [Build a Retrieval Augmented Generation (RAG) AI](/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/).

To learn more about what other AI models you can use at Cloudflare, please visit the [Workers AI](/workers-ai) section of our docs.

---

# Workers Binding

URL: https://developers.cloudflare.com/workers-ai/features/batch-api/workers-binding/

import {
	Render,
	PackageManagers,
	TypeScriptExample,
	WranglerConfig,
	CURL,
} from "~/components";

You can use Workers Bindings to interact with the Batch API.

## Send a Batch request

Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the `queueRequest: true` property (which is what controlls queueing behavior).

:::note[Note]

Ensure that the total payload is under 10 MB.

:::

```ts {26} title="src/index.ts"
export interface Env {
	AI: Ai;
}
export default {
	async fetch(request, env): Promise<Response> {
		const embeddings = await env.AI.run(
			"@cf/baai/bge-m3",
			{
				requests: [
					{
						query: "This is a story about Cloudflare",
						contexts: [
							{
								text: "This is a story about an orange cloud",
							},
							{
								text: "This is a story about a llama",
							},
							{
								text: "This is a story about a hugging emoji",
							},
						],
					},
				],
			},
			{ queueRequest: true },
		);

		return Response.json(embeddings);
	},
} satisfies ExportedHandler<Env>;
```

```json output {4}
{
	"status": "queued",
	"model": "@cf/baai/bge-m3",
	"request_id": "000-000-000"
}
```

You will get a response with the following values:

- **`status`**: Indicates that your request is queued.
- **`request_id`**: A unique identifier for the batch request.
- **`model`**: The model used for the batch inference.

Of these, the `request_id` is important for when you need to [poll the batch status](#poll-batch-status).

### Poll batch status

Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status `queued` or `running` indicating that the request is still in the queue or being processed.

```typescript title=src/index.ts
export interface Env {
	AI: Ai;
}

export default {
	async fetch(request, env): Promise<Response> {
		const status = await env.AI.run("@cf/baai/bge-m3", {
			request_id: "000-000-000",
		});

		return Response.json(status);
	},
} satisfies ExportedHandler<Env>;
```

```json output
{
	"responses": [
		{
			"id": 0,
			"result": {
				"response": [
					{ "id": 0, "score": 0.73974609375 },
					{ "id": 1, "score": 0.642578125 },
					{ "id": 2, "score": 0.6220703125 }
				]
			},
			"success": true,
			"external_reference": null
		}
	],
	"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
}
```

When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.

---

# REST API

URL: https://developers.cloudflare.com/workers-ai/features/batch-api/rest-api/

If you prefer to work directly with the REST API instead of a [Cloudflare Worker](/workers-ai/features/batch-api/workers-binding/), below are the steps on how to do it:

## 1. Sending a Batch Request

Make a POST request using the following pattern. You can pass `external_reference` as a unique ID per-prompt that will be returned in the response.

```bash title="Sending a batch request" {11,15,19}
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
 --header "Authorization: Bearer $API_TOKEN" \
 --header 'Content-Type: application/json' \
 --json '{
    "requests": [
        {
            "query": "This is a story about Cloudflare",
            "contexts": [
                {
                    "text": "This is a story about an orange cloud",
                    "external_reference": "story1"
                },
                {
                    "text": "This is a story about a llama",
                    "external_reference": "story2"
                },
                {
                    "text": "This is a story about a hugging emoji",
                    "external_reference": "story3"
                }
            ]
        }
    ]
  }'
```

```json output {4}
{
	"result": {
		"status": "queued",
		"request_id": "768f15b7-4fd6-4498-906e-ad94ffc7f8d2",
		"model": "@cf/baai/bge-m3"
	},
	"success": true,
	"errors": [],
	"messages": []
}
```

## 2. Retrieving the Batch Response

After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:

```bash title="Retrieving a response"
curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \
 --header "Authorization: Bearer $API_TOKEN" \
 --header 'Content-Type: application/json' \
 --json '{
    "request_id": "<uuid>"
  }'
```

```json output
{
	"result": {
		"responses": [
			{
				"id": 0,
				"result": {
					"response": [
						{ "id": 0, "score": 0.73974609375 },
						{ "id": 1, "score": 0.642578125 },
						{ "id": 2, "score": 0.6220703125 }
					]
				},
				"success": true,
				"external_reference": null
			}
		],
		"usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }
	},
	"success": true,
	"errors": [],
	"messages": []
}
```

---

# 函数调用

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/

import { Stream, TabItem, Tabs } from "~/components";

函数调用使人们能够使用大型语言模型 (LLM)，并利用模型响应来执行函数或与外部 API 交互。开发人员通常会定义一组函数以及每个函数所需的输入模式，我们称之为 `tools`。然后，模型会智能地理解何时需要进行工具调用，并返回一个 JSON 输出，用户需要将该输出提供给另一个函数或 API。

实质上，函数调用允许您通过执行代码或进行额外的 API 调用来使用 LLM 执行操作。

<Stream id="603e94c9803b4779dd612493c0dd7125" title="placeholder" />

## 我如何使用函数调用？

Workers AI 具有[嵌入式函数调用](/workers-ai/features/function-calling/embedded/)，允许您在推理调用旁边执行函数代码。我们有一个名为 [`@cloudflare/ai-utils`](https://www.npmjs.com/package/@cloudflare/ai-utils) 的包来帮助实现这一点，我们已经在 [Github](https://github.com/cloudflare/ai-utils) 上开源了它。

对于行业标准的函数调用，请查看有关[传统函数调用](/workers-ai/features/function-calling/traditional/)的文档。

为了向您展示嵌入式函数调用的价值，请看下面的示例，该示例比较了传统函数调用和嵌入式函数调用。嵌入式函数调用使我们能够将代码行数从 77 行减少到 31 行。

<Tabs> <TabItem label="嵌入式">

```sh
# ai-utils 包支持嵌入式函数调用
npm i @cloudflare/ai-utils
```

```js title="嵌入式函数调用示例"
import {
	createToolsFromOpenAPISpec,
	runWithTools,
	autoTrimTools,
} from "@cloudflare/ai-utils";

export default {
	async fetch(request, env, ctx) {
		const response = await runWithTools(
			env.AI,
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [{ role: "user", content: "谁是 Github 上的 Cloudflare？" }],
				tools: [
					// 您可以直接传递 OpenAPI 规范链接或内容
					...(await createToolsFromOpenAPISpec(
						"https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json",
						{
							overrides: [
								{
									// 对于 *.github.com 上的所有请求，我们需要添加一个 User-Agent。
									matcher: ({ url, method }) => {
										return url.hostname === "api.github.com";
									},
									values: {
										headers: {
											"User-Agent":
												"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
										},
									},
								},
							],
						},
					)),
				],
			},
		).then((response) => {
			return response;
		});

		return new Response(JSON.stringify(response));
	},
};
```

</TabItem> <TabItem label="传统">

```js title="传统函数调用示例"
export default {
	async fetch(request, env, ctx) {
		const response = await env.AI.run(
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [{ role: "user", content: "谁是 Github 上的 Cloudflare？" }],
				tools: [
					{
						name: "getGithubUser",
						description: "提供有关拥有 GitHub 帐户的人的公开信息。",
						parameters: {
							type: "object",
							properties: {
								username: {
									type: "string",
									description: "GitHub 用户帐户的句柄。",
								},
							},
							required: ["username"],
						},
					},
				],
			},
		);

		const selected_tool = response.tool_calls[0];
		let res;

		if (selected_tool.name == "getGithubUser") {
			try {
				const username = selected_tool.arguments.username;
				const url = `https://api.github.com/users/${username}`;
				res = await fetch(url, {
					headers: {
						// Github API 需要一个 User-Agent 标头
						"User-Agent":
							"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
					},
				}).then((res) => res.json());
			} catch (error) {
				return error;
			}
		}

		const finalResponse = await env.AI.run(
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [
					{
						role: "user",
						content: "谁是 Github 上的 Cloudflare？",
					},
					{
						role: "assistant",
						content: JSON.stringify(selected_tool),
					},
					{
						role: "tool",
						content: JSON.stringify(res),
					},
				],
				tools: [
					{
						name: "getGithubUser",
						description: "提供有关拥有 GitHub 帐户的人的公开信息。",
						parameters: {
							type: "object",
							properties: {
								username: {
									type: "string",
									description: "GitHub 用户帐户的句柄。",
								},
							},
							required: ["username"],
						},
					},
				],
			},
		);
		return new Response(JSON.stringify(finalResponse));
	},
};
```

</TabItem> </Tabs>

## 哪些模型支持函数调用？

有一些开源模型经过微调可以进行函数调用。在浏览我们的[模型目录](/workers-ai/models/)时，请查找旁边带有函数调用属性的模型。例如，[@hf/nousresearch/hermes-2-pro-mistral-7b](/workers-ai/models/hermes-2-pro-mistral-7b/) 是 Mistral 7B 的一个微调变体，可用于函数调用。

---

# 传统

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/traditional/

此页面显示了如何按照行业标准进行传统的函数调用。Workers AI 还提供[嵌入式函数调用](/workers-ai/features/function-calling/embedded/)，这比传统的函数调用要简单得多。

通过传统的函数调用，您可以使用名称、描述和工具参数定义一个工具数组。下面的示例显示了如何在推理请求中将名为 `getWeather` 的工具传递给模型。

```js title="传统函数调用示例"
const response = await env.AI.run("@hf/nousresearch/hermes-2-pro-mistral-7b", {
	messages: [
		{
			role: "user",
			content: "伦敦的天气怎么样？",
		},
	],
	tools: [
		{
			name: "getWeather",
			description: "返回给定纬度和经度的天气",
			parameters: {
				type: "object",
				properties: {
					latitude: {
						type: "string",
						description: "给定位置的纬度",
					},
					longitude: {
						type: "string",
						description: "给定位置的经度",
					},
				},
				required: ["latitude", "longitude"],
			},
		},
	],
});

return new Response(JSON.stringify(response.tool_calls));
```

然后，LLM 将返回一个带有必需参数和被调用工具名称的 JSON 对象。然后，您可以将此 JSON 对象传递以进行 API 调用。

```json
[
	{
		"arguments": { "latitude": "51.5074", "longitude": "-0.1278" },
		"name": "getWeather"
	}
]
```

有关如何进行函数调用的工作示例，请查看我们的[演示应用程序](https://github.com/craigsdennis/lightbulb-moment-tool-calling/blob/main/src/index.ts)。

---

# Fine-tunes

URL: https://developers.cloudflare.com/workers-ai/features/fine-tunes/

import { Feature } from "~/components";

Learn how to use Workers AI to get fine-tuned inference.

<Feature header="Fine-tuned inference with LoRAs" href="/workers-ai/features/fine-tunes/loras/" cta="Run inference with LoRAs">

Upload a LoRA adapter and run fine-tuned inference with one of our base models.

</Feature>

---

## What is fine-tuning?

Fine-tuning is a general term for modifying an AI model by continuing to train it with additional data. The goal of fine-tuning is to increase the probability that a generation is similar to your dataset. Training a model from scratch is not practical for many use cases given how expensive and time consuming they can be to train. By fine-tuning an existing pre-trained model, you benefit from its capabilities while also accomplishing your desired task.

[Low-Rank Adaptation](https://arxiv.org/abs/2106.09685) (LoRA) is a specific fine-tuning method that can be applied to various model architectures, not just LLMs. It is common that the pre-trained model weights are directly modified or fused with additional fine-tune weights in traditional fine-tuning methods. LoRA, on the other hand, allows for the fine-tune weights and pre-trained model to remain separate, and for the pre-trained model to remain unchanged. The end result is that you can train models to be more accurate at specific tasks, such as generating code, having a specific personality, or generating images in a specific style.

---

# Using LoRA adapters

URL: https://developers.cloudflare.com/workers-ai/features/fine-tunes/loras/

import { TabItem, Tabs } from "~/components";

Workers AI supports fine-tuned inference with adapters trained with [Low-Rank Adaptation](https://blog.cloudflare.com/fine-tuned-inference-with-loras). This feature is in open beta and free during this period.

## Limitations

- We only support LoRAs for the following models (must not be quantized):

  - `@cf/meta/llama-3.2-11b-vision-instruct`
  - `@cf/meta/llama-3.3-70b-instruct-fp8-fast`
  - `@cf/meta/llama-guard-3-8b`
  - `@cf/meta/llama-3.1-8b-instruct-fast (soon)`
  - `@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`
  - `@cf/qwen/qwen2.5-coder-32b-instruct`
  - `@cf/qwen/qwq-32b`
  - `@cf/mistralai/mistral-small-3.1-24b-instruct`
  - `@cf/google/gemma-3-12b-it`

- Adapter must be trained with rank `r <=8` as well as larger ranks if up to 32. You can check the rank of a pre-trained LoRA adapter through the adapter's `config.json` file
- LoRA adapter file must be < 300MB
- LoRA adapter files must be named `adapter_config.json` and `adapter_model.safetensors` exactly
- You can test up to 30 LoRA adapters per account

---

## Choosing compatible LoRA adapters

### Finding open-source LoRA adapters

We have started a [Hugging Face Collection](https://huggingface.co/collections/Cloudflare/workers-ai-compatible-loras-6608dd9f8d305a46e355746e) that lists a few LoRA adapters that are compatible with Workers AI. Generally, any LoRA adapter that fits our limitations above should work.

### Training your own LoRA adapters

To train your own LoRA adapter, follow the [tutorial](/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/).

---

## Uploading LoRA adapters

In order to run inference with LoRAs on Workers AI, you'll need to create a new fine tune on your account and upload your adapter files. You should have a `adapter_model.safetensors` file with model weights and `adapter_config.json` with your config information. _Note that we only accept adapter files in these types._

Right now, you can't edit a fine tune's asset files after you upload it. We will support this soon, but for now you will need to create a new fine tune and upload files again if you would like to use a new LoRA.

Before you upload your LoRA adapter, you'll need to edit your `adapter_config.json` file to include `model_type` as one of `mistral`, `gemma` or `llama` like below.

```json null {10}
{
  "alpha_pattern": {},
  "auto_mapping": null,
  ...
  "target_modules": [
    "q_proj",
    "v_proj"
  ],
  "task_type": "CAUSAL_LM",
  "model_type": "mistral",
}
```

### Wrangler

You can create a finetune and upload your LoRA adapter via wrangler with the following commands:

```bash title="wrangler CLI" {1,7}
npx wrangler ai finetune create <model_name> <finetune_name> <folder_path>
#🌀 Creating new finetune "test-lora" for model "@cf/mistral/mistral-7b-instruct-v0.2-lora"...
#🌀 Uploading file "/Users/abcd/Downloads/adapter_config.json" to "test-lora"...
#🌀 Uploading file "/Users/abcd/Downloads/adapter_model.safetensors" to "test-lora"...
#✅ Assets uploaded, finetune "test-lora" is ready to use.

npx wrangler ai finetune list
┌──────────────────────────────────────┬─────────────────┬─────────────┐
│ finetune_id                          │ name            │ description │
├──────────────────────────────────────┼─────────────────┼─────────────┤
│ 00000000-0000-0000-0000-000000000000 │ test-lora       │             │
└──────────────────────────────────────┴─────────────────┴─────────────┘
```

### REST API

Alternatively, you can use our REST API to create a finetune and upload your adapter files. You will need a Cloudflare API Token with `Workers AI: Edit` permissions to make calls to our REST API, which you can generate via the Cloudflare Dashboard.

#### Creating a fine-tune on your account

```bash title="cURL"
## Input: user-defined name of fine tune
## Output: unique finetune_id

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
    -H "Authorization: Bearer {API_TOKEN}" \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "SUPPORTED_MODEL_NAME",
      "name": "FINETUNE_NAME",
      "description": "OPTIONAL_DESCRIPTION"
    }'
```

#### Uploading your adapter weights and config

You have to call the upload endpoint each time you want to upload a new file, so you usually run this once for `adapter_model.safetensors` and once for `adapter_config.json`. Make sure you include the `@` before your path to files.

You can either use the finetune `name` or `id` that you used when you created the fine tune.

```bash title="cURL"
## Input: finetune_id, adapter_model.safetensors, then adapter_config.json
## Output: success true/false

curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/{FINETUNE_ID}/finetune-assets/ \
    -H 'Authorization: Bearer {API_TOKEN}' \
    -H 'Content-Type: multipart/form-data' \
    -F 'file_name=adapter_model.safetensors' \
    -F 'file=@{PATH/TO/adapter_model.safetensors}'
```

#### List fine-tunes in your account

You can call this method to confirm what fine-tunes you have created in your account

<Tabs> <TabItem label="curl">

```bash title="cURL"
## Input: n/a
## Output: success true/false

curl -X GET https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/ \
    -H 'Authorization: Bearer {API_TOKEN}'
```

</TabItem> <TabItem label="json output">

```json title="Example JSON output"
# Example output JSON
{
  "success": true,
  "result": [
    [{
       "id": "00000000-0000-0000-0000-000000000",
       "model": "@cf/meta-llama/llama-2-7b-chat-hf-lora",
       "name": "llama2-finetune",
       "description": "test"
    },
    {
       "id": "00000000-0000-0000-0000-000000000",
       "model": "@cf/mistralai/mistral-7b-instruct-v0.2-lora",
       "name": "mistral-finetune",
       "description": "test"
    }]
  ]
}
```

</TabItem> </Tabs>

---

## Running inference with LoRAs

To make inference requests and apply the LoRA adapter, you will need your model and finetune `name` or `id`. You should use the chat template that your LoRA was trained on, but you can try running it with `raw: true` and the messages template like below.

<Tabs> <TabItem label="workers ai sdk">

```javascript null {5-6}
const response = await env.AI.run(
	"@cf/mistralai/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
	{
		messages: [{ role: "user", content: "Hello world" }],
		raw: true, //skip applying the default chat template
		lora: "00000000-0000-0000-0000-000000000", //the finetune id OR name
	},
);
```

</TabItem> <TabItem label="rest api">

```bash null {5-6}
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/mistral/mistral-7b-instruct-v0.2-lora \
  -H 'Authorization: Bearer {API_TOKEN}' \
  -d '{
    "messages": [{"role": "user", "content": "Hello world"}],
    "raw": "true",
    "lora": "00000000-0000-0000-0000-000000000"
  }'
```

</TabItem> </Tabs>

---

# Public LoRA adapters

URL: https://developers.cloudflare.com/workers-ai/features/fine-tunes/public-loras/

Cloudflare offers a few public LoRA adapters that can immediately be used for fine-tuned inference. You can try them out immediately via our [playground](https://playground.ai.cloudflare.com).

Public LoRAs will have the name `cf-public-x`, and the prefix will be reserved for Cloudflare.

:::note


Have more LoRAs you would like to see? Let us know on [Discord](https://discord.cloudflare.com).


:::

| Name                                                                       | Description                        | Compatible with                                                                     |
| -------------------------------------------------------------------------- | ---------------------------------- | ----------------------------------------------------------------------------------- |
| [cf-public-magicoder](https://huggingface.co/predibase/magicoder)          | Coding tasks in multiple languages | `@cf/mistral/mistral-7b-instruct-v0.1` <br/> `@hf/mistral/mistral-7b-instruct-v0.2` |
| [cf-public-jigsaw-classification](https://huggingface.co/predibase/jigsaw) | Toxic comment classification       | `@cf/mistral/mistral-7b-instruct-v0.1` <br/> `@hf/mistral/mistral-7b-instruct-v0.2` |
| [cf-public-cnn-summarization](https://huggingface.co/predibase/cnn)        | Article summarization              | `@cf/mistral/mistral-7b-instruct-v0.1` <br/> `@hf/mistral/mistral-7b-instruct-v0.2` |

You can also list these public LoRAs with an API call:

```bash
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/finetunes/public \
 --header 'Authorization: Bearer {cf_token}'
```

## Running inference with public LoRAs

To run inference with public LoRAs, you just need to define the LoRA name in the request.

We recommend that you use the prompt template that the LoRA was trained on. You can find this in the HuggingFace repos linked above for each adapter.

### cURL

```bash null {10}
curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/mistral/mistral-7b-instruct-v0.1 \
  --header 'Authorization: Bearer {cf_token}' \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "Write a python program to check if a number is even or odd."
      }
    ],
    "lora": "cf-public-magicoder"
  }'
```

### JavaScript

```js null {11}
const answer = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.1',
  {
    stream: true,
    raw: true,
    messages: [
      {
        "role": "user",
        "content": "Summarize the following: Some newspapers, TV channels and well-known companies publish false news stories to fool people on 1 April. One of the earliest examples of this was in 1957 when a programme on the BBC, the UKs national TV channel, broadcast a report on how spaghetti grew on trees. The film showed a family in Switzerland collecting spaghetti from trees and many people were fooled into believing it, as in the 1950s British people didnt eat much pasta and many didnt know how it was made! Most British people wouldnt fall for the spaghetti trick today, but in 2008 the BBC managed to fool their audience again with their Miracles of Evolution trailer, which appeared to show some special penguins that had regained the ability to fly. Two major UK newspapers, The Daily Telegraph and the Daily Mirror, published the important story on their front pages."
      }
    ],
    lora: "cf-public-cnn-summarization"
  });
```

---

# 将新的 AI 模型添加到您的游乐场（第 2 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/

import { Details, DirectoryListing, Stream } from "~/components";

在第 2 部分中，Kristian 通过向您展示如何集成新的 AI 模型并引入新的参数来扩展第 1 部分中构建的现有环境，这些参数允许您自定义图像的生成方式。

<Stream
	id="167ba3a7a86f966650f3315e6cb02e0d"
	title="将新的 AI 模型添加到您的游乐场（第 2 部分）"
	thumbnail="13.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 构建 AI 图像生成器游乐场（第 1 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/

import { Details, DirectoryListing, Stream } from "~/components";

Workers AI 上新的 flux 模型是我们迄今为止最强大的文本到图像 AI 模型。在本视频中，我们将向您展示如何在短短几分钟内一部署您自己的 Workers AI 图像游乐场。

有许多企业建立在 AI 图像生成模型之上。使用 Workers AI，您可以访问业内最好的模型，而无需担心推理、运营或部署。我们提供用于 AI 图像生成的 API，并在几秒钟内取回图像。

<Stream
	id="aeafae151e84a81be19c52c2348e9bab"
	title="构建 AI 图像生成器游乐场（第 1 部分）"
	thumbnail="2.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 使用 R2 存储和编录 AI 生成的图像（第 3 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/

import { Details, DirectoryListing, Stream } from "~/components";

在 AI 图像游乐场系列的最后一部分，Kristian 将教您如何利用 Cloudflare 的 [R2](/r2) 对象存储来维护和跟踪每个 AI 生成的图像。

<Stream
	id="86488269da24984c76fb10f69f4abb44"
	title="存储和编录 AI 生成的图像（第 3 部分）"
	thumbnail="2.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 使用 R2 存储和AI生成的图像（第 3 部分）

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation--playground/image-generator-store-and-catalog/

import { Details, DirectoryListing, Stream } from "~/components";

在 AI 图像游乐场系列的最后一部分，Kristian 将教您如何利用 Cloudflare 的 [R2](/r2) 对象存储来维护和跟踪每个 AI 生成的图像。

<Stream
	id="86488269da24984c76fb10f69f4abb44"
	title="存储和编录 AI 生成的图像（第 3 部分）"
	thumbnail="2.5s"
	showMoreVideos={false}
/>

请参阅 AI 图像游乐场 [GitHub 存储库](https://github.com/kristianfreeman/workers-ai-image-playground) 以在本地进行操作。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# 如何使用 Workers AI 构建图像生成器

URL: https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/

import { Details, DirectoryListing, Stream } from "~/components";

在本系列视频中，Kristian Freeman 构建了一个 AI 图像游乐场。要开始使用，请单击下面的第 1 部分。

<Details header="系列视频" open>

<DirectoryListing folder="workers-ai/guides/tutorials/image-generation-playground" />

</Details>

---

# API 参考

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/api-reference/

了解有关[嵌入式函数调用](/workers-ai/features/function-calling/embedded)的 API 参考的更多信息。

## runWithTools

此包装器方法使您能够进行嵌入式函数调用。您可以向其传递 AI 绑定、模型、输入（`messages` 数组和 `tools` 数组）以及可选配置。

- `AI Binding`Ai
  - AI 绑定，例如 `env.AI`。
- `model`BaseAiTextGenerationModels
  - 支持函数调用的模型的 ID。例如，`@hf/nousresearch/hermes-2-pro-mistral-7b`。
- `input`Object
  - `messages`RoleScopedChatInput\[]
  - `tools`AiTextGenerationToolInputWithFunction\[]
- `config`Object
  - `streamFinalResponse`boolean 可选
  - `maxRecursiveToolRuns`number 可选
  - `strictValidation`boolean 可选
  - `verbose`boolean 可选
  - `trimFunction`boolean 可选 - 对于 `trimFunction`，您可以向其传递 `autoTrimTools`，这是我们设计的另一个辅助方法，用于在将其发送以进行推理之前自动选择正确的工具（使用 LLM）。这意味着您的最终推理调用将具有更少的输入令牌。

## createToolsFromOpenAPISpec

此方法使您可以根据 OpenAPI 规范自动创建工具模式，因此您不必手动编写或硬编码工具模式。您可以以 JSON 或 YAML 格式传递任何 API 的 OpenAPI 规范。

`createToolsFromOpenAPISpec` 有一个配置输入，如果您需要提供诸如身份验证或用户代理之类的标头，则允许您执行覆盖。

- `spec`string
  - JSON 或 YAML 格式的 OpenAPI 规范，或指向远程 OpenAPI 规范的 URL。
- `config`Config 可选 - createToolsFromOpenAPISpec 函数的配置选项
  - `overrides`ConfigRule\[] 可选
  - `matchPatterns`RegExp\[] 可选
  - `options` Object 可选 \{
    `verbose` boolean 可选
    \}

---

# 入门指南

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/get-started/

import { TypeScriptExample, PackageManagers } from "~/components";

本指南将指导您设置和部署您的第一个带有嵌入式函数调用的 Workers AI 项目。您将使用 Workers、Workers AI 绑定、[`ai-utils 包`](https://github.com/cloudflare/ai-utils) 和一个大型语言模型 (LLM)，以在 Cloudflare 全球网络上部署您的第一个带有嵌入式函数调用的 AI 驱动的应用程序。

## 1. 使用 Workers AI 创建一个 Worker 项目

请按照 [Workers AI 入门指南](/workers-ai/get-started/workers-wrangler/) 直到第 2 步。

## 2. 安装额外的 npm 包

接下来，在您的项目存储库中运行以下命令以安装 Worker AI 实用程序包。

<PackageManagers pkg="@cloudflare/ai-utils" />

## 3. 添加 Workers AI 嵌入式函数调用

使用以下代码更新您应用程序目录中的 `index.ts` 文件：

<TypeScriptExample filename="index.ts">

```ts
import { runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
};

export default {
	async fetch(request, env, ctx) {
		// 定义函数
		const sum = (args: { a: number; b: number }): Promise<string> => {
			const { a, b } = args;
			return Promise.resolve((a + b).toString());
		};
		// 使用函数调用运行 AI 推理
		const response = await runWithTools(
			env.AI,
			// 支持函数调用的模型
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				// 消息
				messages: [
					{
						role: "user",
						content: "123123123 + 10343030 的结果是多少？",
					},
				],
				// AI 模型可以利用的可用工具的定义
				tools: [
					{
						name: "sum",
						description: "将两个数字相加并返回结果",
						parameters: {
							type: "object",
							properties: {
								a: { type: "number", description: "第一个数字" },
								b: { type: "number", description: "第二个数字" },
							},
							required: ["a", "b"],
						},
						// 引用先前定义的函数
						function: sum,
					},
				],
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

</TypeScriptExample>

此示例使用 `import { runWithTools} from "@cloudflare/ai-utils"` 导入实用程序，并遵循下面的 API 参考。

此外，在此示例中，我们定义并描述了 LLM 可以用来响应用户查询的工具列表。在这里，该列表仅包含一个工具，即 `sum` 函数。

由 `runWithTools` 函数抽象出来，会发生以下步骤：

```mermaid
sequenceDiagram
    participant Worker as Worker
    participant WorkersAI as Workers AI

    Worker->>+WorkersAI: 发送消息、函数调用提示和可用工具
    WorkersAI->>+Worker: 为函数调用选择工具和参数
    Worker-->>-Worker: 执行函数
    Worker-->>+WorkersAI: 发送消息、函数调用提示和函数结果
    WorkersAI-->>-Worker: 发送包含函数输出的响应
```

`ai-utils 包` 也在 [Github](https://github.com/cloudflare/ai-utils) 上开源。

## 4. 本地开发和部署

请按照 [Workers AI 入门指南](/workers-ai/get-started/workers-wrangler/) 的第 4 步和第 5 步进行本地开发和部署。

:::note[Workers AI 嵌入式函数调用费用]

嵌入式函数调用运行 Workers AI 推理请求。将收取推理（例如令牌）使用的标准费用。
嵌入式函数代码执行期间消耗的资源（例如 CPU 时间）将像任何其他 Worker 代码执行一样收费。

:::

## API 参考

有关更多详细信息，请参阅 [API 参考](/workers-ai/features/function-calling/embedded/api-reference/)。

---

# 嵌入式

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/

import { DirectoryListing } from "~/components";

Cloudflare 具有独特的[嵌入式函数调用](https://blog.cloudflare.com/embedded-function-calling)功能，允许您在工具调用推理的同时执行函数代码。我们的 npm 包 [`@cloudflare/ai-utils`](https://www.npmjs.com/package/@cloudflare/ai-utils) 是入门的开发人员工具包。

嵌入式函数调用可用于轻松创建与网站和 API 交互的复杂代理，例如使用自然语言在 Google 日历上创建会议、将数据保存到 Notion、自动将请求路由到其他 API、将数据保存到 R2 存储桶 - 或者同时完成所有这些。您只需要一个提示和一个 OpenAPI 规范即可开始。

:::caution[REST API 支持]

嵌入式函数调用依赖于 Workers 平台的原生功能。这意味着嵌入式函数调用仅通过 [Cloudflare Workers](/workers-ai/get-started/workers-wrangler/) 支持，而不通过 [REST API](/workers-ai/get-started/rest-api/) 支持。

:::

## 资源

<DirectoryListing />

---

# 故障排除

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/troubleshooting/

本节将介绍故障排除工具并解决常见错误。

## 日志记录

Workers 的常规 [日志记录](/workers/observability/logs/) 功能也适用于嵌入式函数调用。

### 函数调用

可以使用 `console.log()` 像在任何 Worker 中一样记录工具的调用：

```ts title="记录工具调用" {6}
export default {
	async fetch(request, env, ctx) {
		const sum = (args: { a: number; b: number }): Promise<string> => {
			const { a, b } = args;
      // 从嵌入式函数调用中记录日志
      console.log(`sum 函数已使用参数 a: ${a} 和 b: ${b} 被调用`)
			return Promise.resolve((a + b).toString());
		};
    ...
  }
}
```

### 在 `runWithTools` 中记录日志

`runWithTools` 函数有一个 `verbose` 模式，它会发出有用的日志，用于调试函数调用以及输入和输出统计信息。

```ts title="启用详细模式" {13}
const response = await runWithTools(
  env.AI,
  '@hf/nousresearch/hermes-2-pro-mistral-7b',
  {
    messages: [
      ...
    ],
    tools: [
      ...
    ],
  },
  // 启用详细模式
  { verbose: true }
);
```

## 性能

要使用嵌入式函数响应 LLM 提示，可能需要多个 AI 推理请求和函数调用，这可能会影响用户体验。

考虑以下几点来提高性能：

- 缩短提示（以减少输入处理时间）
- 减少提供的工具数量
- 将最终响应流式传输给最终用户（以最小化交互时间）。请参阅以下示例：

```ts title="流式响应示例" {15}
async fetch(request, env, ctx) {
  const response = (await runWithTools(
    env.AI,
    '@hf/nousresearch/hermes-2-pro-mistral-7b',
    {
      messages: [
        ...
      ],
      tools: [
        ...
      ],
    },
    {
      // 启用响应流
      streamFinalResponse: true,
    }
  )) as ReadableStream;

  // 设置流式响应头
  return new Response(response, {
    headers: {
      'content-type': 'text/event-stream',
    },
  });
}
```

## 常见错误

如果您收到 `BadInput` 错误，则您的输入可能超出了我们模型的当前上下文窗口。请尝试减少输入令牌以解决此错误。

---

# 使用 fetch() 处理程序

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/fetch/

一个非常常见的用例是为 LLM 提供通过函数调用执行 API 调用的能力。

在此示例中，LLM 将检索未来 5 天的天气预报。
为此，定义了一个 `getWeather` 函数，并将其作为工具传递给 LLM。

`getWeather` 函数从请求中提取用户的位置，并通过 Workers 的 [`Fetch API`](/workers/runtime-apis/fetch/) 调用外部天气 API 并返回结果。

```ts title="带有 fetch() 的嵌入式函数调用示例"
import { runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
};

export default {
	async fetch(request, env, ctx) {
		// 定义函数
		const getWeather = async (args: { numDays: number }) => {
			const { numDays } = args;
			// 根据 https://developers.cloudflare.com/workers/runtime-apis/request/#incomingrequestcfproperties 从请求中提取位置
			const lat = request.cf?.latitude;
			const long = request.cf?.longitude;

			// 为外部 API 调用插值
			const response = await fetch(
				`https://api.open-meteo.com/v1/forecast?latitude=${lat}&longitude=${long}&daily=temperature_2m_max,precipitation_sum&timezone=GMT&forecast_days=${numDays}`,
			);
			return response.text();
		};
		// 使用函数调用运行 AI 推理
		const response = await runWithTools(
			env.AI,
			// 支持函数调用的模型
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				// 消息
				messages: [
					{
						role: "user",
						content: "未来 5 天天气如何？以文本形式回应",
					},
				],
				// AI 模型可以利用的可用工具的定义
				tools: [
					{
						name: "getWeather",
						description: "获取未来 [numDays] 天的天气",
						parameters: {
							type: "object",
							properties: {
								numDays: { type: "numDays", description: "天气预报的天数" },
							},
							required: ["numDays"],
						},
						// 引用先前定义的函数
						function: getWeather,
					},
				],
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

---

# 示例

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/

import { DirectoryListing } from "~/components";

<DirectoryListing />

---

# 使用 KV API

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/kv/

与持久性存储交互以检索或存储信息，可实现强大的用例。

在此示例中，我们展示了嵌入式函数调用如何通过几行代码与 Cloudflare 开发者平台上的其他资源进行交互。

## 先决条件

要使此示例正常工作，您需要首先配置一个 [KV](/kv/) 命名空间。为此，请遵循 [KV - 入门](/kv/get-started/) 指南。

重要的是，必须更新您的 Wrangler 文件以包含到您相应命名空间的 `KV` 绑定定义。

## Worker 代码

```ts title="使用 KV API 的嵌入式函数调用示例"
import { runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
	KV: KVNamespace;
};

export default {
	async fetch(request, env, ctx) {
		// 定义函数
		const updateKvValue = async ({
			key,
			value,
		}: {
			key: string;
			value: string;
		}) => {
			const response = await env.KV.put(key, value);
			return `成功更新数据库中的键值对：${response}`;
		};

		// 使用函数调用运行 AI 推理
		const response = await runWithTools(
			env.AI,
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [
					{ role: "system", content: "将用户给定的值放入 KV" },
					{ role: "user", content: "将 banana 的值设置为 yellow。" },
				],
				tools: [
					{
						name: "KV 更新",
						description: "更新数据库中的键值对",
						parameters: {
							type: "object",
							properties: {
								key: {
									type: "string",
									description: "要更新的键",
								},
								value: {
									type: "string",
									description: "要更新的值",
								},
							},
							required: ["key", "value"],
						},
						function: updateKvValue,
					},
				],
			},
		);
		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

## 验证结果

要验证结果，请运行以下命令

```sh
npx wrangler kv key get banana --binding KV --local
```

---

# 基于 OpenAPI 规范的工具

URL: https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/openapi/

API 通常通过 [OpenAPI 规范](https://swagger.io/specification/) 进行定义和记录。Cloudflare `ai-utils` 包的 `createToolsFromOpenAPISpec` 函数从 OpenAPI 规范创建工具，然后 LLM 可以利用这些工具来完成提示。

在此示例中，LLM 将根据 Github 的 API 及其 OpenAPI 规范描述一个 Github 用户。

```ts title="来自 OpenAPI 规范的嵌入式函数调用示例"
import { createToolsFromOpenAPISpec, runWithTools } from "@cloudflare/ai-utils";

type Env = {
	AI: Ai;
};

const APP_NAME = "cf-fn-calling-example-app";

export default {
	async fetch(request, env, ctx) {
		const toolsFromOpenAPISpec = [
			// 您可以直接传递 OpenAPI 规范链接或内容
			...(await createToolsFromOpenAPISpec(
				"https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json",
				{
					overrides: [
						{
							matcher: ({ url }) => {
								return url.hostname === "api.github.com";
							},
							// 对于 *.github.com 上的所有请求，我们需要添加一个 User-Agent。
							values: {
								headers: {
									"User-Agent": APP_NAME,
								},
							},
						},
					],
				},
			)),
		];

		const response = await runWithTools(
			env.AI,
			"@hf/nousresearch/hermes-2-pro-mistral-7b",
			{
				messages: [
					{
						role: "user",
						content: "Github 上的 cloudflare 是谁，该组织有多少个仓库？",
					},
				],
				tools: toolsFromOpenAPISpec,
			},
		);

		return new Response(JSON.stringify(response));
	},
} satisfies ExportedHandler<Env>;
```

---