Skip to content
Cloudflare Docs
非官方翻译 - 此文档为非官方中文翻译版本,仅供参考。如有疑问请以 英文官方文档 为准。

使用 Workers AI 构建带自动转录功能的语音笔记应用

Last reviewed: 6 months ago

Developer Spotlight community contribution

Written by: Rajeev R. Sharma

Profile: LinkedIn

在本教程中,您将学习如何创建一个带有语音录音自动转录和可选后处理功能的语音笔记应用。构建该应用将使用以下工具:

  • Workers AI 用于转录语音录音和可选的后处理
  • D1 数据库用于存储笔记
  • R2 存储用于存储语音录音
  • Nuxt 框架用于构建全栈应用
  • Workers 用于部署项目

先决条件

要继续,您需要:

  1. Sign up for a Cloudflare account.
  2. Install Node.js.

Node.js version manager

Use a Node version manager like Volta or nvm to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of 16.17.0 or later.

1. 创建一个新的 Worker 项目

使用带有 nuxt 框架预设的 c3 CLI 创建一个新的 Worker 项目。

Terminal window
npm create cloudflare@latest -- voice-notes --framework=nuxt

安装附加依赖项

切换到新创建的项目目录

Terminal window
cd voice-notes

并安装以下依赖项:

Terminal window
npm i @nuxt/ui @vueuse/core @iconify-json/heroicons

然后将 @nuxt/ui 模块添加到 nuxt.config.ts 文件中:

nuxt.config.ts
export default defineNuxtConfig({
//..
modules: ["nitro-cloudflare-dev", "@nuxt/ui"],
//..
});

[可选] 迁移到 Nuxt 4 兼容模式

迁移到 Nuxt 4 兼容模式可确保您的应用程序与 Nuxt 的未来更新保持向前兼容。

在项目的根目录中创建一个新的 app 文件夹,并将 app.vue 文件移动到其中。此外,将以下内容添加到您的 nuxt.config.ts 文件中:

nuxt.config.ts
export default defineNuxtConfig({
//..
future: {
compatibilityVersion: 4,
},
//..
});

启动本地开发服务器

此时,您可以通过启动本地开发服务器来测试您的应用程序:

Terminal window
npm run dev

如果一切设置正确,您应该在 http://localhost:3000 上看到一个 Nuxt 欢迎页面。

2. 创建转录 API 端点

此 API 利用 Workers AI 来转录语音录音。要在项目中使用 Workers AI,您首先需要将其绑定到 Worker。

AI 绑定添加到 Wrangler 文件中。

wrangler.toml
[ai]
binding = "AI"

配置 AI 绑定后,运行 cf-typegen 命令以生成必要的 Cloudflare 类型定义。这使得类型定义在服务器事件上下文中可用。

Terminal window
npm run cf-typegen

通过在 /server/api 目录中创建 transcribe.post.ts 文件来创建一个转录 POST 端点。

server/api/transcribe.post.ts
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const form = await readFormData(event);
const blob = form.get("audio") as Blob;
if (!blob) {
throw createError({
statusCode: 400,
message: "缺少要转录的音频 blob",
});
}
try {
const response = await cloudflare.env.AI.run("@cf/openai/whisper", {
audio: [...new Uint8Array(await blob.arrayBuffer())],
});
return response.text;
} catch (err) {
console.error("转录音频时出错:", err);
throw createError({
statusCode: 500,
message: "转录音频失败。请重试。",
});
}
});

上述代码执行以下操作:

  1. 从事件中提取音频 blob。
  2. 使用 @cf/openai/whisper 模型转录 blob 并将转录文本作为响应返回。

3. 为将音频录音上传到 R2 创建 API 端点

在将音频录音上传到 R2 之前,您需要先创建一个存储桶。您还需要将 R2 绑定添加到您的 Wrangler 文件并重新生成 Cloudflare 类型定义。

创建一个 R2 存储桶。

Terminal window
npx wrangler r2 bucket create <BUCKET_NAME>

将存储绑定添加到您的 Wrangler 文件中。

wrangler.toml
[[r2_buckets]]
binding = "R2"
bucket_name = "<BUCKET_NAME>"

最后,通过重新运行 cf-typegen 脚本生成类型定义。

现在您已准备好创建上传端点。在您的 server/api 目录中创建一个新的 upload.put.ts 文件,并向其添加以下代码:

server/api/upload.put.ts
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const form = await readFormData(event);
const files = form.getAll("files") as File[];
if (!files) {
throw createError({ statusCode: 400, message: "缺少文件" });
}
const uploadKeys: string[] = [];
for (const file of files) {
const obj = await cloudflare.env.R2.put(`recordings/${file.name}`, file);
if (obj) {
uploadKeys.push(obj.key);
}
}
return uploadKeys;
});

上述代码执行以下操作:

  1. files 变量使用 form.getAll() 检索客户端发送的所有文件,这允许在单个请求中进行多次上传。
  2. 使用您之前创建的绑定 (R2) 将文件上传到 R2 存储桶。

4. 创建 API 端点以保存笔记条目

在创建端点之前,您需要执行与 R2 存储桶类似但有一些额外步骤的步骤,以准备一个笔记表。

创建一个 D1 数据库。

Terminal window
npx wrangler d1 create <DB_NAME>

将 D1 绑定添加到 Wrangler 文件。您可以从 d1 create 命令的输出中获取 DB_ID

wrangler.toml
[[d1_databases]]
binding = "DB"
database_name = "<DB_NAME>"
database_id = "<DB_ID>"

和以前一样,重新运行 cf-typegen 命令以生成类型。

接下来,创建一个数据库迁移。

Terminal window
npx wrangler d1 migrations create <DB_NAME> "create notes table"

这将在项目的根目录中创建一个新的 migrations 文件夹,并向其中添加一个空的 0001_create_notes_table.sql 文件。用下面的代码替换此文件的内容。

CREATE TABLE IF NOT EXISTS notes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
audio_urls TEXT
);

然后应用此迁移以创建 notes 表。

Terminal window
npx wrangler d1 migrations apply <DB_NAME>

现在您可以创建 API 端点。在 server/api/notes 目录中创建一个新文件 index.post.ts,并将其内容更改为以下内容:

server/api/notes/index.post.ts
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const { text, audioUrls } = await readBody(event);
if (!text) {
throw createError({
statusCode: 400,
message: "Missing note text",
});
}
try {
await cloudflare.env.DB.prepare(
"INSERT INTO notes (text, audio_urls) VALUES (?1, ?2)",
)
.bind(text, audioUrls ? JSON.stringify(audioUrls) : null)
.run();
return setResponseStatus(event, 201);
} catch (err) {
console.error("Error creating note:", err);
throw createError({
statusCode: 500,
message: "Failed to create note. Please try again.",
});
}
});

The above does the following:

  1. Extracts the text, and optional audioUrls from the event.
  2. Saves it to the database after converting the audioUrls to a JSON string.

5. Handle note creation on the client-side

Now you're ready to work on the client side. Let's start by tackling the note creation part first.

Recording user audio

Create a composable to handle audio recording using the MediaRecorder API. This will be used to record notes through the user's microphone.

Create a new file useMediaRecorder.ts in the app/composables folder, and add the following code to it:

app/composables/useMediaRecorder.ts
interface MediaRecorderState {
isRecording: boolean;
recordingDuration: number;
audioData: Uint8Array | null;
updateTrigger: number;
}
export function useMediaRecorder() {
const state = ref<MediaRecorderState>({
isRecording: false,
recordingDuration: 0,
audioData: null,
updateTrigger: 0,
});
let mediaRecorder: MediaRecorder | null = null;
let audioContext: AudioContext | null = null;
let analyser: AnalyserNode | null = null;
let animationFrame: number | null = null;
let audioChunks: Blob[] | undefined = undefined;
const updateAudioData = () => {
if (!analyser || !state.value.isRecording || !state.value.audioData) {
if (animationFrame) {
cancelAnimationFrame(animationFrame);
animationFrame = null;
}
return;
}
analyser.getByteTimeDomainData(state.value.audioData);
state.value.updateTrigger += 1;
animationFrame = requestAnimationFrame(updateAudioData);
};
const startRecording = async () => {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
audioContext = new AudioContext();
analyser = audioContext.createAnalyser();
const source = audioContext.createMediaStreamSource(stream);
source.connect(analyser);
mediaRecorder = new MediaRecorder(stream);
audioChunks = [];
mediaRecorder.ondataavailable = (e: BlobEvent) => {
audioChunks?.push(e.data);
state.value.recordingDuration += 1;
};
state.value.audioData = new Uint8Array(analyser.frequencyBinCount);
state.value.isRecording = true;
state.value.recordingDuration = 0;
state.value.updateTrigger = 0;
mediaRecorder.start(1000);
updateAudioData();
} catch (err) {
console.error("Error accessing microphone:", err);
throw err;
}
};
const stopRecording = async () => {
return await new Promise<Blob>((resolve) => {
if (mediaRecorder && state.value.isRecording) {
mediaRecorder.onstop = () => {
const blob = new Blob(audioChunks, { type: "audio/webm" });
audioChunks = undefined;
state.value.recordingDuration = 0;
state.value.updateTrigger = 0;
state.value.audioData = null;
resolve(blob);
};
state.value.isRecording = false;
mediaRecorder.stop();
mediaRecorder.stream.getTracks().forEach((track) => track.stop());
if (animationFrame) {
cancelAnimationFrame(animationFrame);
animationFrame = null;
}
audioContext?.close();
audioContext = null;
}
});
};
onUnmounted(() => {
stopRecording();
});
return {
state: readonly(state),
startRecording,
stopRecording,
};
}

The above code does the following:

  1. Exposes functions to start and stop audio recordings in a Vue application.
  2. Captures audio input from the user's microphone using MediaRecorder API.
  3. Processes real-time audio data for visualization using AudioContext and AnalyserNode.
  4. Stores recording state including duration and recording status.
  5. Maintains chunks of audio data and combines them into a final audio blob when recording stops.
  6. Updates audio visualization data continuously using animation frames while recording.
  7. Automatically cleans up all audio resources when recording stops or component unmounts.
  8. Returns audio recordings in webm format for further processing.

Create a component for note creation

This component allows users to create notes by either typing or recording audio. It also handles audio transcription and uploading the recordings to the server.

Create a new file named CreateNote.vue inside the app/components folder. Add the following template code to the newly created file:

app/components/CreateNote.vue
<template>
<div class="flex flex-col gap-y-5">
<div
class="flex h-full flex-col gap-y-4 overflow-hidden p-px md:flex-row md:gap-x-6"
>
<UCard
:ui="{
base: 'h-full flex flex-col flex-1',
body: { base: 'flex-grow' },
header: { base: 'md:h-[72px]' },
}"
>
<template #header>
<h3
class="text-base font-medium text-gray-600 md:text-lg dark:text-gray-300"
>
Note transcript
</h3>
</template>
<UTextarea
v-model="note"
placeholder="Type your note or use voice recording..."
size="lg"
autofocus
:disabled="loading || isTranscribing || state.isRecording"
:rows="10"
/>
</UCard>
<UCard
class="order-first shrink-0 md:order-none md:flex md:h-full md:w-96 md:flex-col"
:ui="{
body: { base: 'max-h-36 md:max-h-none md:flex-grow overflow-y-auto' },
}"
>
<template #header>
<h3
class="text-base font-medium text-gray-600 md:text-lg dark:text-gray-300"
>
Note recordings
</h3>
<UTooltip
:text="state.isRecording ? 'Stop Recording' : 'Start Recording'"
>
<UButton
:icon="
state.isRecording
? 'i-heroicons-stop-circle'
: 'i-heroicons-microphone'
"
:color="state.isRecording ? 'red' : 'primary'"
:loading="isTranscribing"
@click="toggleRecording"
/>
</UTooltip>
</template>
<AudioVisualizer
v-if="state.isRecording"
class="mb-2 h-14 w-full rounded-lg bg-gray-50 p-2 dark:bg-gray-800"
:audio-data="state.audioData"
:data-update-trigger="state.updateTrigger"
/>
<div
v-else-if="isTranscribing"
class="mb-2 flex h-14 items-center justify-center gap-x-3 rounded-lg bg-gray-50 p-2 text-gray-500 dark:bg-gray-800 dark:text-gray-400"
>
<UIcon
name="i-heroicons-arrow-path-20-solid"
class="h-6 w-6 animate-spin"
/>
Transcribing...
</div>
<RecordingsList :recordings="recordings" @delete="deleteRecording" />
<div
v-if="!recordings.length && !state.isRecording && !isTranscribing"
class="flex h-full items-center justify-center text-gray-500 dark:text-gray-400"
>
No recordings...
</div>
</UCard>
</div>
<UDivider />
<div class="flex justify-end gap-x-4">
<UButton
icon="i-heroicons-trash"
color="gray"
size="lg"
variant="ghost"
:disabled="loading"
@click="clearNote"
>
Clear
</UButton>
<UButton
icon="i-heroicons-cloud-arrow-up"
size="lg"
:loading="loading"
:disabled="!note.trim() && !state.isRecording"
@click="saveNote"
>
Save
</UButton>
</div>
</div>
</template>

The above template results in the following:

  1. A panel with a textarea inside to type the note manually.
  2. Another panel to manage start/stop of an audio recording, and show the recordings done already.
  3. A bottom panel to reset or save the note (along with the recordings).

Now, add the following code below the template code in the same file:

app/components/CreateNote.vue
<script setup lang="ts">
import type { Recording, Settings } from "~~/types";
const emit = defineEmits<{
(e: "created"): void;
}>();
const note = ref("");
const loading = ref(false);
const isTranscribing = ref(false);
const { state, startRecording, stopRecording } = useMediaRecorder();
const recordings = ref<Recording[]>([]);
const handleRecordingStart = async () => {
try {
await startRecording();
} catch (err) {
console.error("Error accessing microphone:", err);
useToast().add({
title: "Error",
description: "Could not access microphone. Please check permissions.",
color: "red",
});
}
};
const handleRecordingStop = async () => {
let blob: Blob | undefined;
try {
blob = await stopRecording();
} catch (err) {
console.error("Error stopping recording:", err);
useToast().add({
title: "Error",
description: "Failed to record audio. Please try again.",
color: "red",
});
}
if (blob) {
try {
const transcription = await transcribeAudio(blob);
note.value += note.value ? "\n\n" : "";
note.value += transcription ?? "";
recordings.value.unshift({
url: URL.createObjectURL(blob),
blob,
id: `${Date.now()}`,
});
} catch (err) {
console.error("Error transcribing audio:", err);
useToast().add({
title: "Error",
description: "Failed to transcribe audio. Please try again.",
color: "red",
});
}
}
};
const toggleRecording = () => {
if (state.value.isRecording) {
handleRecordingStop();
} else {
handleRecordingStart();
}
};
const transcribeAudio = async (blob: Blob) => {
try {
isTranscribing.value = true;
const formData = new FormData();
formData.append("audio", blob);
return await $fetch("/api/transcribe", {
method: "POST",
body: formData,
});
} finally {
isTranscribing.value = false;
}
};
const clearNote = () => {
note.value = "";
recordings.value = [];
};
const saveNote = async () => {
if (!note.value.trim()) return;
loading.value = true;
const noteToSave: { text: string; audioUrls?: string[] } = {
text: note.value.trim(),
};
try {
if (recordings.value.length) {
noteToSave.audioUrls = await uploadRecordings();
}
await $fetch("/api/notes", {
method: "POST",
body: noteToSave,
});
useToast().add({
title: "Success",
description: "Note saved successfully",
color: "green",
});
note.value = "";
recordings.value = [];
emit("created");
} catch (err) {
console.error("Error saving note:", err);
useToast().add({
title: "Error",
description: "Failed to save note",
color: "red",
});
} finally {
loading.value = false;
}
};
const deleteRecording = (recording: Recording) => {
recordings.value = recordings.value.filter((r) => r.id !== recording.id);
};
const uploadRecordings = async () => {
if (!recordings.value.length) return;
const formData = new FormData();
recordings.value.forEach((recording) => {
formData.append("files", recording.blob, recording.id + ".webm");
});
const uploadKeys = await $fetch("/api/upload", {
method: "PUT",
body: formData,
});
return uploadKeys;
};
</script>

The above code does the following:

  1. When a recording is stopped by calling handleRecordingStop function, the audio blob is sent for transcribing to the transcribe API endpoint.
  2. The transcription response text is appended to the existing textarea content.
  3. When the note is saved by calling the saveNote function, the audio recordings are uploaded first to R2 by using the upload endpoint we created earlier. Then, the actual note content along with the audioUrls (the R2 object keys) are saved by calling the notes post endpoint.

Create a new page route for showing the component

You can use this component in a Nuxt page to show it to the user. But before that you need to modify your app.vue file. Update the content of your app.vue to the following:

/app/app.vue
<template>
<NuxtRouteAnnouncer />
<NuxtLoadingIndicator />
<div class="flex h-screen flex-col md:flex-row">
<USlideover
v-model="isDrawerOpen"
class="md:hidden"
side="left"
:ui="{ width: 'max-w-xs' }"
>
<AppSidebar :links="links" @hide-drawer="isDrawerOpen = false" />
</USlideover>
<!-- The App Sidebar -->
<AppSidebar :links="links" class="hidden md:block md:w-64" />
<div class="h-full min-w-0 flex-1 bg-gray-50 dark:bg-gray-950">
<!-- The App Header -->
<AppHeader :title="title" @show-drawer="isDrawerOpen = true">
<template #actions v-if="route.path === '/'">
<UButton icon="i-heroicons-plus" @click="navigateTo('/new')">
New Note
</UButton>
</template>
</AppHeader>
<!-- Main Page Content -->
<main class="h-[calc(100vh-3.5rem)] overflow-y-auto p-4 sm:p-6">
<NuxtPage />
</main>
</div>
</div>
<UNotifications />
</template>
<script setup lang="ts">
const isDrawerOpen = ref(false);
const links = [
{
label: "Notes",
icon: "i-heroicons-document-text",
to: "/",
click: () => (isDrawerOpen.value = false),
},
{
label: "Settings",
icon: "i-heroicons-cog",
to: "/settings",
click: () => (isDrawerOpen.value = false),
},
];
const route = useRoute();
const title = computed(() => {
const activeLink = links.find((l) => l.to === route.path);
if (activeLink) {
return activeLink.label;
}
return "";
});
</script>

The above code allows for a nuxt page to be shown to the user, apart from showing an app header and a navigation sidebar.

Next, add a new file named new.vue inside the app/pages folder, add the following code to it:

app/pages/new.vue
<template>
<UModal v-model="isOpen" fullscreen>
<UCard
:ui="{
base: 'h-full flex flex-col',
rounded: '',
body: {
base: 'flex-grow overflow-hidden',
},
}"
>
<template #header>
<h2 class="text-xl leading-6 font-semibold md:text-2xl">Create note</h2>
<UButton
color="gray"
variant="ghost"
icon="i-heroicons-x-mark-20-solid"
@click="closeModal"
/>
</template>
<CreateNote class="mx-auto h-full max-w-7xl" @created="closeModal" />
</UCard>
</UModal>
</template>
<script setup lang="ts">
const isOpen = ref(true);
const router = useRouter();
const closeModal = () => {
isOpen.value = false;
if (window.history.length > 2) {
router.back();
} else {
navigateTo({
path: "/",
replace: true,
});
}
};
</script>

The above code shows the CreateNote component inside a modal, and navigates back to the home page on successful note creation.

6. Showing the notes on the client side

To show the notes from the database on the client side, create an API endpoint first that will interact with the database.

Create an API endpoint to fetch notes from the database

Create a new file named index.get.ts inside the server/api/notes directory, and add the following code to it:

server/api/index.get.ts
import type { Note } from "~~/types";
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const res = await cloudflare.env.DB.prepare(
`SELECT
id,
text,
audio_urls AS audioUrls,
created_at AS createdAt,
updated_at AS updatedAt
FROM notes
ORDER BY created_at DESC
LIMIT 50;`,
).all<Omit<Note, "audioUrls"> & { audioUrls: string | null }>();
return res.results.map((note) => ({
...note,
audioUrls: note.audioUrls ? JSON.parse(note.audioUrls) : undefined,
}));
});

The above code fetches the last 50 notes from the database, ordered by their creation date in descending order. The audio_urls field is stored as a string in the database, but it's converted to an array using JSON.parse to handle multiple audio files seamlessly on the client side.

Next, create a page named index.vue inside the app/pages directory. This will be the home page of the application. Add the following code to it:

app/pages/index.vue
<template>
<div :class="{ 'flex h-full': !notes?.length }">
<div v-if="notes?.length" class="space-y-4 sm:space-y-6">
<NoteCard v-for="note in notes" :key="note.id" :note="note" />
</div>
<div
v-else
class="flex-1 space-y-2 self-center text-center text-gray-500 dark:text-gray-400"
>
<h2 class="text-2xl md:text-3xl">No notes created</h2>
<p>Get started by creating your first note</p>
</div>
</div>
</template>
<script setup lang="ts">
import type { Note } from "~~/types";
const { data: notes } = await useFetch<Note[]>("/api/notes");
</script>

The above code fetches the notes from the database by calling the /api/notes endpoint you created just now, and renders them as note cards.

Serving the saved recordings from R2

To be able to play the audio recordings of these notes, you need to serve the saved recordings from the R2 storage.

Create a new file named [...pathname].get.ts inside the server/routes/recordings directory, and add the following code to it:

server/routes/recordings/[...pathname].get.ts
export default defineEventHandler(async (event) => {
const { cloudflare, params } = event.context;
const { pathname } = params || {};
return cloudflare.env.R2.get(`recordings/${pathname}`);
});

The above code extracts the path name from the event params, and serves the saved recording matching that object key from the R2 bucket.

7. [Optional] Post Processing the transcriptions

Even though the speech-to-text transcriptions models perform satisfactorily, sometimes you want to post process the transcriptions for various reasons. It could be to remove any discrepancy, or to change the tone/style of the final text.

Create a settings page

Create a new file named settings.vue in the app/pages folder, and add the following code to it:

app/pages/settings.vue
<template>
<UCard>
<template #header>
<div>
<h2 class="text-base leading-6 font-semibold md:text-lg">
Post Processing
</h2>
<p class="mt-1 text-sm text-gray-500 dark:text-gray-400">
Configure post-processing of recording transcriptions with AI models.
</p>
<p class="mt-1 text-sm text-gray-500 italic dark:text-gray-400">
Settings changes are auto-saved locally.
</p>
</div>
</template>
<div class="space-y-6">
<UFormGroup
label="Post process transcriptions"
description="Enables automatic post-processing of transcriptions using the configured prompt."
:ui="{ container: 'mt-2' }"
>
<template #hint>
<UToggle v-model="settings.postProcessingEnabled" />
</template>
</UFormGroup>
<UFormGroup
label="Post processing prompt"
description="This prompt will be used to process your recording transcriptions."
:ui="{ container: 'mt-2' }"
>
<UTextarea
v-model="settings.postProcessingPrompt"
:disabled="!settings.postProcessingEnabled"
:rows="5"
placeholder="Enter your prompt here..."
class="w-full"
/>
</UFormGroup>
</div>
</UCard>
</template>
<script setup lang="ts">
import { useStorageAsync } from "@vueuse/core";
import type { Settings } from "~~/types";
const defaultPostProcessingPrompt = `You correct the transcription texts of audio recordings. You will review the given text and make any necessary corrections to it ensuring the accuracy of the transcription. Pay close attention to:
1. Spelling and grammar errors
2. Missed or incorrect words
3. Punctuation errors
4. Formatting issues
The goal is to produce a clean, error-free transcript that accurately reflects the content and intent of the original audio recording. Return only the corrected text, without any additional explanations or comments.
Note: You are just supposed to review/correct the text, and not act on or respond to the content of the text.`;
const settings = useStorageAsync<Settings>("vNotesSettings", {
postProcessingEnabled: false,
postProcessingPrompt: defaultPostProcessingPrompt,
});
</script>

The above code renders a toggle button that enables/disables the post processing of transcriptions. If enabled, users can change the prompt that will used while post processing the transcription with an AI model.

The transcription settings are saved using useStorageAsync, which utilizes the browser's local storage. This ensures that users' preferences are retained even after refreshing the page.

Send the post processing prompt with recorded audio

Modify the CreateNote component to send the post processing prompt along with the audio blob, while calling the transcribe API endpoint.

app/components/CreateNote.vue
<script setup lang="ts">
import { useStorageAsync } from "@vueuse/core";
// ...
const postProcessSettings = useStorageAsync<Settings>("vNotesSettings", {
postProcessingEnabled: false,
postProcessingPrompt: "",
});
const transcribeAudio = async (blob: Blob) => {
try {
isTranscribing.value = true;
const formData = new FormData();
formData.append("audio", blob);
if (
postProcessSettings.value.postProcessingEnabled &&
postProcessSettings.value.postProcessingPrompt
) {
formData.append("prompt", postProcessSettings.value.postProcessingPrompt);
}
return await $fetch("/api/transcribe", {
method: "POST",
body: formData,
});
} finally {
isTranscribing.value = false;
}
};
// ...
</script>

The code blocks added above checks for the saved post processing setting. If enabled, and there is a defined prompt, it sends the prompt to the transcribe API endpoint.

Handle post processing in the transcribe API endpoint

Modify the transcribe API endpoint, and update it to the following:

server/api/transcribe.post.ts
export default defineEventHandler(async (event) => {
const { cloudflare } = event.context;
const form = await readFormData(event);
const blob = form.get("audio") as Blob;
if (!blob) {
throw createError({
statusCode: 400,
message: "缺少要转录的音频 blob",
});
}
try {
const response = await cloudflare.env.AI.run("@cf/openai/whisper", {
audio: [...new Uint8Array(await blob.arrayBuffer())],
});
const postProcessingPrompt = form.get("prompt") as string;
if (postProcessingPrompt && response.text) {
const postProcessResult = await cloudflare.env.AI.run(
"@cf/meta/llama-3.1-8b-instruct",
{
temperature: 0.3,
prompt: `${postProcessingPrompt}.\n\nText:\n\n${response.text}\n\nResponse:`,
},
);
return (postProcessResult as { response?: string }).response;
} else {
return response.text;
}
} catch (err) {
console.error("转录音频时出错:", err);
throw createError({
statusCode: 500,
message: "转录音频失败。请重试。",
});
}
});

The above code does the following:

  1. Extracts the post processing prompt from the event FormData.
  2. If present, it calls the Workers AI API to process the transcription text using the @cf/meta/llama-3.1-8b-instruct model.
  3. Finally, it returns the response from Workers AI to the client.

8. Deploy the application

Now you are ready to deploy the project to a .workers.dev sub-domain by running the deploy command.

Terminal window
npm run deploy

You can preview your application at <YOUR_WORKER>.<YOUR_SUBDOMAIN>.workers.dev.

Conclusion

In this tutorial, you have gone through the steps of building a voice notes application using Nuxt 3, Cloudflare Workers, D1, and R2 storage. You learnt to:

  • Set up the backend to store and manage notes
  • Create API endpoints to fetch and display notes
  • Handle audio recordings
  • Implement optional post-processing for transcriptions
  • Deploy the application using the Cloudflare module syntax

The complete source code of the project is available on GitHub. You can go through it to see the code for various frontend components not covered in the article. You can find it here: github.com/ra-jeev/vnotes.