Deepgram Speech to Text Live

✅

Link to the plugin page: https://zeroqode.com/plugin/deepgram-speech-to-text-live-1776754857811x409963699587845300

Demo to preview the plugin:

✅

Live Demo: https://zeroqode-demo-35.bubbleapps.io/speech-to-text

✅

Bubble Editor: https://bubble.io/page?id=zeroqode-demo-35&tab=Design&name=speech-to-text&type=page&elements=cnEck

Introduction

The Deepgram Speech-to-Text Live plugin enables real-time audio transcription directly inside your Bubble application by connecting to Deepgram’s WebSocket streaming API. It captures microphone input from the user’s browser, streams audio in low-latency chunks, and returns live interim and final transcripts with support for speaker diarization, punctuation, smart formatting, profanity filtering, and more.

In addition to live streaming, the plugin includes API calls for pre-recorded audio transcription, AI-powered audio intelligence (summarization, sentiment, topics, intents), speaker diarization, and usage analytics — making it a complete speech-to-text solution for any Bubble project.

Prerequisites

You must have a Deepgram account to use this plugin. Create your account and obtain an API key at: https://console.deepgram.com

This plugin is designed specifically for the mobile version of the Bubble editor. To test the plugin on your mobile device, use the TestFlight app available at:

https://testflight.apple.com/join/yrHYYwvw

Security note: Never expose your Deepgram API key on the frontend. The plugin uses a short-lived temporary token (valid 30 seconds) obtained via a Bubble Backend Workflow. This token is only used for the initial WebSocket handshake — once the connection is open, the session remains active regardless of token expiry.

ℹ️

Copy your secret key, the key is shown only once

How to setup

Step 1 - Install the plugin from the Bubble marketplace and open the plugin settings tab.

Step 2 - Enter your Deepgram API key in the plugin’s private key field (labeled `Token`). This key is used server-side only.

Step 3 - Add the SpeechToText element to your page. Configure the desired transcription properties (model, language, punctuation, etc.).

Step 4 - Build your recording workflow:

Step 5 - Add Stop / Pause / Resume actions to the appropriate buttons on your page.

Plugin Element - SpeechToText

The plugin contains one visual element - SpeechToText - that should be placed on the page. The element handles all microphone capture and WebSocket communication logic.

Title	Description	Type
Model	Deepgram model used for transcription. Recommended: `nova-3` (highest accuracy). Other options: `nova-2`, `base`, `enhanced`, `whisper`. `nova-3` is required for key term prompting.	Text
Language	BCP-47 language tag, e.g. `en-US`. Use `multi` for nova-2 and nova-3 multilingual detection. Leave empty for automatic detection.	Text
Punctuation	Add punctuation and capitalization to the transcript. Recommended for readable output.	Boolean
Smart format	Apply smart formatting to transcript output to improve readability (dates, currencies, etc.).	Boolean
Diarize	Recognize speaker changes. Each word will be assigned a speaker number starting at 0.	Boolean
Numerals	Convert numbers from written format to numerical format, e.g. “forty two” → “42”.	Boolean
Profanity filter	Replace recognized profanity with the nearest non-profane word or remove it from the transcript.	Boolean
Interim results	Send continuous transcription updates as audio is received. When enabled, results may evolve over time until finalized.	Boolean
Utterance end ms	Milliseconds of silence after which an utterance is considered complete. Only relevant when Interim Results is enabled.	Number
Vad events	Fire the On Speech Started event each time Deepgram detects the beginning of speech. Useful for microphone animations or “Listening…” UI states.	Boolean
Endpointing	Milliseconds of silence before Deepgram finalizes a transcript segment.	Number
Max words	Automatically stop recording after N words are transcribed. Set to `0` to disable.	Number
Max seconds	Automatically stop recording after N seconds. Set to `0` to disable.	Number

Title	Description	Type
Deepgram Temp Token	Short-lived JWT obtained from the Get Temp Token API Call, triggered server-side in a Bubble Backend Workflow. Valid for 30 seconds — enough for the WebSocket handshake. Never use your real Deepgram API key here.	Text

Name	Description	Type
transcript interim	Current partial transcription result while the user is still speaking. Updates frequently. Only populated when Interim Results is enabled.	Text
transcript final	The last finalized phrase returned by Deepgram. Updated each time a segment is confirmed.	Text
transcript full	Full accumulated session transcript. All finalized phrases appended in order.	Text
speaker index	The speaker number assigned to the most recently finalized phrase. Only meaningful when Diarize is enabled. Starts at 0.	Number
word count	Running total of words transcribed in the current session.	Number
elapsed seconds	Seconds elapsed since Start Recording was called. Pauses when recording is paused.	Number
limit reason	Reason why auto-stop was triggered. Value is `"words"` or `"time"`. Empty when no limit has been hit.	Text
is recording	`true` when the microphone and WebSocket are both active and sending audio.	Boolean
is connected	`true` when the WebSocket connection to Deepgram is open (including during pause).	Boolean
is_paused	`true` when the recording is paused (microphone muted but WebSocket still open).	Boolean
error message	The last error message. Populated on microphone denial, missing token, or WebSocket failure.	Text
speaker lines	List of speaker-tagged transcript lines in the format `{speaker_index}\\|{text}`. Only populated when Diarize is enabled.	List of Text

Name	Description
On Interim Result	Fires each time Deepgram sends a partial transcription while the user is still speaking. Read transcript interim for the current partial text. Only fires when Interim Results is enabled. Fires frequently — avoid heavy database writes in this handler.
On Final Result	Fires when Deepgram finalizes a phrase. At this point transcript final, transcript full, and word count are all updated. This is the recommended event for saving transcripts to the database.
On Utterance End	Fires when Deepgram detects the end of a complete utterance. Only fires when Utterance end ms is set. Use as a trigger to process or display a completed thought.
On Speech Started	Fires when Deepgram’s VAD detects the start of speech. Only fires when Vad events is enabled. Use to show a pulsing microphone animation or “Listening…” indicator.
On Recording Started	Fires once the microphone is open and the WebSocket to Deepgram is established. Use to switch the UI to recording mode (show stop button, start a visible timer, etc.).
On Recording Stopped	Fires after a manual Stop Recording action completes. `is_recording` and `is_connected` are both `false` when this fires. Not fired on auto-stop — use On Limit Reached for that.
On Limit Reached	Fires when recording is automatically stopped because a word or time limit was hit. Read limit reason (`"words"` or `"time"`) to know which limit was triggered. Save transcript full in this handler.
On Error	Fires when any error occurs during the recording lifecycle. Read error message for the description. Common causes: microphone permission denied, missing token, WebSocket connection failed.

Name	Description	Type
ttl_seconds	Token time-to-live in seconds. Default: `30`. Maximum: `3600`. Increase only if users have high-latency connections.	Number

Deepgram Speech to Text Live

Demo to preview the plugin:

Introduction

Prerequisites

How to setup

Step 1 - Install the plugin from the Bubble marketplace and open the plugin settings tab.

Step 2 - Enter your Deepgram API key in the plugin’s private key field (labeled `Token`). This key is used server-side only.

Step 3 - Add the SpeechToText element to your page. Configure the desired transcription properties (model, language, punctuation, etc.).

Step 4 - Build your recording workflow:

Step 5 - Add Stop / Pause / Resume actions to the appropriate buttons on your page.

Plugin Element - SpeechToText

Fields

Element Actions

Exposed States

Element Events

Plugin Data/Action Calls (API Calls only)

Get Temp Token

Transcribe URL

Transcribe with Intelligence

Transcribe with Diarization

Get Usage Breakdown

Workflow example

1. Basic Live Transcription Setup

2. Pause and Resume Recording

3. Auto-Stop with Word or Time Limits

4. Speaker Diarization Display

Changelogs

Update 22.04.26 - Version 1.1.0

Update 20.04.26 - Version 1.0.0

Name	Description	Type
audio_url	Publicly accessible URL to the audio or video file to transcribe.	Text
model	Transcription model. `nova-3` = best accuracy, `nova-2` = fast, `enhanced` / `base` = legacy, `whisper` = OpenAI via Deepgram. Default: `nova-3`.	Text
language	BCP-47 tag, e.g. `en-US`, `fr`, `de`. Leave empty for automatic language detection.	Text
paragraphs	Split transcript into paragraphs by pause/topic. Requires `punctuate=true`.	Boolean
utterances	Split into utterances (natural speech segments) with per-segment timing and confidence.	Boolean
numerals	Convert spoken numbers to numeric form.	Boolean
profanity_filter	Replace profanity with sanitized text.	Boolean
redact	Entity types to redact: `pci`, `ssn`, `numbers`, `true` (all). Comma-separate multiple values.	Text
callback	Optional URL for Deepgram to POST results to asynchronously.	Text

Name	Description	Type
body request_id	Unique ID for this transcription request.	Text
body metadata duration	Duration of the audio file in seconds.	Number
body metadata channels	Number of audio channels detected.	Number
body channels	List of channel result objects, each containing alternatives.	List
alternatives transcript	Full transcript text for the channel.	Text
alternatives confidence	Overall confidence score (0–1).	Number
alternatives words	List of word-level objects with timing and confidence.	List
returned_an_error	`true` if the API returned an error.	Boolean
error status_code	HTTP status code of the error response.	Number

Name	Description	Type
audio_url	Publicly accessible URL to the audio or video file to transcribe.	Text
model	Transcription model. Default: `nova-3`.	Text
language	BCP-47 language tag. Leave empty for auto-detection.	Text
topics	Detect key topics discussed. Returns topic segments with confidence scores.	Boolean
intents	Detect speaker intents, e.g. “make a purchase”, “request support”.	Boolean
sentiment	Analyze sentiment. Returns overall label (positive / negative / neutral), score, and per-segment breakdown.	Boolean
callback	Optional URL for asynchronous delivery of results.	Text

Name	Description	Type
body request_id	Unique ID for this transcription request.	Text
body summary	Full AI-generated summary of the audio content.	Text
body summary short	Short version of the AI-generated summary.	Text
body channels	List of channel result objects with alternatives, paragraphs, and words.	List
alternatives transcript	Full transcript text.	Text
alternatives confidence	Overall confidence score (0–1).	Number
alternatives paragraphs transcript	Paragraph-formatted transcript.	Text
returned_an_error	`true` if the API returned an error.	Boolean
error status_code	HTTP status code of the error response.	Number

Deepgram Speech to Text Live

Demo to preview the plugin:

Introduction

Prerequisites

How to setup

Step 1 - Install the plugin from the Bubble marketplace and open the plugin settings tab.

Step 2 - Enter your Deepgram API key in the plugin’s private key field (labeled Token). This key is used server-side only.

Step 3 - Add the SpeechToText element to your page. Configure the desired transcription properties (model, language, punctuation, etc.).

Step 4 - Build your recording workflow:

Step 5 - Add Stop / Pause / Resume actions to the appropriate buttons on your page.

Plugin Element - SpeechToText

Fields

Element Actions

Exposed States

Element Events

Plugin Data/Action Calls (API Calls only)

Get Temp Token

Transcribe URL

Transcribe with Intelligence

Transcribe with Diarization

Get Usage Breakdown

Workflow example

1. Basic Live Transcription Setup

2. Pause and Resume Recording

3. Auto-Stop with Word or Time Limits

4. Speaker Diarization Display

Changelogs

Update 22.04.26 - Version 1.1.0

Update 20.04.26 - Version 1.0.0

Step 2 - Enter your Deepgram API key in the plugin’s private key field (labeled `Token`). This key is used server-side only.