Text to Speech & Speech Recognition

Demo to preview the plugin:

Introduction

Add voice interaction to your Bubble app with ease using the Speech Capabilities Plugin. Powered by the Web Speech API, this plugin introduces Text-to-Speech (TTS) and Speech Recognition (SR) functionalities, enabling your app to speak to users and listen to them in real time. With extensive customization options and real-time transcription, you can build immersive, accessible, and interactive user experiences.
Key Features

Prerequisites

  • Ensure your users are on modern browsers (Chrome, Edge, Safari) for best support.
  • Have a basic understanding of how to use workflows and elements in Bubble.
  • No API key is needed as this relies on native browser functionality.
⚠️
Note: These features are experimental and browser-dependent. Check compatibility on caniuse.com.
Image without caption

How to setup

Step 1: Installation

Step 2: Place the Element (Text to Speech)

Step 3: Place the Element (Speech Recognition)

Step 4: Text to Speech Required Page Elements

Step 5: Text to Speech Workflow Setup

Step 6: Text to Speech Required Page Elements

Step 7: Speech to Text Workflow Setup

Plugin Element Properties

Text to Speech

Element Actions

  1. Speak - Speak text (add to queue).
    1. Title
      Description
      Type
      Text
      The message you want the speech engine to say aloud. This should be plain text, such as a sentence or paragraph.
      Text
      Rate
      Controls the speed at which the text is spoken. The default is 1.0. Lower values (e.g. 0.5) make the speech slower, and higher values (e.g. 2.0) make it faster.
      Number
      Volume
      Sets the loudness of the speech output. Accepts values from 0.0 (muted) to 1.0 (full volume). The default is 1.0.
      Number
      Pitch
      Adjusts the pitch (tone) of the voice. The default is 1.0. Values below 1.0 result in a deeper voice, and values above 1.0 produce a higher-pitched voice. Range is usually from 0.0 to 2.0.
      Number
      Voice
      The specific voice to use for speech synthesis (e.g. "Google UK English Male"). Leave blank to use the browser’s default voice. Voice options vary by device and browser.
      Text (optional)
      OR Language
      Set the language locale (e.g., "en-US", "fr-FR") to influence which voice is chosen. This can help match the voice with the appropriate language accent if you’re not specifying a custom voice.
      Text (optional)
  1. Pause - Pause speaking.
  1. Resume - Resume speaking.
  1. Stop - Stop speaking and clear queue.

Exposed states

Title
Description
Type
Available voices
List of all voices available on the current device
Text
Speech synthesis supported
Returns yes if speech synthesis API is supported on current device
Checkbox (yes/no)
Is speaking
Returns yes if text is currently being spoken
Checkbox (yes/no)
Is paused
Returns yes if text is queued but currently paused
Checkbox (yes/no)
Default voice
The current device’s default voice
Text
Available languages
List of the languages of all voices available on the current device
Text
Is Local Service
List of whether each voice is a local service (or remote)
Checkbox (yes/no)

Speech Recognition

Fields:
Title
Description
Type
Language
Language code for the speech you want to recognize (e.g. "en-US" for English - United States). Note: There’s no guaranteed method to fetch all supported languages, but using the same language codes from the Text-to-Speech element is a reliable approach.
Text
Continuous recognition
When checked, the speech recognition continues listening until manually stopped, allowing longer or ongoing speech input. If unchecked, it will stop automatically after detecting a pause.
Checkbox (yes/no)
Keep full transcript
When enabled, the plugin will keep building a full transcript of everything heard in one session. If disabled, only the latest phrase will be available.
Checkbox (yes/no)
Capitalize first word
Automatically capitalizes the first word of each new result, helpful for formatting transcripts more naturally.
Checkbox (yes/no)
Period at end of sentence
Appends a period (.) at the end of each recognized sentence or phrase, useful for auto-punctuation in generated transcripts.
Checkbox (yes/no)

Element Actions

  1. Start recognition - Begins listening for the user’s speech input and starts transcribing it into text based on the selected configuration (language, continuous mode, etc.). This action uses the browser's speech recognition engine to capture audio in real time.
  1. Stop recognition - Stops the speech recognition process manually. This action ends the current transcription session and prevents further audio input from being processed.

Exposed states

Title
Description
Type
Transcript
Returns the full or partial transcribed text captured during the speech recognition session. This text updates in real time as the user speaks.
Text
Is Listening
Indicates whether the plugin is currently active and listening for speech input. Useful for toggling UI elements or controlling flow logic.
Checkbox (yes/no)
Sound Happening
Shows whether sound (not necessarily speech) is currently being detected. Can be used to visualize audio activity (like mic input waves) or signal that the user is speaking.
Checkbox (yes/no)
Speech Recognition Supported
Checks if the current browser supports the Web Speech Recognition API. If this is no, the feature won’t work and a fallback should be considered.
Checkbox (yes/no)

Element Events

Title
Description
Speech Recognized
This event is triggered each time speech is successfully recognized and converted into text. You can use this to run workflows based on the transcribed content, such as storing the text, triggering search, or interacting with other elements in your app.
Image without caption

Changelogs