Audio Services
Convert text to natural-sounding speech, transcribe audio to text, or generate sound effects — all through a single API with no account setup required.
Quick Example
Section titled “Quick Example”import { withSapiom } from "@sapiom/axios";import axios from "axios";import fs from "fs";
// Create a Sapiom-wrapped Axios clientconst client = withSapiom( axios.create({ baseURL: "https://elevenlabs.services.sapiom.ai" }), { apiKey: process.env.SAPIOM_API_KEY, baseURL: "https://api.sapiom.ai", serviceName: "ElevenLabs TTS", agentName: "my-agent", });
// Convert text to speech - Sapiom tracks cost automaticallyconst { data } = await client.post( "/v1/text-to-speech/EXAVITQu4vr4xnSDxMaL", { text: "Hello! Welcome to Sapiom. This is a test of the text-to-speech API.", model_id: "eleven_multilingual_v2", }, { responseType: "arraybuffer" });
// Save the audio to a filefs.writeFileSync("output.mp3", Buffer.from(data));console.log("Audio saved to output.mp3");import { createFetch } from "@sapiom/fetch";import fs from "fs";
// Create a Sapiom-tracked fetch functionconst sapiomFetch = createFetch({ apiKey: process.env.SAPIOM_API_KEY, baseURL: "https://api.sapiom.ai", serviceName: "ElevenLabs TTS", agentName: "my-agent",});
// Convert text to speech - SDK handles payment/auth automaticallyconst response = await sapiomFetch( "https://elevenlabs.services.sapiom.ai/v1/text-to-speech/EXAVITQu4vr4xnSDxMaL", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ text: "Hello! Welcome to Sapiom. This is a test of the text-to-speech API.", model_id: "eleven_multilingual_v2", }), });
// Save the audio to a fileconst buffer = await response.arrayBuffer();fs.writeFileSync("output.mp3", Buffer.from(buffer));console.log("Audio saved to output.mp3");How It Works
Section titled “How It Works”Sapiom routes audio requests to ElevenLabs, which provides state-of-the-art voice AI technology. The SDK handles payment negotiation automatically — you pay based on character count (TTS), audio duration (STT), or a flat rate (sound effects).
The service supports three operations:
- Text-to-Speech — Convert text to natural-sounding audio
- Speech-to-Text — Transcribe audio files to text
- Sound Effects — Generate sound effects from text descriptions
Provider
Section titled “Provider”Powered by ElevenLabs. ElevenLabs provides industry-leading voice synthesis with natural intonation and emotional range across 29 languages.
API Reference
Section titled “API Reference”Text-to-Speech
Section titled “Text-to-Speech”Endpoint: POST https://elevenlabs.services.sapiom.ai/v1/text-to-speech/{voiceId}
Convert text to natural-sounding speech. The voice ID is specified in the URL path.
Popular voice IDs:
EXAVITQu4vr4xnSDxMaL— Sarah (female, soft)JBFqnCBsd6RMkjVDRZzb— George (male, narrative)21m00Tcm4TlvDq8ikWAM— Rachel (female, calm)AZnzlk1XvdvUeBnXmlld— Domi (female, strong)
Request
Section titled “Request”| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to convert to speech (max 5000 characters) |
model_id | string | No | Model for synthesis (default: eleven_multilingual_v2) |
output_format | string | No | Audio format (default: mp3_44100_128) |
Output format options:
- MP3:
mp3_22050_32,mp3_44100_64,mp3_44100_128,mp3_44100_192 - PCM:
pcm_16000,pcm_22050,pcm_24000,pcm_44100 - Opus:
opus_48000_64,opus_48000_128
{ "text": "Welcome to our application. How can I help you today?", "model_id": "eleven_multilingual_v2"}Response
Section titled “Response”The response is binary audio data with the appropriate Content-Type header:
audio/mpegfor MP3 formatsaudio/pcmfor PCM formatsaudio/basicfor μ-law/A-law formats
The X-Character-Count header contains the number of characters processed.
Speech-to-Text
Section titled “Speech-to-Text”Endpoint: POST https://elevenlabs.services.sapiom.ai/v1/speech-to-text
Transcribe audio to text.
Request
Section titled “Request”| Parameter | Type | Required | Description |
|---|---|---|---|
audioBase64 | string | Yes | Base64-encoded audio content |
durationSeconds | number | Yes | Audio duration in seconds (required for pricing) |
fileName | string | No | Original filename for logging |
modelId | string | No | Transcription model (default: scribe_v1) |
languageCode | string | No | Language code (auto-detected if not specified) |
Supported languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Japanese, Chinese, Korean, and more.
{ "audioBase64": "SGVsbG8gV29ybGQh...", "durationSeconds": 30.5, "fileName": "meeting-recording.mp3", "languageCode": "en"}Response
Section titled “Response”{ "text": "Hello and welcome to today's meeting. We have several items on the agenda...", "language_code": "en", "language_probability": 0.98, "words": [ { "text": "Hello", "start": 0.0, "end": 0.5 } ]}Sound Effects
Section titled “Sound Effects”Endpoint: POST https://elevenlabs.services.sapiom.ai/v1/sound-effects
Generate sound effects from text descriptions.
Request
Section titled “Request”| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Description of the sound effect to generate |
durationSeconds | number | No | Duration in seconds, 0.5-22.0 (default: 2.0) |
promptInfluence | number | No | How literally to follow the prompt, 0.0-1.0 (default: 0.3) |
{ "text": "Cinematic braam, horror atmosphere", "durationSeconds": 3.0, "promptInfluence": 0.5}Response
Section titled “Response”The response is binary MP3 audio data with Content-Type: audio/mpeg.
Price Estimation
Section titled “Price Estimation”Endpoints:
POST https://elevenlabs.services.sapiom.ai/v1/text-to-speech/pricePOST https://elevenlabs.services.sapiom.ai/v1/speech-to-text/pricePOST https://elevenlabs.services.sapiom.ai/v1/sound-effects/price
Get the estimated cost before making a request. Accepts the same parameters as the main endpoint.
{ "price": "$0.012", "currency": "USD"}Error Codes
Section titled “Error Codes”| Code | Description |
|---|---|
| 400 | Invalid request — check parameters |
| 402 | Payment required — ensure you’re using the Sapiom SDK |
| 404 | Voice or model not found |
| 413 | Text or audio too large |
| 429 | Rate limit exceeded |
Complete Example
Section titled “Complete Example”import { withSapiom } from "@sapiom/axios";import axios from "axios";
const client = withSapiom(axios.create(), { apiKey: process.env.SAPIOM_API_KEY,});
const baseUrl = "https://elevenlabs.services.sapiom.ai/v1";
async function createPodcastIntro(title: string, host: string) { // Generate podcast intro with TTS const script = `Welcome to ${title}. I'm your host, ${host}. Let's dive in.`;
const response = await client.post( `${baseUrl}/text-to-speech`, { text: script, voiceId: "JBFqnCBsd6RMkjVDRZzb", outputFormat: "mp3_44100_192", }, { responseType: "arraybuffer" } );
return Buffer.from(response.data);}
async function transcribeRecording(audioBase64: string, duration: number) { // Transcribe an audio recording const { data } = await client.post(`${baseUrl}/speech-to-text`, { audioBase64, durationSeconds: duration, languageCode: "en", });
return data.text;}
async function generateTransitionSound() { // Create a custom sound effect const response = await client.post( `${baseUrl}/sound-effects`, { text: "Soft whoosh transition, podcast style", durationSeconds: 1.5, }, { responseType: "arraybuffer" } );
return Buffer.from(response.data);}
// Usageconst introAudio = await createPodcastIntro("Tech Weekly", "Alex");console.log("Intro audio size:", introAudio.byteLength, "bytes");
const transitionSfx = await generateTransitionSound();console.log("Transition audio size:", transitionSfx.byteLength, "bytes");import { createFetch } from "@sapiom/fetch";
const fetch = createFetch({ apiKey: process.env.SAPIOM_API_KEY,});
const baseUrl = "https://elevenlabs.services.sapiom.ai/v1";
async function createPodcastIntro(title: string, host: string) { // Generate podcast intro with TTS const script = `Welcome to ${title}. I'm your host, ${host}. Let's dive in.`;
const response = await fetch(`${baseUrl}/text-to-speech`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ text: script, voiceId: "JBFqnCBsd6RMkjVDRZzb", outputFormat: "mp3_44100_192", }), });
return Buffer.from(await response.arrayBuffer());}
async function transcribeRecording(audioBase64: string, duration: number) { // Transcribe an audio recording const response = await fetch(`${baseUrl}/speech-to-text`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ audioBase64, durationSeconds: duration, languageCode: "en", }), });
const data = await response.json(); return data.text;}
async function generateTransitionSound() { // Create a custom sound effect const response = await fetch(`${baseUrl}/sound-effects`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ text: "Soft whoosh transition, podcast style", durationSeconds: 1.5, }), });
return Buffer.from(await response.arrayBuffer());}
// Usageconst introAudio = await createPodcastIntro("Tech Weekly", "Alex");console.log("Intro audio size:", introAudio.byteLength, "bytes");
const transitionSfx = await generateTransitionSound();console.log("Transition audio size:", transitionSfx.byteLength, "bytes");Pricing
Section titled “Pricing”| Operation | Price | Unit |
|---|---|---|
| Text-to-Speech | $0.24 | per 1,000 characters |
| Speech-to-Text | $0.08 | per minute |
| Sound Effects | $0.08 | flat per generation |
Minimums:
- Text-to-Speech: $0.001 minimum per request
- Speech-to-Text: $0.01 minimum per request
Example costs:
- 500 character TTS: ~$0.12
- 5 minute transcription: ~$0.40
- Sound effect: $0.08