Skip to content
Go To Dashboard

Audio Services

Convert text to natural-sounding speech or generate sound effects — all through a single API with no account setup required.

import { createFetch } from "@sapiom/fetch";
import fs from "fs";
// Create a Sapiom-tracked fetch function
const sapiomFetch = createFetch({
apiKey: process.env.SAPIOM_API_KEY,
agentName: "my-agent",
});
// Convert text to speech - SDK handles payment/auth automatically
const response = await sapiomFetch(
"https://elevenlabs.services.sapiom.ai/v1/text-to-speech/EXAVITQu4vr4xnSDxMaL",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "Hello! Welcome to Sapiom. This is a test of the text-to-speech API.",
model_id: "eleven_multilingual_v2",
}),
}
);
// Save the audio to a file
const buffer = await response.arrayBuffer();
fs.writeFileSync("output.mp3", Buffer.from(buffer));
console.log("Audio saved to output.mp3");

Sapiom routes audio requests to ElevenLabs, which provides state-of-the-art voice AI technology. The SDK handles payment negotiation automatically — you pay based on character count (TTS) or a flat rate (sound effects).

The service supports two operations:

  1. Text-to-Speech — Convert text to natural-sounding audio
  2. Sound Effects — Generate sound effects from text descriptions

Powered by ElevenLabs. ElevenLabs provides industry-leading voice synthesis with natural intonation and emotional range across 29 languages.

Endpoint: POST https://elevenlabs.services.sapiom.ai/v1/text-to-speech/{voiceId}

Convert text to natural-sounding speech. The voice ID is specified in the URL path.

Popular voice IDs:

  • EXAVITQu4vr4xnSDxMaL — Sarah (female, soft)
  • JBFqnCBsd6RMkjVDRZzb — George (male, narrative)
  • 21m00Tcm4TlvDq8ikWAM — Rachel (female, calm)
  • AZnzlk1XvdvUeBnXmlld — Domi (female, strong)
ParameterTypeRequiredDescription
textstringYesText to convert to speech (max 5000 characters)
model_idstringNoModel for synthesis (default: eleven_multilingual_v2)
output_formatstringNoAudio format (default: mp3_44100_128)

Output format options:

  • MP3: mp3_22050_32, mp3_44100_64, mp3_44100_128, mp3_44100_192
  • PCM: pcm_16000, pcm_22050, pcm_24000, pcm_44100
  • Opus: opus_48000_64, opus_48000_128
{
"text": "Welcome to our application. How can I help you today?",
"model_id": "eleven_multilingual_v2"
}

The response is binary audio data with the appropriate Content-Type header:

  • audio/mpeg for MP3 formats
  • audio/pcm for PCM formats
  • audio/basic for μ-law/A-law formats

The X-Character-Count header contains the number of characters processed.

Endpoint: POST https://elevenlabs.services.sapiom.ai/v1/sound-generation

Generate sound effects from text descriptions.

ParameterTypeRequiredDescription
textstringYesDescription of the sound effect to generate
duration_secondsnumberNoDuration in seconds, 0.5-22.0 (default: 2.0)
prompt_influencenumberNoHow literally to follow the prompt, 0.0-1.0 (default: 0.3)
{
"text": "Cinematic braam, horror atmosphere",
"duration_seconds": 3.0,
"prompt_influence": 0.5
}

The response is binary MP3 audio data with Content-Type: audio/mpeg.

Endpoints:

  • POST https://elevenlabs.services.sapiom.ai/v1/text-to-speech/{voiceId}/price
  • POST https://elevenlabs.services.sapiom.ai/v1/sound-generation/price

Get the estimated cost before making a request. Accepts the same parameters as the main endpoint.

{
"price": "$0.012",
"currency": "USD"
}

Endpoint: GET https://elevenlabs.services.sapiom.ai/v2/voices

List all available ElevenLabs voices. This endpoint is free and requires no payment.

const { data } = await client.get("https://elevenlabs.services.sapiom.ai/v2/voices");
for (const voice of data.voices) {
console.log(`${voice.name} (${voice.voice_id})`);
}
CodeDescription
400Invalid request — check parameters
402Payment required — ensure you’re using the Sapiom SDK
404Voice or model not found
413Text or audio too large
429Rate limit exceeded
import { createFetch } from "@sapiom/fetch";
const sapiomFetch = createFetch({
apiKey: process.env.SAPIOM_API_KEY,
agentName: "my-agent",
});
const baseUrl = "https://elevenlabs.services.sapiom.ai/v1";
async function createPodcastIntro(title: string, host: string) {
// Generate podcast intro with TTS
const script = `Welcome to ${title}. I'm your host, ${host}. Let's dive in.`;
const response = await sapiomFetch(`${baseUrl}/text-to-speech/JBFqnCBsd6RMkjVDRZzb`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: script,
model_id: "eleven_multilingual_v2",
}),
});
return Buffer.from(await response.arrayBuffer());
}
async function generateTransitionSound() {
// Create a custom sound effect
const response = await sapiomFetch(`${baseUrl}/sound-generation`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
text: "Soft whoosh transition, podcast style",
duration_seconds: 1.5,
}),
});
return Buffer.from(await response.arrayBuffer());
}
// Usage
const introAudio = await createPodcastIntro("Tech Weekly", "Alex");
console.log("Intro audio size:", introAudio.byteLength, "bytes");
const transitionSfx = await generateTransitionSound();
console.log("Transition audio size:", transitionSfx.byteLength, "bytes");
OperationPriceUnit
Text-to-Speech$0.24per 1,000 characters
Sound Effects$0.08flat per generation

Minimums:

  • Text-to-Speech: $0.001 minimum per request

Example costs:

  • 500 character TTS: ~$0.12
  • Sound effect: $0.08