Skip to content

Generate documentation narration with cloned voice via ElevenLabs

pattern

Creating audio narration for documentation requires recording and re-recording manually whenever content changes

accessibilityelevenlabsdocumentationvoice-cloningtext-to-speechgospeak
26 views

Problem

Adding audio narration to documentation, wiki pages, or onboarding guides dramatically improves accessibility and engagement. But recording voice-overs requires a quiet room, a decent microphone, and editing time for every update. When docs change frequently, the audio falls out of sync within days and re-recording everything is not sustainable. Meanwhile, users increasingly expect a "listen" button on blogs and docs -- auto TTS with playback speed settings is becoming table stakes.

Solution

Step 1: Clone your voice with ElevenLabs

Record a 3-5 minute sample of yourself reading technical content in a natural tone. Upload it to create a voice clone.

# Upload voice sample via API
curl -X POST "https://api.elevenlabs.io/v1/voices/add" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -F "name=my-voice" \
  -F "description=Documentation narration voice" \
  -F "files=@voice-sample.mp3"

Step 2: Extract narration text from markdown

Strip frontmatter, code blocks, and formatting to produce clean narration text.

import re

def markdown_to_narration(md_content: str) -> str:
    # Remove frontmatter
    text = re.sub(r"^---.*?---\s*", "", md_content, flags=re.DOTALL)
    # Remove code blocks (describe them instead)
    text = re.sub(
        r"```\w*\n.*?```",
        "(see the code example in the documentation)",
        text,
        flags=re.DOTALL,
    )
    # Remove markdown formatting
    text = re.sub(r"#{1,6}\s+", "", text)
    text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
    text = re.sub(r"\[(.*?)\]\(.*?\)", r"\1", text)
    # Clean up whitespace
    text = re.sub(r"\n{3,}", "\n\n", text)
    return text.strip()

Step 3: Generate audio from the narration text

import requests
import os

def generate_narration(text: str, voice_id: str, output_path: str):
    response = requests.post(
        f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}",
        headers={
            "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
            "Content-Type": "application/json",
        },
        json={
            "text": text,
            "model_id": "eleven_multilingual_v2",
            "voice_settings": {
                "stability": 0.6,
                "similarity_boost": 0.8,
            },
        },
    )
    with open(output_path, "wb") as f:
        f.write(response.content)

Step 4: Automate on doc changes

#!/bin/bash
# regenerate-narration.sh - Run on CI when docs change
for md_file in docs/*.md; do
  slug=$(basename "$md_file" .md)
  audio_file="docs/audio/${slug}.mp3"
  if [ "$md_file" -nt "$audio_file" ]; then
    python generate_narration.py "$md_file" "$audio_file"
    echo "Regenerated: $audio_file"
  fi
done

Alternative: use gospeak for self-hosted CLI generation

For teams that cannot send content to external APIs, gospeak provides a Go-based TTS tool that runs locally:

# Install gospeak
go install github.com/schappim/gospeak@latest

# Generate speech from text
gospeak --input docs/getting-started.txt --output docs/audio/getting-started.mp3

Why It Works

Voice cloning creates a consistent, recognizable voice that sounds like the actual author without requiring them to re-record anything. The ElevenLabs API produces natural-sounding speech that handles technical terminology well. By extracting narration text from the same markdown source, the audio stays synchronized with the written docs automatically. The CI-based regeneration means audio updates happen as a side effect of doc changes with zero manual effort. Stripping code blocks before narration keeps the audio listenable -- nobody wants to hear a JSON blob read aloud.

Context

  • A 3-5 minute voice sample is enough for a usable clone; longer samples improve quality
  • ElevenLabs pricing is per-character so monitor usage for large doc sets
  • The stability parameter controls how consistent the voice sounds; lower values sound more natural but vary more
  • For self-hosted alternatives, gospeak (github.com/schappim/gospeak) and Coqui TTS provide options without API costs
  • Add a small HTML5 audio player widget to your doc site with playback speed controls for the best reading experience
  • The eleven_multilingual_v2 model supports 29 languages, so this works for localized documentation too
  • Store generated audio in a separate directory or CDN to keep the docs repository lightweight
  • This pattern also works for generating podcast-style summaries of changelogs or release notes
About this share
Contributormblode
Repositorymblode/shares
CreatedFeb 10, 2026
View on GitHub