Test voice agents with simulated callers using Hamming

Problem

Voice agents built with ElevenLabs, Play.ht, or custom pipelines are difficult to test systematically. Verifying accent handling, background noise robustness, and interruption behavior requires a person to manually call the agent and try each variation. This is slow, non-reproducible, and scales poorly. You cannot easily test 50 accent variations or simulate a noisy restaurant by hand.

Solution

Use Hamming.ai to automate voice agent testing with simulated callers, accent variations, and scenario playback.

Step 1: Define test scenarios

# test-scenarios.yaml
scenarios:
  - name: "Basic order placement"
    caller_profile:
      accent: "american-midwest"
      background_noise: "quiet"
    script:
      - say: "Hi, I'd like to place an order for delivery"
      - wait_for_response: true
      - say: "A large pepperoni pizza and garlic bread"
      - wait_for_response: true
      - say: "123 Main Street, apartment 4B"
    expected_outcomes:
      - order_captured: true
      - address_captured: "123 Main Street, apartment 4B"

  - name: "Accent handling - Indian English"
    caller_profile:
      accent: "indian-english"
      background_noise: "moderate-office"
    script:
      - say: "I am wanting to make a reservation for four people"
      - wait_for_response: true
      - say: "Saturday evening, seven thirty PM"
    expected_outcomes:
      - reservation_party_size: 4
      - reservation_time: "19:30"

  - name: "Interruption handling"
    caller_profile:
      accent: "australian"
    script:
      - say: "I need to cancel my--"
      - interrupt_after_ms: 500
      - say: "Sorry, I need to cancel my appointment tomorrow"
    expected_outcomes:
      - intent_recognized: "cancel_appointment"

Step 2: Run the test suite

import { HammingClient } from "@hamming/sdk";

const hamming = new HammingClient({ apiKey: process.env.HAMMING_API_KEY });

const results = await hamming.runTestSuite({
  agentPhoneNumber: "+1-555-YOUR-AGENT",
  scenarioFile: "./test-scenarios.yaml",
  parallel: 5,
});

for (const result of results) {
  console.log(`${result.scenario}: ${result.passed ? "PASS" : "FAIL"}`);
  console.log(`  Latency: ${result.avgResponseMs}ms`);
  console.log(`  Accuracy: ${result.transcriptAccuracy}%`);
}

Why It Works

Hamming simulates realistic callers by combining text-to-speech with accent models and background noise injection. The simulated caller follows a script but adapts to the agent's responses, mimicking real conversation flow. By defining expected outcomes, you get automated pass/fail results without listening to every call. Running scenarios in parallel tests dozens of variations in minutes instead of hours of manual calling.

Context

Hamming.ai focuses specifically on voice agent QA, unlike general speech-to-text benchmarks
Use it for regression testing after changing your voice agent's prompt, model, or speech provider
Pairs well with ElevenLabs Conversational AI, Sesame, or any agent reachable via phone number
The cost per simulated call is significantly lower than manual QA staff test calls
For testing speech-to-text accuracy alone, consider dedicated STT benchmarks rather than full call simulation