Problem
Voice agents built with ElevenLabs, Play.ht, or custom pipelines are difficult to test systematically. Verifying accent handling, background noise robustness, and interruption behavior requires a person to manually call the agent and try each variation. This is slow, non-reproducible, and scales poorly. You cannot easily test 50 accent variations or simulate a noisy restaurant by hand.
Solution
Use Hamming.ai to automate voice agent testing with simulated callers, accent variations, and scenario playback.
Step 1: Define test scenarios
# test-scenarios.yaml
scenarios:
- name: "Basic order placement"
caller_profile:
accent: "american-midwest"
background_noise: "quiet"
script:
- say: "Hi, I'd like to place an order for delivery"
- wait_for_response: true
- say: "A large pepperoni pizza and garlic bread"
- wait_for_response: true
- say: "123 Main Street, apartment 4B"
expected_outcomes:
- order_captured: true
- address_captured: "123 Main Street, apartment 4B"
- name: "Accent handling - Indian English"
caller_profile:
accent: "indian-english"
background_noise: "moderate-office"
script:
- say: "I am wanting to make a reservation for four people"
- wait_for_response: true
- say: "Saturday evening, seven thirty PM"
expected_outcomes:
- reservation_party_size: 4
- reservation_time: "19:30"
- name: "Interruption handling"
caller_profile:
accent: "australian"
script:
- say: "I need to cancel my--"
- interrupt_after_ms: 500
- say: "Sorry, I need to cancel my appointment tomorrow"
expected_outcomes:
- intent_recognized: "cancel_appointment"
Step 2: Run the test suite
import { HammingClient } from "@hamming/sdk";
const hamming = new HammingClient({ apiKey: process.env.HAMMING_API_KEY });
const results = await hamming.runTestSuite({
agentPhoneNumber: "+1-555-YOUR-AGENT",
scenarioFile: "./test-scenarios.yaml",
parallel: 5,
});
for (const result of results) {
console.log(`${result.scenario}: ${result.passed ? "PASS" : "FAIL"}`);
console.log(` Latency: ${result.avgResponseMs}ms`);
console.log(` Accuracy: ${result.transcriptAccuracy}%`);
}
Why It Works
Hamming simulates realistic callers by combining text-to-speech with accent models and background noise injection. The simulated caller follows a script but adapts to the agent's responses, mimicking real conversation flow. By defining expected outcomes, you get automated pass/fail results without listening to every call. Running scenarios in parallel tests dozens of variations in minutes instead of hours of manual calling.
Context
- Hamming.ai focuses specifically on voice agent QA, unlike general speech-to-text benchmarks
- Use it for regression testing after changing your voice agent's prompt, model, or speech provider
- Pairs well with ElevenLabs Conversational AI, Sesame, or any agent reachable via phone number
- The cost per simulated call is significantly lower than manual QA staff test calls
- For testing speech-to-text accuracy alone, consider dedicated STT benchmarks rather than full call simulation