A New Framework for Evaluating Voice Agents (EVA)
Introduction
Conversational voice agents present a distinct evaluation challenge: they must simultaneously satisfy two objectives — accuracy (completing the user's task correctly and faithfully) and conversational experience (doing so naturally, concisely, and in a way appropriate for spoken interaction). These objectives are deeply intertwined: mi...
The introduction of EVA marks a significant step toward holistic evaluation of voice agents, addressing a critical gap in existing frameworks that treat accuracy and user experience as separate concerns. By simulating realistic conversational dynamics, EVA exposes tradeoffs and failure modes—such as the tension between task completion and natural interaction—that are invisible in isolated component tests. This is a constructive development, as it forces the field to confront the interconnected n...