Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents
TL;DR — We extend the RLVE framework from single-turn reasoning puzzles to multi-turn, tool-augmented e-commerce conversations. EcomRLVE-GYM provides 8 verifiable environments — product discovery, substitution, cart building, returns, order tracking, policy QA, bundle planning, and multi-intent journeys — each with pr...
This work represents a significant step toward bridging the gap between fluent language models and reliable task completion in e-commerce. The use of verifiable, algorithmic rewards addresses a critical limitation in reinforcement learning for conversational agents: the subjectivity and scalability issues of human or LLM-based evaluation. By focusing on multi-turn, tool-augmented interactions, the framework moves beyond static reasoning puzzles to dynamic, real-world workflows—a shift that align...