AgenticCache: Cache-Driven Asynchronous Planning for Embodied AI Agents
Abstract
Large language models (LLMs) have recently been integrated into embodied AI agents, yet their synchronous plan-act loop imposes severe latency and cost bottlenecks. We present AgenticCache, a cache-driven asynchronous planning framework that decouples LLM reasoning from real-time execution. By identifying strong plan transition locality in embodied tasks, AgenticCache enables agents to reuse frequently occurring plan fragments and update them asynchronously through a background LLM process. This design converts idle waiting time into productive action while preserving context-aware decision quality. Across four multi-agent embodied benchmarks, AgenticCache improves task success rates by 24.34%, reduces simulation latency by 75%, and lowers token usage by 65% on average. These results demonstrate that caching and asynchronous reasoning together offer a path toward real-time, low-cost, and cognitively inspired autonomy in LLM-based agents.