FlashAgents: Accelerating Multi-Agent LLM Systems via Streaming Prefill Overlap
Taosong Fang ⋅ Zhen Zheng ⋅ Zhengzhao Ma ⋅ Yaojie Lu ⋅ Hongyu Lin ⋅ Xianpei Han ⋅ Le Sun
Abstract
Large Language Models (LLMs) are increasingly deployed as collaborating agents in Multi-Agent Systems (MAS), where sequential agent interactions create significant latency bottlenecks. Traditional serving systems require each downstream agent to wait for complete upstream generation before starting prefill, leaving substantial idle time during inter-agent transitions. We present FlashAgents, a system that accelerates multi-agent workflows through token-level streaming and prefix-aware coordination. FlashAgents introduces Inter-agent streaming and incremental prefill, which streams tokens between agents and performs incremental prefill to overlap downstream prefill with upstream decode, reducing inter-agent latency. For concurrent workloads, an intra-turn prefix cache built on radix trees detects and eliminates redundant prefill across requests sharing common instruction templates, avoiding recomputation of shared prefixes within the same processing turn. Implemented on SGLang, FlashAgents achieves up to 46\% end-to-end latency reduction on real workflows and 3.5$\times$ speedup in controlled two-agent benchmarks, demonstrating consistent improvements across diverse models and interaction patterns.
Chat is not available.
Successful Page Load