Skip to yearly menu bar Skip to main content


Session

Industry-Track Oral Presentation: I5: Agentic AI/MLSys

Grand Ballroom 1
Thu 21 May 8:30 a.m. PDT — 10 a.m. PDT
Abstract:
Chat is not available.


ADS: AN AGENTIC DETECTION SYSTEM FOR ENTERPRISE AGENTIC AI SECURITY

Chenning Li ⋅ Pan Hu ⋅ Justin Xu ⋅ Baris Ozbas ⋅ Olivia Liu ⋅ Caroline Van ⋅ ⋅ Wei Zhou ⋅ Mohammad Alizadeh ⋅ Pengyu Zhang

We present ADR (Agentic AI Detection and Response), the first large-scale, production-proven enterprise framework for securing AI agents operating through the Model Context Protocol (MCP). We identify three persistent challenges in this domain: (1) limited observability, as existing telemetry fails to capture reasoning and tool-execution chains; (2) insufficient robustness, given vast, dynamic enterprise contexts and extreme class imbalance; and (3) high detection costs, as LLM-based inference is computationally expensive. ADR addresses these challenges via three components: the ADR Sensor for high-fidelity agentic telemetry, the ADR Explorer for continuous red teaming and hard-example generation, and the ADR Detector for scalable, two-tier online detection combining fast triage with context-aware reasoning. On ADR-Bench (302 tasks, 17 techniques, 133 MCP servers), ADR achieves zero false positives while detecting 67% of attacks—outperforming three state-of-the-art baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2–4×. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks. Over ten months of telemetry, ADR sustained reliable detection in production, uncovering credential exposures and enabling a shift-left prevention layer with 97.2% precision. ADR’s source code and benchmark will be publicly available.


Agentic Operator Generation for ML ASICs

Alec Hammond ⋅ Aram Markosyan ⋅ Aman Dontula ⋅ ⋅ Zacharias Fisches ⋅ Dmitrii Pedchenko ⋅ Keyur Muzumdar ⋅ ⋅ Mark Saroufim ⋅ Joe Isaacson ⋅ ⋅ Warren Hunt ⋅ ⋅ ⋅ Gabriel Synnaeve ⋅ ⋅ Jacob Kahn ⋅

We present TritorX, an agentic AI system designed to generate functionally correct Triton PyTorch ATen kernels at scale for emerging accelerator platforms. TritorX integrates open-source large language models with a custom linter, JIT compilation, and a PyTorch OpInfo-based test harness. This pipeline operates both on deployed Meta Training and Inference Accelerator (MTIA) silicon and in hardware simulation environments for next-generation devices. In contrast to previous kernel-generation approaches that prioritize performance for a limited set of high-usage kernels, TritorX prioritizes coverage. Our system emphasizes correctness and generality across the entire operator set, including diverse data types, shapes, and argument patterns. In our experiments, TritorX successfully generated kernels and wrappers for 481 unique ATen operators that pass all corresponding PyTorch OpInfo tests (over 20,000 in total). TritorX paves the way for overnight generation of complete PyTorch ATen backends for new accelerator platforms.

Software upgrades are critical to maintaining server reliability in datacenters. While job duration prediction and scheduling have been extensively studied, the unique challenges posed by software upgrades remain largely under-explored. This paper presents the first in-depth investigation into software upgrade scheduling at datacenter scale. We begin by characterizing various types of upgrades and then frame the scheduling task as a constrained optimization problem. To address this problem, we introduce Zephyr, a cost-aware duration prediction framework designed to improve upgrade scheduling efficiency and throughput while meeting service-level objectives (SLOs). Zephyr accounts for asymmetric misprediction costs, strategically selects the best predictive models, and mitigates straggler-induced overestimations. Evaluations on Meta's production datacenter systems demonstrate that Zephyr significantly outperforms the existing upgrade scheduler by improving upgrade window utilization by 1.25x, increasing the number of scheduled and completed upgrades by 33% and 41%, and reducing cancellation rates by 2.4x. The code and data sets will be released after paper acceptance.


PROMPTS: PeRformance Optimization via Multi-Agent Planning for LLM Training and Serving

Yuran Ding ⋅ Ruobing Han ⋅ Xiaofan Zhang ⋅ Xinwei Chen

Optimizing large-language model (LLM) training and serving on large-scale distributed systems is a significant challenge. This difficulty stems from the rapidly evolving LLM landscape, the requirement for deep domain expertise, and the need for workload-specific optimization strategies. Existing methods rely on either handcrafted optimization performed by human experts, which is tedious and time-consuming, or resource-intensive black-box searches, which lack the extensibility to keep pace with evolving models and hardware. To address this, we introduce \textbf{PROMPTS}, a novel multi-agent framework that complements traditional search methods with expert-informed reasoning to deliver system-level optimization with much fewer shots. Key components of the proposed framework include an \textit{Analyzer Agent} that diagnoses performance bottlenecks by synthesizing profiler data and a \textit{Proposal Agent} that leverages a knowledge base to generate optimized sharding configurations with detailed justifications through retrieval-augmented generation (RAG). Experimental results across eight real-world LLM workloads have demonstrated that PROMPTS can provide valid reasoning and accurate recommendations by considering LLM workload characteristics and backend hardware features, delivering performance improvements of up to \textbf{434\%}. These workloads spanned LLMs with Mixture-of-Experts (MoE) and dense models, system configurations from 2-TPU chips to 512-chip systems with 2D/3D Torus interconnects, and the full LLM lifecycle including pre-training, post-training, and serving. To validate our agent's system optimization proposals, we benchmarked them against production configurations that were previously optimized by experts, either through extensive manual analysis or automated black-box searches. In every case, our agent independently identified this expert-validated solution within its top three recommendations from a \textbf{single invocation}. Furthermore, the agent's top-ranked recommendation matched the production solution in \textbf{87.5\%} of cases, demonstrating its ability to not only find optimized configurations but also to correctly prioritize the optimization candidates.


The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

Xingyao Wang ⋅ ⋅ Juan Michelini ⋅ Calvin Smith ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

Building production-ready software engineering agents requires balancing fast research iteration with operational stability, secure deployment, and reproducible execution across diverse environments. \textbf{OpenHands V0}—an open-source agent system with 64k+ GitHub stars—validated community demand but revealed four key tensions: rigid sandboxing, scattered mutable configuration, blurred core–application boundaries, and limited extensibility. We present the \textbf{OpenHands Software Agent SDK}—the core of \textbf{OpenHands V1}—a complete architectural redesign that \emph{separates agent core from downstream applications}. The SDK embodies four principles: (i) \emph{optional isolation} (local-first, sandbox-on-demand); (ii) \emph{stateless components} with immutable configuration and event-sourced state; (iii) \emph{strict separation of concerns} between core and applications; and (iv) \emph{two-layer composability} enabling modular deployment across four packages (SDK, Tools, Workspace, Server) and extensibility through typed, swappable components. Built on these foundations, the SDK delivers \emph{seamless local-to-remote execution portability}, integrated REST/WebSocket services, and visual workspaces (VS Code, VNC, browser) for human-agent collaboration. Compared with existing SDKs from OpenAI, Claude and Google, OpenHands uniquely integrates native sandboxed execution, lifecycle control, model-agnostic multi-LLM routing, and built-in QA and security analysis. Empirical results on SWE-Bench Verified and GAIA benchmarks demonstrate strong performance. By codifying lessons from V0, the OpenHands Agent SDK provides a practical foundation for prototyping, unlocking new classes of custom applications, \emph{and} reliably deploying agents at scale.