Keynote Talk Fri, May 22, 2026 • 9:45 AM – 10:45 AM PDT Grand Ballroom 1

The Path to Infernece Efficiency

Christos Kozyrakis

Abstract

Agentic AI is moving out of demos and into daily use, creating enormous demand for efficient inference: higher throughput, lower latency, and better efficiency in both dollars and joules. Meeting these targets requires rethinking the full inference stack, from the specialized silicon that runs the models, to the system software that compiles, schedules, and serves them at scale, to the model architectures that determine what must be computed in the first place. In this talk, we will examine these layers with an eye toward the next major advances in hardware architecture, and how systems and algorithms can be co-designed to fully exploit them. Large gains in inference efficiency will come not from isolated improvements, but from treating hardware, systems, and models as an integrated stack.

Speaker

Christos Kozyrakis

Christos Kozyrakis is a computer architecture researcher at NVIDIA and the Leonard Bosack and Sandy K Lerner Professor of Engineering at Stanford University. His research focuses on hardware and software infrastructure for AI, as well as the use of AI for hardware and software design. He holds a PhD degree from the University of California at Berkeley and a BS degree from the University of Crete. He is a fellow of the ACM and the IEEE. He has received the IEEE Harry H Goode award, the ACM SIGARCH Maurice Wilkes award, the NSF Career Award, the ISCA Influential Paper Award, the ASPLOS Influential Paper Award, the HPCA Test of Time award, the SoCC Test of Time award, the Okawa Foundation Research Grant, the Noyce Family Faculty Scholarship, and the Willard R. and Inez Kerr Bell Faculty Scholarship, and faculty awards by IBM, Google, and Microsoft.

Chat is not available.