Skip to yearly menu bar Skip to main content


Invited Talk

An AI stack: from scaling AI workloads to evaluating LLMs

Ion Stoica

Mission City Ballroom
[ ] [ Project Page ]
Tue 13 May 10:30 a.m. PDT — 11:30 a.m. PDT

Abstract:

Large language models (LLMs) have taken the world by storm—enabling new applications, intensifying GPU shortages, and raising concerns about the accuracy of their outputs. In this talk, I will present several projects I have worked on to address these challenges. Specifically, I will focus on: (i) Ray, a distributed framework for scaling AI workloads; (ii) vLLM and SGLang, two high-throughput inference engines for LLMs; and (iii) Chatbot Arena, a platform for accurate LLM benchmarking. I will conclude with key lessons learned and outline directions for future research.

Chat is not available.