Invited Talk
An AI stack: from scaling AI workloads to evaluating LLMs
Ion Stoica
Mission City Ballroom
Abstract:
Large language models (LLMs) have taken the world by storm—enabling new applications, intensifying GPU shortages, and raising concerns about the accuracy of their outputs. In this talk, I will present several projects I have worked on to address these challenges. Specifically, I will focus on: (i) Ray, a distributed framework for scaling AI workloads; (ii) vLLM and SGLang, two high-throughput inference engines for LLMs; and (iii) Chatbot Arena, a platform for accurate LLM benchmarking. I will conclude with key lessons learned and outline directions for future research.
Chat is not available.