Skip to yearly menu bar Skip to main content


Session

YPS Poster Session & YPS Reception

Evergreen Ballroom
Mon 18 May 5 p.m. PDT — 7 p.m. PDT
Abstract:
Chat is not available.


11
On the Diminishing Returns of Expert Load Balancing in MoE LLM Serving

Hanfei Yu ⋅ Jinru Duan ⋅ Jiabin Luo ⋅ Hao Wang


12
Neuro-Analog

Apuroop Mutyala


13
SAT-Eval: A Framework for Preference Drift in Multi-Turn LLM Conversations

Suryaprakash Vengadesan ⋅ Suryaprakash Vengadesan


2
HiSpec: Hierarchical Speculative Decoding for LLMs

Avinash Kumar ⋅ Sujay Sanghavi ⋅ Poulami Das


22
REMIX: Dynamic Partitioning for Fine-Grained Heterogeneous LLM Serving

Victoria Clerico ⋅ Corey Lammie ⋅ Garima Singh ⋅ Orhun G√∂rkem ⋅ William Simon ⋅ Hsinyu Tsai ⋅ Jeronimo Castrillon ⋅ Abu Sebastian ⋅ Hadjer Benmeziane


23
Toward a Small ML Runtime Stack for Raspberry Pi 5 QPUs

Yiannis Hadjiyianni ⋅ Panos Michelakis ⋅ Dimitrios Stamoulis ⋅ Yiannis Hadjiyianni


27
HADIS: Hybrid Adaptive Diffusion Model Serving for Efficient Text-to-Image Generation

Qizheng Yang ⋅ Tung-I Chen ⋅ Siyu Zhao ⋅ Ramesh Sitaraman ⋅ Hui Guan


28
Leveraging ASIC AI Chips for Homomorphic Encryption

Jianming Tong ⋅ Tianhao Huang ⋅ Jingtian Dang ⋅ Leo de Castro ⋅ Anirudh Itagi ⋅ Anupam Golder ⋅ Asra Ali ⋅ Jevin Jiang ⋅ Jeremy Kun ⋅ Arvind Arvind ⋅ G. Edward Suh ⋅ Tushar Krishna ⋅ Tianhao Huang ⋅ Jeremy Kun ⋅ Jingtian Dang


29
Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts

Weilin Cai ⋅ Le Qin ⋅ Junwei Cui ⋅ Jiayi Huang


31
NeSyKV: Neuro-Symbolic Architecture-Specific KV-Cache Eviction for LLM Inference

Pratik Poudel ⋅ Jason Liu ⋅ Yanzhao Wu ⋅ Sumit Jha


34
Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference

Dhruv Rajesh Deshmukh ⋅ SAURABH GOYAL ⋅ NIPUN KWATRA ⋅ Ramachandran Ramjee


36
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Genghan Zhang ⋅ Shaowei Zhu ⋅ Anjiang Wei ⋅ Zhenyu Song ⋅ Allen Nie ⋅ Zhen Jia ⋅ Nandita Vijaykumar ⋅ Yida Wang ⋅ Kunle Olukotun ⋅ Shaowei Zhu ⋅ Anjiang Wei ⋅ Zhenyu Song


38
BLAZE: Bias-Driven Load-Aware Zero-Overhead Expert Routing

Yide Ran ⋅ DJ Matusz ⋅ Jianwen Xie ⋅ Chuan Li ⋅ Zhaozhuo Xu


4
ForeCache: Understanding Workloads and Optimizing KVCache Management for Efficiently Serving LLM Coding Agents

Shubham Tiwari ⋅ Tapan Chugh ⋅ Nash Rickert ⋅ Simon Peter ⋅ Ratul Mahajan ⋅ Haiying Shen


6
Flexo: A User-Controllable Distributed Training System

Megan Frisella ⋅ Shubham Tiwari ⋅ Parker Gustafson ⋅ Andy Ruan ⋅ Yi Pan ⋅ Mathew Jacob ⋅ Gilbert Bernstein ⋅ Stephanie Wang ⋅ Parker Gustafson


9
Towards Efficient Systems for Long-Context Automatic Speech Recognition

Wei-Tzu Lee ⋅ Keisuke Kamahori ⋅ Baris Kasikci