Session
YPS Poster Session & YPS Reception
Evergreen Ballroom
Impact of Scheduling for Terminal Agent Workloads on Unified-Memory Workstations
Yuanli Wang ⋅ Vasiliki Kalavri
Tiered Autonomy Framework for Human–Agent Collaboration in Mission-Critical Cyber-Physical Systems
David Akokodaripon
On the Diminishing Returns of Expert Load Balancing in MoE LLM Serving
Hanfei Yu ⋅ Jinru Duan ⋅ Jiabin Luo ⋅ Hao Wang
SAT-Eval: A Framework for Preference Drift in Multi-Turn LLM Conversations
Suryaprakash Vengadesan ⋅ Suryaprakash Vengadesan
ADAPTIVE ERASURE CODING FOR FAULT-TOLERANT LLM SERVING WITH CONTINUOUS BATCHING
Chinmay Dhanraj Nehate ⋅ Jun Wang
BioTriton: Portable Cross-Vendor GPU Kernels for High-Throughput Bioinformatics via OpenAI Triton
Manpreet singh
REMIX: Dynamic Partitioning for Fine-Grained Heterogeneous LLM Serving
Victoria Clerico ⋅ Corey Lammie ⋅ Garima Singh ⋅ Orhun G√∂rkem ⋅ William Simon ⋅ Hsinyu Tsai ⋅ Jeronimo Castrillon ⋅ Abu Sebastian ⋅ Hadjer Benmeziane
Toward a Small ML Runtime Stack for Raspberry Pi 5 QPUs
Yiannis Hadjiyianni ⋅ Panos Michelakis ⋅ Dimitrios Stamoulis ⋅ Yiannis Hadjiyianni
Communication-Efficient Distributed Inference for Transformer Models via Vector Quantized Context
Xiao Liu ⋅ Lijun Zhang ⋅ Deepak Ganesan ⋅ Hui Guan
LearnedCache: An eBPF-Integrated Perceptron-Based Eviction Policy for the Linux Page Cache
Zejia Qi
Accelerating LLM Inference: Self-Speculative Decoding via Learned Seed Injection
Anuradha Pandey ⋅ Anuradha Pandey
HADIS: Hybrid Adaptive Diffusion Model Serving for Efficient Text-to-Image Generation
Qizheng Yang ⋅ Tung-I Chen ⋅ Siyu Zhao ⋅ Ramesh Sitaraman ⋅ Hui Guan
Leveraging ASIC AI Chips for Homomorphic Encryption
Jianming Tong ⋅ Tianhao Huang ⋅ Jingtian Dang ⋅ Leo de Castro ⋅ Anirudh Itagi ⋅ Anupam Golder ⋅ Asra Ali ⋅ Jevin Jiang ⋅ Jeremy Kun ⋅ Arvind Arvind ⋅ G. Edward Suh ⋅ Tushar Krishna ⋅ Tianhao Huang ⋅ Jeremy Kun ⋅ Jingtian Dang
Shortcut-connected Expert Parallelism for Accelerating Mixture of Experts
Weilin Cai ⋅ Le Qin ⋅ Junwei Cui ⋅ Jiayi Huang
Speciesism in the Assistant Axis: Probing Compassion Vectors in Post-Trained LLMs
Shubham Gupta ⋅ Jasmine Brazilek
ov_training_kit : Model training and inference on local AI PC to strengthen the AI ecosystem
Shivam Basia
NeSyKV: Neuro-Symbolic Architecture-Specific KV-Cache Eviction for LLM Inference
Pratik Poudel ⋅ Jason Liu ⋅ Yanzhao Wu ⋅ Sumit Jha
DriftBench: Measuring and Predicting Infrastructure Drift in LLM Serving Systems
Gianluigi Vitale
ViRuleEval: A Neuro-Symbolic System for Interpretable Evaluation of Text-to-Video Generation
Chufeng Jiang ⋅ Heng Li
Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
Dhruv Rajesh Deshmukh ⋅ SAURABH GOYAL ⋅ NIPUN KWATRA ⋅ Ramachandran Ramjee
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
Genghan Zhang ⋅ Shaowei Zhu ⋅ Anjiang Wei ⋅ Zhenyu Song ⋅ Allen Nie ⋅ Zhen Jia ⋅ Nandita Vijaykumar ⋅ Yida Wang ⋅ Kunle Olukotun ⋅ Shaowei Zhu ⋅ Anjiang Wei ⋅ Zhenyu Song
BLAZE: Bias-Driven Load-Aware Zero-Overhead Expert Routing
Yide Ran ⋅ DJ Matusz ⋅ Jianwen Xie ⋅ Chuan Li ⋅ Zhaozhuo Xu
ForeCache: Understanding Workloads and Optimizing KVCache Management for Efficiently Serving LLM Coding Agents
Shubham Tiwari ⋅ Tapan Chugh ⋅ Nash Rickert ⋅ Simon Peter ⋅ Ratul Mahajan ⋅ Haiying Shen
SD-HC: Heterogeneous Functional Pipelining for Speculative LLM Decoding on AI PCs
Xikai(Noah) Meng ⋅ Chao Li ⋅ Spandan Tiwari
Flexo: A User-Controllable Distributed Training System
Megan Frisella ⋅ Shubham Tiwari ⋅ Parker Gustafson ⋅ Andy Ruan ⋅ Yi Pan ⋅ Mathew Jacob ⋅ Gilbert Bernstein ⋅ Stephanie Wang ⋅ Parker Gustafson
From 805 ms to 23 ms: Accelerating State-Space Models for Real-Time ICU Monitoring with Fused Triton Kernels
Manpreet singh
A Framework for Evaluating Neural Network Deployability on Analog In-Memory Computing Hardware
Apuroop Mutyala
Towards Efficient Systems for Long-Context Automatic Speech Recognition
Wei-Tzu Lee ⋅ Keisuke Kamahori ⋅ Baris Kasikci