Skip to yearly menu bar Skip to main content


Oral

MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing

Zhaoyuan Su · Zeyu Zhang · Tingfeng Lan · Zirui Wang · · Juncheng Yang · Yue Cheng

Abstract

Chat is not available.