Skip to yearly menu bar Skip to main content


Oral

Optimizing Deployment Configurations for LLM Inference

Sungmin Cho ⋅ Jaewon Lee ⋅ Chunqiang Tang ⋅ Yejin Lee ⋅ Geonhwa Jeong ⋅ ⋅ Scott Batura ⋅ ⋅ ⋅ ⋅ Sijia Chen ⋅ ⋅ Bradley Davis ⋅ Summer Deng ⋅ ⋅ Emad El-Haraty ⋅ ⋅ Lu Fang ⋅ Lu Fang ⋅ Joshua Fromm ⋅ ⋅ ⋅ Liangpeng Guo ⋅ ⋅ ⋅ Jianyu Huang ⋅ Aya Ibrahim ⋅ ⋅ Hongyi Jia ⋅ Changkyu Kim ⋅ ⋅ ⋅ ⋅ ⋅ Xiaozhu Meng ⋅ Vlad Tiberiu Mihailescu ⋅ ⋅ Maxim Naumov ⋅ Michal Ostrowski ⋅ ⋅ ⋅ Sarunya Pumma ⋅ ⋅ ⋅ Jeremy Francis Reizenstein ⋅ Rajasi Saha ⋅ ⋅ ⋅ Ruan Silva ⋅ ⋅ Jon Swenson ⋅ ⋅ Chris Thi ⋅ ⋅ Yunfan Wang ⋅ Pengchao Wang ⋅ Wenchen Wang ⋅ ⋅ Bram Wasti ⋅ ⋅ ⋅ Jingyi Yang ⋅ ⋅ ⋅ Jing Zhang ⋅ Yi Zhen ⋅

Abstract

Chat is not available.