Skip to yearly menu bar Skip to main content


Poster 24

Communication-Efficient Distributed Inference for Transformer Models via Vector Quantized Context

Xiao Liu ⋅ Lijun Zhang ⋅ Deepak Ganesan ⋅ Hui Guan

Log in and register to view live content