Skip to yearly menu bar Skip to main content


Poster 24

Communication-Efficient Distributed Inference for Transformer Models via Vector Quantized Context

Xiao Liu ⋅ Lijun Zhang ⋅ Deepak Ganesan ⋅ Hui Guan
Chat is not available.