Skip to yearly menu bar Skip to main content

Poster 24

Communication-Efficient Distributed Inference for Transformer Models via Vector Quantized Context

Xiao Liu ⋅ Lijun Zhang ⋅ Deepak Ganesan ⋅ Hui Guan

Chat is not available.