Communication-Efficient Distributed Inference for Transformer Models via Vector Quantized Context
Xiao Liu ⋅ Lijun Zhang ⋅ Deepak Ganesan ⋅ Hui Guan
Successful Page Load