Skip to yearly menu bar Skip to main content


Poster

LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Rya Sanovar ⋅ Srikant Bharadwaj ⋅ Renée St. Amant ⋅ Victor Ruehle ⋅ Saravan Rajmohan

Abstract

Video

Chat is not available.