Skip to yearly menu bar Skip to main content


Poster

LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Rya Sanovar · Srikant Bharadwaj · Renée St. Amant · Victor Ruehle · Saravan Rajmohan

Abstract

Video

Chat is not available.