Skip to yearly menu bar Skip to main content


Poster

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Marco Federici ⋅ Davide Belli ⋅ Mart van Baalen ⋅ Amir Jalalirad ⋅ Andrii Skliar ⋅ Bence Major ⋅ Markus Nagel ⋅ Paul Whatmough

Abstract

Video

Chat is not available.