Talk
Designing Models from the Hardware Up
Simran Arora
Mission City Ballroom
Abstract:
This talk presents systems-level techniques for designing language models that are both high quality and highly efficient. I’ll introduce ThunderKittens, a GPU programming library that simplifies the development of hardware-friendly models, and show how it enabled BASED—an attention-free architecture built from simple, throughput-oriented components. These innovations made it possible to train state-of-the-art 8B–405B parameter attention-free models on academic resources and have influenced emerging approaches across research, industry, and open-source.
Chat is not available.