Skip to yearly menu bar Skip to main content


Talk

Designing Models from the Hardware Up

Simran Arora

Mission City Ballroom
[ ]
Mon 12 May 11:25 a.m. PDT — 11:45 a.m. PDT

Abstract:

This talk presents systems-level techniques for designing language models that are both high quality and highly efficient. I’ll introduce ThunderKittens, a GPU programming library that simplifies the development of hardware-friendly models, and show how it enabled BASED—an attention-free architecture built from simple, throughput-oriented components. These innovations made it possible to train state-of-the-art 8B–405B parameter attention-free models on academic resources and have influenced emerging approaches across research, industry, and open-source.

Chat is not available.