Talk

Designing Models from the Hardware Up

Simran Arora

2025 Talk

Abstract

This talk presents systems-level techniques for designing language models that are both high quality and highly efficient. I’ll introduce ThunderKittens, a GPU programming library that simplifies the development of hardware-friendly models, and show how it enabled BASED—an attention-free architecture built from simple, throughput-oriented components. These innovations made it possible to train state-of-the-art 8B–405B parameter attention-free models on academic resources and have influenced emerging approaches across research, industry, and open-source.

Video

Chat is not available.