MLSys 2023 Invited Talks

Invited Talks

Invited Talk

Alexander Rush

[ Ballroom C ]

Abstract

Modern NLP runs on Transformers. Large language models are possible because of system successes in making Transformers bigger, faster, and longer-range. However, 5 years after the advent of BERT and GPT, it is still an open question whether the central routing component of Transformers, Self-Attention, is central to their success in pretraining, or whether it is worth developing large-scale systems for alternative approaches. Inspired by an off-hand wager on this topic https://www.isattentionallyouneed.com, this talk will be an overview of recent work exploring the use of alternative approaches for routing in large-scale NLP architectures. After giving background on the best practices and context of modern NLP, I will describe alternative approaches, primarily focusing on static methods based on state-space models (SSMs) and long-range convolutions. I will conclude by discussing the current empirical results and theoretical properties of these models, as well as paths for their future systems development as competitive technologies.

Improving the Quality and Factuality of Large Language Model Applications

Invited Talk

Matei Zaharia

[ Ballroom C ]

Abstract

Large language models are fluent text generators, but they struggle at generating factual, correct content, even when paired with tools such as information retrieval and agent programming frameworks. In this talk, I’ll discuss Demonstrate-Search-Predict (DSP), a system we are developing at Stanford to let users build highly accurate applications using LLMs and external tools. DSP offers a declarative programming model, where users write an application using control flow in Python and calls to ML components such as an LLM or a neural information retrieval system. Given such an application and a small amount of data, DSP systematically improves the application by tuning the ML components to get high quality results, by automatically generating better prompts for each model involved, fine-tuning models, etc. We show that with even a few tens of examples, DSP can match state-of-the-art solutions on multiple knowledge-intensive tasks, and that it can then systematically improve both task performance and computational efficiency without requiring manual tuning or prompt engineering from a developer. We also discuss and compare with other emerging approaches to turn LLMs into reliable software components.