Large language models are fluent text generators, but they struggle at generating factual, correct content, even when paired with tools such as information retrieval and agent programming frameworks. In this talk, I’ll discuss Demonstrate-Search-Predict (DSP), a system we are developing at Stanford to let users build highly accurate applications using LLMs and external tools. DSP offers a declarative programming model, where users write an application using control flow in Python and calls to ML components such as an LLM or a neural information retrieval system. Given such an application and a small amount of data, DSP systematically improves the application by tuning the ML components to get high quality results, by automatically generating better prompts for each model involved, fine-tuning models, etc. We show that with even a few tens of examples, DSP can match state-of-the-art solutions on multiple knowledge-intensive tasks, and that it can then systematically improve both task performance and computational efficiency without requiring manual tuning or prompt engineering from a developer. We also discuss and compare with other emerging approaches to turn LLMs into reliable software components.