Timezone: »
Machine learning models are increasingly deployed in mission-critical settings such as vehicles, but unfortunately, these models can fail in complex ways. To prevent errors, ML engineering teams monitor and continuously improve these models. We propose a new abstraction, model assertions, that adapts the classical use of program assertions as a way to monitor and improve ML models. Model assertions are arbitrary functions over the model's input and output that indicates when errors may be occurring. For example, a developer may write an assertion that an object's class should stay the same across frames of video. Once written, these assertions can be used both for runtime monitoring and for improving a model at training time. In particular, we show that at runtime, model assertions can find high confidence errors, where a model returns the wrong output with high confidence, which uncertainty-based monitoring techniques would not detect. We also propose two methods to use model assertions at training time. First, we propose a bandit-based active learning algorithm that can sample from data flagged by assertions and show that it can reduce labeling costs by up to 33% over traditional uncertainty-based methods. Second, we propose an API for generating "consistency assertions" (e.g., the class change example) and weak labels for inputs where the consistency assertions fail, and show that these weak labels can improve relative model quality by up to 46%. We evaluate both algorithms on four real-world tasks with video, LIDAR, and ECG data.
Author Information
Daniel Kang (Stanford University)
Deepti Raghavan (Stanford University)
Peter Bailis (Stanford University)
Matei Zaharia (Stanford and Databricks)

Matei Zaharia is an Associate Professor of Computer Science at Stanford (moving to UC Berkeley later this year) and Chief Technologist and Cofounder of Databricks. His research has spanned distributed systems, databases, security and machine learning, with the most recent focus on systems for machine learning, natural language processing, and information retrieval. Matei started and contributed to multiple widely used open source projects including Apache Spark (his PhD project at UC Berkeley), MLflow, Dolly, Delta Lake, and ColBERT. His research was recognized through the 2014 ACM Doctoral Dissertation Award, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Oral: Model Assertions for Monitoring and Improving ML Models »
Mon. Mar 2nd through Tue the 3rd Room Ballroom A
More from the Same Authors
-
2023 Poster: MegaBlocks: Efficient Sparse Training with Mixture-of-Experts »
Trevor Gale · Deepak Narayanan · Cliff Young · Matei Zaharia -
2023 Invited Talk: Improving the Quality and Factuality of Large Language Model Applications »
Matei Zaharia -
2020 Workshop: MLOps Systems »
Debo Dutta · Matei Zaharia · Ce Zhang -
2020 Oral: MLPerf Training Benchmark »
Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia -
2020 Oral: Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc »
Zhihao Jia · Sina Lin · Mingyu Gao · Matei Zaharia · Alex Aiken -
2020 Poster: Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference »
Peter Kraft · Daniel Kang · Deepak Narayanan · Shoumik Palkar · Peter Bailis · Matei Zaharia -
2020 Poster: Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc »
Zhihao Jia · Sina Lin · Mingyu Gao · Matei Zaharia · Alex Aiken -
2020 Poster: MLPerf Training Benchmark »
Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia -
2020 Oral: Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference »
Peter Kraft · Daniel Kang · Deepak Narayanan · Shoumik Palkar · Peter Bailis · Matei Zaharia