Skip to yearly menu bar Skip to main content


Timezone: US/Pacific
Filter Events
Registration Desk
6:30 AM - 5:00 PM
Tutorial
8:00 AM - 10:15 AM

This tutorial aims to introduce the audience to ML-based telemetry analytics for large-scale computing systems to improve system performance, resilience, and power efficiency. Modern large-scale computing systems (i.e., data centers, High-Performance Computing clusters, etc.) are highly parallel systems that perform numerous complex operations concurrently, and they are critical for many societal and scientific applications. These complex systems support higher degrees of parallelism, which often leads to significant resource contention and eventually to performance variability and loss of efficiency. One way to assess system performance and identify the root causes of problems is by gathering and inspecting telemetry data. Such telemetry (of hundreds or thousands of hardware and software sensors) and log data are readily
available on any computer system today. As this system data contains billions of data points per day, manual analysis is impractical and has limited benefits. Considering the limitations of manual analysis, ML is emerging as a promising approach to automate performance analytics. Also, computer system telemetry analytics is a challenging application area with many open problems since labeled data is scarcely available, whereas unlabeled data can reach up to the scale of terabytes per day.

The goal of this tutorial is twofold. First, the tutorial provides an overview of telemetry data-based analytics and shows why ML-based approaches are more promising than existing methods that can identify which applications are running on compute nodes, performance or other anomalies, and root causes of anomalies. Participants will learn and experience these materials directly during hands-on activities through the use of open-source analytics frameworks designed by the speakers' teams at Boston University and the University of Bologna. At the end of this tutorial, participants will have a better understanding of the challenges and opportunities and gain the skills needed to employ ML-based frameworks for
solving complex problems in computer systems.

... more
Oral
8:45 AM - 10:15 AM
5 Events in this session
Samuel A. Stein · Betis Baheri · Daniel Chen · Ying Mao · Qiang Guan · Ang Li · Shuai Xu · Caiwen Ding
Andrew Or · Haoyu Zhang · Michael None Freedman
Wasu Piriyakulkij · Cristina Menghini · Ross Briden · Nihal Vivekanand Nayak · Jeffrey Zhu · Elaheh Raisi · Stephen Bach
Carole-Jean Wu · Ramya Raghavendra · Udit Gupta · Bilge Acun · Newsha Ardalani · Kiwan Maeng · Gloria Chang · Fiona Aga · Jinshi Huang · Charles Bai · Michael Gschwind · Anurag Gupta · Myle Ott · Anastasia Melnikov · Salvatore Candido · David Brooks · Geeta Chauhan · Benjamin Lee · Hsien-Hsin Lee · Bugra Akyildiz · Maximilian Balandat · Joe Spisak · Ravi Jain · Mike Rabbat · Kim Hazelwood
Go to Event Page
Invited Talk
10:30 AM - 12:00 PM

Data is a key driver of modern economy and AI/machine learning, however, a lot of this data is sensitive and handling the sensitive data has caused unprecedented challenges for both individuals and businesses, and these challenges will only get more severe as we move forward in the digital era. In this talk, I will talk about technologies needed for responsible data use including secure computing, differential privacy, federated learning, as well as blockchain technologies for data rights, and how to combine privacy computing technologies and blockchain to building a platform for a responsible data economy, to enable the creation of a new type of asset, i.e., data assets, more responsible use of data and fair distribution of value created from data.

... more
Speaker Bio
Dawn Song
Dawn Song is a Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley. Her research interest lies in AI and deep learning, security and privacy. She is the recipient of various awards including the MacArthur Fellowship, the Guggenheim Fellowship, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review TR-35 Award, ACM SIGSAC Outstanding Innovation Award, and Test-of-Time Awards and Best Paper Awards from top conferences in Computer Security and Deep Learning. She is an ACM Fellow and an IEEE Fellow. She is ranked the most cited scholar in computer security (AMiner Award). She obtained her Ph.D. degree from UC Berkeley. She is also a serial entrepreneur. She is the Founder of Oasis Labs and has been named on the Female Founder 100 List by Inc. and Wired25 List of Innovators.
... more
Round Table Discussion
11:30 AM - 1:00 PM

We plan roundtable discussions on Tuesday to connect early career professionals attending MLSys with senior MLSys conference attendee(s). When you sign up, we will group early career professionals at a reserved lunch table where you can meet new people and an assigned senior mentor at your table! This is an informal, group mentee-mentor event that lowers the barrier for young professionals starting their careers in the MLSys community.

Signup here

... more
Oral
1:00 PM - 2:15 PM
4 Events in this session
Ankur Mallick · Kevin Hsieh · Behnaz Arzani · Gauri Joshi
Xinfeng Xie · Prakash Prabhu · Ulysse Beaugnon · Mangpo Phothilimthana · Sudip Roy · Azalia Mirhoseini · Eugene Brevdo · James Laudon · Yanqi Zhou
Junguk Cho · Diman Zad Tootaghaj · Lianjie Cao · Puneet Sharma
Yi Ding · Avinash Rao · Hyebin Song · Rebecca Willett · Henry (Hank) Hoffmann
Go to Event Page
Tutorial

Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems

Mert Hidayetoglu · Jinjun Xiong · Wen-Mei Hwu · Rakesh Nagi · Vikram Sharma Mailthody · Jeff Pool · Sitao Huang
1:00 PM - 5:00 PM

A plethora of ML models are either sparsified (such as deep neural networks) to save memory footprint and FLOPs or inherently sparse due to their unstructured nature (such as graph neural networks). Nevertheless, even though sparsity is desired in theory, it often hampers the performance in practice because existing heterogeneous systems (such as GPUs and FPGAs) fall short in irregular computations. For example, as the GPU architectures are optimized for regular, dense computations, only a tiny portion of the theoretical GPU performance is realized when performing sparse computation. In this tutorial, we discuss the source of sparsity in deep neural networks as well as key techniques for mapping the sparse computation on heterogeneous systems to support high-performance inference and training. We will conclude this tutorial with a discussion on future work on model parallelism for optimizing sparse communications for large-scale sparse ML models.

... more
Oral
2:15 PM - 3:30 PM
4 Events in this session
Kartikeya Bhardwaj · Milos Milosavljevic · Liam O'Neil · Dibakar Gope · Ramon Matas · Alex Chalfin · Alex Chalfin · Naveen Suda · Naveen Suda · Lingchuan Meng · Lingchuan Meng · Danny Loh · Danny Loh
Yanqi Zhou · Xuanyi Dong · Tianjian Meng · Mingxing Tan · Berkin Akin · Daiyi Peng · Amir Yazdanbakhsh · Da Huang · Ravi Narayanaswami · James Laudon
Saurabh Agarwal · Hongyi Wang · Shivaram Venkataraman · Dimitris Papailiopoulos
Seo Jin Park · Joshua Fried · Sunghyun Kim · Mohammad Alizadeh · Adam Belay
Go to Event Page
Oral
5 Events in this session
Pradeep Dogga · Karthik Narasimhan · Anirudh Sivaraman · Shiv Saini · George Varghese · Ravi Netravali
Donglin Zhuang · Xingyao Zhang · Shuaiwen Song · Sara Hooker
Ningning Xie · Tamara Norman · Dominik Grewe · Dimitrios Vytiniotis
Hanpeng Hu · Chenyu Jiang · Yuchen Zhong · Yanghua Peng · Chuan Wu · Yibo Zhu · Haibin Lin · Chuanxiong Guo
Wei Hao · Aahil Awatramani · Jiayang Hu · Chengzhi Mao · Pin-Chun Chen · Eyal Cidon · Asaf Cidon · Junfeng Yang
Go to Event Page