ML-based Computer System Telemetry Analytics
This tutorial aims to introduce the audience to ML-based telemetry analytics for large-scale computing systems to improve system performance, resilience, and power efficiency. Modern large-scale computing systems (i.e., data centers, High-Performance Computing clusters, etc.) are highly parallel systems that perform numerous complex operations concurrently, and they are critical for many societal and scientific applications. These complex systems support higher degrees of parallelism, which often leads to significant resource contention and eventually to performance variability and loss of efficiency. One way to assess system performance and identify the root causes of problems is by gathering and inspecting telemetry data. Such telemetry (of hundreds or thousands of hardware and software sensors) and log data are readily
available on any computer system today. As this system data contains billions of data points per day, manual analysis is impractical and has limited benefits. Considering the limitations of manual analysis, ML is emerging as a promising approach to automate performance analytics. Also, computer system telemetry analytics is a challenging application area with many open problems since labeled data is scarcely available, whereas unlabeled data can reach up to the scale of terabytes per day.
The goal of this tutorial is twofold. First, the tutorial provides an overview of telemetry data-based analytics and shows why ML-based approaches are more promising than existing methods that can identify which applications are running on compute nodes, performance or other anomalies, and root causes of anomalies. Participants will learn and experience these materials directly during hands-on activities through the use of open-source analytics frameworks designed by the speakers' teams at Boston University and the University of Bologna. At the end of this tutorial, participants will have a better understanding of the challenges and opportunities and gain the skills needed to employ ML-based frameworks for
solving complex problems in computer systems.
Data is a key driver of modern economy and AI/machine learning, however, a lot of this data is sensitive and handling the sensitive data has caused unprecedented challenges for both individuals and businesses, and these challenges will only get more severe as we move forward in the digital era. In this talk, I will talk about technologies needed for responsible data use including secure computing, differential privacy, federated learning, as well as blockchain technologies for data rights, and how to combine privacy computing technologies and blockchain to building a platform for a responsible data economy, to enable the creation of a new type of asset, i.e., data assets, more responsible use of data and fair distribution of value created from data.
We plan roundtable discussions on Tuesday to connect early career professionals attending MLSys with senior MLSys conference attendee(s). When you sign up, we will group early career professionals at a reserved lunch table where you can meet new people and an assigned senior mentor at your table! This is an informal, group mentee-mentor event that lowers the barrier for young professionals starting their careers in the MLSys community.
Signup here
Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems
A plethora of ML models are either sparsified (such as deep neural networks) to save memory footprint and FLOPs or inherently sparse due to their unstructured nature (such as graph neural networks). Nevertheless, even though sparsity is desired in theory, it often hampers the performance in practice because existing heterogeneous systems (such as GPUs and FPGAs) fall short in irregular computations. For example, as the GPU architectures are optimized for regular, dense computations, only a tiny portion of the theoretical GPU performance is realized when performing sparse computation. In this tutorial, we discuss the source of sparsity in deep neural networks as well as key techniques for mapping the sparse computation on heterogeneous systems to support high-performance inference and training. We will conclude this tutorial with a discussion on future work on model parallelism for optimizing sparse communications for large-scale sparse ML models.