MLSys 2022 Tuesday 08/30

Timezone: US/Pacific

Schedule Mon Tue Wed Thu

Registration Desk: Registration Check-in Desk Tue 30 Aug 06:30 a.m.

Tutorial: Burak Aksar

ML-based Computer System Telemetry Analytics

This tutorial aims to introduce the audience to ML-based telemetry analytics for large-scale computing systems to improve system performance, resilience, and power efficiency. Modern large-scale computing systems (i.e., data centers, High-Performance Computing clusters, etc.) are highly parallel systems that perform numerous complex operations concurrently, and they are critical for many societal and scientific applications. These complex systems support higher degrees of parallelism, which often leads to significant resource contention and eventually to performance variability and loss of efficiency. One way to assess system performance and identify the root causes of problems is by gathering and inspecting telemetry data. Such telemetry (of hundreds or thousands of hardware and software sensors) and log data are readily
available on any computer system today. As this system data contains billions of data points per day, manual analysis is impractical and has limited benefits. Considering the limitations of manual analysis, ML is emerging as a promising approach to automate performance analytics. Also, computer system telemetry analytics is a challenging application area with many open problems since labeled data is scarcely available, whereas unlabeled data can reach up to the scale of terabytes per day.

The goal of this tutorial is twofold. First, the tutorial provides an overview of telemetry data-based analytics and shows why ML-based approaches are more promising than existing methods that can identify which applications are running on compute nodes, performance or other anomalies, and root causes of anomalies. Participants will learn and experience these materials directly during hands-on activities through the use of open-source analytics frameworks designed by the speakers' teams at Boston University and the University of Bologna. At the end of this tutorial, participants will have a better understanding of the challenges and opportunities and gain the skills needed to employ ML-based frameworks for
solving complex problems in computer systems.

Bio :

Burak Aksar is a Ph.D. student in the Department of Electrical and Computer Engineering of Boston University. He received his B.S. degree in Electronics Engineering from Sabanci University, Istanbul, Turkey. His research interests are applied machine learning & explainable AI techniques to improve the performance of large-scale computing systems. He has completed successful internships at IBM AI research and Sandia National Labs.

Oral: Systems for ML 2 Tue 30 Aug 08:45 a.m.

QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity

Samuel A. Stein · Betis Baheri · Daniel Chen · Ying Mao · Qiang Guan · Ang Li · Shuai Xu · Caiwen Ding

[ Exhibit Hall A ]

Registration Desk: Registration Check-in Desk Tue 30 Aug 06:30 a.m.

Tutorial: Burak Aksar

Oral: Systems for ML 2 Tue 30 Aug 08:45 a.m.

Invited Talk: Dawn Song

Round Table Discussion Tue 30 Aug 11:30 a.m.

Oral: ML for Systems Tue 30 Aug 01:00 p.m.

Tutorial: Mert Hidayetoglu · Jinjun Xiong · Wen-Mei Hwu · Rakesh Nagi · Vikram Sharma Mailthody · Jeff Pool · Sitao Huang

Oral: Hardware Efficient ML Tue 30 Aug 02:15 p.m.

Oral: Testing, Debugging and Monitoring & Security Tue 30 Aug 04:00 p.m.