Skip to yearly menu bar Skip to main content


Tutorial

Time Series Anomaly Detection : Tools, Techniques & Tricks

Patel Dhaval
Aug 29, 1:00 PM - 5:00 PM Room 203

This tutorial presents a design and implementation of a scikit-compatible system for detecting anomalies from time series data for the purpose of offering a broad range of algorithms to the end user, with special focus on unsupervised/semi-supervised learning. Given an input time series, we discuss how data scientist can construct four categories of anomaly pipelines followed by an enrichment module that helps to label anomaly. The tutorial provides an hand-on-experience using a deployed system on IBM API Hub for developer communities that aim to support a wide range of execution engines to meet the diverse need of anomaly workloads such as Serveless for CPU intensive work, GPU for deep-learning model training, etc.

Show more
View full details
Tutorial

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Hasan Genc
Aug 29, 1:00 PM - 5:00 PM Room 204

Please register online at this link if you would like to attend: https://forms.gle/Pd9tviuBHBno7G7F7

We present a tutorial that teaches users how to perform full-system, full-stack DNN accelerator evaluation using the Gemmini platform. Gemmini allows users to evaluate how a DNN hardware accelerator interacts with external components, like the cache hierarchy or virtual address translation scheme, to affect performance across the hardware-software-system stack.

With Gemmini, users can generate a variety of different DNN hardware accelerators, with different underlying system, SoC, and programming stack components. Users can evaluate the performance of their hardware accelerators on end-to-end workloads in a real-world system context, exposing how different system components, like the cache hierarchy, virtual address translation scheme, or operating system, impact performance in subtle but noticeable ways. Gemmini also allows users to program their applications at different “levels” of the programming stack, from high-level model compilation to low-level direct machine configuration. Overall, Gemmini enables users to explore and evaluate a variety of different DNN accelerator and system configurations, exposing how these different parameters interact to impact end-to-end performance and efficiency.

Gemmini has been presented previously at DAC 2021, where it won the Best Paper award, as
well as at an IISWC 2021 tutorial.

Show more
View full details
Tutorial

ML-based Computer System Telemetry Analytics

Burak Aksar
Aug 30, 8:00 AM - 10:15 AM Room 204

This tutorial aims to introduce the audience to ML-based telemetry analytics for large-scale computing systems to improve system performance, resilience, and power efficiency. Modern large-scale computing systems (i.e., data centers, High-Performance Computing clusters, etc.) are highly parallel systems that perform numerous complex operations concurrently, and they are critical for many societal and scientific applications. These complex systems support higher degrees of parallelism, which often leads to significant resource contention and eventually to performance variability and loss of efficiency. One way to assess system performance and identify the root causes of problems is by gathering and inspecting telemetry data. Such telemetry (of hundreds or thousands of hardware and software sensors) and log data are readily
available on any computer system today. As this system data contains billions of data points per day, manual analysis is impractical and has limited benefits. Considering the limitations of manual analysis, ML is emerging as a promising approach to automate performance analytics. Also, computer system telemetry analytics is a challenging application area with many open problems since labeled data is scarcely available, whereas unlabeled data can reach up to the scale of terabytes per day.

The goal of this tutorial is twofold. First, the tutorial provides an overview of telemetry data-based analytics and shows why ML-based approaches are more promising than existing methods that can identify which applications are running on compute nodes, performance or other anomalies, and root causes of anomalies. Participants will learn and experience these materials directly during hands-on activities through the use of open-source analytics frameworks designed by the speakers' teams at Boston University and the University of Bologna. At the end of this tutorial, participants will have a better understanding of the challenges and opportunities and gain the skills needed to employ ML-based frameworks for
solving complex problems in computer systems.

Show more
View full details
Tutorial

Sparsity in ML: Understanding and Optimizing Sparsity in Neural Networks Running on Heterogeneous Systems

Mert Hidayetoglu · Jinjun Xiong · Wen-Mei Hwu · Rakesh Nagi · Vikram Sharma Mailthody · Jeff Pool · Sitao Huang
Aug 30, 1:00 PM - 5:00 PM Room 204

A plethora of ML models are either sparsified (such as deep neural networks) to save memory footprint and FLOPs or inherently sparse due to their unstructured nature (such as graph neural networks). Nevertheless, even though sparsity is desired in theory, it often hampers the performance in practice because existing heterogeneous systems (such as GPUs and FPGAs) fall short in irregular computations. For example, as the GPU architectures are optimized for regular, dense computations, only a tiny portion of the theoretical GPU performance is realized when performing sparse computation. In this tutorial, we discuss the source of sparsity in deep neural networks as well as key techniques for mapping the sparse computation on heterogeneous systems to support high-performance inference and training. We will conclude this tutorial with a discussion on future work on model parallelism for optimizing sparse communications for large-scale sparse ML models.

Show more
View full details
Tutorial

Online Experimentation for Cloud Applications

Mert TOSLALI
Aug 31, 8:00 AM - 10:15 AM Room 203

The need to deliver code changes to production systems to satisfy new requirements has fueled the adoption of an agile software development practice called onlineexperimentation. Online experimentation provides insight into the value delivered by new application versions as they are exposed to users.

To solve the online experimentation problem for web and mobile applications, practitioners use A/B tests or more advanced methods such as multi-armed bandit algorithms. These approaches entail comparing and assessing application versions online to determine the best version based on business requirements such as user-engagement. However, existing techniques and their formulations do not capture the unique complexities in cloud systems.

When assessing the outcomes of releases of microservices or machine learning (ML) models in the cloud, practitioners must simultaneously consider application performance as well as business metrics. This difference arises because a cloud application’s behavior is inherently volatile due to an increased likelihood of performance bugs or variability, which can degrade desired business results. For example, Amazon reported that every 100ms of latency costs them 1% in sales. As a result of these complexities, the deployment of cloud applications is more art than science when contrasted with the approaches adopted in the web and mobile domains. However, practitioners lack rigorous solutions for code releases, making it difficult to automatically learn and optimize for both business metrics and application performance with statistical guarantees.

This tutorial aims to provide a new perspective to rethink online experimentation in the cloud era. The tutorial will study the field of online experimentation and popular existing approaches, address their shortcomings in the cloud, and discuss key challenges and requirements for real-world solutions. Participants will get a chance to craft and run an
online experiment on an open-source system designed for online experimentation of microservices and ML models deployed on the cloud.

Show more
View full details
Tutorial

ASTRA-sim: Enabling SW/HW Co-Design Exploration for Distributed Deep Learning Training Platforms

Tushar Krishna
Aug 31, 1:00 PM - 5:00 PM Room 203

Modern Deep Learning systems heavily rely on distributed training over customized high-performance accelerator (e.g.,
TPU, GPU)-based hardware platforms connected via high-performance interconnects (e.g., NVlinks). Examples today
include NVIDIA’s DGX-2, Google’s Cloud TPU and Facebook’s Zion. Deep Neural Network (DNN) training involves a
complex interplay between the DNN model architecture, parallelization strategy, scheduling strategy, collective
communication algorithm, network topology, and the accelerator endpoint, as shown in the figure above.

Collective communications (e.g., all-reduce, all-to-all, reduce-scatter, all-gather) are initiated at different phases for different parallelism approaches – and play a crucial role in overall runtime, if not hidden efficiently behind compute. This problem becomes paramount as recent models for NLP such as GPT-3 and Recommendations such as DLRM have billions to trillions of parameters and need to be scaled across tens to hundreds to thousands of accelerator nodes. As innovation in AI/ML models continues to grow at an accelerated rate, there is a need for a comprehensive methodology to understand and navigate this complex design-space to (i) architect future platforms and (ii) develop novel parallelism schemes to support efficient training of future DNN models.

As an ongoing collaboration between Intel, Facebook and Georgia Tech, we have been jointly developing a detailed cycle-
accurate distributed training simulator called ASTRA-sim. ASTRA-sim models the co-design space described above and
schedules the compute-communication interactions from distributed training over plug-and-play compute and network
simulators. It enables a systematic study of bottlenecks at the software and hardware level for scaling training. It also enables end-to-end design-space exploration for running large DNN models over future training platforms. Papers detailing ASTRA-sim were presented at ISPASS 2020 and Hot Interconnects 2020. Currently, ASTRA-sim uses SCALE-sim (a Google TPU like simulator) as its compute model and provides a suite of network models (analytical network, Garnet from gem5 and NS3) to go from simple analytical to detailed cycle-accurate simulation of large-scale training platforms. To the best of our knowledge, ASTRA-sim is the first open-source simulator for modeling future distributed training platforms.

In this tutorial, we will educate the research community about the challenges in the emerging domain of distributed training, demonstrate the capabilities of ASTRA-sim with examples and discuss ongoing development efforts.

Show more
View full details
Tutorial

Training-Free Approaches for Edge AI: Challenges, Opportunities and Progress

Radu Marculescu · Ming Lin · Atlas Wang · Kartikeya Bhardwaj
Aug 31, 1:00 PM - 5:00 PM Room 204

With the explosion in Big Data, it is often forgotten that much of the data nowadays is generated at the edge. Specifically, a major source of data is users’ endpoint devices like phones, smart watches, etc., that are connected to the internet, also known as the Internet-of-Things (IoT). Despite the huge success of deep learning (DL) in many areas (e.g., computer vision, natural language processing, etc.), the size and the computational complexity of the existing state-of-the art deep models limit the deployment of DL on resource-constrained devices and its large-scale adoption in EdgeAI. Neural architecture search (NAS) (also called AutoML) techniques have been proposed to automatically design neural architectures with reduced model sizes. The networks obtained via NAS have higher prediction accuracy and significantly fewer parameters than the hand-crafted networks. However, adapting existing NAS approaches to different hardware architectures is challenging due to their intensive computation and execution time requirements.

To address such issues, in this tutorial, we focus on the newest and perhaps the most promising breed of NAS for EdgeAI, namely approaches that are training-free and thus eminently suited for large-scale development. In particular, we plan to address a few relevant questions: What kind of system architectures can meet the AI algorithm requirements, while maximizing the prediction accuracy, inference speed as well as energy efficiency? Can we use network science or deep learning theory to understand what kind of network architectures can achieve good performance without training individual models? Can we develop efficient approaches that enable co-optimization of deep neural network accuracy and performance on real hardware platforms?

Starting from these overarching ideas, in this tutorial, we will cover both algorithmic and hardware- aware aspects of training-free model design for EdgeAI, show state-of-the-art results for relevant edge applications, and illustrate potential implications on real edge devices.

PRESENTERS: Radu Marculescu (The University of Texas at Austin), Ming Lin (Amazon), Atlas Wang (The University of Texas at Austin), Kartikeya Bhardwaj (ARM).

Conference tutorial page available here.

Show more
View full details