Online Experimentation for Cloud Applications
The need to deliver code changes to production systems to satisfy new requirements has fueled the adoption of an agile software development practice called onlineexperimentation. Online experimentation provides insight into the value delivered by new application versions as they are exposed to users.
To solve the online experimentation problem for web and mobile applications, practitioners use A/B tests or more advanced methods such as multi-armed bandit algorithms. These approaches entail comparing and assessing application versions online to determine the best version based on business requirements such as user-engagement. However, existing techniques and their formulations do not capture the unique complexities in cloud systems.
When assessing the outcomes of releases of microservices or machine learning (ML) models in the cloud, practitioners must simultaneously consider application performance as well as business metrics. This difference arises because a cloud application’s behavior is inherently volatile due to an increased likelihood of performance bugs or variability, which can degrade desired business results. For example, Amazon reported that every 100ms of latency costs them 1% in sales. As a result of these complexities, the deployment of cloud applications is more art than science when contrasted with the approaches adopted in the web and mobile domains. However, practitioners lack rigorous solutions for code releases, making it difficult to automatically learn and optimize for both business metrics and application performance with statistical guarantees.
This tutorial aims to provide a new perspective to rethink online experimentation in the cloud era. The tutorial will study the field of online experimentation and popular existing approaches, address their shortcomings in the cloud, and discuss key challenges and requirements for real-world solutions. Participants will get a chance to craft and run an
online experiment on an open-source system designed for online experimentation of microservices and ML models deployed on the cloud.
Advances in machine learning are having exciting impacts in a variety of domains from neuroscience to chemistry to astronomy. Machine learning has been somewhat slow to interface with more traditional engineering domains such as mechanical, civil, and chemical engineering. Nevertheless, there is huge potential for ML to accelerate everything in the engineering design cycle: CAD, simulation, fabrication, and control. In this talk I will discuss some of the exciting work starting to happen at this interface—from deep generative modeling to differentiable physical simulation---and take a look forward at what might be possible.
Ryan Adams is a machine learning researcher and Professor of Computer Science at Princeton University. Ryan completed his Ph.D. in physics under David MacKay at the University of Cambridge, where he was a Gates Cambridge Scholar and a member of St. John's College. Following his Ph.D. Ryan spent two years as a Junior Research Fellow at the University of Toronto as a part of the Canadian Institute for Advanced Research. From 2011-2016, he was an Assistant Professor at Harvard University in the School of Engineering and Applied Sciences. In 2015, Ryan sold the company he co-founded, Whetlab, to Twitter and he spent three years in industry at Twitter and Google before joining the faculty at Princeton in 2018. Ryan has won paper awards at ICML, UAI, and AISTATS, received the DARPA Young Faculty Award and the Alfred P. Sloan Fellowship. He also co-hosted the popular Talking Machines podcast.
ASTRA-sim: Enabling SW/HW Co-Design Exploration for Distributed Deep Learning Training Platforms
Modern Deep Learning systems heavily rely on distributed training over customized high-performance accelerator (e.g.,
TPU, GPU)-based hardware platforms connected via high-performance interconnects (e.g., NVlinks). Examples today
include NVIDIA’s DGX-2, Google’s Cloud TPU and Facebook’s Zion. Deep Neural Network (DNN) training involves a
complex interplay between the DNN model architecture, parallelization strategy, scheduling strategy, collective
communication algorithm, network topology, and the accelerator endpoint, as shown in the figure above.
Collective communications (e.g., all-reduce, all-to-all, reduce-scatter, all-gather) are initiated at different phases for different parallelism approaches – and play a crucial role in overall runtime, if not hidden efficiently behind compute. This problem becomes paramount as recent models for NLP such as GPT-3 and Recommendations such as DLRM have billions to trillions of parameters and need to be scaled across tens to hundreds to thousands of accelerator nodes. As innovation in AI/ML models continues to grow at an accelerated rate, there is a need for a comprehensive methodology to understand and navigate this complex design-space to (i) architect future platforms and (ii) develop novel parallelism schemes to support efficient training of future DNN models.
As an ongoing collaboration between Intel, Facebook and Georgia Tech, we have been jointly developing a detailed cycle-
accurate distributed training simulator called ASTRA-sim. ASTRA-sim models the co-design space described above and
schedules the compute-communication interactions from distributed training over plug-and-play compute and network
simulators. It enables a systematic study of bottlenecks at the software and hardware level for scaling training. It also enables end-to-end design-space exploration for running large DNN models over future training platforms. Papers detailing ASTRA-sim were presented at ISPASS 2020 and Hot Interconnects 2020. Currently, ASTRA-sim uses SCALE-sim (a Google TPU like simulator) as its compute model and provides a suite of network models (analytical network, Garnet from gem5 and NS3) to go from simple analytical to detailed cycle-accurate simulation of large-scale training platforms. To the best of our knowledge, ASTRA-sim is the first open-source simulator for modeling future distributed training platforms.
In this tutorial, we will educate the research community about the challenges in the emerging domain of distributed training, demonstrate the capabilities of ASTRA-sim with examples and discuss ongoing development efforts.
Training-Free Approaches for Edge AI: Challenges, Opportunities and Progress
With the explosion in Big Data, it is often forgotten that much of the data nowadays is generated at the edge. Specifically, a major source of data is users’ endpoint devices like phones, smart watches, etc., that are connected to the internet, also known as the Internet-of-Things (IoT). Despite the huge success of deep learning (DL) in many areas (e.g., computer vision, natural language processing, etc.), the size and the computational complexity of the existing state-of-the art deep models limit the deployment of DL on resource-constrained devices and its large-scale adoption in EdgeAI. Neural architecture search (NAS) (also called AutoML) techniques have been proposed to automatically design neural architectures with reduced model sizes. The networks obtained via NAS have higher prediction accuracy and significantly fewer parameters than the hand-crafted networks. However, adapting existing NAS approaches to different hardware architectures is challenging due to their intensive computation and execution time requirements.
To address such issues, in this tutorial, we focus on the newest and perhaps the most promising breed of NAS for EdgeAI, namely approaches that are training-free and thus eminently suited for large-scale development. In particular, we plan to address a few relevant questions: What kind of system architectures can meet the AI algorithm requirements, while maximizing the prediction accuracy, inference speed as well as energy efficiency? Can we use network science or deep learning theory to understand what kind of network architectures can achieve good performance without training individual models? Can we develop efficient approaches that enable co-optimization of deep neural network accuracy and performance on real hardware platforms?
Starting from these overarching ideas, in this tutorial, we will cover both algorithmic and hardware- aware aspects of training-free model design for EdgeAI, show state-of-the-art results for relevant edge applications, and illustrate potential implications on real edge devices.
PRESENTERS: Radu Marculescu (The University of Texas at Austin), Ming Lin (Amazon), Atlas Wang (The University of Texas at Austin), Kartikeya Bhardwaj (ARM).
Conference tutorial page available here.