MLSys 2021 Schedule

Filter Events

MON 5 APR

8:30 a.m.

Symposium:

Chips and Compilers Symposium

(ends 5:00 PM)

TUE 6 APR

8 a.m.

Remarks:

Opening Remarks

(ends 8:15 AM)

8:20 a.m.

Invited Talk:

Directions for Deep Learning Hardware

William Dally

(ends 9:10 AM)

9:30 a.m.

Session 1: Search and Devices [9:30-10:50]

Oral s 9:30-10:50

[9:30] ModularNAS: Towards Modularized and Reusable Neural Architecture Search

[9:50] Fluid: Resource-aware Hyperparameter Tuning Engine

[10:10] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

[10:30] Characterizing and Taming Model Instability Across Edge Devices

(ends 10:50 AM)

11:10 a.m.

Session 2: Compilers [11:10-12:30]

Oral s 11:10-12:30

[11:10] Cortex: A Compiler for Recursive Deep Learning Models

[11:30] A Deep Learning Based Cost Model for Automatic Code Optimization

[11:50] Learning Fitness Functions for Machine Programming

[12:10] CODE: Compiler-based Neuron-aware Ensemble training

(ends 12:30 PM)

1:30 p.m.

Session 3: Communication and Storage [1:30-2:50]

Oral s 1:30-2:50

[1:30] Pufferfish: Communication-efficient Models At No Extra Cost

[1:50] In-network Aggregation for Shared Machine Learning Clusters

[2:10] Data Movement Is All You Need: A Case Study on Optimizing Transformers

[2:30] Learning on Distributed Traces for Data Center Storage Systems

(ends 2:50 PM)

3:20 p.m.

Session 4: Training (I) [3:20-5:00]

Oral s 3:20-5:00

[3:20] TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems

[3:40] Scaling Distributed Training with Adaptive Summation

[4:00] PipeMare: Asynchronous Pipeline Parallel DNN Training

[4:20] EXPLORING THE LIMITS OF CONCURRENCY IN ML TRAINING ON GOOGLE TPUS

[4:40] TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

(ends 5:00 PM)

5 p.m.

Poster Session 1 [5:00-]

Cortex: A Compiler for Recursive Deep Learning Models

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

In-network Aggregation for Shared Machine Learning Clusters

EXPLORING THE LIMITS OF CONCURRENCY IN ML TRAINING ON GOOGLE TPUS

Learning Fitness Functions for Machine Programming

A Deep Learning Based Cost Model for Automatic Code Optimization

PipeMare: Asynchronous Pipeline Parallel DNN Training

Scaling Distributed Training with Adaptive Summation

Learning on Distributed Traces for Data Center Storage Systems

Pufferfish: Communication-efficient Models At No Extra Cost

ModularNAS: Towards Modularized and Reusable Neural Architecture Search

Fluid: Resource-aware Hyperparameter Tuning Engine

Characterizing and Taming Model Instability Across Edge Devices

Data Movement Is All You Need: A Case Study on Optimizing Transformers

TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems

CODE: Compiler-based Neuron-aware Ensemble training

(ends 6:00)

WED 7 APR

8 a.m.

Invited Talk:

Trustworthy AI

Jeannette Wing

(ends 8:50 AM)

9:10 a.m.

Session 5: Gradients and Precision [9:10-10:50]

Oral s 9:10-10:50

[9:10] An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

[9:30] Adaptive Gradient Communication via Critical Learning Regime Identification

[9:50] Don't Forget to Sign the Gradients!

[10:10] Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators

[10:30] Bit Error Robustness for Energy-Efficient DNN Accelerators

(ends 10:50 AM)

10:50 a.m.

Break - Visit the Sponsor Hall

11:10 a.m.

Session 6: Benchmarks, Cost models, and Profiling [11:10-12:30]

Oral s 11:10-12:30

[11:10] RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads

[11:30] A Learned Performance Model for Tensor Processing Units

[11:50] Accounting for Variance in Machine Learning Benchmarks

[12:10] Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks

(ends 12:30 PM)

12:30 p.m.

Lunch Break / Visit the Sponsor Hall

1:30 p.m.

Session 7: Systems [1:30-2:50]

Oral s 1:30-2:50

[1:30] IOS: Inter-Operator Scheduler for CNN Acceleration

[1:50] Value Learning for Throughput Optimization of Deep Learning Workloads

[2:10] ByzShield: An Efficient and Robust System for Distributed Training

[2:30] FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation

(ends 2:50 PM)

2:50 p.m.

Break - Visit the Sponsor Hall

3:20 p.m.

Session 8: Inference [3:20-5:00]

Oral s 3:20-5:00

[3:20] Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

[3:40] MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

[4:00] VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

[4:20] Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity

[4:40] sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

(ends 5:00 PM)

5 p.m.

Poster Session 2 [5:00-]

IOS: Inter-Operator Scheduler for CNN Acceleration

Don't Forget to Sign the Gradients!

Bit Error Robustness for Energy-Efficient DNN Accelerators

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Adaptive Gradient Communication via Critical Learning Regime Identification

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators

An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Value Learning for Throughput Optimization of Deep Learning Workloads

A Learned Performance Model for Tensor Processing Units

FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation

sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks

Accounting for Variance in Machine Learning Benchmarks

RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads

ByzShield: An Efficient and Robust System for Distributed Training

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity

(ends 6:00)

THU 8 APR

8 a.m.

Invited Talk:

Machine Learning in Science: Applications, Algorithms and Architectures

Kathy Yelick

(ends 8:50 AM)

8:50 a.m.

Break - Visit the Sponsor Hall

9:10 a.m.

Session 9: Hardware [9:10-10:50]

Oral s 9:10-10:50

[9:10] Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick

[9:30] Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

[9:50] A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core Systems

[10:10] Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

[10:30] Scaling Polyhedral Neural Network Verification on GPUs

(ends 10:50 AM)

10:50 a.m.

Break - Visit the Sponsor Hall

11:10 a.m.

Session 10: Techniques, and more Techniques [11:10-12:30]

Oral s 11:10-12:30

[11:10] SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection

[11:30] Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy

[11:50] Equality Saturation for Tensor Graph Superoptimization

[12:10] Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices

(ends 12:30 PM)

12:30 p.m.

Lunch Break / Visit the Sponsor Hall

1:30 p.m.

Session 11: Tools [1:30-2:50]

Oral s 1:30-2:50

[1:30] Swift for TensorFlow: A portable, flexible platform for deep learning

[1:50] Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model Training

[2:10] FLAML: A Fast and Lightweight AutoML Library

[2:30] To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks

(ends 2:50 PM)

2:50 p.m.

Break - Visit the Sponsor Hall

3:20 p.m.

Session 12: Training (II) [3:20-4:40]

Oral s 3:20-4:40

[3:20] Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

[3:40] Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

[4:00] Wavelet: Efficient DNN Training with Tick-Tock Scheduling

[4:20] Pipelined Backpropagation at Scale: Training Large Models without Batches

(ends 4:40 PM)

4:40 p.m.

Remarks:

Closing Remarks

(ends 5:00 PM)

5 p.m.

Poster Session 3 [5:00-]

Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy

Pipelined Backpropagation at Scale: Training Large Models without Batches

Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery

Scaling Polyhedral Neural Network Verification on GPUs

Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick

To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks

Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

Swift for TensorFlow: A portable, flexible platform for deep learning

Equality Saturation for Tensor Graph Superoptimization

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

FLAML: A Fast and Lightweight AutoML Library

SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection

Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices

A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core Systems

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

Wavelet: Efficient DNN Training with Tick-Tock Scheduling

Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model Training

(ends 6:00)

FRI 9 APR

6:15 a.m.

Workshop:

Personalized Recommendation Systems and Algorithms

(ends 3:00 PM)

7 a.m.

Workshop:

Workshop of Graph Neural Networks and Systems (GNNSys'21)

(ends 4:00 PM)

Workshop:

2nd On-Device Intelligence Workshop

(ends 3:00 PM)

7:45 a.m.

Workshop:

SysML4Health: Scalable Systems for ML-driven Analytics in Healthcare

(ends 4:00 PM)

8 a.m.

Workshop:

Journal of Opportunities, Unexpected limitations, Retrospectives, Negative results, and Experiences

(ends 3:00 PM)

Workshop:

Benchmarking Machine Learning Workloads on Emerging Hardware

(ends 5:00 PM)