Skip to yearly menu bar Skip to main content


Show Detail Timezone:
America/Los_Angeles
 
Filter Rooms:  

 

MON 5 APR
8:30 a.m.
Symposium:
(ends 5:00 PM)
TUE 6 APR
8 a.m.
Remarks:
(ends 8:15 AM)
8:20 a.m.
Invited Talk:
William Dally
(ends 9:10 AM)
9:30 a.m.
Oral s 9:30-10:50
[9:30] ModularNAS: Towards Modularized and Reusable Neural Architecture Search
[9:50] Fluid: Resource-aware Hyperparameter Tuning Engine
[10:10] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers
[10:30] Characterizing and Taming Model Instability Across Edge Devices
(ends 10:50 AM)
11:10 a.m.
Oral s 11:10-12:30
[11:10] Cortex: A Compiler for Recursive Deep Learning Models
[11:30] A Deep Learning Based Cost Model for Automatic Code Optimization
[11:50] Learning Fitness Functions for Machine Programming
[12:10] CODE: Compiler-based Neuron-aware Ensemble training
(ends 12:30 PM)
1:30 p.m.
Oral s 1:30-2:50
[1:30] Pufferfish: Communication-efficient Models At No Extra Cost
[1:50] In-network Aggregation for Shared Machine Learning Clusters
[2:10] Data Movement Is All You Need: A Case Study on Optimizing Transformers
[2:30] Learning on Distributed Traces for Data Center Storage Systems
(ends 2:50 PM)
3:20 p.m.
Oral s 3:20-5:00
[3:20] TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems
[3:40] Scaling Distributed Training with Adaptive Summation
[4:00] PipeMare: Asynchronous Pipeline Parallel DNN Training
[4:20] EXPLORING THE LIMITS OF CONCURRENCY IN ML TRAINING ON GOOGLE TPUS
[4:40] TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
(ends 5:00 PM)
WED 7 APR
8 a.m.
Invited Talk:
Jeannette Wing
(ends 8:50 AM)
9:10 a.m.
Oral s 9:10-10:50
[9:10] An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
[9:30] Adaptive Gradient Communication via Critical Learning Regime Identification
[9:50] Don't Forget to Sign the Gradients!
[10:10] Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
[10:30] Bit Error Robustness for Energy-Efficient DNN Accelerators
(ends 10:50 AM)
10:50 a.m.
Break - Visit the Sponsor Hall
11:10 a.m.
Oral s 11:10-12:30
[11:10] RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads
[11:30] A Learned Performance Model for Tensor Processing Units
[11:50] Accounting for Variance in Machine Learning Benchmarks
[12:10] Larq Compute Engine: Design, Benchmark and Deploy State-of-the-Art Binarized Neural Networks
(ends 12:30 PM)
12:30 p.m.
Lunch Break / Visit the Sponsor Hall
1:30 p.m.
Oral s 1:30-2:50
[1:30] IOS: Inter-Operator Scheduler for CNN Acceleration
[1:50] Value Learning for Throughput Optimization of Deep Learning Workloads
[2:10] ByzShield: An Efficient and Robust System for Distributed Training
[2:30] FirePlace: Placing Firecraker Virtual Machines with Hindsight Imitation
(ends 2:50 PM)
2:50 p.m.
Break - Visit the Sponsor Hall
3:20 p.m.
Oral s 3:20-5:00
[3:20] Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference
[3:40] MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
[4:00] VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
[4:20] Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity
[4:40] sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data
(ends 5:00 PM)
THU 8 APR
8 a.m.
8:50 a.m.
Break - Visit the Sponsor Hall
9:10 a.m.
Oral s 9:10-10:50
[9:10] Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick
[9:30] Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models
[9:50] A Distributed Graph-Theoretic Framework for Automatic Parallelization in Multi-core Systems
[10:10] Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More
[10:30] Scaling Polyhedral Neural Network Verification on GPUs
(ends 10:50 AM)
10:50 a.m.
Break - Visit the Sponsor Hall
11:10 a.m.
Oral s 11:10-12:30
[11:10] SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection
[11:30] Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy
[11:50] Equality Saturation for Tensor Graph Superoptimization
[12:10] Doping: A technique for Extreme Compression of LSTM Models using Sparse Structured Additive Matrices
(ends 12:30 PM)
12:30 p.m.
Lunch Break / Visit the Sponsor Hall
1:30 p.m.
Oral s 1:30-2:50
[1:30] Swift for TensorFlow: A portable, flexible platform for deep learning
[1:50] Amazon SageMaker Debugger: A System for Real-Time Insights into Machine Learning Model Training
[2:10] FLAML: A Fast and Lightweight AutoML Library
[2:30] To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks
(ends 2:50 PM)
2:50 p.m.
Break - Visit the Sponsor Hall
3:20 p.m.
Oral s 3:20-4:40
[3:20] Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters
[3:40] Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
[4:00] Wavelet: Efficient DNN Training with Tick-Tock Scheduling
[4:20] Pipelined Backpropagation at Scale: Training Large Models without Batches
(ends 4:40 PM)
4:40 p.m.
Remarks:
(ends 5:00 PM)