MLSys 2022 Papers

Skip to yearly menu bar Skip to main content

Layout:

mini compact topic detail

by

BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling

Gyro Dropout: Maximizing Ensemble Effect in Neural Network Training

Collapsible Linear Blocks for Super-Efficient Super Resolution

DietCode: Automatic Optimization for Dynamic Tensor Programs

QuadraLib: A Performant Quadratic Neural Network Library for Architecture Optimization and Design Exploration

Graphiler: Optimizing Graph Neural Networks with Message Passing Data Flow Graph

URSABench: A System for Comprehensive Benchmarking of Bayesian Deep Neural Network Models and Inference methods

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules

MLPerf Mobile Inference Benchmark: An Industry-Standard Open-Source Machine Learning Benchmark for On-Device AI

Towards the Co-design of Neural Networks and Accelerators

Sustainable AI: Environmental Implications, Challenges and Opportunities

A Tale of Two Models: Constructing Evasive Attacks on Edge Models

Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining

SLA-Driven ML INFERENCE FRAMEWORK FOR CLOUDS WITH HETEROGENEOUS ACCELERATORS

torch.fx: Practical Program Capture and Transformation for Deep Learning in Python

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

Apollo: Automatic Partition-based Operator Fusion through Layer by Layer Optimization

REX: Revisiting Budgeted Training with an Improved Schedule

VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware

Learning Compressed Embeddings for On-Device Inference

Improving Model Training with Multi-fidelity Hyperparameter Evaluation

TyXe: Pyro-based Bayesian neural nets for Pytorch

HALOS: Hashing Large Output Space for Cheap Inference

ULPPACK: Fast Sub-8-bit Matrix Multiply on Commodity SIMD Hardware

LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning

Pathways: Asynchronous Distributed Dataflow for ML

On the Utility of Gradient Compression in Distributed Training Systems

NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Randomness in Neural Network Training: Characterizing the Impact of Tooling

Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective

FROTE: Feedback Rule-Driven Oversampling for Editing Models

Revelio: ML-Generated Debugging Queries for Finding Root Causes in Distributed Systems

GPU Semiring Primitives for Sparse Neighborhood Methods

The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers

QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity

SRIFTY: Swift and Thrifty Distributed Neural Network Training on the Cloud

Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs

TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

TorchSparse: Efficient Point Cloud Inference Engine

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems

dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training

Matchmaker: Data Drift Mitigation in Machine Learning for Large-Scale Systems

ML-EXray: Visibility into ML Deployment on the Edge

mmSampler: Efficient Frame Sampler for Multimodal Video Retrieval

Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance

PAPAYA: Practical, Private, and Scalable Federated Learning

Efficient Strong Scaling Through Burst Parallel Training

AccMPEG: Optimizing Video Encoding for Accurate Video Analytics