MLSys 2023 Papers

Skip to yearly menu bar Skip to main content

Layout:

mini compact topic detail

by

FedTree: A Federated Learning System For Trees

Efficiently Scaling Transformer Inference

Transcending Runtime-Memory Tradeoffs in Checkpointing by being Fusion Aware

Be Careful with PyPI Packages: You May Unconsciously Spread Backdoor Model Weights

X-RLFLOW: GRAPH REINFORCEMENT LEARNING FOR NEURAL NETWORK SUBGRAPHS TRANSFORMATION

FLINT: A Platform for Federated Learning Integration

Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training

Renee: END-TO-END TRAINING OF EXTREME CLASSIFICATION MODELS

SIRIUS: Harvesting Whole-Program Optimization Opportunities for DNNs

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs

Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training

RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

Building Verified Neural Networks for Computer Systems with Ouroboros

ApproxCaliper: A Programmable Framework for Application-aware Neural Network Optimization

Reducing Activation Recomputation in Large Transformer Models

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

Validating Large Language Models with ReLM

Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training

Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization

Breadth-First Pipeline Parallelism

GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing

On Noisy Evaluation in Federated Hyperparameter Tuning

Safe Optimized Static Memory Allocation for Parallel Deep Learning

PyTorch RPC: Distributed Deep Learning Built on Tensor-Optimized Remote Procedure Calls

Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Unified Convolution Framework: A compiler-based approach to support sparse convolutions

Cuttlefish: Low-Rank Model Training without All the Tuning

GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning

Edge Impulse: An MLOps Platform for Tiny Machine Learning

On Optimizing the Communication of Model Parallelism

Tutel: Adaptive Mixture-of-Experts at Scale

Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning

SUBGRAPH STATIONARY HARDWARE-SOFTWARE INFERENCE CO-DESIGN

XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse

AutoScratch: ML-Optimized Cache Management for Inference-Oriented GPUs

Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models

Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models

HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs

μ-TWO: 3× Faster Multi-Model Training with Orchestration and Memory Optimization

Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching

Virtual Machine Allocation with Lifetime Predictions

Uniform Sparsity in Deep Neural Networks