(46 events)
Timezone: »
Toggle Poster Visibility
Poster
None
Unified Convolution Framework: A compiler-based approach to support sparse convolutions
Poster
None
Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning
Poster
None
GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing
Poster
None
Be Careful with PyPI Packages: You May Unconsciously Spread Backdoor Model Weights
Poster
None
HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs
Poster
None
SUBGRAPH STATIONARY HARDWARE-SOFTWARE INFERENCE CO-DESIGN
Poster
None
GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning
Poster
None
Pre-trained Neural Cost Models for Efficient Embedding Table Sharding in Deep Learning Recommendation Models
Poster
None
Sirius: Harvesting Whole-Program Optimization Opportunities for DNNs
Poster
None
DISTRIBUTED DEEP LEARNING BUILT ON TENSOR-OPTIMIZED REMOTE PROCEDURE CALLS
Poster
None
XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse
Poster
None
FedTree: A Federated Learning System For Trees
Poster
None
Building Verified Neural Networks for Computer Systems with Ouroboros
Poster
None
Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization
Poster
None
On Optimizing the Communication of Model Parallelism
Poster
None
Exploiting Hardware Utilization and Adaptive Dataflow for Efficient Sparse Convolution in 3D Point Clouds
Poster
None
AutoScratch: ML-Optimized GPU Cache Management
Poster
None
μ-TWO: MULTI-MODEL TRAINING WITH ORCHESTRATION AND MEMORY OPTIMIZATION
Poster
None
ALCOP: Automatic Load-Compute Pipelining in Deep Learning Compiler for AI-GPUs
Poster
None
Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models
Poster
None
Safe Optimized Static Memory Allocation for Parallel Deep Learning
Poster
None
Flex: Adaptive Mixture-of-Experts at Scale
Poster
None
Virtual Machine Allocation with Lifetime Predictions
Poster
None
Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training
Poster
None
Renee: END-TO-END TRAINING OF EXTREME CLASSIFICATION MODELS
Poster
None
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure
Poster
None
Transcending Runtime-Memory Tradeoffs in Checkpointing by being Fusion Aware
Poster
None
OptNNS: Optimising Transformation of Neural Network Subgraphs
Poster
None
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training
Poster
None
Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation
Poster
None
Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Poster
None
FLINT: A Platform for Federated Learning Integration
Poster
None
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Poster
None
On Noisy Evaluation in Federated Hyperparameter Tuning
Poster
None
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
Poster
None
ApproxCaliper: A Programmable Framework for Application-aware Neural Network Optimization
Poster
None
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
Poster
None
Cuttlefish: Low-rank Model Training without All The Tuning
Poster
None
Communication-Efficient Graph Neural Networks with Probabilistic Neighborhood Expansion Analysis and Caching
Poster
None
Breadth-First Pipeline Parallelism
Poster
None
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Poster
None
Efficiently Scaling Transformer Inference
Poster
None
Uniform Sparsity in Deep Neural Networks
Poster
None
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Poster
None
Reducing Activation Recomputation in Large Transformer Models
Poster
None
Validating Large Language Models with ReLM