Research On Algorithms & Data Structures (ROADS) to Mega-AI Models

Workshop

Research On Algorithms & Data Structures (ROADS) to Mega-AI Models

Zhaozhuo Xu · Aditya Desai · Anshumali Shrivastava

Thu 8 Jun, 5:30 a.m. PDT

[ Abstract ] Workshop Website

The current state-of-the-art on numerous machine learning (ML) benchmarks comes from training enormous neural
network models on expensive, specialized hardware with massive quantities of data. However, this route to success in
deep learning is unsustainable. Training a large transformer model in natural language processing, for instance, can
incur a higher carbon footprint than the total lifetime cost of five cars1

. In addition, these state-of-the-art models require
immense memory and computing resources during deployment, which hinders their practical impact. To realize the
full promise and benefits of artificial intelligence, we must solve these scalability challenges prevalent in both training
and inference and design new algorithms with step-function improvements in efficiency.
This workshop aims to bring together both computer science researchers and practitioners focused on ML efficiency
to offer innovative solutions towards efficient modeling workflows grounded in principled algorithm design. We invite
papers that address the algorithmic efficiency and scalability of any component of the ML pipeline, including data
management, optimization algorithms, model training, and deployment. Topics of interest include, but are not limited
to, the following:
• Algorithms and data structures to improve the computational complexity of the forward and backward passes
within deep neural networks.
• Model compression approaches for training and inference, including pruning, quantization, parameter sharing,
etc.
• Data reduction (sketching, sampling, coresets, etc. ) and active sampling approach for faster training.
• Solutions to the large-scale nature of challenges in ML such as large-output prediction, large-vocabulary input,
enabling longer sequence transformers, higher resolution images, wider hidden layers, etc.
• Algorithmic solutions to the deployment challenges on resource-constrained devices like edge and mobile.
• Data structures for accelerating model inference, reducing memory, or accelerating training.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Thu 5:30 a.m. - 5:40 a.m.	Opening Remarks ( Opening Remarks ) >	🔗
Thu 5:40 a.m. - 6:40 a.m.	Keynote #1: Prof. Michael Mitzenmacher (Harvard) ( Keynote #1: Prof. Michael Mitzenmacher (Harvard) ) > SlidesLive Video	🔗
Thu 6:40 a.m. - 7:05 a.m.	Invited Talk 2: Hongyi Wang,Cuttlefish: Low-Rank Model Training without All the Tuning ( Invited Talk ) > SlidesLive Video	Carole-Jean Wu 🔗
Thu 7:05 a.m. - 7:30 a.m.	Invited Talk: Daochen Zha Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models ( Presentation ) >	🔗
Thu 7:30 a.m. - 7:50 a.m.	Break ( Poster Session ) >	🔗
Thu 7:50 a.m. - 8:30 a.m.	Keynote #2: Dr. Bilge Acun (Meta) ( Presentation ) > SlidesLive Video	🔗
Thu 8:30 a.m. - 8:55 a.m.	Invited Talk: Trevor Gale MegaBlocks: Efficient Sparse Training with Mixture-of-Experts ( Invited Talk ) >	🔗
Thu 8:55 a.m. - 9:20 a.m.	Invited Talk: Daochen Zha RSC: Accelerating Graph Neural Networks Training via Randomized Sparse Computations ( Invited Talk ) > SlidesLive Video	🔗
Thu 9:20 a.m. - 10:20 a.m.	Lunch Break ( Lunch Break ) >	🔗
Thu 10:20 a.m. - 11:00 a.m.	Keynote #3: Prof. Jonathan Frankle (Harvard) ( Keynote ) > SlidesLive Video	🔗
Thu 11:00 a.m. - 11:30 a.m.	Keynote #4: Prof. Furong Huang (UMD) ( Keynote ) > SlidesLive Video	🔗
Thu 11:30 a.m. - 12:10 p.m.	Keynote #5: Dr. Chen Luo (Amazon) ( Keynote ) > SlidesLive Video	🔗
Thu 12:10 p.m. - 12:30 p.m.	Break	🔗
Thu 12:30 p.m. - 1:30 p.m.	Panel Discussion (Mitzenmacher, Frankle, Acun, Luo, Shrivastava) ( Panel Discussion ) >	🔗
Thu 1:30 p.m. - 2:00 p.m.	Social ( Social ) >	🔗