Tom St John · Murali Emani · Wenqian Dong

[ Room 241 ]

With evolving system architectures, hardware and software stack, diverse machine learning workloads,
and data, it is important to understand how these components interact with each other. Well-defined
benchmarking procedures help evaluate and reason the performance gains with ML workload to system
Key problems that we seek to address are: (i) which representative ML benchmarks cater to workloads
seen in industry, national labs, and interdisciplinary sciences; (ii) how to characterize the ML workloads
based on their interaction with hardware; (iii) what novel aspects of hardware, such as heterogeneity in
compute, memory, and bandwidth, will drive their adoption; (iv) performance modeling and projections
to next-generation hardware.
The workshop will invite experts in these research areas to present recent work and potential directions
to pursue. Accepted papers from a rigorous evaluation process will present state-of-the-art research
efforts. A panel discussion will foster an interactive platform for discussion between speakers and the

Deniz Altınbüken · Lyric Doshi · Milad Hashemi · Martin Maas

[ Room 238 ]

Using ML for improving computer systems has seen a significant amount of work both in
academia and industry. However, deployed uses of such techniques remain rare. While many
published works in this space focus on solving the underlying learning problems, we observed
from an industry vantage point that some of the biggest challenges of deploying ML for Systems
in practice come from non-ML systems aspects, such as feature stability, reliability, availability,
ML integration into rollout processes, verification, safety guarantees, feedback loops introduced
by learning, debuggability, and explainability.
Building on the success of the first iteration of PACMI at MLSys ‘22, the goal of this workshop is
to raise awareness of these problems and bring together practitioners (both on the production
systems and ML side) and academic researchers, to work towards a methodology of capturing
these problems in academic research. We believe that starting this conversation between the
academic and industrial research communities will facilitate the adoption of ML for Systems
research in production systems, and will provide the academic community with access to new
research problems that exist in real-world deployments but have seen less attention in the
academic community.
The workshop will uniquely facilitate this conversation by providing a …

Zhaozhuo Xu · Aditya Desai · Anshumali Shrivastava

[ Room 239 ]

The current state-of-the-art on numerous machine learning (ML) benchmarks comes from training enormous neural
network models on expensive, specialized hardware with massive quantities of data. However, this route to success in
deep learning is unsustainable. Training a large transformer model in natural language processing, for instance, can
incur a higher carbon footprint than the total lifetime cost of five cars1

. In addition, these state-of-the-art models require
immense memory and computing resources during deployment, which hinders their practical impact. To realize the
full promise and benefits of artificial intelligence, we must solve these scalability challenges prevalent in both training
and inference and design new algorithms with step-function improvements in efficiency.
This workshop aims to bring together both computer science researchers and practitioners focused on ML efficiency
to offer innovative solutions towards efficient modeling workflows grounded in principled algorithm design. We invite
papers that address the algorithmic efficiency and scalability of any component of the ML pipeline, including data
management, optimization algorithms, model training, and deployment. Topics of interest include, but are not limited
to, the following:
• Algorithms and data structures to improve the computational complexity of the forward and backward passes
within deep neural networks.
• Model compression approaches …

Binhang Yuan · Beidi Chen · Virginia Smith · Ce Zhang · Christopher Re

[ Room 247 ]

Machine learning models, especially large language models such as GPT-3 and generative models for image
synthesis tasks such as Stable Diffusion, are primarily trained in a centralized data center today, with thousands of
GPUs lasting for weeks, if not months. The inference process of these models is also not cheap — given their
staggering size, these models are also often served with expensive cutting-edge GPUs hosted in a centralized data
center. Such a centralized paradigm is not only expensive but also greatly limits the accessibility to the rest of the
research community. Inspired by the great success of volunteer computing and federated learning projects such as
SETI@Home, Folding@Home, and FedML, making machine learning decentralized and collaborative can be a
promising alternative to this centralized paradigm. If we could exploit globally geo-distributed GPUs/edge devices
that are under-utilized, we would share one of the most powerful “supercomputers” in the world and potentially use
them for the next generation of open models!
In recent years, there has been significant progress in decentralized and collaborative learning. This includes new
theoretical and algorithmic developments (e.g., [1, 2, 3, 4]), and practical deployments including Training
Transformer Together [5] and Petals [6]. Together with recent advancements in …

Dimitris Stripelis · Chaoyang He · Hongyi Wang · Tian Li · Praneeth Vepakomma · Bo Li · Eric Xing

[ Room 242 ]

We organize this workshop to spur further research in the intersection of federated learning algorithmic optimization, model and data privacy & security, and federated learning systems efficiency and scalability.

Topics of interest include, but are not limited to:
• Challenges of FL systems deployment.
• Design and development of scalable FL systems.
• FL systems automation.
• FL systems in real-world, practical and production settings.
• FL systems with federated data management awareness.
• FL systems tailored for different learning applications, such as medical, finance, and manufacturing.
• FL systems tailored for different learning topologies, such as centralized, decentralized, and hierarchical.
• FL systems tailored for different data partitioning schemes, such as horizontal, vertical, and hybrid.
• FL systems with self-tuning capabilities.
• FL systems with failover capabilities.
• FL systems benchmark and evaluation.
• Data value and economics of data federations and FL systems.
• Auditable FL systems.
• Explainable FL systems.
• Interpretable FL systems.
• FL systems open challenges and vision perspectives.
• Incentives for formatting large-scale federations across organizations.
• Operational challenges in FL systems.
• Resilient and robust FL systems.
• Standardization of FL systems.
• Trade-offs between FL systems privacy, security, and efficiency.
• …

Vijay Janapa Reddi · Paul Whatmough · Vikas Chandra · Pete Warden · Brian Plancher · Colby Banbury · Matthew Stewart

[ Room 248 ]

Ubiquitous on-device artificial intelligence (AI) is the next step in transforming the myriad of
mobile computing devices in our everyday lives into a new class of truly “smart” devices capable
of constantly observing, learning, and adapting to their environment. Through advances in AI
technology, these intelligent devices will provide proactive assistance and enable new
applications, as well as making our lives safer and the world around us more energy efficient.

Present-day AI features, such as voice-based user interfaces on smartphones, often rely on a
connection to the cloud. In contrast, on-device AI promises to increase the energy efficiency,
privacy, responsiveness, and autonomy of embedded and edge devices by severing their tether
to the cloud. The 3rd on-device intelligence workshop aims to advance the state-of-the-art by
bringing together researchers and practitioners to discuss the key problems, disseminate new
research results, and provide practical tutorial material. Due to the multidisciplinary nature of
on-device AI, collaboration across the traditional computing stack is crucial.

We aim to bring together experts to discuss solutions to the following key challenges:
(1) How do we design, train and optimize ML models tailored to fit a plethora of edge
devices with constrained compute, storage and energy budgets?
(2) …

Navid NaderiAlizadeh · M. Hadi Amini · Virginia Smith · Ahmed Alkhateeb · Ravikumar Balakrishnan · Arash Behboodi · Jakob Hoydis · Christoph Studer

[ Room 246 ]

This workshop seeks to bring ML and wireless networking experts together to identify interdisciplinary approaches to evolve ML algorithms for and over communication networks that operate under constrained resources, including time, labeling, and computational capacity constraints. The workshop will provide a unique opportunity to expose the MLSys community to the challenges and opportunities of integrating ML methods into resource-constrained communication networks. It will also highlight emerging trends in ML with limited resources and their implications for the design and operation of next-generation communication networks.

We are seeking original submissions in topics including, but not limited to:

- Learning in wireless networks with limited training data
- Multi-agent federated/distributed learning with low computational and communication resources
- Communicated data compression for network-wide task completion
- Online learning with wireless latency constraints
- Learning in wireless networks with privacy constraints
- Few-shot learning and adaptation in wireless environments
- Datasets and benchmarks for resource-constrained learning in wireless networks

Jason Yik · Brian Anderson · Charlotte Frenkel · Vijay Janapa Reddi · Zergham Ahmed

[ Room 240 ]

The workshop begins at 9:50am in Room 240. Please see our schedule on our website.

Deep learning methods have made great strides in machine intelligence over the past few
years, but they are now having trouble keeping up with the growing amount of data and
resources. As traditional system architectures get closer to their physical limits, the problem of
compute scalability is getting worse, which makes it hard to predict how far AI methods and
systems can go in the future. These issues beg the question: What are alternative directions for
the next-generation of AI methods and systems that will run them?

Processing domains like analog, asynchronous, event-based, probabilistic, neuromorphic,
photonic, and quantum computing have all shown promise for faster, more efficient AI with new
capabilities through a complete shift in the way AI systems work.

The goal of this workshop is to kick off discussions about next-generation systems and methods that will help AI move forward, specifically through a realistic assessment of how these exotic emerging approaches for next-generation AI are making progress toward practical relevance and in what timeframes.

We want to help both experts and non-experts, believers and doubters, by achieving the
following goals:
(1) Educate …