SysML4Health: Scalable Systems for ML-driven Analytics in Healthcare

Alexey Tumanov, Jimeng Sun, Tushar Krishna, Vivek Sarkar, Dawn Song


"This workshop focuses on the challenges involved in building integrated scalable distributed systems for the healthcare analytics domains. Healthcare analytics offers a unique opportunity to explore scalable system design since there has been a tectonic shift in the ability of medical institutions to capture and store unprecedented amount of structured and unstructured medical data, including the new ability to stream unstructured medical data in real time. This shift has already contributed to an ecosystem of Machine Learning (ML) models being trained for a variety of clinical tasks. However, new approaches are required to build systems that can develop and deploy ML models based on distributed healthcare data that must necessarily be accessed with privacy-preserving constraints.

The goal of this workshop is to attract leading researchers to share and discuss their latest results involving approaches to building scalable platforms for privacy-aware collaborative learning and inference that can be applicable to the domain of healthcare analytics. The scope of the workshop includes (but is not limited to) the following challenges:
* Scalable and distributed learning
* Continuous federated learning with privacy constraints
* Enforcing soft real-time constraints for streaming data analytics
* Specialized heterogeneous hardware for learning and inference
* Scalable runtime and resource allocation systems
* Productive systems for developing scalable data analytics applications"

Chat is not available.

Timezone: »


Fri 7:45 a.m. - 8:00 a.m.
Intro and Welcome (Talk)
Alexey Tumanov
Fri 8:00 a.m. - 8:30 a.m.

In the world of children's hospitals data exist in silos. Kids go undiagnosed with brain cancer or die on the operating room table because MRI scans or echocardiograms are difficult to share. While AI, in particular deep learning holds promise in diagnosing many conditions, there is little data to fuel training these algorithms. Silos of data were also a challenge in the world of computing until 1994, when the Internet connected 1,000,000 computing machines. As we know that changed our consumer lives. Our moon shot mission is to connect all 1,000,000 healthcare machines in all the children's hospitals in the world, and enable software applications and data to change children's lives. This talk will highlight four key priorities in developing AI in medicine. I’ll show our scalable high performance, global architecture to develop and deploy ML models based on distributed healthcare data which was engineered for security and privacy. Finally I’ll discuss how the architecture addresses the four key areas.

Tim Chou
Fri 8:30 a.m. - 9:00 a.m.

ML systems continue to advance at a rapid pace and have the potential to revolutionize healthcare. Areas of medicine where large data sets have the greatest opportunity to benefit from this technology. Intensive care is one such area that generates large, multi-modal data and may see ML directly impact. Areas of clinical care that may be particularly suited to ML will be discussed, and what pain points exist for clinicians that can be solved by ML. A discussion of specific conditions that can be predicted by use of ML systems sill also be discussed.

Kevin Maher
Fri 9:00 a.m. - 9:30 a.m.

Dina Katabi is the Andrew & Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT, and the Director of the MIT Center for Wireless Networks and Mobile Computing. Professor Katabi is a MacArthur Fellow and a Member of the National Academy of Engineering. She received her PhD and MS degrees from MIT in 2003 and 1999, and her Bachelor of Science from Damascus University in 1995. Her research interests span wireless and mobile systems, health IoT, and applied machine learning. She develops new technologies, algorithms, and systems that provide non-invasive health monitoring, enable smart homes, improve WiFi and cellular performance, and deliver new applications that are not feasible given today's technologies. She has received multiple prestigious awards including the ACM Prize in Computing, the ACM Grace Murray Hopper Award, two SIGCOMM Test of Time Awards, a Sloan Fellowship, the IEEE William R. Bennett prize, and multiple best paper awards. Several start-ups have been spun out of Katabi's lab.

Dina Katabi
Fri 9:30 a.m. - 10:00 a.m.
Brandon Westover, "Automated Diagnosis of Epilepsy: Challenges and Opportunities" (talk)
Brandon Westover
Fri 10:00 a.m. - 10:30 a.m.
 link »

Hallway track on gathertown! Equipment/zoom/connection testing for Systems session speakers

Fri 10:30 a.m. - 11:00 a.m.

In this talk, we address the problem of achieving "secure and resilient autonomy" in a cloud-backed distributed edge-AI environment. We call the latter paradigm "swarm-AI", with principles drawn from bio-inspired swarm intelligence. We give examples of diverse problem domains (including healthcare) where ¡°swarm-AI¡± is meaningfully applicable. We point out the challenges in making such systems reliable and secure, while meeting targeted performance and energy efficiency metrics. We will present our current solution strategy and initial simulation/emulation based results. In conclusion, we will examine some of the key research challenges going forward in the topic area addressed in this talk. In particular, we will draw upon the insights gained from the evidence of ¡°efficient resilience¡± manifested in robust, self-aware biological systems (in particular, human beings).

Pradip Bose
Fri 11:00 a.m. - 11:30 a.m.

This presentation will discuss the broad research and application of artificial intelligence at Oak Ridge National Laboratory. It will also discuss research for two specific applications, including scalable imaging using ORNL’s Summit supercomputer and the use of machine learning to develop robust digital twins for accelerator operations at ORNL’s Spallation Neutron Source.

David Womble
Fri 11:30 a.m. - 12:00 p.m.

Memory mapping is the standard technique for accessing external data objects as if in memory. With the increasing diversity of storage options, such as persistent memory, locally attached high performance SSDs, and network storage, memory mapping presents a uniform interface for applications to access out of core data sets. In this talk, I will discuss two approaches to efficient memory-mapped search of genetic data in persistent memory/storage.

The UMap library provides a memory mapping interface to external data sets. As a user level library, UMap can be easily adapted to application-specific access patterns and and to storage characteristics. This flexibility is not possible with system-wide services like mmap which are optimized for generality. UMap has been integrated into the Livermore Metagenomics Analysis Toolkit (LMAT) and improves performance by 15% over system mmap.

The second approach creates a hardware pipeline to efficiently search an in-memory key/value store and discuss its use to find k-mers. K-mer search is the first step in LMAT's metagenomic analysis to collect taxonomy information associated with k-mers found in a metagenomic sample. We find that hardware acceleration can speed up k-mer look up by 4X to 10X over software. Using an FPGA emulator, we can assess the performance impact of higher latency persistent memory on this important processing step.

Maya Gokhale
Fri 12:00 p.m. - 12:30 p.m.

Precision health can transform medicine over the next few decades. We can detect cancer several years earlier through simple blood tests, without invasive biopsies. We can tailor treatment plans based on mutations in a cancer cell. We can detect rare genetic disorders, assess disease risks, and intervene early. We can identify infectious pathogens early enough to prevent pandemics and avoid indiscriminate use of broad-spectrum antibiotics. Better drugs could be discovered by understanding the biological mechanisms of complex diseases such as Alzheimer’s. This talk will take a deeper look at computing applications that drive precision health and discuss systems challenges.

Reetu Das
Fri 12:30 p.m. - 1:00 p.m.

The AI transformation is spurring a virtuous cycle of compute which will impact not just how we do computing, but what computing can do for us. In this talk, I will discuss some of the emerging application opportunities enabled by AI, and system-level implications that lie at the heart of this intersection of traditional high-performance computing with emerging data-intensive computing.

Pradeep Dubey
Fri 1:00 p.m. - 1:30 p.m.

In this talk I will review several recent results that (I think) will help shape future development of practical MPC (secure multiparty computation) and ZK (zero-knowledge proofs). Namely, I will talk about Stacked Garbling, a new technique allowing to evaluate garbled circuits (GC) conditionals roughly at the cost of a single branch, rather than of all branches as previously widely believed. Combined with more efficient RAM, which I will also briefly review, this facilitates transition of MPC and ZK from circuit to RAM-machine evaluation. I will also briefly discuss applications of MPC and ZK to ML and health.

Vlad Kolesnikov
Fri 1:30 p.m. - 1:45 p.m.
Wrap up (talk)
Alexey Tumanov
Fri 1:45 p.m. - 2:00 p.m.
Lightning poster talks (talk)
Alind Khare, Manas Sahni, Shreya Varshini, Luis E Pastrana Leones, Yanbo Xu
Fri 2:00 p.m. - 4:00 p.m.
 link »

Poster session on gathertown.