Moderator: Eiko Yoneki
Hyoukjun Kwon · Krishnakumar Nair · Jamin Seo · Jason Yik · Debabrata Mohapatra · Dongyuan Zhan · JINOOK SONG · Peter Capak · Peizhao Zhang · Peter Vajda · Colby Banbury · Mark Mazumder · Liangzhen Lai · Ashish Sirasao · Tushar Krishna · Harshit Khaitan · Vikas Chandra · Vijay Janapa Reddi
Real-time multi-task multi-model (MTMM) workloads, a new form of deep learning inference workloads, are emerging for applications areas like extended reality (XR) to support metaverse use cases. These workloads combine user interactivity with computationally complex machine learning (ML) activities. Compared to standard ML applications, these ML workloads present unique difficulties and constraints. Real-time MTMM workloads impose heterogeneity and concurrency requirements on future ML systems and devices, necessitating the development of new capabilities. This paper begins with a discussion of the various characteristics of these real-time MTMM ML workloads and presents an ontology for evaluating the performance of future ML hardware for XR systems. Next, we present XRBENCH, a collection of MTMM ML tasks, models, and usage scenarios that execute these models in three representative ways: cascaded, concurrent, and cascaded-concurrency for XR use cases. Finally, we emphasize the need for new metrics that capture the requirements properly. We hope that our work will stimulate research and lead to the development of a new generation of ML systems for XR use cases. XRBench is available as an open-source project: https://github.com/XRBench
Vidit Jain · Jatin Prakash · Deepak Saini · Jian Jiao · Ramachandran Ramjee · Manik Varma
The goal of Extreme Multi-label Classification (XC) is to learn representations that enable mapping input texts to the most relevant subset of labels selected from an extremely large label set, potentially in hundreds of millions. Given the extreme scale, conventional wisdom believes it is infeasible to train an XC model in an end-to-end manner. Thus, for training efficiency, several modular and sampling-based approaches to XC training have been proposed in the literature. In this paper, we identify challenges in the end-to-end training of XC models and devise novel optimizations that improve training speed over an order of magnitude, making end-to-end XC model training practical. Furthermore, we show that our end-to-end trained model, Renee, delivers state-of-the-art accuracy in a wide variety of XC benchmark datasets. Renee code will be released publicly.
Zhongming Yu · Guohao Dai · Shang Yang · Genghan Zhang · Hengrui Zhang · Feiwen Zhu · June Yang · Jishen Zhao · Yu Wang
Hypergraph Neural Network (HyperGNN) is an emerging type of Graph Neural Networks (GNNs) that can utilize hyperedges to model high-order relationships among vertices. Current GNN frameworks fail to fuse two message-passing steps from vertices to hyperedges and hyperedges to vertices, leading to high latency and redundant memory consumption. The following challenges need to be solved for efficient fusion in HyperGNNs: (1) Inefficient partition: hardware-efficient and workload-balanced partitions are required for parallel workers to process two consecutive message passing steps after fusion. (2) Workload-Agnostic Format: current data formats like Compressed Sparse Row (CSR) fail to represent a two-step computation workload. (3) Heavy writing conflicts: partitioning leads to heavy writing conflicts when updating the same vertex.To enable efficient fusion for HyperGNNs, we present HyperGef. HyperGef proposes an edge-split workload balance partition scheme to achieve higher efficiency and better workload balancing. To represent the workload after fusion and partition, HyperGef introduces a novel fusion workload-aware format. HyperGef also introduces a shared memory-aware grouping scheme to reduce writing conflicts. Extensive experiments demonstrate that our fused kernel outperforms the NVIDIA cuSPARSE kernel by 3.31x. By enabling efficient fusion for HyperGNNs, HyperGef achieves 2.25x to 3.99x end-to-end speedup on various HyperGNN models compared with state-of-the-art frameworks like DGL and PyG.