G-HEMP: FAST MULTI-GPU PRIVATE INFERENCE FOR LARGE-SCALE GCNS WITH HOMOMORPHIC ENCRYPTION
Abstract
Homomorphic Encryption (HE) offers a promising solution for privacy-preserving Graph Convolutional Net- works (GCN) inference in untrusted cloud environments by enabling computation directly on encrypted data. This capability is particularly valuable in applications such as recommendation systems, financial analysis, and bioinformatics, where the data is subject to strict privacy requirements. However, applying HE to large-scale GCN inference introduces substantial computational and memory overhead, which significantly limits scalability and runtime performance. Although prior works have demonstrated promising results with CPU-based implementa- tions, these approaches remain constrained in terms of throughput and scalability due to redundant HE operations and high memory demands. In this work, we present G-HEMP, the first framework that leverages the power of multi-GPU systems to accelerate large-scale private GCN inference. G-HEMP introduces two key innovations: (i) a block-diagonal parallel packing technique that eliminates redundant data replication for encrypted adjacency matrices, achieving up to 4.41× latency speedup over traditional feature-wise packing; and (ii) a multi-GPU workload partitioning strategy that reduces peak memory usage by 50% and improves inference latency by up to 1.98×. By combining these techniques, the number of HE operations is significantly reduced, and the encrypted computation can be partitioned and efficiently distributed across multiple GPUs to maximize throughput and hardware utilization. Our G-HEMP framework is model-agnostic and scales seamlessly with large GCN inference tasks. Together, these contributions enable scalable and efficient privacy-preserving GCN inference, advancing the practicality of HE-based GCN analytics on modern heterogeneous hardware.