Session
Privacy and security
Mission B4 & B12
Accelerating ReLU for MPC-Based Private Inference with a Communication-Efficient Sign Estimation
Kiwan Maeng · G. Edward Suh
Secure multi-party computation (MPC) allows users to offload machine learning inference on untrusted servers without having to share their privacy-sensitive data. Despite their strong security properties, MPC-based private inference has not been widely adopted due to their high communication overhead, mostly incurred when evaluating non-linear layers.This paper presents HummingBird, an MPC framework that reduces the ReLU communication overhead significantly. HummingBird leverages an insight that determining whether a value is positive or negative mostly does not need a full-bit communication.With its theoretical analyses and an efficient search engine, HummingBird discards 66--72% of the bits during ReLU without altering the outcome, and discards 87--91% when some accuracy can be degraded. On a realistic MPC setup, HummingBird achieves on average 2.03--2.67$\times$ end-to-end speedup without introducing any errors, and up to 8.42$\times$ when some accuracy degradation is tolerated.
ACCURATE LOW-DEGREE POLYNOMIAL APPROXIMATION OF NON-POLYNOMIAL OPERATORS FOR FAST PRIVATE INFERENCE IN HOMOMORPHIC ENCRYPTION
Jingtian Dang · Jianming Tong · Anupam Golder · Cong "Callie" Hao · Arijit Raychowdhury · Tushar Krishna
As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU and MaxPooling) with high-degree Polynomial Approximated Function (PAF).We propose SmartPAF, a framework to replace non-polynomial operators with low-degree PAF and then recover the accuracy of PAF-approximated model through four techniques: (1) Coefficient Tuning (CT) -- adjust PAF coefficients based on the input distributions before training, (2) Progressive Approximation (PA) -- progressively replace one non-polynomial operator at a time followed by a fine-tuning, (3) Alternate Training (AT) -- alternate the training between PAFs and other linear operators in the decoupled manner, and (4) Dynamic Scale (DS) / Static Scale (SS) -- dynamically scale PAF input value within [-1, 1] in training, and fix the scale as the running max value in FHE deployment.The synergistic effect of CT, PA, AT, and DS/SS enables SmartPAF to enhance the accuracy of the various models approximated by PAFs with various low degrees under multiple datasets. For ResNet-18 under ImageNet-1k, the Pareto-frontier spotted by SmartPAF in latency-accuracy tradeoff space achieves 1.42X ~ 13.64X accuracy improvement and 6.79X~14.9X speedup than prior works. Further, SmartPAF enables a 14-degree PAF to achieve a 7.81X speedup compared to the 27-degree PAF obtained by minimax approximation with the same 69.4% post-replacement accuracy. Our code is available at https://anonymous.4open.science/r/SmartPAF-64E1
Proteus: Preserving Model Confidentiality during Graph Optimizations
Yubo Gao · Maryam Haghifam · Christina Giannoula · Renbo Tu · Gennady Pekhimenko · Nandita Vijaykumar
Deep learning (DL) models have revolutionized numerous domains, yet optimizing them for computational efficiency remains a challenging endeavor. Development of new DL models typically involves two parties: the model developers and performance optimizers. The exchange between the parties often necessitates exposing the model architecture and computational graph to the optimizers. However, this exposure is undesirable since the model architecture is an important intellectual property, and its innovations require significant investments and expertise. During the exchange, the model is also vulnerable to adversarial attacks via model stealing.This paper presents Proteus, a novel mechanism that enables model optimization by an independent party while preserving the confidentiality of the model architecture. Proteus obfuscates the protected model by partitioning its computational graph into subgraphs and concealing each subgraph within a large pool of generated realistic subgraphs that cannot be easily distinguished from the original. We evaluate Proteus on a range of DNNs, demonstrating its efficacy in preserving confidentiality without compromising performance optimization opportunities. Proteus effectively hides the model as one alternative among up to $10^{32}$ possible model architectures, and is resilient against attacks with a learning-based adversary. We also demonstrate that heuristicbased and manual approaches are ineffective in identifying the protected model.To our knowledge, Proteus is the first work that tackles the challenge of model confidentiality during performance optimization. Proteus will be open-sourced for direct use and experimentation, with easy integration with compilers such as ONNXRuntime.