PrivCore: Multiplication-Activation Co-Reduction for Efficient Private Inference

2025-03-14 Fri
Deep Neural Network Secure 2-Party Computation Private Inference Winograd Convolution Network Pruning ReLU Optimization
Efficient Private Inference in Deep Neural Networks: Breakthrough Research on the PrivCore FrameworkBackground IntroductionWith the rapid development of deep learning technology, Deep Neural Networks (DNNs) have been increasingly applied in fields such as image recognition, natural language processing, and medical diagnosis. However, as the demand for data privacy and model protection grows, how to perform efficient model inference while protecting user privacy has become an important research topic. Traditional privacy-preserving inference methods, such as Private Inference (PI) based on Secure Multi-Party Computation (MPC), although excellent in privacy protection, suffer from significant computational and communication overhead, making them difficult to widely adopt in practical applications.
In recent years, researchers have attempted to reduce the computational and communication overhead in private inference by optimizing network architectures. However, existing research has mostly focused on reducing the overhead of non-linear operations (e.g., ReLU activation functions) while neglecting the optimization of linear operations (e.g., convolutions). In fact, convolution operations account for the majority of communication overhead in private inference. Therefore, how to jointly optimize linear and non-linear operations while maintaining inference accuracy has become an urgent problem to solve.
Source of the PaperThe paper, titled “PrivCore: Multiplication-Activation Co-Reduction for Efficient Private Inference”, is co-authored by Zhi Pang, Lina Wang, Fangchao Yu, Kai Zhao, Bo Zeng, and Shuwang Xu from the School of Cyber Science and Engineering at Wuhan University. The research was published in the Neural Networks journal (Volume 187, Page 107307) in 2025. The study proposes a framework called PrivCore, which significantly improves the efficiency of private inference by jointly optimizing linear and non-linear operations.
Research Process and Details1. Research Objectives and Framework OverviewThe core objective of PrivCore is to reduce the computational and communication overhead in private inference by jointly optimizing convolution and ReLU operations while maintaining model inference accuracy. Specifically, the PrivCore framework consists of two main stages: linear optimization stage and non-linear optimization stage. In the linear optimization stage, PrivCore reduces the number of multiplications in convolution operations through Winograd convolution and structured pruning techniques. In the non-linear optimization stage, PrivCore automatically selects redundant ReLU activation functions through sensitivity analysis and replaces these ReLUs with polynomial approximations, thereby reducing the overhead of non-linear operations.
2. Linear Optimization Stage: Winograd Convolution and Structured Pruning2.1 Winograd ConvolutionWinograd convolution is a fast convolution algorithm that reduces the number of multiplications by increasing addition operations, thereby lowering communication overhead. PrivCore first converts standard convolution operations into the Winograd domain, leveraging the advantages of the Winograd algorithm to reduce the number of multiplications in convolution operations. Specifically, Winograd convolution is achieved through four steps: input transformation, weight transformation, element-wise matrix multiplication, and output transformation.
2.2 Structured PruningTo further reduce the number of multiplications in convolution operations, PrivCore proposes two structured pruning methods: Winograd-Aware Filter Pruning (WAFP) and Winograd-Aware Vector Pruning (WAVP). WAFP prunes filters in the spatial domain while preserving structured sparsity in the Winograd domain; WAVP further reduces the number of multiplications in convolution operations by pruning vectors in the Winograd domain.
WAFP: WAFP calculates the Winograd-aware importance score of filters and selects the least important filters for pruning. This method not only considers the importance of filters in the spatial domain but also takes into account the weight sparsity in the Winograd domain, thereby maintaining high model accuracy after pruning.
WAVP: WAVP groups weight vectors in the Winograd domain and prunes them based on the L2 norm of the vectors. WAVP supports two pruning modes: Cross-Kernel Pruning (WAVP-CK) and Cross-Filter Pruning (WAVP-CF). Experiments show that cross-filter pruning performs better under intensive pruning, so WAVP-CF mode is adopted in subsequent experiments.
3. Non-linear Optimization Stage: Sensitivity Analysis and Polynomial ApproximationIn the non-linear optimization stage, PrivCore automatically selects redundant ReLU activation functions through sensitivity analysis and replaces these ReLUs with polynomial approximations, thereby reducing the overhead of non-linear operations.
3.1 Sensitivity AnalysisPrivCore proposes a sensitivity-based method for evaluating the importance of ReLUs. Specifically, PrivCore parameterizes the ReLU function as a trainable mixed function and evaluates the impact of each ReLU on model performance by calculating the sensitivity score of ReLU parameters. A higher sensitivity score indicates that the ReLU has a greater impact on model performance, while a lower score suggests that the ReLU is redundant and can be removed.
3.2 Polynomial ApproximationFor ReLUs marked as redundant, PrivCore uses a trainable second-order polynomial for approximation. The polynomial coefficients are continuously optimized during training to minimize the intermediate representation difference between the original and optimized models. In this way, PrivCore reduces non-linear operations while maintaining model inference accuracy.
4. Experimental Results and ConclusionsPrivCore conducted extensive experiments on multiple models and datasets, validating its effectiveness. The experimental results show that PrivCore achieves a 2.2× reduction in communication and a 1.8% improvement in accuracy compared to SENet (ICLR 2023) on the CIFAR-100 dataset; on the ImageNet dataset, it achieves a 2.0× reduction in communication while maintaining the same accuracy compared to CoPriv (NeurIPS 2023).
Communication and Latency Optimization: PrivCore significantly reduces communication and latency overhead in private inference by jointly optimizing convolution and ReLU operations. Experiments show that PrivCore achieves the best trade-off between communication, latency, and accuracy on the CIFAR-100, Tiny-ImageNet, and ImageNet datasets.
Model Accuracy Preservation: Despite significantly reducing communication and latency overhead, PrivCore maintains inference accuracy comparable to existing methods on multiple datasets, and in some cases, even improves it.
Research Highlights and SignificanceJoint Optimization of Linear and Non-linear Operations: PrivCore is the first to propose reducing computational and communication overhead in private inference by jointly optimizing convolution and ReLU operations, filling a gap in existing research.
Winograd-Aware Structured Pruning: The Winograd-aware pruning method proposed by PrivCore can maintain structured sparsity in the Winograd domain, thereby reducing convolution operations while preserving model accuracy.
Sensitivity Analysis and Polynomial Approximation: PrivCore automatically selects redundant ReLUs through sensitivity analysis and replaces them with polynomial approximations, further reducing the overhead of non-linear operations.
ConclusionThe PrivCore framework significantly improves the efficiency of private inference by jointly optimizing linear and non-linear operations while maintaining model inference accuracy. This research not only provides a new solution for privacy-preserving inference but also offers new insights into the efficient optimization of deep neural networks. As the demand for data privacy and model protection continues to grow, the PrivCore framework is expected to play an important role in practical applications.