Learn the Global Prompt in the Low-Rank Tensor Space for Heterogeneous Federated Learning
Academic Background
With the increasing complexity of artificial intelligence (AI) models and the growing demand for data privacy protection, Federated Learning (FL) has emerged as a hot research topic as a distributed machine learning paradigm. Federated Learning allows multiple clients to collaboratively train a global model without sharing local data, thereby enhancing the model’s generalization ability while protecting data privacy. However, FL faces three major challenges in practical applications: 1) excessive communication burden due to a large number of model parameters; 2) performance degradation of the global model caused by non-independent and identically distributed (Non-IID) data; and 3) the ineffectiveness of traditional federated aggregation methods in the presence of model heterogeneity.
To address these challenges, this paper proposes an innovative method called FedGPT, which learns the global prompt in a low-rank tensor space, effectively tackling the aforementioned issues. Specifically, FedGPT uses prompts instead of model parameters as the carrier of local knowledge, significantly reducing communication volume. At the same time, Tensor Singular Value Decomposition (T-SVD) is employed to extract cross-client global information while eliminating the impact of client-specific information. Additionally, FedGPT can handle model heterogeneity, enabling knowledge transfer between local models of different architectures through prompts, thereby improving overall performance.
Source of the Paper
This paper is co-authored by Lele Fu, Sheng Huang, Yuecheng Li, Chuan Chen, Chuanfu Zhang, and Zibin Zheng, who are affiliated with the School of Systems Science and Engineering and the School of Computer Science and Engineering at Sun Yat-sen University. The paper was published in 2025 in the journal Neural Networks under the title Learn the Global Prompt in the Low-Rank Tensor Space for Heterogeneous Federated Learning.
Research Process
1. Research Background and Problem Definition
The goal of federated learning is to collaboratively train a global model through multiple clients, but it faces three major challenges in practical applications: communication burden, data heterogeneity, and model heterogeneity. This paper proposes the FedGPT method, aiming to address these issues through prompt learning and low-rank tensor decomposition.
2. Combining Prompt Learning with Federated Learning
The core idea of FedGPT is to use prompts as the medium for information interaction between clients and the server. Prompts are learnable embeddings that require only a small number of parameters to adapt pre-trained models to new tasks. In the federated learning scenario, each client receives a global prompt from the server and trains it on local data. After training, the client uploads the local prompt to the server, which extracts global information through low-rank tensor decomposition and updates the global prompt.
3. Low-Rank Tensor Decomposition
To address data heterogeneity, FedGPT stacks prompts from different clients into third-order tensors and performs Tensor Singular Value Decomposition (T-SVD) on them. T-SVD can extract the principal components (i.e., global information) from the tensor while discarding redundant client-specific information. The specific steps are as follows: 1. Stack the prompts from clients into third-order tensors. 2. Perform T-SVD decomposition on the tensor to obtain orthogonal tensors and singular value tensors. 3. Retain the principal components in the singular value tensor and discard redundant components. 4. Generate the global prompt through weighted averaging.
4. Handling Model Heterogeneity
FedGPT enables knowledge transfer between heterogeneous models through prompt learning. Since prompts require only a small number of parameters, local models of different architectures can effectively exchange information through prompts, overcoming the limitations of traditional federated aggregation methods in scenarios with model heterogeneity.
5. Experimental Design and Results
Experiments were conducted on three real-world datasets—CIFAR10, CIFAR100, and Flower102—to validate the effectiveness of FedGPT. The results demonstrate that FedGPT performs exceptionally well in both data heterogeneity and model heterogeneity scenarios, outperforming other state-of-the-art federated learning methods. Specific results are as follows: 1. Data Heterogeneity Experiments: FedGPT exhibits strong robustness under different degrees of heterogeneity, outperforming methods such as FedAvg, FedProx, and Scaffold. 2. Model Heterogeneity Experiments: FedGPT effectively handles knowledge transfer between heterogeneous models, outperforming methods such as FedMD and FedProto. 3. Communication Efficiency: FedGPT’s communication volume is only 3% of FedAvg’s, significantly reducing the communication burden.
Key Results
1. Data Heterogeneity Experiment Results
On the CIFAR10 dataset, when the heterogeneity parameter β is 0.3, FedGPT achieves a classification accuracy of 85.26%, significantly higher than FedAvg’s 75.11%. As β increases, FedGPT’s performance gradually improves, reaching an accuracy of 88.57% when β is 1.
2. Model Heterogeneity Experiment Results
On the CIFAR100 dataset, FedGPT achieves a classification accuracy of 66.51% in the heterogeneous model scenario, outperforming FedMD’s 64.54% and FedProto’s 62.33%.
3. Communication Efficiency
FedGPT’s communication volume is only 3% of FedAvg’s, significantly reducing the communication burden. For example, on the CIFAR10 dataset, FedGPT’s communication volume is 0.31MB, while FedAvg’s is 11.46MB.
Conclusions and Significance
The proposed FedGPT method effectively addresses the three major challenges in federated learning—communication burden, data heterogeneity, and model heterogeneity—through prompt learning and low-rank tensor decomposition. Experimental results demonstrate that FedGPT performs exceptionally well on multiple datasets, outperforming other advanced federated learning methods. Additionally, FedGPT significantly improves communication efficiency, providing a feasible solution for practical applications.
The innovation of FedGPT is mainly reflected in the following two aspects: 1. Using prompts as knowledge carriers and extracting global information through T-SVD, achieving efficient communication and overcoming the negative impact of data heterogeneity. 2. Exploring the application of prompt learning in model heterogeneity scenarios, providing a new solution for knowledge transfer between heterogeneous models.
Research Highlights
- Efficient Communication: FedGPT significantly reduces communication volume through prompt learning, with communication volume being only 3% of FedAvg’s.
- Handling Data Heterogeneity: By extracting global information through low-rank tensor decomposition, FedGPT effectively addresses the challenges posed by data heterogeneity.
- Handling Model Heterogeneity: FedGPT enables knowledge transfer between heterogeneous models, providing a new solution for federated learning in model heterogeneity scenarios.
- Experimental Validation: Experimental results on multiple real-world datasets demonstrate that FedGPT performs exceptionally well in both data heterogeneity and model heterogeneity scenarios, outperforming other advanced federated learning methods.
Other Valuable Information
This paper also provides a detailed analysis of the computational complexity of FedGPT and proves its convergence when a certain number of communication rounds are satisfied. Additionally, the limitations of FedGPT are discussed, such as the high computational complexity of T-SVD when processing large-scale images, which may affect the algorithm’s execution efficiency. Future research can further explore how to optimize the computational efficiency of T-SVD and how to better align semantic information from different clients in prompt learning.