Federated Local Causal Structure Learning Algorithm
Intersection of Data Privacy and Causal Learning: Breakthrough in Federated Local Causal Structure Learning
With the rapid development of big data and artificial intelligence, analyzing and inferring causal relationships while ensuring data privacy in sensitive fields such as healthcare and finance has become a key challenge for academia and industry. The paper, “Federated Local Causal Structure Learning”, addresses this critical topic by introducing the FedLCS algorithm, designed to learn local causal structures in a federated learning environment. This research innovatively tackles the problem of causal inference while preserving data privacy, with broad practical applications in fields such as medicine and economics.
Research Background and Problem Definition
Causal Structure Learning (CSL) determines causal relationships between variables from observational data, often represented as Directed Acyclic Graphs (DAGs). In many practical scenarios, researchers are not required to learn the complete causal network, instead focusing on the causal relationships around a specific target variable—namely, its direct causes (Direct Causes) and direct effects (Direct Effects). This is referred to as Local Causal Structure Learning (LCS). Compared with constructing a global causal graph, LCS avoids resource waste and the complexity of building large graph models, making it especially effective in scenarios with limited data or high-dimensional variables.
However, traditional LCS methods usually require aggregating multiple datasets into a centralized location or directly sharing data between organizations. In today’s context of increasing data privacy concerns, this requirement has become unacceptable. For instance, due to privacy concerns, sharing patients’ electronic medical records between hospitals is challenging, limiting medical data analysis across organizations. This paper provides a novel solution to this dilemma: how to learn local causal structures in a federated learning framework while preserving data privacy.
Paper Origin and Publication Information
This research paper, authored by Kui Yu, Chen Rong, and others, comes from the School of Computer and Information at Hefei University of Technology and the School of Computer and Information Technology at Shanxi University. The paper was submitted in October 2023 and was published online on January 16, 2025, in the journal Science China Information Sciences.
Research Methodology and Workflow
The proposed FedLCS algorithm includes three key subroutines: Federated Local Skeleton Learning (FLSKE), Federated Local Skeleton Orientation (FLSORI), and Federated Local Extension-and-Backtracking Orientation (FLEORI), forming a complete causal inference framework.
1. Federated Local Skeleton Learning (FLSKE)
The FLSKE subroutine employs an innovative layer-wise federated learning strategy to learn the local skeleton (an undirected graph depicting relationships) of a target variable while preserving privacy. The main steps are as follows:
Phase 1: Initial Learning on Clients
Each client independently performs skeleton learning on their local dataset. The initial skeleton comprises undirected edges between the target variable and all candidate variables.Phase 2: Parameter Sharing and Aggregation
Clients send their learned local skeletons to a central server. The server aggregates these skeletons using a voting mechanism, retaining edges that exceed a predefined threshold, and sends the aggregated result back to the clients.Phase 3: Iterative Learning
The aggregated skeleton is used as the starting skeleton for the next layer of learning, and clients repeat the process. This cycle continues until the skeleton stabilizes or the number of candidate variables becomes smaller than the layer number.
2. Federated Local Skeleton Orientation (FLSORI)
After learning the skeleton, FLSORI leverages V-structures and applies the Meek rules to orient the undirected edges. Challenges include:
Extension of V-structure Information
FLSORI extends the skeleton by incorporating parent-child nodes of the candidate variables and relevant separation sets, enhancing the model’s ability to accurately identify more V-structures.Consistent Separation Set Learning
Clients independently identify separation sets, and the server aggregates them, selecting the set with the highest p-value as the consistent separation set. This significantly improves the precision of V-structure recognition.
3. Federated Local Extension-and-Backtracking Orientation (FLEORI)
For edges that remain unoriented, FLEORI recursively extends the parent-child nodes layer by layer, identifying new V-structures and backtracking their directional information to the target variable. The process avoids the complexity of global causal graph learning, focusing only on the causal directions needed locally.
Datasets and Data Analysis
The experiments included six benchmark Bayesian Network datasets (e.g., Alarm and Gene) and six synthetic datasets, with a total sample size of 5000. In the federated environment, data is evenly distributed across clients, with each client maintaining a unique subset to avoid direct data sharing.
The analysis used two metrics to evaluate FedLCS’s performance: F1 score (structural correctness) and Structural Hamming Distance (SHD, representing structural errors). Experimental results indicate that FedLCS outperforms its competitors in most cases. It achieves significantly higher structural correctness and lower structural error rates than other methods.
Research Results and Conclusions
Key Results:
- FedLCS significantly reduces the influence of noisy variables on causal skeleton learning through federated voting and aggregation strategies during the skeleton learning phase.
- The consistent separation set strategy dramatically improves the accuracy of V-structure identification; more efficient information exchange ensures higher accuracy in V-structures.
- Compared with global causal graph algorithms (e.g., FedPC and NOTears-ADMM), FedLCS performs exceptionally well on high-dimensional data and offers significant time advantages.
Significance of Research:
This study achieves a breakthrough at the intersection of data privacy and causal inference, enabling high-precision local causal learning without data sharing. The method is widely applicable to fields such as healthcare and finance—for instance, identifying chronic disease factors through collaborative data analysis across hospitals, thereby informing public health policies.
Highlights and Innovations:
- The first framework for local causal structure learning based on federated learning.
- An innovative design of layer-wise voting for skeleton learning and a consistent separation set strategy, significantly improving learning efficiency and precision.
- A federated extension-and-backtracking subroutine that dynamically updates edge directions during the extension process for the first time.
Future Directions and Improvements
The authors note that the current decision-making mechanism does not account for variations in data quality across clients. Future research could explore strategies for assigning differentiated weights based on client data quality. Additionally, further optimization of the edge selection algorithms could enhance robustness.
FedLCS opens a new research direction for local causal learning and has profound implications for causal inference in the era of data privacy protection.