Curriculum-Guided Self-Supervised Representation Learning of Dynamic Heterogeneous Networks

Academic Background

In the real world, network data (such as social networks, citation networks, etc.) often contain multiple types of nodes and edges, and these network structures evolve dynamically over time. To better analyze these complex networks, researchers have proposed network embedding techniques, which aim to represent nodes and edges in a network as fixed-length vectors to facilitate subsequent analysis tasks such as node classification and link prediction. However, traditional network embedding models face numerous challenges when dealing with dynamic heterogeneous networks, particularly in effectively capturing the dynamic changes and heterogeneity of network structures.

In recent years, Transformer models have achieved remarkable success in the field of Natural Language Processing (NLP), but their application in network embedding is still in its infancy. Transformer models, through their self-attention mechanisms, can capture complex relationships in sequential data, providing new insights for network embedding. However, most existing Transformer models are designed for static or homogeneous networks, lacking effective support for dynamic heterogeneous networks.

To address this issue, this study proposes a new Transformer model—DHG-BERT (Dynamic Heterogeneous Graph BERT)—which combines curriculum learning and self-supervised learning strategies to more efficiently learn representations of dynamic heterogeneous networks. By introducing curriculum learning, the model can gradually transition from simple to complex network structures, thereby improving training efficiency and representation quality.

Source of the Paper

This paper is co-authored by Namgyu Jung, David Camacho, Chang Choi, and O.-Joun Lee. Namgyu Jung and Chang Choi are from the Department of Computer Engineering at Gachon University in South Korea, David Camacho is from the Department of Computer Systems Engineering at the Universidad Politécnica de Madrid in Spain, and O.-Joun Lee is from the Department of Artificial Intelligence at The Catholic University of Korea. The paper was accepted on March 11, 2025, and published in the journal Cognitive Computation, with the DOI 10.1007/s12559-025-10441-1.

Research Process

1. Data Preprocessing and Network Construction

This study uses a bibliographic network as an example to construct a dynamic heterogeneous network. The network contains three types of nodes: authors, papers, and venues, as well as three types of relationships: authors writing papers, papers published in venues, and papers citing other papers. The network evolves over time, with each node and edge having a timestamp recording its first appearance.

To represent the complex structure of the network, the researchers used meta-paths as inputs. A meta-path is a sequence of nodes of specific types that captures relationships between different nodes in the network. For example, the meta-path “author-paper-author” indicates that two authors co-authored a paper. The researchers extracted 71 meta-paths from citation data spanning 2008 to 2018 and used them as inputs to the model.

2. Model Structure

The DHG-BERT model is based on the ALBERT (A Lite BERT) architecture and has been modified for dynamic heterogeneous networks. The core idea of the model is to capture the heterogeneity and dynamism of network structures through self-supervised learning tasks. Specifically, the model proposes two self-supervised learning tasks:

  • Masked Meta-path Recovery (MMR): Similar to the Masked Language Model (MLM) in BERT, the MMR task requires the model to predict masked nodes in meta-paths. Through this task, the model can learn the co-occurrence relationships and heterogeneity between nodes.

  • Temporal Order Prediction (TOP): This task requires the model to predict the temporal order of meta-paths generated by the same node at different time points. Through this task, the model can capture the dynamic changes in network structures.

Additionally, the model introduces a curriculum learning strategy, gradually transitioning from simple to complex meta-paths to improve training efficiency.

3. Training and Fine-tuning

The model’s training is divided into three stages: pre-training, post-training, and fine-tuning.

  • Pre-training: The model learns the general topological structure and dynamic changes of the network through the MMR and TOP tasks. Pre-training starts with shorter meta-paths and gradually transitions to longer ones to help the model progressively understand complex network structures.

  • Post-training: In the post-training stage, the model focuses on learning network structures related to the target task. For example, in the task of predicting author collaborations, the model emphasizes meta-paths related to authors (e.g., “author-paper-author”).

  • Fine-tuning: In the fine-tuning stage, the model adapts to specific downstream tasks, such as link prediction, by adding an additional fully connected layer.

4. Experiments and Evaluation

The researchers evaluated the model’s performance by predicting future author collaborations. The experiments used the ArnetMiner dataset, which contains citation data from 2008 to 2018. The data from 2008 to 2013 were used for training, and the data from 2014 to 2018 were used for testing. The results showed that DHG-BERT achieved an average accuracy of 0.94 in predicting author collaborations, significantly outperforming existing network embedding models.

Main Results

  1. Model Performance: DHG-BERT performed exceptionally well in the task of predicting author collaborations, with an average accuracy of 0.94, surpassing existing models by 0.13 to 0.35. Particularly in the presence of dynamic attributes (e.g., future collaboration prediction), the model’s accuracy significantly improved.

  2. Effectiveness of Self-supervised Learning Tasks: Through the MMR and TOP tasks, the model effectively captured the heterogeneity and dynamism of network structures. Experiments showed that the model combining both tasks significantly outperformed models using only one of the tasks.

  3. Effectiveness of Curriculum Learning Strategy: The curriculum learning strategy significantly improved the model’s training efficiency and representation quality. By gradually transitioning from simple to complex meta-paths, the model gained a better understanding of the global network structure.

Conclusion and Significance

This study proposes a new Transformer model—DHG-BERT—which effectively learns representations of dynamic heterogeneous networks by combining curriculum learning and self-supervised learning strategies. Experimental results demonstrate that DHG-BERT performs exceptionally well in tasks such as predicting author collaborations, significantly outperforming existing network embedding models.

The scientific value of this study lies in providing a new approach to representation learning for dynamic heterogeneous networks, particularly in combining Transformer models with curriculum learning strategies. Additionally, the model has broad potential in practical applications such as social network analysis and bibliographic network analysis.

Research Highlights

  1. Novel Transformer Model: DHG-BERT is the first Transformer model specifically designed for dynamic heterogeneous networks, effectively capturing the heterogeneity and dynamism of network structures.

  2. Self-supervised Learning Tasks: Through the MMR and TOP tasks, the model learns co-occurrence relationships and dynamic changes between nodes, improving representation quality.

  3. Curriculum Learning Strategy: The curriculum learning strategy significantly enhances the model’s training efficiency, enabling it to gradually transition from simple to complex network structures.

  4. Application Value: The model has broad potential in practical applications such as social network analysis and bibliographic network analysis.

Other Valuable Information

The limitations of this study include the inability to handle newly emerging nodes and edges and the lack of consideration for node and edge attributes. Future research will explore how to address these issues through inductive representation learning and multi-modal Transformer models.