Toward Optimal Disease Surveillance with Graph-Based Active Learning

Toward Optimal Disease Surveillance with Graph-Based Active Learning

Academic Background

With the acceleration of globalization, the speed and scope of infectious disease transmission have significantly increased. How to effectively monitor and control the spread of infectious diseases has become a critical issue in public health. Traditional disease surveillance methods typically rely on large-scale testing and isolation measures. However, in resource-constrained settings, optimizing the allocation of testing resources to maximize information acquisition has become a challenge for policymakers. Especially in resource-poor regions, the uneven distribution of testing resources may lead to the continued spread of epidemics. Therefore, developing a strategy that maximizes monitoring effectiveness under limited resources is particularly important.

This study aims to optimize the allocation of testing resources in disease surveillance through graph-based and active learning methods. Specifically, the researchers model disease transmission as an undirected and unweighted graph, where nodes represent geographic locations, and edges represent the transmission paths of infectious diseases between these locations. By simulating the spread of epidemics, the researchers evaluate various node selection strategies and propose a new strategy called “Selection by Local Entropy (LE)” to maximize monitoring effectiveness under a limited testing budget.

Source of the Paper

This paper is co-authored by Joseph L.-H. Tsui, Mengyan Zhang, Prathyush Sambaturu, Simon Busch-Moreno, Marc A. Suchard, Oliver G. Pybus, Seth Flaxman, Elizaveta Semenova, and Moritz U. G. Kraemer, from institutions such as the University of Oxford, the University of California, Los Angeles, and Imperial College London. The paper was published on December 19, 2024, in the Proceedings of the National Academy of Sciences (PNAS), titled “Toward Optimal Disease Surveillance with Graph-Based Active Learning.”

Research Process

1. Disease Surveillance as a Node Classification Task

The researchers model the disease surveillance task as a node classification problem. Specifically, they use an undirected and unweighted graph to represent the mobility network between geographic locations, where nodes represent locations, and edges represent the transmission paths of infectious diseases between these locations. The researchers assume that the spread of the epidemic follows a stochastic Susceptible-Infected (SI) model, meaning that infection can only spread between nodes through edges. The simulated epidemic starts from a randomly selected single node and continues until a certain proportion of nodes are infected.

After simulating the spread of the epidemic, the researchers label the infection status of each node as a binary label (0 or 1), where 1 indicates infection and 0 indicates no infection. The researchers assume that the timescale of epidemic spread is much longer than the timescale of testing resource deployment, so the epidemic distribution can be considered static throughout the monitoring process.

2. Test Allocation as an Active Learning Task

Under a limited testing budget, the researchers frame the allocation of testing resources as an active learning task. Specifically, they use existing active learning strategies (such as node entropy and Bayesian active learning) to select nodes for testing and update the estimated infection probabilities of unobserved nodes based on the test results. The researchers propose a new strategy called “Selection by Local Entropy (LE),” which considers not only the prediction uncertainty of the candidate node itself but also the prediction uncertainty of its surrounding nodes.

3. Policy Evaluation

The researchers evaluate the performance of various node selection strategies under different network structures and epidemic scenarios. Specifically, they use synthetic networks (such as periodic lattice graphs, random graphs generated by the Barabási-Albert model, etc.) and networks based on real human mobility data (such as provincial-level mobility data in Italy and global air traffic data) for simulation experiments. By comparing the performance of different strategies under limited testing budgets, the researchers assess their effectiveness.

Main Results

1. Disease Surveillance on an Aperiodic Lattice Graph

The researchers evaluate the performance of different strategies on an aperiodic lattice graph. The results show that under a small testing budget, the Selection by Local Entropy (LE) strategy outperforms the Node Entropy (NE) and Bayesian Active Learning by Disagreement (BALD) strategies. As the testing budget increases, the performance of the Node Entropy strategy gradually surpasses that of the Local Entropy strategy, especially when the testing budget is large, where the Node Entropy strategy rapidly approaches perfect predictive performance.

2. Disease Surveillance on Synthetic Graphs

The researchers evaluate the performance of different strategies on various synthetic graphs. The results show that, except for the Bayesian Active Learning and Reactive-Infected strategies, other strategies outperform random selection in most epidemic scenarios. Particularly on random graphs generated by the Barabási-Albert model, graph-based strategies (such as degree centrality and PageRank centrality) outperform uncertainty-based strategies in the early and intermediate stages of the epidemic.

3. Disease Surveillance on Empirical Human Mobility Networks

The researchers evaluate the performance of different strategies on networks based on real human mobility data. The results show that the Selection by Local Entropy strategy performs well under a small testing budget, but as the testing budget increases, the performance of the Node Entropy strategy gradually surpasses that of the Local Entropy strategy. Particularly on networks generated from global air traffic data, graph-based strategies perform well in the early stages of the epidemic but decline in performance in the later stages.

Conclusion

This paper proposes a framework for optimizing the allocation of testing resources in disease surveillance through graph-based and active learning methods. The results show that under a limited testing budget, the Selection by Local Entropy strategy can effectively improve monitoring effectiveness. Especially in the early stages of an epidemic and in networks with a high degree of structural order, the Local Entropy strategy performs well. However, as the testing budget increases, the performance of the Node Entropy strategy gradually surpasses that of the Local Entropy strategy.

This study provides new insights for disease surveillance in resource-constrained settings, particularly in coordinating surveillance strategies globally, helping policymakers allocate testing resources more effectively and reduce the uncertainty of epidemic spread.

Research Highlights

  1. Innovative Strategy: This paper proposes a new node selection strategy—Selection by Local Entropy (LE)—which considers not only the prediction uncertainty of the candidate node itself but also the prediction uncertainty of its surrounding nodes, thereby maximizing monitoring effectiveness under a limited testing budget.
  2. Multi-Scenario Evaluation: The researchers evaluate the performance of different strategies under various network structures and epidemic scenarios, covering synthetic networks and networks generated from real human mobility data, ensuring the broad applicability of the results.
  3. Practical Application Value: The findings of this study can provide effective strategic support for disease surveillance in resource-constrained settings, particularly in coordinating surveillance strategies globally, helping policymakers allocate testing resources more effectively and reduce the uncertainty of epidemic spread.

Other Valuable Information

This study also points out future research directions, including considering more complex transmission models (such as SEIR models), more realistic mobility networks (such as directed and weighted graphs), and more practical assumptions about testing resource deployment (such as testing noise and delayed feedback). These extensions will further enhance the practicality and applicability of the model.