Toward Optimal Disease Surveillance with Graph-Based Active Learning

2024-12-28 Sat
disease surveillance active learning network dynamics epidemiology public health
Toward Optimal Disease Surveillance with Graph-Based Active LearningAcademic BackgroundWith the acceleration of globalization, the speed and scope of infectious disease transmission have significantly increased. How to effectively monitor and control the spread of infectious diseases has become a critical issue in public health. Traditional disease surveillance methods typically rely on large-scale testing and isolation measures. However, in resource-constrained settings, optimizing the allocation of testing resources to maximize information acquisition has become a challenge for policymakers. Especially in resource-poor regions, the uneven distribution of testing resources may lead to the continued spread of epidemics. Therefore, developing a strategy that maximizes monitoring effectiveness under limited resources is particularly important.
This study aims to optimize the allocation of testing resources in disease surveillance through graph-based and active learning methods. Specifically, the researchers model disease transmission as an undirected and unweighted graph, where nodes represent geographic locations, and edges represent the transmission paths of infectious diseases between these locations. By simulating the spread of epidemics, the researchers evaluate various node selection strategies and propose a new strategy called “Selection by Local Entropy (LE)” to maximize monitoring effectiveness under a limited testing budget.
Source of the PaperThis paper is co-authored by Joseph L.-H. Tsui, Mengyan Zhang, Prathyush Sambaturu, Simon Busch-Moreno, Marc A. Suchard, Oliver G. Pybus, Seth Flaxman, Elizaveta Semenova, and Moritz U. G. Kraemer, from institutions such as the University of Oxford, the University of California, Los Angeles, and Imperial College London. The paper was published on December 19, 2024, in the Proceedings of the National Academy of Sciences (PNAS), titled “Toward Optimal Disease Surveillance with Graph-Based Active Learning.”
Research Process1. Disease Surveillance as a Node Classification TaskThe researchers model the disease surveillance task as a node classification problem. Specifically, they use an undirected and unweighted graph to represent the mobility network between geographic locations, where nodes represent locations, and edges represent the transmission paths of infectious diseases between these locations. The researchers assume that the spread of the epidemic follows a stochastic Susceptible-Infected (SI) model, meaning that infection can only spread between nodes through edges. The simulated epidemic starts from a randomly selected single node and continues until a certain proportion of nodes are infected.
After simulating the spread of the epidemic, the researchers label the infection status of each node as a binary label (0 or 1), where 1 indicates infection and 0 indicates no infection. The researchers assume that the timescale of epidemic spread is much longer than the timescale of testing resource deployment, so the epidemic distribution can be considered static throughout the monitoring process.
2. Test Allocation as an Active Learning TaskUnder a limited testing budget, the researchers frame the allocation of testing resources as an active learning task. Specifically, they use existing active learning strategies (such as node entropy and Bayesian active learning) to select nodes for testing and update the estimated infection probabilities of unobserved nodes based on the test results. The researchers propose a new strategy called “Selection by Local Entropy (LE),” which considers not only the prediction uncertainty of the candidate node itself but also the prediction uncertainty of its surrounding nodes.
3. Policy EvaluationThe researchers evaluate the performance of various node selection strategies under different network structures and epidemic scenarios. Specifically, they use synthetic networks (such as periodic lattice graphs, random graphs generated by the Barabási-Albert model, etc.) and networks based on real human mobility data (such as provincial-level mobility data in Italy and global air traffic data) for simulation experiments. By comparing the performance of different strategies under limited testing budgets, the researchers assess their effectiveness.
Main Results1. Disease Surveillance on an Aperiodic Lattice GraphThe researchers evaluate the performance of different strategies on an aperiodic lattice graph. The results show that under a small testing budget, the Selection by Local Entropy (LE) strategy outperforms the Node Entropy (NE) and Bayesian Active Learning by Disagreement (BALD) strategies. As the testing budget increases, the performance of the Node Entropy strategy gradually surpasses that of the Local Entropy strategy, especially when the testing budget is large, where the Node Entropy strategy rapidly approaches perfect predictive performance.
2. Disease Surveillance on Synthetic GraphsThe researchers evaluate the performance of different strategies on various synthetic graphs. The results show that, except for the Bayesian Active Learning and Reactive-Infected strategies, other strategies outperform random selection in most epidemic scenarios. Particularly on random graphs generated by the Barabási-Albert model, graph-based strategies (such as degree centrality and PageRank centrality) outperform uncertainty-based strategies in the early and intermediate stages of the epidemic.
3. Disease Surveillance on Empirical Human Mobility NetworksThe researchers evaluate the performance of different strategies on networks based on real human mobility data. The results show that the Selection by Local Entropy strategy performs well under a small testing budget, but as the testing budget increases, the performance of the Node Entropy strategy gradually surpasses that of the Local Entropy strategy. Particularly on networks generated from global air traffic data, graph-based strategies perform well in the early stages of the epidemic but decline in performance in the later stages.
ConclusionThis paper proposes a framework for optimizing the allocation of testing resources in disease surveillance through graph-based and active learning methods. The results show that under a limited testing budget, the Selection by Local Entropy strategy can effectively improve monitoring effectiveness. Especially in the early stages of an epidemic and in networks with a high degree of structural order, the Local Entropy strategy performs well. However, as the testing budget increases, the performance of the Node Entropy strategy gradually surpasses that of the Local Entropy strategy.
This study provides new insights for disease surveillance in resource-constrained settings, particularly in coordinating surveillance strategies globally, helping policymakers allocate testing resources more effectively and reduce the uncertainty of epidemic spread.
Research HighlightsInnovative Strategy: This paper proposes a new node selection strategy—Selection by Local Entropy (LE)—which considers not only the prediction uncertainty of the candidate node itself but also the prediction uncertainty of its surrounding nodes, thereby maximizing monitoring effectiveness under a limited testing budget.
Multi-Scenario Evaluation: The researchers evaluate the performance of different strategies under various network structures and epidemic scenarios, covering synthetic networks and networks generated from real human mobility data, ensuring the broad applicability of the results.
Practical Application Value: The findings of this study can provide effective strategic support for disease surveillance in resource-constrained settings, particularly in coordinating surveillance strategies globally, helping policymakers allocate testing resources more effectively and reduce the uncertainty of epidemic spread.
Other Valuable InformationThis study also points out future research directions, including considering more complex transmission models (such as SEIR models), more realistic mobility networks (such as directed and weighted graphs), and more practical assumptions about testing resource deployment (such as testing noise and delayed feedback). These extensions will further enhance the practicality and applicability of the model.