International Multi-Institutional External Validation of Preoperative Risk Scores for 30-Day In-Hospital Mortality in Paediatric Patients
External Validation of Preoperative Risk Scores for 30-Day In-Hospital Mortality in Pediatric Patients
Academic Background
Although the perioperative mortality rate in pediatric patients is low (<0.5%), it remains a significant issue in clinical practice. Traditional risk assessment tools, such as the American Society of Anesthesiologists (ASA) Physical Status (ASA PS) score, are concise but fail to comprehensively reflect the individualized risk of pediatric patients. To better predict perioperative mortality in pediatric patients, researchers have developed various risk prediction scoring models. These models incorporate multiple factors, including patient demographics, comorbidities, preoperative physiological data, medication use, and surgical characteristics, aiming to support clinical decision-making.
However, the application of these risk scoring models in clinical practice requires external validation to ensure their effectiveness across different times, locations, and populations. In previous studies, only a few pediatric perioperative mortality risk scoring models have undergone external validation. Therefore, the primary objective of this study was to externally validate two risk scoring models for 30-day in-hospital mortality in pediatric patients using data from the Multicenter Perioperative Outcomes Group (MPOG) registry and to recalibrate these models.
Source of the Paper
This paper was co-authored by researchers from multiple institutions, with the primary authors including Virginia E. Tangel (Erasmus University Medical Centre, Rotterdam, Netherlands; Weill Cornell Medicine, New York, NY, USA), Sanne E. Hoeks (Erasmus University Medical Centre, Rotterdam, Netherlands), Robert Jan Stolker (Erasmus University Medical Centre, Rotterdam, Netherlands), and others. The paper was published online on October 29, 2024, in the British Journal of Anaesthesia (BJA), with the DOI 10.1016/j.bja.2024.09.003.
Research Process
Data Source and Study Population
This study utilized data from the MPOG database, which includes anesthesia records from multiple hospitals in the United States and the Netherlands, covering patient comorbidities, medication use, vital signs, surgical types, and perioperative outcomes. The study included pediatric patients under 18 years of age from October 1, 2015, to December 31, 2020, excluding cardiac surgeries and diagnostic imaging procedures. Ultimately, the study included 606,488 cases from 56 hospitals.
External Validation of Risk Scoring Models
This study primarily validated two risk scoring models: the Pediatric Risk Assessment (PRAM) Score and the Intrinsic Surgical Risk Score. Both models were developed based on data from the American College of Surgeons National Surgical Quality Improvement Program-Pediatric (ACS NSQIP-P) and were externally validated using MPOG data in this study.
1. External Validation of the PRAM Score
The PRAM score model includes multiple predictive variables, such as urgent surgery, respiratory disease, congenital heart disease, and preoperative acute or chronic kidney disease. Due to the absence of certain variables (e.g., preoperative cardiopulmonary resuscitation) in the MPOG database, these variables were omitted during validation. The performance of the PRAM score in external validation was as follows: - AUROC (Area Under the Receiver Operating Characteristic Curve): 0.856 (95% CI: 0.844-0.869) - AUC-PR (Area Under the Precision-Recall Curve): 0.008
Although the PRAM score showed good calibration at low mortality probabilities, its performance was poor at high mortality probabilities. Decision curve analysis indicated limited clinical utility for the PRAM score.
2. External Validation of the Intrinsic Surgical Risk Score
The Intrinsic Surgical Risk Score model includes variables such as neonatal status, weight <5kg, and ASA PS score. Due to the inability to reconstruct the “intrinsic surgical risk” variable in the MPOG database, this variable was omitted during validation. The performance of the Intrinsic Surgical Risk Score in external validation was as follows: - AUROC: 0.925 (95% CI: 0.914-0.936) - AUC-PR: 0.085
Compared to the PRAM score, the Intrinsic Surgical Risk Score demonstrated better discrimination but still resulted in a large number of false positives. Decision curve analysis also indicated limited clinical utility for this score.
Recalibration of the Models
To further improve the predictive ability of the models, the study recalibrated both the PRAM score and the Intrinsic Surgical Risk Score. After recalibration, the PRAM score’s AUROC was 0.873 (95% CI: 0.861-0.886), and the AUC-PR was 0.031. The Intrinsic Surgical Risk Score’s AUROC was 0.925 (95% CI: 0.915-0.936), and the AUC-PR was 0.094. Although recalibration improved the models’ discrimination, their overall performance remained inferior to that in the original studies.
Main Results
PRAM Score: In external validation, the PRAM score had an AUROC of 0.856 and an AUC-PR of 0.008. After recalibration, the AUROC improved to 0.873, and the AUC-PR improved to 0.031. Despite the improvement in discrimination, the PRAM score exhibited poor calibration at high mortality probabilities, and decision curve analysis indicated limited clinical utility.
Intrinsic Surgical Risk Score: In external validation, the Intrinsic Surgical Risk Score had an AUROC of 0.925 and an AUC-PR of 0.085. After recalibration, the AUROC remained unchanged, and the AUC-PR improved to 0.094. Although this score outperformed the PRAM score in discrimination, it still resulted in a large number of false positives, and decision curve analysis indicated limited clinical utility.
Conclusion
This study externally validated and recalibrated the PRAM score and the Intrinsic Surgical Risk Score using MPOG data. Although the Intrinsic Surgical Risk Score outperformed the PRAM score in discrimination, both models exhibited inferior performance in external validation compared to the original studies. Calibration metrics appeared favorable due to the large number of low-mortality cases, but both models overestimated mortality at higher probabilities. Decision curve analysis indicated limited clinical utility for both scoring models.
Research Highlights
Importance of External Validation: This study underscores the necessity of external validation for risk scoring models before their clinical application. Although the PRAM score and the Intrinsic Surgical Risk Score performed well in the original studies, their performance declined in external validation, suggesting that the applicability of risk scoring models may vary across different datasets.
Value of Clinical Judgment: The superior performance of the Intrinsic Surgical Risk Score was largely driven by the ASA PS score, indicating that clinical judgment may be more effective than risk scoring models in predicting mortality in high-risk pediatric patients.
Issue of False Positives: Both scoring models resulted in a large number of false positives, which could lead to unnecessary resource allocation in clinical practice. However, false positives are less concerning than false negatives, as they do not adversely affect patient outcomes.
Research Significance
This study, through external validation, revealed the limitations of the PRAM score and the Intrinsic Surgical Risk Score in clinical practice. Although both models performed well in the original studies, their performance declined in external validation, suggesting that the applicability of risk scoring models may vary across different datasets. Additionally, the findings suggest that clinical judgment may be more effective than risk scoring models in predicting mortality in high-risk pediatric patients. Future research should explore ways to improve risk scoring models to enhance their clinical utility.
Other Valuable Information
The limitations of this study include the absence of certain key variables (e.g., preoperative cardiopulmonary resuscitation and intrinsic surgical risk) in the MPOG database, which may have contributed to the decline in the models’ performance during external validation. Furthermore, the MPOG database primarily consists of data from academic medical centers in the United States, which may not fully represent pediatric patient populations in other regions or resource-limited settings. Future research should validate these risk scoring models in broader populations to further assess their applicability.