Development and Validation of Machine Learning Algorithms Based on Electrocardiograms for Cardiovascular Diagnoses at the Population Level

Development and Validation of Large-Scale Machine Learning Algorithms for Cardiovascular Diagnosis Based on Electrocardiograms

Introduction

Cardiovascular diseases (CV) have long been a major source of global disease burden. Early diagnosis and intervention are crucial for reducing complications, healthcare utilization, and associated costs. Traditional electrocardiograms (ECGs), as a low-cost and convenient diagnostic tool, are widely used in detecting cardiovascular diseases. However, existing ECG interpretation techniques (including manual and computer algorithms) have limitations in identifying high-level signal interactions and “hidden” clinically relevant patterns. The emergence of artificial intelligence (AI), especially deep learning (DL), offers new opportunities to identify “hidden” patterns in ECG signals and simultaneously evaluate the complex interactions between various cardiovascular diseases. This study is based on this background.

Source of Paper and Authors

This paper was published in the journal “npj Digital Medicine” and is collaboratively conducted by researchers from multiple institutions, including Sunil Vasu Kalmady, Amir Salimi, Weijie Sun, Nariman Sepehrvand, Yousef Nademi, Kevin Bainey, Justin Ezekowitz, Abram Hindle, Finlay McAlister, Russel Greiner, Roopinder Sandhu, and Padma Kaul, with particular emphasis on collaborative research at Seoul National University Bundang Hospital.

Research Process

Study Subjects and Data Collection

The study utilized 1,605,268 12-lead ECG data provided by 244,077 adult patients from 84 emergency departments or hospitals in Alberta, Canada, collected from February 2007 to April 2020. The research aimed to simultaneously predict 15 common cardiovascular diagnoses, including atrial fibrillation, supraventricular tachycardia, ventricular tachycardia, cardiac arrest, atrioventricular block, unstable angina, ST-elevation myocardial infarction (STEMI), non-ST-elevation myocardial infarction (NSTEMI), pulmonary embolism, hypertrophic cardiomyopathy, aortic stenosis, mitral valve prolapse, mitral stenosis, pulmonary hypertension, and heart failure.

Model Development and Validation

The study employed a ResNet-based deep learning model (using ECG waveform data) and an extreme gradient boosting (XGB) model (using ECG measurement data) to predict diseases and evaluated the models on a holdout set of 97,631 test patients.

Detailed Description of Procedure

  1. Initial Data Processing: Extract ECG data from patients’ health records and correlate them with standardized administrative health databases.
  2. Model Training: Train the deep learning model and XGB model using ECG data from 146,446 patients.
  3. Holdout Set Evaluation: Validate the models on a holdout set of 97,631 patients, evaluating model performance by comparing the first ECG data of each patient.
  4. Feature Importance Analysis: Use gradient-weighted class activation mapping (Grad-CAM) for deep learning model visualization and information gain for XGB model feature importance analysis.

Performance Evaluation

Model performance for the 15 cardiovascular diseases was evaluated on the holdout set, where the DL model outperformed the XGB model by approximately 5% in average area under the receiver operating characteristic curve (AUROC) for all diseases, with significant improvement in some cases. The DL model performed best in predicting STEMI with an AUROC of 95.5%, while its performance for pulmonary embolism was the lowest with an AUROC of 68.9%.

Gender and Pacemaker Analysis

The study also evaluated the DL model’s performance in males, females, and patients with pacemakers. The results showed consistent model performance, with slightly better prediction performance in males for certain diseases (e.g., ventricular tachycardia, STEMI). The presence of pacemakers had minimal impact on model performance.

Main Results

  1. Model Effectiveness: The DL model achieved an AUROC of over 80% for 12 cardiovascular diseases, with four diseases (including STEMI, mitral stenosis, hypertrophic cardiomyopathy, and atrioventricular block) exceeding 90%.
  2. Performance Improvement: The DL model outperformed the XGB model in predicting most diseases, with significant improvement particularly in detecting mitral stenosis and myocardial infarction.
  3. Model Robustness: The model showed consistent performance across different genders and in patients with pacemakers, indicating the robustness of the algorithm.

Conclusion

This study demonstrated the effectiveness and robustness of AI-driven ECG algorithms in diagnosing 15 cardiovascular diseases, with the DL model showing superior diagnostic accuracy compared to the XGB model. By utilizing comprehensive administrative databases, the research highlights the substantial potential of machine learning algorithms in common cardiovascular disease diagnosis, providing new tools for early diagnosis and risk stratification in clinical practice. Future work should further explore the deployment and effectiveness of these models in actual clinical applications.