Foundation Model of ECG Diagnosis: Diagnostics and Explanations
Research on ECG Diagnosis Foundation Model Based on Signal-Language Architecture
Academic Background
Cardiovascular disease (CVD) is the leading cause of death worldwide, and early identification of high-risk populations is crucial. The electrocardiogram (ECG), a non-invasive, cost-effective, and widely used diagnostic tool, is recorded over 300 million times annually and serves as a primary means for early diagnosis of CVD. However, even for experienced cardiologists, interpreting complex ECGs remains a time-consuming and error-prone task. In remote and underserved regions, providing accurate diagnoses is particularly challenging.
In recent years, the application of artificial intelligence (AI) in ECG interpretation has shown tremendous potential. Studies indicate that AI-based ECG diagnostics have already surpassed general cardiologists in diagnosing certain specific diseases. However, existing mainstream automatic ECG diagnosis systems are typically trained on closed datasets for a few specific diseases, making it difficult to directly apply these models to datasets from other centers due to differences in data distribution and the wide range of diseases across multi-center datasets. Therefore, developing an automatic diagnostic system that can operate effectively without relying on annotated data after initial training holds significant practical importance for large-scale multi-center clinical environments, especially in remote and underserved areas.
Paper Source
This paper was co-authored by Yuanyuan Tian, Zhiyuan Li, Yanrui Jin, and others from the State Key Laboratory of Mechanical System and Vibration at Shanghai Jiao Tong University, the AI Institute, and the Department of Cardiology at Shanghai First People’s Hospital affiliated with Shanghai Jiao Tong University. The paper was published on December 17, 2024, in the journal Cell Reports Medicine, titled Foundation Model of ECG Diagnosis: Diagnostics and Explanations of Any Form and Rhythm on ECG.
Research Process
1. Model Design and Training
The research team proposed a knowledge-enhanced ECG diagnosis foundation model (KED) based on a signal-language architecture. This model incorporates domain-specific knowledge of ECG signals using large language models (LLMs). The KED model was trained on 800,000 ECGs from nearly 160,000 unique patients at a single center. Despite being trained on single-center data, the model demonstrated exceptional zero-shot diagnostic performance across various regions, including China, the United States, and other areas.
2. Datasets and Evaluation
The study used the MIMIC-IV-ECG clinical database for pre-training, which includes approximately 800,000 ECGs from the emergency department, inpatient (including ICU), and outpatient care centers of Beth Israel Deaconess Medical Center. To comprehensively evaluate the model’s diagnostic performance, the research team utilized five external datasets from China, the southeastern United States, and other regions, covering diverse ethnicities, age groups, and ECG acquisition devices.
3. Model Architecture
The KED framework consists of four main modules: an ECG signal encoder, a knowledge encoder, a label query network (LQN), and a classification head. During the training phase, the research team proposed a novel contrastive learning strategy—augmented signal-text-label contrastive learning (AugCL)—which introduces a label dimension to construct independent contrastive spaces, thereby reducing noise in multi-label classification.
4. Zero-Shot Diagnosis and Fine-Tuning
The research team evaluated the KED model’s performance in zero-shot diagnosis and few-shot fine-tuning. Zero-shot diagnosis refers to the model’s ability to diagnose data samples without additional training data, even when the output classes are unknown. The results showed that the KED model performed exceptionally well on multiple external datasets, particularly in diverse populations in China, the southeastern United States, and other regions, and could diagnose diseases not encountered during training.
Key Results
1. Zero-Shot Diagnostic Performance in the Chinese Population
On ECG datasets from the Chinese population (CPS2018 and Chapman), the KED model demonstrated outstanding performance. For example, in the CPS2018 dataset, the model achieved a zero-shot diagnosis AUC of 0.900, sensitivity of 0.695, and specificity of 0.949 for abnormalities such as atrial fibrillation, premature beats, and conduction blocks. For ST segment depression (STD), which was not encountered during training, the model also showed diagnostic capability.
2. Zero-Shot Diagnostic Performance in the Southeastern United States Population
On the Georgia dataset, representing the southeastern United States population, the KED model achieved a zero-shot diagnosis AUC of 0.900, sensitivity of 0.696, and specificity of 0.925 for 20 types of ECG descriptions. For Q wave abnormalities (QAB) and T wave inversions (TINV), which were not encountered during training, the model also demonstrated diagnostic capability.
3. Zero-Shot Diagnostic Performance in Other Regional Populations
On the PTB-XL dataset, representing populations from other regions, the KED model achieved a zero-shot diagnosis AUC of 0.744, sensitivity of 0.623, and specificity of 0.768 for 46 ECG statements. For certain ischemic heart diseases not encountered during training, the model also showed diagnostic capability.
4. Comparison with Cardiologists
On the clinical dataset, the KED model’s zero-shot diagnostic performance was comparable to that of three cardiologists from top-tier hospitals in well-developed regions of China. For example, the model achieved an AUC of 0.994, sensitivity of 0.949, and specificity of 0.975 for atrial fibrillation, matching the diagnostic performance of the experts.
Conclusions and Significance
The KED model demonstrated exceptional zero-shot diagnostic capabilities and generalization performance, effectively diagnosing ECGs across different regions, ethnicities, and ECG acquisition devices, even for diseases not encountered during training. This capability holds significant application value in areas with limited medical resources, aiding in diagnosis and early rapid screening.
Furthermore, the KED model’s ability to adapt quickly to new ECG patterns through few-shot fine-tuning enhances its practical application potential. The signal-language architecture, knowledge enhancement methods, and contrastive learning strategies proposed by the research team provide new insights for the development of ECG diagnostic models.
Research Highlights
- Zero-Shot Diagnostic Capability: The KED model can diagnose diseases of unknown categories without additional training data, demonstrating strong generalization ability.
- Applicability to Multi-Center Data: The model performed exceptionally well on data from different regions, ethnicities, and ECG acquisition devices, overcoming the limitations of traditional models.
- Few-Shot Fine-Tuning: With fine-tuning on a small amount of target center data, the model can quickly adapt to new ECG patterns, enhancing its flexibility in practical applications.
- Signal-Language Architecture: The signal-language architecture proposed by the research team uses text as a supervisory signal, leveraging the rich semantic knowledge of large language models to improve zero-shot transfer capabilities.
Other Valuable Information
The research team also discussed the limitations of the KED model, such as erroneous labels in training data and the hallucination risks of large language models. Future research will focus on collecting more manually annotated data and exploring more precise alignment methods between ECG and medical text knowledge to further expand the model’s application scope.
The KED model offers a new solution for ECG diagnosis, with broad application prospects, particularly in areas with limited medical resources, where it can significantly improve the efficiency and accuracy of ECG diagnosis.