An Explainable Transformer Model Integrating PET and Tabular Data for Histologic Grading and Prognosis of Follicular Lymphoma: A Multi-Institutional Digital Biopsy Study
A Transformer Model Integrating PET Imaging and Clinical Data: A Multicenter Digital Biopsy Study for Predicting Pathological Grading and Prognosis of Follicular Lymphoma
Academic Background
Follicular Lymphoma (FL) is the most common indolent Non-Hodgkin Lymphoma in Western countries, accounting for approximately 30% of newly diagnosed cases. According to the World Health Organization (WHO) classification, FL is graded into three pathological levels (Grade 1–3) based on the number of centroblasts per high-power field (HPF). Grade 3 is further subdivided into Grade 3a and Grade 3b, with Grade 3b exhibiting more aggressive biological behavior and worse prognosis, similar to Diffuse Large B-Cell Lymphoma (DLBCL) in its treatment approach. In contrast, Grade 1-2 patients often have a slow disease progression, and some asymptomatic, low-tumor-burden cases can even adopt a “watchful waiting” strategy. Grade 3a patients exhibit biological and clinical characteristics that are intermediate between Grade 1-2 and Grade 3b, with significant differences in survival rates also confirmed within the same grade.
Currently, biopsy combined with immunohistochemistry remains the gold standard for determining the pathological grade of FL. However, this method is limited by sampling bias and the difficulty of obtaining specimens from deep anatomical locations. While ^18F-FDG PET/CT is widely utilized for staging FL, its role in grading has not been fully explored. Previous studies have attempted to leverage PET parameters like SUVmax (maximum standardized uptake value), TMTV (total metabolic tumor volume), and TLG (total lesion glycolysis) to distinguish between FL grades. However, these studies were constrained by small sample sizes and single-center data, raising concerns about their generalizability. Moreover, traditional metabolic parameters offer limited insights into tumor heterogeneity, making it difficult to fully capture biological variability.
In recent years, advances in Artificial Intelligence (AI) have facilitated the in-depth analysis of medical images, with deep learning models demonstrating remarkable potential for extracting granular pathological information from PET images. However, most AI models rely heavily on radiomic features or solely derive information from images, with limited integration of clinical data. Additionally, many existing AI models are “black boxes” that lack transparency, making it difficult to explain the rationale behind their predictions and challenging for clinicians to trust these techniques.
To address these challenges, this study aims to develop an explainable multimodal fusion Transformer model that integrates PET imaging and clinical data for accurate prediction of FL pathological grading and prognosis.
Source of the Study
The study was led by a research team from the Department of Nuclear Medicine at West China Hospital, Sichuan University, in collaboration with researchers from Nanjing Drum Tower Hospital, Shandong University Qilu Hospital, the First Affiliated Hospital of Nanjing Medical University, and the First Affiliated Hospital of Xiamen University. The paper, authored by Chong Jiang and Zekun Jiang, was published in the European Journal of Nuclear Medicine and Molecular Imaging and was made available online in January 2025.
Research Workflow
Data and Participants
The study included a cohort of 513 FL patients from five independent medical centers. Data was retrospectively analyzed and categorized into three groups based on pathological grade: Grade 1-2, Grade 3a, and Grade 3b. The dataset was partitioned into training, internal validation, and external independent validation cohorts according to the geographical location of participating centers: - Training set consisted of 275 patients randomly drawn from West China Hospital, Nanjing Drum Tower Hospital, and Shandong University Qilu Hospital. - Internal validation set comprised 69 patients. - External validation set, serving as an independent dataset, included 169 patients from the First Affiliated Hospital of Nanjing Medical University and the First Affiliated Hospital of Xiamen University.
Data Preprocessing
- PET Image Processing: The tumor volume of interest (VOI) for each patient was extracted. Intensity normalization was applied to eliminate variations across scanners, and VOIs were resized to a uniform dimension suitable for model input. Semi-automated delineation of lesions was conducted with a 41% SUVmax threshold.
- Clinical Data Standardization: Nine clinical features, including age, LDH (lactate dehydrogenase), and B symptoms, were normalized for model training.
- Image Augmentation: To enhance model generalization and data diversity, PET images underwent augmentation techniques such as rotation and scaling.
Model Design and Development
The proposed Transformer model comprises the following components: 1. Image Encoder: A 3D Swin Transformer architecture was employed for encoding features from PET images. 2. Tabular Data Encoder: A multi-layer perceptron (MLP) architecture encoded clinical tabular data. 3. Fusion Network: Cross-attention layers captured relationships between PET images and clinical data, while self-attention layers refined the fused features. 4. Classification Head: The final classification was performed through fully connected and non-linear layers, facilitating the three-grade prediction of FL.
Dropout regularization was applied to prevent overfitting. Additionally, dynamic learning rate adjustment was implemented via an AdamW optimizer and a cosine annealing scheduler.
Model Explainability Mechanisms
To improve clinical interpretability, the model incorporated three explainability mechanisms: 1. Grad-CAM (Gradient-weighted Class Activation Mapping): Heatmaps were generated to visualize the regions in PET images most relevant to predictions. 2. SHAP (Shapley Additive Explanations) Analysis: Clinical features’ importance scores were calculated to clarify their contributions to predictions. 3. Cross-attention Weight Analysis: Contribution ratios of PET images and tabular data were quantified within the decision-making process.
Key Findings
Model Performance
The proposed fusion Transformer model demonstrated outstanding performance in pathological grade classification, achieving AUC values >0.9 in the training, internal validation, and external validation cohorts. In the external validation cohort: - Grade 1-2 prediction: AUC = 0.936, Accuracy = 86.4% - Grade 3a prediction: AUC = 0.927, Accuracy = 88.2% - Grade 3b prediction: AUC = 0.994, Accuracy = 97.0%
Ablation experiments confirmed that the fusion model significantly outperformed single-modality models based solely on PET or clinical data.
Clinical Interpretability and Prognostic Analysis
- Grad-CAM Heatmaps: The model primarily focused on tumor core regions while capturing some peri-tumoral information, consistent with clinical knowledge.
- SHAP Analysis: Age and SUVmax emerged as the most important predictors of FL grades, with age being strongly associated with the aggressiveness of FL.
- Prognostic Stratification: Kaplan-Meier survival curves based on model predictions revealed significant disparities in progression-free survival (PFS) across different grades, with Grade 1-2 patients having the best prognosis and Grade 3b the worst.
Research Significance and Highlights
- Innovative Model Architecture: This is one of the first studies to adopt Transformer-based architectures for integrating PET images and clinical data, overcoming limitations of single-modality analyses.
- Clinical Interpretability: By leveraging Grad-CAM and SHAP techniques, the study provides transparent insights into model decision-making, offering reliable guidance for clinicians.
- Generality: The multicenter validation ensures the model’s applicability across diverse geographical regions and equipment environments.
- Practical Value: The model delivers accurate pathological grading, enabling non-invasive strategies for early prognostic interventions among high-risk patients.
Conclusion
This study successfully developed and validated an explainable multimodal fusion Transformer model for non-invasive grading and prognosis prediction of FL. The model not only exhibits superior performance but also significantly enhances clinical applicability through its explainability mechanisms. It represents a groundbreaking advancement in leveraging AI to support decision-making in precision medicine.