Holistic AI analysis of hybrid cardiac perfusion images for mortality prediction

Introduction

Myocardial perfusion scintigraphy is widely used for the evaluation of coronary artery disease (CAD), with over 15–20 million scans performed worldwide^1,2. During myocardial perfusion imaging (MPI), a low-dose non-contrast computed tomography attenuation correction (CTAC) scan is often used to correct for soft-tissue attenuation, leading to improved diagnostic accuracy^3,4. Attenuation correction by computed tomography (CT) is recommended by American Society of Nuclear Cardiology guidelines⁵. Although the myocardium is the structure of principal interest during SPECT/CT MPI, its CTAC scan provides a wealth of additional information about other visible organs. Incidental findings have been reported in up to 59.5% of SPECT/CT MPI studies, of which some are clinically important and necessitate further diagnosis and treatment^6,7.

However, due to limitations in the quality of CTAC images (low dose, no electrocardiographic gating), detection and characterization of abnormal findings on CTAC can be challenging⁸. Consequently, the additional information present in hybrid cardiac scans is often underutilized during clinical reporting. While some methods have been developed to derive information about coronary artery calcium (CAC) and epicardial adipose tissue (EAT) from CTAC scans^9,10, many other potentially clinically important features, like extracardiac structures, are present in these scans, yet to date their added value to MPI has not been systematically evaluated (Supplementary Table 1).

The aim of this study is to develop a holistic artificial intelligence (AI)-based approach for the prediction of all-cause mortality from SPECT/CT MPI utilizing all possible information contained in the hybrid images and to separately evaluate the value of CTAC images for this purpose, which have been previously underutilized.

Results

Patient Characteristics

In total 10,983 participants from 4 sites were enrolled in the REFINE SPECT registry, of which 500 CTAC scans from one site were used for EAT-model training and validation. Of the 10,483 remaining participants, 3 were excluded due to incomplete CTAC scans. The final study cohort consisted of 10,480 participants (Fig. 1, Supplementary Fig. 1).

Holistic AI analysis of hybrid cardiac perfusion images for mortality prediction — **Fig. 1: Central illustration.**

Table 1 represents baseline characteristics stratified by sex. Of all participants, 5745 (54.8%) were male, and median age was 65 with an interquartile range (IQR) of (57, 73) years. During the median 2.9-year (IQR 1.6–4.0) follow-up period, 651 (6.2%) patients died. Table 2 shows baseline characteristics stratified by ACM. Normal myocardial perfusion was present in 7329 (69.9%) patients, of whom 345 (4.7%) died. Patients with normal perfusion were significantly younger (p < 0.001), more often female, and less often diagnosed with hypertension (p < 0.001), diabetes (p < 0.001), and dyslipidemia (p < 0.001) (Supplementary Table 2).

Table 1 Baseline characteristics for all participants stratified by sex

Full size table

Table 2 Baseline characteristics for all participants stratified by all-cause mortality (ACM)

Full size table

Myocardial Imaging Perfusion Quantitative Image Analysis Parameters

In all patients, the median TPD was 2.6% (0.9–6.0) and was higher in male than female patients (2.7 vs. 2.5, respectively, p < 0.001) (Table 1). Significantly lower stress ejection fraction was observed in men compared with women (59% vs. 70%, respectively, p < 0.001). The median TPD in patients with abnormal perfusion was 8.9 (6.5, 14.2), whereas the median stress ejection fraction in this group was 57 (46, 67) (Supplementary Table 2).

Coronary artery calcium and epicardial adipose tissue

CAC was 0 in 3,753 (35.8%) patients, >0–100 in 1982 (18.9%), >100–400 in 1462 (14.0%), and >400 in 3283 (31.3%) subjects. The median EAT volume and density were 130 mL (90, 183) and −65 HU (−70, −61), respectively (Table 1).

In patients with normal perfusion, 2903 (39.6%) subjects had no CAC, 1515 (20.7%) had CAC > 0 and ≤100, 1029 (14.0%) had CAC > 100 and ≤400, and 1882 (25.7%) had CAC > 400. The median EAT volume and density in patients with normal perfusion were 129 mL (89, 181) and −65 HU (−70, −61), respectively (Supplementary Table 2).

Model performance

Figure 2 represents the model performance and feature importance for mortality in all patients, subjects with normal perfusion, and patients without calcified lesions in coronary arteries. The lungs were the top feature in all patients, in patients with normal perfusion as well as in subjects without coronary calcifications. Supplementary Fig. 2 shows feature importance plots stratified by different sites and image quality. For all AI models in all patients included in the study, AUCs with 95% confidence interval (CI) are shown in Supplementary Table 3. There was a better performance of the AI CTAC model (AUC 0.78, 95% CI 0.71–0.85) than the EAT model (AUC 0.56, 95% CI 0.49–0.63, p < 0.001), and coronary calcium (AUC 0.64, 95% CI 0.57–0.71, p < 0.001) alone. There was a small but statistically significant difference in the prediction performance of the AI hybrid model and the CTAC model (AUC 0.79 vs. 0.78, p < 0.001). Additionally, the AI CTAC model outperformed the AI SPECT model (AUC 0.78 vs 0.65, p < 0.001).

AUCs with 95% CI for all AI models in patients with normal myocardial perfusion are shown in Supplementary Table 4, whereas in subjects with no coronary calcium in Supplementary Table 5. In the group with normal perfusion, the performance of the AI CTAC model was significantly better compared to Perfusion (AUC 0.76 vs. 0.53, respectively, p < 0.001). The AI hybrid model incorporating CTAC and MPI features had similar prediction performance compared to the AI CTAC-only model (AUC 0.76 vs. 0.76, respectively, p = 0.384). Among the patients with no calcium, the AI CTAC model significantly outperformed Perfusion (AUC 0.71 vs. 0.59, respectively, p < 0.001). The AI hybrid model was significantly better than AI CTAC-only model (AUC 0.75 vs 0.71, respectively, p < 0.001). Models were also evaluated across different sites and acquisition protocols, as shown in Supplementary Table 6. The AI model demonstrated consistent performance regardless of the acquisition protocols. However, Columbia and Ottawa showed significantly lower performance compared to Yale (Supplementary Table 6).

A subgroup analysis was performed using the best model (All model) across the following categories: white race, black race, female, male, older ( ≥ 65 years), and younger ( < 65 years). Due to limited data for other racial groups, the race-based subgroup analysis was restricted to black and white individuals. Our findings indicate that the All model demonstrates comparable performance across both male and female groups (AUC: 0.77 vs 0.79, p = 0.08) in Supplementary Fig. 3. Furthermore, the model exhibited better performance in individuals aged <65 years compared to those aged ≥65 years (AUC: 0.79 vs 0.74, p = 0.16). The difference between for the Black group and the White group while numerically different (AUC: 0.70 vs. 0.84, p = 0.74) did not reach statistical significance, with few events in the Black population. Due to limited data available about patient race, only subset of the cohort could be studied with limited number of events and the study is likely underpowered for such comparison.

Association with outcomes and multivariable model

Kaplan-Meier Curves stratified by TPD (ischemia <10% and ≥10%), and a matched proportion of patients with high and low AI scores (AI threshold at 0.17, high risk in 4.13%) are shown in Fig. 3. AI score led to an improved risk reclassification of patients who experienced mortality (15.1%, 95% CI 11.4–18.8, p < 0.001) and patients who did not experience mortality (1.0%, 95% CI 0.5–1.5, p < 0.001), with an overall net reclassification improvement of 16.1% (95% CI 12.4–19.8, p < 0.001). The stability of the AI threshold was assessed by inspecting the hazard ratios (HR) of the AI threshold in high-risk categorization across different subgroups in Supplementary Table 7. Notably, the mean adjusted HRs in all subgroups are above 4.

**Fig. 3: Kaplan-Meier (KM) curves stratified by total perfusion deficit (TPD).**

Supplementary Fig. 4 illustrates findings of multivariable analyses. After adjusting for age, sex (male), hypertension, dyslipidemia, diabetes mellitus, peripheral vascular disease, past myocardial infarction, and family history of CAD, patients with abnormal perfusion were at higher risk of death compared to patients with normal myocardial perfusion (adjusted HR 1.71, 95% CI 1.46–2.01, p < 0.001). Moreover, CAC > 400 (adjusted HR 2.11, 95% CI 1.67–2.65, p < 0.001) was associated with an increased risk of death.

Structure specific risk evaluation

Examples of patients classified to be at a higher risk of death (with extracardiac structures, notably the lungs and aorta, contributing the most to mortality) are shown in Fig. 4, Supplementary Figs. 5 and 6.

Fig. 4: Example of a patient undergoing single-photon emission computed tomography/computed tomography (SPECT/CT) myocardial perfusion imaging with an extracardiac structure increasing the highest risk of all-cause mortality.

Discussion

In this study, we have demonstrated the potential value of holistic anatomic, functional, and clinical evaluation of CTAC scans for improving all-cause mortality prediction in patients undergoing hybrid perfusion MPI. We developed a fully automated AI model incorporating multi-structure segmentation and radiomic feature extraction in parallel to deep learning-based CAC and EAT quantification. This model improves mortality prediction from multimodality myocardial perfusion, with a combined model improving upon any feature set (SPECT, CTAC, or clinical) in isolation. Moreover, it provides physicians with guidance regarding portions of CTAC scans which require further scrutiny to identify potentially important underlying conditions indicating potentially significant incidental findings, despite coronary artery disease being the primary indication for the examination. This fully automated workflow could be leveraged by physicians to unlock the full potential of hybrid SPECT/CT imaging.

Several studies have proven the role of AI in predicting mortality and cardiovascular events from cardiac imaging (Supplementary Table 1), only few of these studies were utilizing hybrid MPI^11,12, and CTAC data^13,14. None of the studies of the cardiovascular data considered comprehensively all organs in the field-of-view for the analysis. Moreover, only a limited number of CTAC findings, like CAC¹³ or EAT¹⁰ were included in these previous analyses. More recently we demonstrated that deep learning cardiac chamber volumes (from CTAC) provided incremental and complementary value to CAC and SPECT variables¹⁵. Ashrafinia et al. used radiomic features from SPECT MPI to predict CAC score derived from CT scans¹⁶, whereas Amini et al. applied a quantitative image analysis approach not only to diagnose CAD, but also for risk classification¹⁷. The proposed AI approach integrates simultaneous assessment of multiple structures on CTAC by leveraging strengths of deep learning and quantitative image analyses. Importantly, the model incorporating SPECT, CTAC, and clinical data had the highest prediction performance suggesting that AI-derived information encrypted in CTAC is complementary to traditional methods for analysis.

By integrating functional imaging (SPECT) with anatomic characteristics (CT), hybrid imaging has not only enhanced nuclear medicine by improving diagnostic accuracy¹⁸, but also provides an enormous amount of data contained in CTACs. This improvement was also observed in the performance of our model — the model including only perfusion and functional features performed significantly lower than the hybrid model (incorporating CTAC and SPECT data) or even the AI CTAC model alone. Moreover, the integration of clinical and imaging information improved the performance of the model in predicting the risk of death, which reflects the need for a holistic approach in patients’ diagnosis and radiology reporting¹⁹. While the 2024 ESC Guidelines for the Management of Chronic Coronary Syndromes recommend CAC scoring from CTACs to improve the detection of nonobstructive and obstructive CAD²⁰, there is significantly more information in CTAC images beyond CAC that is not currently utilized. As demonstrated in this study, the highest feature importance score for predicting mortality was reported for the lungs. Although ischemic heart disease is the leading cause of mortality worldwide, the total number of lives lost due to respiratory diseases is still higher^21,22. Incidental findings are frequently detected also on CTACs^6,7, some of which may be clinically significant and require further diagnosis and treatment^23,24,25. This underlines the need for a scrutinized evaluation of exams in patients undergoing diagnostic imaging for various reasons. For example, some respiratory diseases, like lung cancer and chronic obstructive pulmonary disease, share the same risk factors as CAD²² and early detection of potentially significant incidental findings might be lifesaving. AI-systems like the one proposed in our study could aid clinicians in these tasks.

This study has some limitations. It was a retrospective study with non-uniform CTAC acquisition protocols from multiple sites, however, this highlights the generalizability of the approach. Some organs (like the pancreas) were only partially visible or not visualized on all scans, whereas organs like kidneys and thyroid were excluded from the analysis because of their high missingness ( > 20%) across the cohort. For a more holistic approach and more accurate mortality prediction, organs with missingness <20% were included, however, this could influence model accuracy since, in some cases, fewer features were included. This large, multicenter registry does not include information on the reported cause of death, limiting our ability to evaluate the associations between specific extracardiac organ features and cause-specific mortality. Additionally, while SHAP and XGBoost are widely used for model explainability, their results can be subtly influenced by feature correlations and training data quality, highlighting the need for careful interpretation and oversight by clinicians. Another limitation of this study is the limited racial data, restricting subgroup analysis to Black and White individuals. Finally, radiological evaluation of CTACs was performed only with radiomic features, and no information regarding reported incidental findings is available in this cohort.

We demonstrate a significant, yet underappreciated, role of CTAC in risk stratification with MPI SPECT/CT. Fully automated AI integration of quantitative features from multiple organs derived from CTAC, perfusion and clinical data images significantly improves mortality risk stratification in patients undergoing SPECT/CT MPI as compared to MPI only.

Material and methods

Study population

In this retrospective study we utilized CTAC scans of patients who underwent SPECT/CT MPI from 4 sites (University of Calgary, Yale University, Columbia University, University of Ottawa Heart Institute) participating in the Registry of Fast Myocardial Perfusion Imaging with Next generation SPECT (REFINE SPECT)²⁶. The cohort included consecutive patients at each center referred for SPECT imaging, with scans performed between 2009 and 2021. The study protocol complied with the Declaration of Helsinki and was approved by the institutional review boards (IRBs) at each participating institution, including the University of Calgary (Conjoint Health Research Ethics Board), Yale University School of Medicine (Human Research Protection Program, Institutional Review Boards), University of Ottawa Heart Institute (Ottawa Health Science Network Research Ethics Board), and Columbia University Irving Medical Center (Human Research Protection Program, Institutional Review Boards). The investigators ensured that the institutional ethics committee at each center evaluated and approved the study protocol before data collection and transfer. The overall study was approved by the institutional review board at Cedars-Sinai Medical Center (Office of Research Compliance and Quality Improvement). Sites either obtained written informed consent or waiver of consent for the use of the de-identified data. To the extent allowed by data sharing agreements and institutional review board protocols, the data and code from this manuscript will be shared upon written request. Baseline demographic and clinical characteristics were obtained from the REFINE SPECT registry²⁶. CTAC image acquisition at each participating site is shown in Supplementary Table 8. The outcome was all-cause mortality (referred to subsequently simply as “mortality”), which was determined using the national death index for sites in the United States and administrative databases in Canada.

Myocardial perfusion image analysis

Total perfusion deficit (TPD), end-diastolic stress shape index (ratio between the maximum left ventricular (LV) diameter in short axis and the length of the LV in end-diastole at stress), stress ejection fraction, and end-diastolic volume were quantified automatically from non-attenuation-corrected MPI scans at the core laboratory (Cedars-Sinai Medical Center, Los Angeles) with the use of dedicated software (Quantitative Perfusion SPECT [QPS] software, Cedars-Sinai Medical Center, Los Angeles)²⁷. Normal myocardial perfusion was defined as stress TPD < 5%²⁸, whereas moderate-to-severe ischemia was defined as TPD ≥ 10% of the myocardium²⁹.

Multi-structure deep learning feature extraction from CTAC

The study design is shown in Fig. 1. TotalSegmentator, a multi-structure segmentation deep learning (DL) model, was used to segment structures visible on CTAC³⁰. Out of all segmented structures, we selected thirty-three structures with a frequency of >80% on all scans (Supplementary Fig. 7). The automatic extraction of imaging features for all selected structures was performed with PyRadiomics package (version 3.0.1)³¹. In per-organ analysis, we included eleven first-order and four 3D features which are clinically relevant and have straightforward clinical interpretation (Supplementary Tables 9-10).

One primary goal of this study was to create a simple, explainable model with high predictive power. We selected 15 radiomic features (11 first-order statistical and 4 3D shape-based) defined by PyRadiomics for their strong signal specificity and clinical relevance^32,33,34,35. Grey-level features were excluded as they are deprecated in newer radiomics versions³⁶. Further, we conducted a comparison between the performance of the models created with all calculated 32 radiomic features and the subset of clinically interpretable 15 radiomic features (for the names of these selected features please see Supplementary Table 9). There were no statistically significant differences in performance between the models using all 32 radiomic features and those using 15 features for the All and AI CTAC models (p = 0.09 and p = 0.40, respectively, Supplementary Table 10). Additionally, for the AI hybrid model, the 15-feature subset performed significantly better than the full 32-feature set (see Supplementary Table 10 for AUCs, confidence intervals, and p-values). This supports our decision to use the clinically interpretable 15-feature subset, as it simplifies the model without compromising performance and, in some cases, enhances it.

Automated coronary artery calcium scoring

Our formerly validated deep learning model was used for CAC segmentation and scoring^37,38. To segment heart mask and CAC on CTAC images, two convolutional long short-term memory (convLSTM) networks were tested externally on data (10,480 CTAC scans) from 4 different sites. To automatically obtain CAC scores from the deep learning segmentation, established methods were used³⁹.

Automated epicardial adipose tissue scoring

A previously developed deep learning model was used to estimate EAT volume and density (−190 and −30 Hounsfield units [HU]) from CTAC scans¹⁰. For EAT model training and validation purposes, we used 500 CTAC scans from one site (Yale University). Patients who were used for EAT model training and validation were not included in this analysis.

Classification models

Extreme Gradient Boosting (XGBoost) models (version 1.7.3), a currently leading machine learning method, were used for mortality classification³³. These models generate all-cause mortality risk scores by applying 10-fold cross-validation regimen across the entire dataset. Within each fold, 90% of the data was first set aside for model training and validation. This 90% was further divided, with 80% used for training and 20% for validation. The remaining 10% of the data in each fold was used for testing and kept separate from training and validation to ensure each patient was tested exactly once across all folds. 10 separate models were built, and each was tested independently. Testing results were concatenated from all models for the overall performance evaluation. Hyper-parameter tuning to optimize the model parameters was conducted during training and validation, separately in each fold using the grid-search method.

Key benefits of employing 10-fold cross-validation include: 1) reducing variability of prediction errors for more accurate evaluation⁴⁰; 2) maximizing data utilization while minimizing overfitting and cross-contamination of information among data splits⁴¹; 3) ensuring each data point contributes to the test set exactly once, providing independent and non-overlapping predictions for robust performance evaluation⁴²; 4) meeting the DeLong test requirements for valid AUC comparisons by using independent predictions⁴³.

Models

Five models were used for the mortality endpoint: 1 – model incorporating DL-EAT (EAT), 2 – model combining quantitative CTAC image analysis of all segmented structures [radiomics], DL-EAT and DL-CAC (AI CTAC), 3 – model incorporating stress ejection fraction, stress end-diastolic volume, stress shape index end-diastolic, stress TPD, and other SPECT imaging features (in total 22 features) [see Supplementary Table 11] (AI SPECT), 4 – model incorporating all variables included in the AI CTAC model as well features included in the AI SPECT model (AI hybrid), 5 – model combining CTAC, MPI and clinical data (All), whereas Coronary calcium (DL-CAC score) and Perfusion (utilizing stress TPD) were univariate comparisons.

Clinical data include patient demographics such as age, sex, body mass index (BMI). Also included is past medical history: hypertension, diabetes, dyslipidemia, prior CAD (prior myocardial infarction, percutaneous coronary intervention [PCI], and coronary artery bypass graft [CABG]). Further, the clinical data encompass variables from stress test such as the type of test, peak stress heart rate, peak stress blood pressure, and ECG response to stress.

Model explainability

The predictive power of variables included in model training was evaluated using XGBoost feature importance, which quantifies the increase in accuracy resulting from the addition of each feature. SHapley Additive explanations (SHAP), a game-theoretic feature importance method, was used to explain how structures contributed to the overall risk in model inference for individual patients⁴⁴.

Thresholds for comparisons of machine learning

Patients were classified into low or high-risk groups based on AI-derived all-cause mortality risk score. This classification was achieved by setting a threshold that aligns with the proportion of patients identified by the established clinical criteria for ischemia ( ≥ 10%)^45,46.

Statistical analysis

Continuous variables with a normal distribution are presented as mean ± standard deviation (SD) and not normally distributed variables as medians with interquartile range (IQR) [IQ1-IQ3]. Categorical variables are expressed as count and relative frequencies (percentages). Differences between categorical variables were compared by the Pearson’s χ2 test whereas continuous variables were compared by Wilcoxon Mann-Whitney test, as appropriate. The performance of the models was evaluated using receiver-operating characteristics analysis, and area under the receiver-operating characteristic (AUC) analysis values were compared with the DeLong test⁴⁷. Kaplan-Meier survival curve, alongside univariate Cox proportional hazard models, were employed to evaluate the association with mortality. Log-rank test was used to ascertain the statistical significance. The improvement in model predictions was measured using the time-dependent net reclassification improvement score at 2 years⁴⁸. Confidence intervals were calculated by the percentile bootstrap method. A two-tailed p-value of <0.05 was considered statistically significant. All statistical analyses were performed with Pandas (version 2.1.1) and Numpy (version 1.24.3), Scipy (version 1.11.4), Lifelines (version 0.28.0) and Scikit-learn (version 1.3.0) in Python 3.11.5 (Python Software Foundation, Wilmington, DE, USA), as well as “nricens” package (version 1.6) in R version 4.3.2 (R Foundation for Statistical Computing, Vienna, Austria).