Establishment and validation of a ResNet-based radiomics model for predicting prognosis in cervical spinal cord injury patients

Introduction
Patients with cervical spinal cord injury (cSCI) often vary with different degrees of paralysis1,2,3. Generally, patients with mild paralysis in the early stages tend to have better recovery potential4,5. However, even those with severe early paralysis can experience significant recovery, complicating prognostic predictions6. This uncertainty in prognosis prediction challenges for subsequent treatment decisions and rehabilitation choices4,5,6. To address this issue, the International Association of Neurorestoration introduced the Spinal Cord Injury Functional Rating Scale in 20197. This comprehensive scale assesses the daily living activities and quality of life of cSCI patients, providing a quantitative measure for helping evaluate the prognosis of cSCI patients.
Over the past decade, deep learning has made remarkable advancements and become a core technology in the field of artificial intelligence8,9,10,11. This progress is mainly attributed to enhanced computational power, the availability of large datasets, and improved algorithms10. Deep learning has achieved groundbreaking success in image recognition, natural language processing, and speech recognition8,9,10,11. Among these advancements, the Residual Network (ResNet) technology stands out as a deep learning network architecture that addresses the issues of vanishing and exploding gradients in deep network training by introducing “skip connections.” These skip connections allow information to jump across layers, effectively bypassing one or more layers, which facilitates the training of deeper networks12,13,14. ResNet has demonstrated outstanding performance in various image classification tasks and become the foundation for many modern deep learning models12,15,16,17,18.
Currently, there is a relative scarcity of radiomics research focused on cSCI. In one retrospective study, 43 features were extracted from MRI images and clinical data to predict the 6-month post-injury ASIA grade in cSCI patients. Using the XGBoost algorithm, the model achieved an accuracy rate of 81.1%19. However, the study had notable limitations, as it relied on a limited set of predefined imaging features and did not include high-throughput radiomic features, potentially restricting its predictive performance. In contrast, another retrospective study by Okimatsu et al. employed deep learning to construct a Convolutional Neural Network (CNN) model for patients with cSCI within one month post-injury. The model was trained and validated on 294 sagittal T2-weighted MRI scans and utilized RF to predict neurological outcomes by incorporating patient age and initial ASIA grade. The model achieved accuracy, precision, recall, and F1 scores of 0.714, 0.590, 0.565, and 0.567, respectively20. While the results suggest the feasibility of using MRI images and machine learning for predicting neurological recovery in cSCI patients, this study had limitations as well. The images used for training included rectangular regions that encompassed not only the spinal cord but also surrounding structures, introducing significant noise and reducing the model’s overall performance. Moreover, the short follow-up period limited the assessment of long-term recovery.
We aimed to develop a composite model that integrates imaging and clinical features to predict the prognosis of cSCI patients six months post-injury and to evaluate the model’s application value. By combining radiomic features and deep learning techniques, we aspire to provide a more accurate and reliable prognostic tool for cSCI patients, thereby supporting clinical decision-making and optimizing rehabilitation strategies.
Methods
Study population
See Table 1 of patient baseline characteristics. This retrospective clinical cohort included 168 patients with cervical spinal cord injury (cSCI) who received treatment at Zhongda Hospital from January 1, 2018, to June 30, 2023, which was randomly divided into training and testing set. And the prospective clinical cohort included 43 patients with cSCI who received treatment at Zhongda Hospital from July 1, 2018, to November 30, 2023, as the validation set. Approval for the study was granted by the Zhongda Hospital Ethics Committee, with the ethical approval number 2023ZDSYLL137-P01.
The research was conducted in compliance with the Declaration of Helsinki. Informed consent was obtained from all participants, allowing the use of their data for research purposes. We defined a good prognosis group as having an IANR score ≥ 37, while a poor prognosis group was defined as having an IANR score < 37.
Inclusion criteria for the study were as follows: patients over 18 years of age, diagnosed with cSCI, who underwent standardized treatment, and proceeded with preoperative protocols. Exclusion criteria were patients with incomplete MRI sequences, spinal cord injury lesions that were not visible on MRI, those lost to follow-up, or those with incomplete clinical data. The dataset was randomly distributed in an 8:2 ratio, assigning it to the training set or the testing set.
Radiomics model construction
MR images were obtained using two 3.0 T MRI scanners (Philips Ingenia 3.0T; Siemens MAGNETOM Verio 3.0 T) as shown in Supplementary Table 1. All patients were routinely scanned with sagittal T1WI, axial T1WI, coronal T1WI, sagittal T2WI, axial T2WI and coronal T2WI. Table 1 shows the parameters of the selected sequences of each MRI scanner.
We employed two methods to extract radiomic features. First, we delineated the injury lesion as the volume of interest (VOI) for handcrafted feature extraction. Second, we defined the region of interest (ROI) by cropping vertical rectangular areas on sagittal images at the injury level, encompassing the injury site and the anterior-posterior boundaries of the spinal canal, ensuring the region extended neither above nor below the lesion for deep learning feature extraction. Manual segmentations were carried out utilizing 3D Slicer software (https://www.slicer.org; version 5.0.3).
Radiologist A, with 7 years of experience in spine MRI diagnosis, performed delineations for all patients. To ensure robustness, radiologist B, who has 10 years of experience in spine MRI diagnosis, independently delineated a randomly selected subset of 50 patients. Both radiologists were blinded to the patients’ diagnoses. After a 2-month interval, radiologist A re-delineated the VOIs and ROIs for all patients. The intraclass correlation coefficient (ICC) was computed for each feature to evaluate inter-observer and intra-observer reliability, and features with an ICC below 0.75 were excluded.
For handcrafted feature extraction, features were extracted from the VOIs using an MRI feature analysis program in Pyradiomics (http://pyradiomics.readthedocs.io). For deep learning feature extraction, we utilized the ResNet152 algorithm, an advanced version of the ResNet algorithm introduced in 201521. ResNet152, with its deeper architecture, enhances the model’s ability to capture intricate features from medical images, particularly in complex cases like spinal cord injury. We applied transfer learning to initialize the model with weights pre-trained on our datasets. This pre-training allowed the model to leverage general imaging features and adapt more efficiently to our specific cSCI dataset, leading to improved performance in capturing the relevant features of the injury. Transfer learning proved particularly beneficial in a medical imaging context, where labeled data can be limited, and specialized imaging patterns are crucial for accurate diagnosis and prognosis. Then we compressed the deep learning features through dimensionality reduction. Both handcrafted and deep learning features were standardized using the Z-score method. The least absolute shrinkage and selection operator (LASSO) was employed to filter out standardized features with non-zero coefficients, thus selecting and reducing the dimensionality of the fusion features to obtain the optimal subset.
Following LASSO feature screening, the final features were input into various machine learning models, including Logistic Regression (LR), Naive Bayes Classifier (NaiveBayes), Support Vector Machine (SVM), K Nearest Neighbors Classifier (KNN), Random Forest Classifier (RF), Extra Trees Classifier (ExtraTrees), eXtreme Gradient Boosting Classifier (XGBoost), Light Gradient Boosting Machine Classifier (LightGBM), Gradient Boosting Classifier (GradientBoosting), Adaptive Boosting Classifier (AdaBoosting), and Multi-Layer Perceptron Classifier (MLP) for predictive model construction. A 10-fold cross-validation was used to determine the final radiomics signature. Receiver operating characteristic (ROC) curves were plotted to assess the diagnostic performance of the predictive models, analyzing the area under the curve (AUC), diagnostic accuracy, sensitivity, and specificity.
Clinical model construction
Age, smoking history, drinking history, hypertension, diabetes, cardiovascular disease, traumatic brain injury, injury site, and treatment (including anterior approach surgery, posterior approach surgery, anterior & posterior approach surgery and conservative treatment) were selected as clinical factors and analyzed for differences between groups. These clinical factors were fed into the LR model for clinical signature building.
Combined model construction
To enhance the predictive performance, we constructed a combined model by integrating both radiomic features and clinical factors. We tested the combined model using the receiver operating characteristic (ROC) curves, as well as the AUC, sensitivity, specificity, and accuracy. By combining radiomics and clinical data, we aimed to improve the overall prognostic accuracy, providing a more comprehensive model for predicting neurological recovery in cSCI patients.
Statistical analysis
Statistical analysis was conducted using SPSS (version 26.0), with significance set at p < 0.05. Continuous variables in the clinical data were assessed using independent t-tests or Mann-Whitney U tests, while categorical variables were analyzed using Fisher’s exact test or chi-square tests (Table 2).
Results
A total of 168 patients were included in this study, randomly divided into a training set of 134 patients and an independent testing set of 34 patients at a ratio of 8:2. The patient flow through the study and the number of patients at each analysis stage are shown in Fig. 1. Patient characteristics for the cohort are detailed in Table 1. The definition of patient characteristics is shown in Supplementary Table 2.

Flow chart demonstrating the inclusion and exclusion criteria for the study participants with cSCI.
The study workflow is shown in Fig. 2. We extracted features by pyradiomics and ResNet152. Figure 3 shows a class activation mapping visualization example. Following feature extraction, the LASSO algorithm (Fig. 4A and B) was used to filter out handcrafted and deep learning features with non-zero coefficients, ultimately reducing the dimensionality for final 31 features (Fig. 4C). These final features were derived from a combination of 25 handcrafted radiomic features and 6 deep learning-based features(shown in Supplementary Table 3). Each feature’s source sequence and extraction method are detailed in the table. These final features were input into various machine learning classifiers. The SVM classifier achieved the highest AUC, with 1.000 in the training set and 0.915 in the testing set (Fig. 5; Table 3). So we choose the SVM as the classifier of the radiomics model.

The workflow of model building.

Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations of a patient example. (A) T1WI, ( B) T2WI. The red area shows where the model pays most attention.

Figures of LASSO regression. (A) Mean square error of 10-fold validation. (B) Lasso path plot of the best-performance model in the training set. (C) Spearman correlation coefficients between features were calculated, and 31 features with correlations were retained.

Receiver operating characteristic curves of different classifiers in the training set (A) and testing set (B). The SVM model got the highest AUC value.
Analysis in the retrospective cohort revealed that age, diabetes, and treatment were independent clinical risk factors (Table 2). Comparisons of smoking history, drinking history, hypertension, cardiovascular disease, traumatic brain injury, and injury site revealed no significant differences between different prognosis groups (p > 0.05). And significant differences in age, diabetes, and admission ASIA were noted between the groups (p < 0.05). A clinical signature, comprising age, hypertension, and treatment, was constructed to develop a clinical model (Table 3). LR was chosen for the clinical model due to its simplicity, interpretability, and suitability for analyzing linear relationships commonly observed in clinical features.
A combined model integrating radiomics and clinical features demonstrated excellent performance, with an AUC of 1.000 in the training set, 0.952 in the testing set and 0.815 in the validation set(Fig. 6A; Table 3). Diagnostic accuracy, sensitivity, and specificity for the three models are presented in Table 3. Calibration curves indicated that the combined model’s predicted prognosis closely matched actual outcomes in both datasets (Fig. 6B). Decision curve analysis (DCA) further highlighted the improvement in the combined model across both datasets (Fig. 6C), showing superior performance when the threshold probability ranged from 1 to 99%. We developed a nomogram to visualize the combined model (Fig. 7), allowing for the calculation of risk by summing the points for each variable along the corresponding axis.

Results of the three models: (A) Receiver operator characteristic curves of the 3 models for prediction in the training and testing datasets. (B) The calibration curve of the 3 models. (C) The decision curve analysis of the three models of the training and testing datasets. Left: Training dataset; Mid: Testing dataset; Right: Testing dataset.

The nomogram combing clinical and radiomics signatures for predicting the prognosis of the cSCI.
Discussion
This study presents a combined model of radiomics and clinical models that can effectively assess the daily living functions and quality of life of cSCI patients six months post-injury. It also highlights the potential of radiomics and other artificial intelligence technologies in developing personalized treatment plans for cSCI patients.
In this study, selecting an appropriate quantitative metric is crucial for accurately predicting the prognosis of cSCI patients. Quantitative metrics provide objective, reproducible evaluation standards that simplify complex clinical information, facilitate statistical analysis and model construction, and enhance the generalizability and verifiability of research results. We chose the International Association of Neurorestoratology (IANR) score as the metric because it covers multiple dimensions, including motor function, sensation, and activities of daily living, comprehensively reflecting the patients’ rehabilitation status7. The sensitivity and specificity of the IANR score ensure accurate capture of functional changes, improving the precision of prognosis prediction. Its widespread use in numerous studies has demonstrated its reliability and validity, making our research results more credible22,23. Additionally, the intuitive numerical value of the IANR score helps doctors and patients better understand prognostic information, promoting rehabilitation and treatment work. Therefore, the use of the IANR score not only enhances the scientific rigor and accuracy of the model but also provides reliable guidance for clinical practice.
In terms of imaging feature extraction, we employed two methods. First, we delineated VOI of the injury lesion and used traditional radiomics techniques for feature extraction, as intramedullary pathology is closely related to patient recovery. Accurate extraction of these features is vital for prognosis prediction. Second, we applied deep learning techniques to extract features from the injury site and adjacent anterior and posterior regions of the spinal canal. In the deep learning process, we used the ResNet network and transfer learning techniques. The ResNet network, with its depth and efficient feature extraction capabilities, enabled us to identify and analyze imaging features more accurately. The introduction of transfer learning further improved the model’s performance, making it better suited to the specific conditions of different patients. Additionally, we fully considered the patients’ basic clinical conditions and integrated imaging features with clinical data to develop a comprehensive predictive model. This model relies not only on imaging features but also incorporates clinical information, thereby improving the accuracy and reliability of prognosis prediction. Ultimately, by combining clinical and imaging data, we constructed a holistic predictive model that provides robust support for clinical practice.
Previous studies have been limited by using only clinical data and empirical imaging features24,25 or some non-routine inspection26,27, restricting their comprehensiveness. Moreover, some studies focused more on local functions, such as walking ability28 or the recovery of specific muscles29. Based on clinical needs, our study aims to provide clinicians with an overall prognosis expectation for patients before treatment. Our research used MRI T1-weighted and T2-weighted sequences, which have good generalizability and significant applicability in future studies. Our model was effective across different MRI scanners (vendors and field strengths), laying a solid foundation for future multicenter research.
However, this study has limitations. Traditional MRI struggles to reveal molecular information beneath the macroscopic level, such as axonal and myelin preservation, which may be closely related to post-injury recovery. Thus, it is challenging to provide an interpretable biological mechanism for the model. Additionally, other MRI techniques or sequences, such as diffusion-weighted imaging (DWI), which can achieve tractography30, could better judge the condition of spinal cord injury. However, challenges such as volume effects in spinal cord imaging, metal artifacts, and the impracticality of requiring patients to remain precisely still for long periods at high resolution necessitate further technological advancements for widespread application. We also acknowledge the need for more external independent validation sets to ensure the generalizability of our findings, which is the next goal of our team.
In summary, this study demonstrates the potential of a combined imaging and clinical model in predicting the prognosis of cSCI patients. This model can provide stratified prognostic assessments for cSCI patients, assist clinicians in patient consultation and guiding treatment and rehabilitation decisions, and potentially improve the design of future treatment plans.
Responses