Interpretable machine learning to evaluate relationships between DAO/DAOA (pLG72) protein data and features in clinical assessments, functional outcome, and cognitive function in schizophrenia patients

Introduction
Precision psychiatry is an emerging and integrative field within the realms of psychiatry and precision medicine1,2. Recent advances in artificial intelligence have opened new possibilities for precision psychiatry using machine learning techniques3,4,5,6,7,8. Illustrative investigations in precision psychiatry involve the prediction of prognoses, such as treatment outcomes9,10 and functional outcomes11,12, through the application of machine learning techniques.
The N-methyl-d-aspartate receptor (NMDAR) pathway has been a focal point of schizophrenia research. While d-serine is the main coagonist of NMDAR, several studies have reported decreased blood levels of d-serine in patients with schizophrenia, supporting the NMDAR hypothesis in the disorder13,14,15. In addition, dysregulated d-serine modulation was observed in patients with treatment-resistant schizophrenia16. Moreover, the d-amino acid oxidase (DAO, or DAAO) protein (responsible for degrading d-serine) and its putative activator, the d-amino acid oxidase activator (DAOA, also known as pLG72 or G72) protein, represent two integral components within this pathway17. Of note, pLG72 has been regarded as DAOA18 or DAO repressor19.
Though it remains unclear how the two proteins’ levels in the blood relate to those in the central nervous system, the utilization of DAO and DAOA blood levels in the development of machine learning models for inferring schizophrenia disease status has been documented7,8. A prior study8 employed machine learning algorithms to develop predictive models utilizing the DAOA protein and genetic variants to differentiate between 89 patients with schizophrenia and 60 healthy individuals. A subsequent study7 developed an ensemble boosting framework utilizing DAO and DAOA protein levels to differentiate 355 schizophrenia patients from 86 healthy individuals.
Schizophrenia symptoms generally fall into two categories: positive and negative. Additionally, the associated cognitive impairment typically affects patients’ outcomes and quality of life20,21,22,23. Malfunctional NMDAR-mediated neurotransmission contributes to clinical symptoms (principally negative symptoms) and cognitive impairment of schizophrenia24,25,26,27. Since DAO metabolizes D-serine and DAOA regulates DAO28,29,30, both proteins are involved in schizophrenia pathogenesis and its psychiatric and cognitive symptoms15,31,32,33,34,35. Consequently, along with d-amino acids like d-serine, both DAO and DAOA are regarded as promising therapeutic targets for schizophrenia27,36,37,38,39,40,41,42,43,44,45,46,47,48,49. Though DAOA modulators have not yet been investigated in clinical trials, DAO inhibitors, such as sodium benzoate and luvadaxistat, have been shown to improve the cognitive performance of patients with schizophrenia50,51,52. Therefore, the potential association of DAO and DAOA protein levels with various parameters in demographic variables, clinical assessments, functional outcomes, and cognitive function in schizophrenia deserves investigation. Hence, our hypothesis posits that machine learning models have the potential to optimize the prediction of relationships between various parameters and the two protein levels in patients with schizophrenia.
In a prior study applying structural equation modeling53, clinical symptoms acted as a mediator between cognitive functions and functional outcomes in 302 individuals with schizophrenia; in a subsequent study11, a bagging ensemble machine learning method54 was utilized to evaluate functional outcomes in them using clinical symptoms and neurocognitive tests. In another study utilizing the bagging ensemble machine learning method55, a single neurocognitive domain, specifically speed of processing, was the most effective in estimating overall cognitive function in the 302 patients. Moreover, a recent study56 employed an automated machine learning method to predict social cognition in a cohort of 380 patients with schizophrenia using six neurocognitive domains and nine neurocognitive tests. In the current study, we utilized the same cohort of 380 schizophrenia patients56 to examine the relationships between DAO/DAOA levels and 27 parameters through the application of interpretable machine learning (IML) methods57,58,59. The 27 parameters included three demographic variables, four clinical assessments, two functional outcomes, and eighteen cognitive function variables. To our knowledge, no prior studies have evaluated the IML methods for this task. A common drawback of machine learning models is their “black box” nature58. We chose IML methods due to their advantages in transparency, efficient feature selection capability, access to state-of-the-art machine learning techniques, and facilitation of decision-making for healthcare professionals60,61,62.
Materials and methods
This study received approval from the institutional review board of the China Medical University Hospital in Taiwan and was conducted in compliance with the Declaration of Helsinki.
Study population
The enrollment criteria details were previously published53. In the current cohort, 380 schizophrenia patients (227 men and 153 women) were recruited from the China Medical University Hospital and affiliated Taichung Chin-Ho Hospital in Taiwan. Their ages ranged from 18 to 65 years (mean = 38.3; SD = 10.7), and they were physically healthy. On average, they had received 12.0 years of education (SD = 2.8). We enrolled two groups of patients: (1) 75 acutely exacerbated patients who had been drug-free for at least 2 weeks; and (2) 305 chronic medicated patients who had been stabilized with antipsychotics for at least 2 months.
Following a comprehensive explanation of the study, written informed consent was obtained from the participants in accordance with institutional review board guidelines.
Clinical assessments
Details of clinical assessments were published before53. In summary, we utilized four clinical assessment scales to evaluate positive, negative and depressive symptoms53, namely the Positive and Negative Syndrome Scale (PANSS)—Positive subscale63, the PANSS—Negative subscale63, Scale for the Assessment of Negative Symptoms 20-item (SANS20)64, and the 17-item Hamilton Rating Scale for Depression (HAMD17)65.
Functional outcomes
Details of functional outcomes were published before53. In brief, we evaluated functional outcomes using the quality of life scale (QLS)66 and the global assessment of functioning (GAF) scale from the DSM-IV67. The QLS assesses various aspects of functional outcomes in individuals with schizophrenia, such as anhedonia, aimless inactivity, empathy capacity, curiosity, emotional interaction, motivation, purpose sense, social activity, social initiatives, and social withdrawal66. The GAF is a tool for assessing the overall psychological, social, and occupational functioning in schizophrenia patients67.
Cognitive function
Details of cognitive function parameters were published before53. In summary, we assessed overall cognitive function using seven cognitive domains and ten cognitive tests. Seven cognitive domains included speed of processing68,69, sustained attention by CPT70, working memory71,72, verbal learning and memory72, visual learning and memory (visual reproduction subtest of the Wechsler Memory Scale-III)72, reasoning and problem solving73, and social cognition21,22. Each cognitive domain score was standardized to a T score with a mean of 50 and a standard deviation of 10 to ensure comparability across domains53,74. For cognitive domains with more than one test, a composite T score was calculated by standardizing the average of T scores from the individual cognitive tests53.
Moreover, ten cognitive tests included category fluency75, trail making A68, WAIS-III digit symbol-coding69, d-Prime of clear version70, verbal working memory71, nonverbal working memory72, verbal learning and memory72, visual learning and memory72, reasoning and problem solving73, and social cognition21,22.
Laboratory assessments
Serum DAO levels were quantified using commercially available enzyme-linked immunosorbent assay (ELISA) kits following the manufacturer’s recommended protocol (Cloud-Clone Corp., Houston, TX, United States), as detailed in a previous publication76.
The detailed method for measuring DAOA levels has also been previously documented77. In summary, plasma DAOA protein expression levels were assessed using western blotting.
IML models
We employed IML methods, including linear regression, the least absolute shrinkage and selection operator (Lasso) models, and generalized additive models (GAMs)57,58,59. The experiments were conducted on a Linux computer with an Intel (R) Xeon(R) CPU E5-2620, 120 GB RAM, and Ubuntu Server7.
First, we evaluated the linear relationship between 24 individual parameters (including four clinical assessments, two functional outcomes, and eighteen cognitive parameters) and DAO/DAOA protein levels in schizophrenia using a linear regression model, with age, gender, and education included as covariates78. The R software was utilized for the linear regression analysis. A significance criterion of P < 0.05 was applied to all tests.
Second, we employed the Lasso model79 to perform feature selection, identifying the top parameters from the aforementioned 27 parameters. Additionally, we estimated the importance and weights of these features. In summary, the Lasso model is built upon the scikit-learn software package (https://scikit-learn.org; accessed on May 12, 2024). Essentially, we utilized the LassoCV module from the scikit-learn software package80 for implementing the Lasso model. The LassoCV module automatically generates optimal alpha values using fivefold cross-validation.
Furthermore, we evaluated the non-linear relationship between the aforementioned 24 parameters and DAO/DAOA protein levels in schizophrenia using GAMs81, incorporating age, gender, and education as covariates. The R software was utilized for the analysis of GAMs, with a significance criterion of P < 0.05 applied to all tests. In short, we employed the gam package in R for the implementation of GAMs.
Results
The study comprised 380 Taiwanese schizophrenia patients, with mean and standard deviation values for overall cognitive function reported as 343.56 and 65.34, respectively. Detailed demographic characteristics and measures related to overall cognitive function in schizophrenia patients were provided before53.
There were no significant differences between these two groups in DAO levels (mean 42.3 ± 15.3 [SD] ng/mL in the acutely ill patients, and 43.5 ± 12.3 ng/mL in the chronic patients) and in DAOA levels (3.2 ± 1.2 ng/μL in the acutely ill patients, and 3.4 ± 2.1 ng/μL in the chronic patients).
Linear association using linear regression
Initially, we examined the linear association between 24 individual parameters and DAO/DAOA levels through a linear regression model, incorporating age, gender, and education as covariates. Table 1 presents the correlations between clinical assessments and DAO levels in chronic medicated patients (n = 305); HAMD17 exhibited a correlation with DAO (P = 0.0014). However, no significant associations were observed between DAO and clinical assessments in acutely ill patients (Supplementary Table S1).
Additionally, PANSS-negative and SANS20 demonstrated associations with DAOA levels in acutely ill patients (Table 2; P = 0.016 and 0.0409, respectively). On the other hand, no significant correlations were found between clinical assessments and DAOA in chronic patients (Supplementary Table S2).
Table 3 shows linear correlations between DAOA levels and both functional outcomes (QLS and GAF scores) in acutely ill patients (P = 0.0156 and 0.0203, respectively). However, no significant associations were observed between DAO and functional outcomes in acutely ill patients (Supplementary Table S3), between DAO and functional outcomes in chronic patients (Supplementary Table S4), as well as between DAOA and functional outcomes in chronic patients (Supplementary Table S5).
In addition, DAOA levels exhibited a linear relationship with cognitive function in category fluency in acutely ill patients (Supplementary Table S8, P = 0.0104). Nevertheless, no significant relationships were found between DAO and cognitive function in acutely ill patients (Supplementary Table S6), between DAO and cognitive function in chronic patients (Supplementary Table S7), and between DAOA and cognitive function in chronic patients (Supplementary Table S9).
Feature selection using Lasso
Table 4 presents a summary of the Lasso models used to assess the significance of selected features concerning DAO levels among chronic patients; four key features (HAMD17, age, working memory, and overall cognitive function [OCF]) were recognized with HAMD17 being the most crucial one.
Table 5 provides an overview of the Lasso models employed to evaluate the importance of selected features related to DAOA levels in acutely ill patients, identifying four features, including OCF, SANS20, QLS, and category fluency.
Non-linear association using GAMs
GAMs unveiled a nonlinear correlation between category fluency and DAO levels in chronic patients (Fig. 1; P = 0.0273) and identified a nonlinear association between QLS and DAOA in acutely ill patients (Fig. 2; P = 0.0311).

Generalized additive models were utilized to examine non-linear relationships between category fluency (raw score) and DAO protein data in the NMDAR pathway in chronic patients (n = 305) with schizophrenia. DAO d-amino acid oxidase, NMDAR N-methyl-d-aspartate receptor. s() is a spline function.

Generalized additive models were utilized to examine non-linear relationships between quality of life scale (QLS) and DAOA protein data in the NMDAR pathway in acutely ill patients (n = 75) with schizophrenia. DAOA d-amino acid oxidase activator, NMDAR N-methyl-d-aspartate receptor. s() is a spline function.
Discussion
To the best of our knowledge, this is the first study to date to construct an IML framework for analyzing the relationships between clinical parameters and DAO/DAOA levels in schizophrenia patients. In essence, we optimized the linear relationships between clinical features and DAO/DAOA levels through linear regression, and captured the non-linear relationships using GAMs. Additionally, Lasso was employed to identify the essential clinical features associated with DAO/DAOA levels. While endeavors have been ongoing to improve the interpretability and justifications of machine learning models82,83, the IML framework utilized in this study serves as a proof of concept for predicting the clinical/laboratory associations in individuals with schizophrenia. Further research is necessary to further enhance model interpretability, thereby ensuring the successful integration of these frameworks in clinical settings.
In the current study, for reducing the risk of false positive findings obtained from multiple comparisons in linear regression, any positive finding from linear regression had to be confirmed by at least one of two other machine learning algorithms (Lasso and GAMs). On the other hand, for the newly observed associations in Lasso or GAMs, only those associations revealed in both Lasso and GAMs were regarded as significant.
Among the chronic medicated patients, the only noteworthy discovery was the correlation observed between HAMD17 scores and DAO levels, identified through linear regression (Table 1), with HAMD17 emerging as the most prominent feature associated with DAO levels using Lasso (Table 4). The significant correlation observed between HAMD17 and DAO highlights the potential involvement of NMDAR in depression; subsequently, numerous treatments, including ketamine, esketamine, arketamine, d-serine, sarcosine, and sodium benzoate, that target NMDAR have been studied or used in clinical settings84,85,86,87,88. In addition, three NMDAR enhancers (d-serine, sarcosine, and sodium benzoate) have been shown to improve depressive symptoms in patients with schizophrenia 36,37,50. Intriguingly, NMDAR antagonists, such as ketamine, and NMDAR enhancers, such as d-serine and sarcosine, exerted antidepressant-like effects via a common mechanism (activation of the AMPA receptor–mTOR signaling pathway) in rodent studies89,90.
Among the acutely exacerbated patients, there were three repeatable findings: that is, the associations of DAOA levels with SANS20 (shown in linear regression [Table 2] and Lasso [Table 5]), with QLS (in Lasso [Table 5]) and GAMs [Fig. 2]), and with category fluency (in linear regression [Table S8] and Lasso [Table 5]). As aforementioned, dysfunctional NMDAR-related neurotransmission is involved in clinical symptoms (especially negative symptoms) and cognitive deficits of schizophrenia24,25,26,27. Among cognitive tasks, category fluency (representing verbal fluency) is a sensitive indicator of cortical dysfunction, notably in schizophrenia91,92,93, and DAOA plays an important role in verbal fluency and cortical function28,94,95.
Interestingly, DAOA levels appeared to be associated with clinical measures in acutely ill patients, but DAO levels in chronically stable patients. Mounting data indicates that high levels of DAOA trigger oxidative stress, mitochondrial fragmentation, and DAO activation, potentially contributing to the development of schizophrenia17,96,97,98. Accordingly, the current study’s findings that DAOA levels were linked to clinical measures in acutely ill patients seemed explicable. On the other hand, the current study and a previous one99 suggest that DAO may contribute to depressive symptoms in schizophrenia patients after antipsychotic treatment. In consistence, DAO has been found to play a role in depression, and DAO inhibitors exert antidepressant activity100,101. Further, a DAO inhibitor improved depressive symptoms in medicated chronic schizophrenia patients in a clinical trial50. Therefore, the current study’s finding on DAO seemed plausible too, because current evidence indicates distinct neurological mechanisms of depressed and negative symptoms in people with schizophrenia102,103 and depression in schizophrenia may persist even after treatment103,104,105. Longitudinal research is necessary in the future to further elucidate the functions of DAO and DAOA in various stages of schizophrenia.
An advantage of Lasso is its ability to automate the feature selection process by considering all features simultaneously58,106. That is, Lasso performs automatic feature selection by shrinking the coefficients of less important variables to zero, effectively selecting the most relevant predictors and reducing model complexity79. Additionally, the Lasso method facilitates the retention of a concise set of features, promoting model simplicity and interpretability while effectively managing the complexity of the predictive model58,106. In other words, Lasso encourages sparsity in the model by producing sparse solutions with fewer non-zero coefficients, leading to a more parsimonious and interpretable model107.
One advantage of GAMs is that GAMs can capture non-linear relationships between variables81, allowing for more flexible modeling compared to Lasso and linear regression. That is, GAMs use smoothing functions (for example, spline functions) to model complex relationships, allowing for the detection of patterns that may be missed by linear models like Lasso and linear regression58,81. In addition, GAMs provide interpretable results by allowing for the visualization of the relationship between each predictor and the response variable, aiding in the understanding of complex interactions (for example, Figs. 1 and 2). However, GAMs do not perform feature selection like Lasso; instead, they focus on capturing the relationship between predictors and the response variable without excluding variables based on their coefficients81.
Continued investigations into IML are poised to elucidate the technological advancements and their implications within the medical sector62,108. With the swift evolution and application of machine learning in healthcare, it is crucial to assess the efficacy of IML frameworks in predicting medication responses61. These studies are pivotal in transforming healthcare by enhancing the precision and efficiency of medication response predictions through IML techniques109. IML frameworks, which utilize sophisticated algorithms and computational methods, have proven indispensable in refining interpretable models that incorporate a variety of data sources, including demographic, imaging, genetic, clinical, and electronic health records110,111,112,113. These IML frameworks facilitate the development of comprehensive interpretable models that capture complex relationships affecting medication responses and support the creation of personalized prediction models that consider individual patient characteristics and histories60,114. This tailored approach enables clinicians to make informed decisions about medication selection and dosage adjustments, significantly enhancing patient care and treatment outcomes115. In summary, IML frameworks represent a significant advancement in predicting medication responses, with their capacity to integrate diverse data, manage high-dimensional information, and generate personalized predictions, underscoring their transformative potential in medicine. This research not only enriches the existing body of knowledge but also provides pivotal insights for ongoing and future developments in this dynamic field.
This study had several limitations. First, this study did not measure blood D-serine levels in schizophrenia patients. Second, the blood-brain relationship of DAO and DAOA remains uncertain in the participants. Third, the findings of this study were obtained from a cross-sectional design. Future longitudinal studies are warranted. Fourth, the sample size, especially in the acutely ill group, was modest. Fifth, only Taiwanese patients were recruited in this study. Whether the findings can be extrapolated to other populations deserves investigation. Sixth, applying the Bonferroni correction for the multiple comparisons in the study, only the correlation between HAMD17 scores and DAO levels in the 305 chronic medicated patients (P = 0.0014, shown in Table 1) remained significant. Since there were 16 comparisons (4 clinical assessments × 2 proteins × 2 clinical phases), the corrected statistical significance should be P < 0.003125 (=0.05/16). On the other hand, among the 75 acutely exacerbated patients, the three repeatable findings (the associations of DAOA levels with SANS20, with QLS, and with category fluency) became statistically insignificant after the Bonferroni correction. A possible reason may have been the very small sample size (N = 75). Consequently, future larger-sized studies are needed. Seventh, a model created with the current sample must be tested and held in an independent sample. Future studies are therefore also needed. In addition, to exclude some of the sample from the original algorithm for model development, and then to test the model on that excluded sample is an alternative approach. However, due to the current limited sample size, the statistical power would be inadequate, as shown by the aforementioned marginal significance even in the only positive association between HAMD17 and DAO after the Bonferroni correction in the sample of the 305 chronic medicated patients. Eighth, another strategy is to apply illness phase as a co-variate rather than to analyze patients in different phases separately. However, the current two groups of patients differed in not only illness phase (chronic vs. acutely exacerbated) but also their medicated status (medicated vs. unmedicated). We hence determined to analyze them separately. In the future, enrolling four groups of patients (chronic unmedicated, chronic medicated, acute unmedicated, and acute medicated) would be a better strategy to elucidate this issue.
Conclusions
In conclusion, we developed an IML framework to forecast the linear and non-linear connections between clinical variables and DAO/DAOA protein levels in Taiwanese individuals with schizophrenia, using linear regression, Lasso, and GAMs. Our analysis suggests that this IML framework could serve as a viable approach for interpretability in forecasting associations and essential features in schizophrenia patients. Furthermore, the findings of this study may have future implications for precision psychiatry, particularly in predicting clinical variables and functional outcomes in psychiatric individuals. Additionally, these findings could potentially be extended to support molecular prognostic and diagnostic investigations in the future. Given the experimental and pilot nature of this study, it is crucial to pursue independent studies with replication datasets and conduct further analysis to validate the functionality of the IML framework developed in this study.
Responses