Regulatory T cells-related gene in primary sclerosing cholangitis: evidence from Mendelian randomization and transcriptome data

Introduction

Primary sclerosing cholangitis (PSC) is a chronic, progressive liver disease due to a combination of immunological, inflammatory, and genetic causes, ultimately resulting in liver failure [1]. There is a global variance in the incidence and prevalence of PSC exhibit global variation, with rates ranging from 0.07 (Spain) to 1.3 (Norway) cases per 100,000 people per year for incidence, and from 0.2 (Spain) to 13.6 (US) cases per 100,000 persons for prevalence [2]. Roughly 70–80% of PSC patients also suffer from inflammatory bowel disease, which is a risk factor for cholangiocarcinoma and colorectal cancer [3]. The clinical manifestations and course of PSC are varied. After excluding other causes, the diagnosis of PSC relies mainly on bile duct imaging and liver histopathology. While PSC has a more indolent course in some, the diagnosis carries significant long-term health implications, with a median transplant-free survival of 13.2 years [4]. However, because of advancements in imaging technology, less invasive diagnostic methods utilizing magnetic resonance (MR) have replaced invasive procedures like endoscopic retrograde cholangiopancreatography and liver biopsies for the safe and accurate identification of PSC [5]. Further optimization of the diagnostic method may be achieved by the identification of cost-effective PSC-specific biological markers in the future. Additionally, there are currently no effective drugs to treat PSC, and liver transplantation is the only effective treatment. Hence, diagnosing PSC, determining individualized risk, discovering treatment targets, and improving our knowledge of its pathophysiology require the urgent identification of new biomarkers.

There are several theories about the etiology of PSC, but the most popular one is that it is an immune-mediated illness that is initiated by an environmental stimulus, which ultimately results in hepatocyte damage and inflammation in members of the population who are genetically susceptible to the condition [6]. A significant correlation was observed between immune cell infiltration and the progression of fibrosis and damage to cholangiocytes [7, 8]. Several different types of immune cells, such as neutrophils and macrophages, have been found close to the bile ducts of individuals who have PSC [9]. One of the most distinguishing features of PSC is the presence of T cell infiltration; nevertheless, the composition and roles of these infiltrating T cells are observed to differ [9, 10]. For instance, Schoknecht et al. found that CD4 + T cells from PSC patients exhibited decreased apoptosis sensitivity but not CD8 + T cells in peripheral blood [11]. In their study, Lampinen et al. found that patients with PSC exhibited a greater proportion of CD8 + T cells that were positive for CXCR3, but they had fewer CD25-positive CD4 + T cells [12]. There is a certainty that these few observational studies were predisposed to reverse causation and confounding bias. It is important to ascertain if such links indicate spurious correlations or causal relations as a result of bias. Randomized controlled trials (RCTs) are important in determining if targeting specific immune cells can lead to the discovery of novel avenues for treatment. There is an unmet need to further understand the immune cellular composition that affected PSC and how it underlies disease pathogenesis. In light of the lack of RCTs, Mendelian randomization (MR), conquers the bias because of two factors (reverse causation and confounding). Recently, MR has been a markedly acknowledged technique to assess more robust causal inferences between exposure and clinical outcomes by utilizing genetic variants like instrumental variables (IVs) [13].

Additionally, research on PSC is limited because of the following reasons: the cholangiocyte is challenging to obtain; there are relatively few of these cells in the liver; in vitro culture techniques are unstable; and samples are taken from patients with advanced illness. The availability of high-throughput RNA sequencing datasets provides an unprecedented opportunity to discover novel biomarkers. With the development of bioinformatics, massive machine-learning approaches have been developed, and become a routine tool for selecting feature variables and constructing a predictive model. At present, LASSO-Cox is the mainstream algorithm used for generating massive predictive signatures [14, 15]. The uniqueness and inappropriateness of certain modeling methodologies have resulted in some models with significant inadequacies, which restricts their use in clinical contexts. The discovery of novel biomarkers for the application of transcriptome data combined with advanced machine-learning (ML) algorithms for PSC is limited. An integrated approach based on a combination of various ML algorithms that can fit a consensus model for diagnosing PSC has not yet been exploited.

Here, firstly, we executed MR analyses to establish the causal relationships between the 731 immune cells and PSC, from the extensive genome-wide association studies (GWAS) to date. Replicating the findings in another PSC GWAS and then pooled with meta-analysis. Through a search of the Gene Expression Omnibus (GEO) database, the expression profiles of two PSC mRNAs (GSE119600, and GSE159676) were downloaded. IOBR (CIBERSORT, TIMER, xCell, MCPcounter, ESTIMATE, EPIC, Quantiseq) and ssGSEA were used to quantify immune cell infiltration levels. Weighted gene co-expression network analysis (WGCNA) was applied to identify significant Treg modules in two GEO cohorts. Then, we developed a novel machine learning framework that incorporated 12 machine learning algorithms and their 107 combinations to construct a consensus Tregs classifier and validate it in another cohort. The hub Tregs-related genes’ expression in the PSC mice model was confirmed utilizing qRT-PCR. Subsequently, the crucial immune cell infiltration and molecular pathways implicated in the initiation of PSC and their association with hub Tregs-related genes were explored. Lastly, the SHapley Additive exPlanations (SHAP) and XGBoost algorithm were used to identify optimal Tregs-related genes, and the Mantel test was executed to investigate the link between pivotal molecular pathways. We hypothesize that the developed Tregs classifier may effectively diagnose PSC. Moreover, the discovery of the hub Tregs-related gene, pivotal molecular pathways, and immune cells will provide insights into the pathogenesis mechanisms of PSC, uncovering druggable targets for its treatment.

Materials and methods

GWAS Data sources for PSC and immune cell traits

Three large-scale GWAS data on PSC (GWAS ID: ieu-a-1112, finn-b-K11_CHOLANGI_STRICT, finn-b-K11_CHOLANGI) were obtained from the IEU OpenGWAS project (https://gwas.mrcieu.ac.uk/). Two PSC GWAS datasets (GWAS ID: ieu-a-1112, and finn-b-K11_CHOLANGI_STRICT) were set as discovery cohorts, another PSC GWAS dataset (GWAS ID: finn-b-K11_CHOLANGI) as a validation cohort. Table 1 highlights basic data extracted from PSC GWAS datasets.

Table 1 Genome-wide association studies summary statistics datasets used for genetic analyses.
Full size table

GWAS summary statistics of immunological traits are available publicly from the GWAS Catalog (accession numbers GCST90001391 to GCST90002121) [16]. This dataset includes 731 distinct immune phenotypes, such as morphological parameters (MP) (n = 32), relative cell (RC) counts (n = 192), absolute cell (AC) counts (n = 118), and median fluorescence intensities (MFI) expressing surface antigen levels (n = 389) were incorporated. In particular, the RC, AC, and MFI features have B cells, myeloid cells, dendritic cells (DCs), monocytes, mature stages of T cells, TBNK (T cells, B cells, natural killer cells), and Treg panels, whereas MP have TBNK and DC panels. Table 1 highlights the basic data collected via GWAS on immune cells.

The STROBE-MR (Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomization) checklist was completed for this observational study (Supplementary material 1).

Selection of IVs and data harmonization

In MR, the IVs indicate genetic variations that are highly related to the exposure and are not confounded by other factors that impact the outcome, which may adequately suppress the effect of confounders. For an MR study to be valid, the IVs ought to fulfill three essential criteria: (1) SNPs markedly (P < 5 × 10−8 or P < 1 × 10−5) linked to exposures are utilized as IVs, to eliminate the impact of weak IV bias, we incorporated SNPs whose F-statistic was >10; (2) Independence Assumption: Significant confounding factors, such as those related to the exposure and the corresponding outcome, exhibit no link to SNPs (IVs); (3) Exclusivity Assumption: SNPs (IVs) influence outcome directly susceptibility via exposure and are not otherwise linked to outcome.

SNPs linked to immunological traits were determined at the P < 1 × 10−5, as used in previous MR studies (given that only a small number of SNPs attained a degree of genome-wide significance (P < 5 × 10−8)) [17]. Further, the examination of the linkage disequilibrium (LD) among SNPs relied on 1000 Genomes Project-obtained European ancestry reference information. We picked SNPs utilizing an LD coefficient (r2 < 0.001) and situated over 10 Mb, to identify SNPs with independent genetic effects.

Additionally, in cases where there were no common SNPs between the outcome and exposure, proxies SNPs in LD (r2 ≥ 0.8) were added [18]. To eliminate the impact of weak IV bias, we incorporated SNPs whose F-statistic was <10 (a measurement of these IVs’ strength).

Primary MR analysis

MR analysis with package TwoSampleMR (version 0.5.6) was applied to ascertain the causal relationships of immunological traits with PSC in discovery and validation cohorts. Regarding characteristics influenced by two or more SNPs, the random-effects inverse variance weighted (IVW) models, which have stable, as well as, balanced pleiotropic impacts, were utilized as the primary method to determine the causal effects. Additionally, the effect estimates for traits controlled by a single SNP were derived utilizing the Wald ratio. The MR estimates are presented as odds ratio (OR) with a 95% CI for the dichotomous data or beta value with standard error (SE) for the continuous variables. To improve the reliability and strength of the causal relationship, we take the intersection of the results of two discovery cohorts.

MR sensitivity and heterogeneity analysis

To ascertain if MR impact estimates are resilient to possible invalid genetic variants, we executed MR-Egger regression, weighted median (WM), simple mode, and weight mode as sensitivity analyses. In comparison to the IVW approach, which presumes that all the SNPs are valid IVs when the Instrument Strength must be Independent of the Direct Efect (InSIDE) assumption holds, the MR-Egger regression test could generate a consistent estimate even if all the genetic instruments are invalid. The WM model stands out as a robust method, capable of availing consistent estimate results when over half of the genetic instruments are deemed valid. For any possible heterogeneity, we applied Cochran Q statistic derived from the MR-Egger regression and IVW approaches. They were visualized using funnel plots and MR-Egger intercept was utilized to examine horizontal pleiotropy; a threshold of P < 0.05 for both was utilized. In addition, MR Steiger filtering was utilized to ascertain the direction of causation between exposure and outcome. Finally, we performed a leave-one-out (LOO) analysis to re-calibrate the overall effect size and explored whether the association can be affected by a single SNP, by eliminating one exposure-linked SNP at a time.

Meta-analysis pooled the results of MR

To further improve the reliability and strength of the causal relationship, we performed a meta-analysis to pool the significant results of MR in discovery and validation cohorts. OR with a 95% CI was reported to estimate the association between immunological traits and PSC risk. Heterogeneity was assessed through the utilization of three methods: visually inspecting the forest plots, assessing the diversity (D2) estimates, and computing the inconsistency statistics (I2). We employed a fixed effects model when I2 was equal to zero, and a combination of fixed and random effects models when I2 was greater than zero. We reported the most conservative estimate being the point estimate closest to no effect or the estimate with the widest CI.

Transcriptome dataset collection, sample selection, and preprocessing

The GEO (http://www.ncbi.nlm.nih.gov/geo) database was systematically searched to find PSC-relevant transcriptome datasets from February 2001 through February 2024. The transcriptomic profiling datasets needed to fulfill the following criteria: (a) organism: Homo sapiens; (b) expression profiling through high-throughput sequencing or array. Lastly, two GEO datasets (GSE119600: 47 healthy controls and 45 PSC samples; and GSE159676: 6 healthy tissues and 12 PSC samples) were included for quantitative and qualitative analyses. For GEO datasets, a robust multi-array average analysis was implemented, including quantile normalization, background correction, and summarization.

Evaluation of immune cell infiltration

We used eight different algorithms to infer PSC patients’ immune cell infiltration in two GEO cohorts. These algorithms including CIBERSORT, TIMER, xCell, MCPcounter, ESTIMATE, EPIC, and Quantiseq were implemented using the R package ‘IOBR’ [19]; while ssGSEA was performed to quantify the infiltrating immune cells [20]. Additionally, the relative abundance of Tregs and B cells was compared between healthy/control and PSC groups.

WGCNA

By constructing a weighted gene co-expression network in two GEO cohorts utilizing the R package “WGCNA” (v 1.71), it was feasible to identify functional gene modules that correlated strongly with the proportion of Tregs and were thus appropriate for additional screening [21]. Before establishing a scale-free topology, the soft thresholding power (β = 1–20) was ascertained. Following the generation of the weighted adjacency matrix, it was transformed into a topological overlap matrix (TOM). Furthermore, the dynamic tree-cutting method was implemented to examine different modules clustered according to gene similarity, after obtaining the dissTOM for hierarchical clustering. Subsequently, we related the identified modules to two traits (B cells and Tregs proportion). For further analysis, genes from the module that exhibited a significant correlation with Tregs were selected.

Identification of Tregs-related genes and functional enrichment analysis

Standardized data including GSE119600 and GSE159676 datasets between healthy and PSC samples, were subjected to a variance analysis utilizing the NetworkAnalyst online Gene Expression Table (https://www.networkanalyst.ca/). Then, we take the overlap of differentially expressed genes (DEGs) and significant Tregs module genes in two PSC cohorts, defined as Tregs-related genes, and visualized with the UpSetR package. GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analyses were conducted on the Treg-related genes utilizing the clusterProfiler R package.

Identification and construction of Tregs classifier through multiple machine learning methods

To find robust hub Tregs-related genes, ML techniques with 5-fold cross-validation based on disease status were applied to two PSC cohorts. Among these methods were support vector machine (SVM) for feature variable selection and modified LASSO penalized regression. The hub Tregs-related genes were derived from the overlapping of the results obtained from SVM, and LASSO.

Additionally, the following procedure was carried out to develop a consensus diagnostic model for PSC: (1) We began by combining 12 classical algorithms: Stepglm, gradient boosting machine (GBM), Linear Discriminant Analysis (LDA), eXtreme Gradient Boosting (XGBoost), NaiveBayes and SVM, random forest (RF), glmBoost, LASSO, Partial Least Squares Regression for Generalized Linear Models (plsRglm), ridge regression, elastic network (Enet). Of these, LASSO, RF, Stepglm, and glmBoost have feature selection capabilities. In addition, 107 algorithm combinations were constructed as prediction models utilizing the leave-one-out cross-validation (LOOCV) framework (2). Next, for the training set, we utilized the GSE119600 in PSC and employed these 107 combinations to generate classifiers independently using hub Tregs-related genes. (3) Lastly, in the testing cohorts (GSE159676), the Tregs score was calculated for each cohort by employing the model that was acquired in the training cohort. The best consensus diagnostic model for PSC was ultimately determined by averaging the area under the curve (AUC) of the two cohorts.

Diagnostic value, and clinical usefulness of Tregs classifier

Receiver operating characteristic (ROC) analysis was executed to examine the diagnostic capability of the Tregs classifier in two PSC datasets. Principal component analysis (PCA) [22], utilizing hub Tregs-related gene expression levels, was executed on two PSC datasets. Ultimately, the clinical applicability of the Tregs classifier was determined via decision curve analysis (DCA).

PSC animals model

Hunan Slack Jingda Laboratory Animal Co. Ltd. supplied the BALB/c mice that were 6–8 weeks old. BALB/c mice were randomly assigned to one of the three groups: the control group (n = 6), the 3,5-diethoxycarbonyl-1,4-dihydrocollidine (DDC)-induced PSC group (n = 6). All animal experiments received approval from the Ethical Committee on Animal Experimentation of the Central Hospital of Hengyang City (approval number: 2023-10-40). Animal experiments were carried out under specific pathogen-free conditions. We chose the DDC diet to develop a PSC mice model, considering that it well mimics the human PSC, and reproduces the gradual progression of the disease [23]. During the experiments, the mice were anesthetized with 2% isofluorane in 95% oxygen. Upon completion of each study, the mice were anesthetized by with 2% isofluorane in 95% oxygen and died of spinal dislocation. All the methods were in accordance with relevant guidelines and regulations.

Tissue samples and quantitative real-time polymerase chain reaction (qRT-PCR)

Liver tissue samples were acquired from PSC and control groups, and preserved at −80 °C for subsequent in vitro experiments. Total RNA was isolated as per the provided guidelines using Trizol (Invitrogen). Reverse transcription of RNA was executed utilizing the RevertAid RT Reverse Transcription Kit (Thermo Scientific). Quantitative PCR was conducted utilizing PowerUp™ SYBR™ Green Master Mix (Thermo Scientific), with Gapdh serving as the internal standard. Quantitative reverse transcription-PCR was executed utilizing the ABI 7500 real-time PCR system (Applied Biosystems, Foster City, CA, USA). Subsequently, fold change in gene expression was examined using the 2-(DeltaDelta)Ct method. Gene-specific PCR primers are provided in Supplementary Material 2.

Expression levels, and correlation pattern of hub Tregs-related genes

The hub Tregs-related gene mRNA expression levels in PSC and control groups were compared and verified in two GEO datasets and the PSC mice model. Additionally, hub Tregs-related genes in two GEO datasets were subjected to correlation analysis.

Analyses of immune cell infiltration in PSC and its relationship with hub Tregs-related genes

Based on the results of the previous eight different algorithms which calculate the level of immune cell infiltration, we compared the scores of immune, stromal, estimate, and microenvironment in two PSC cohorts. In addition, Spearman correlation analyses were conducted to examine the association of hub Tregs-related genes with immune cells.

Discovery of potential mechanisms in the development and progression of PSC and its relationship with hub Tregs-related genes

ssGSEA analysis was performed on several representative gene sets (immune/inflammatory/cell death-related pathways) [20] in two PSC cohorts. Then, the gene set variation analysis (GSVA) [24] was carried out to discern the signaling pathways according to the KEGG gene sets (c2.cp.kegg.v7.4.symbols.gmt), GO gene sets (c5.all.v2023.2.Hs.symbols.gmt), and HALLMARK gene sets (h.all.v2023.2.Hs.symbols.gmt) obtained from the Molecular Signatures Database. In the analysis of signaling pathways, the differential enrichment score between the control and PSC groups was investigated. Following this, correlation analysis was executed to further elucidate the link between the Tregs-related genes and common key biological pathways.

XGBoost and SHAP

Recently, the XGBoost algorithm has gained popularity owing to its excellent performance in other machine learning (such as k-nearest neighbor algorithms, logistic regression, Lasso, RF, and SVM) competitions [25]. We applied the XGBoost algorithm to construct a diagnostic model for PSC based on Tregs-related genes. Subsequently, the impact of each feature was interpreted using SHAP values; this provided a clear picture in the form of summary and bar plots that illustrated the significance and ordered the features in order of importance. Clear insights into parameters determining the incidence of PSCs were provided by this method, which facilitated the integration of statistical analysis with advanced ML. Finally, the Mantel test was applied to investigate the correlation between optimal Tregs-related genes and pivotal biological pathways.

Statistical analysis

Statistical analysis was executed utilizing RevMan v5.3 (The Cochrane Collaboration, Copenhagen, Denmark), GraphPad Prism 9, SPSS 22, and R software (R v4.3.1). Continuous variables were compared utilizing Student’s t tests, Mann–Whitney, Wilcoxon, or Kruskal–Wallis tests with two-tailed analyses based on the distribution of the variables. The statistical significance criterion was set at P < 0.05 unless specific P values were provided.

Results

Genetic instruments for immune cells

By applying the indicated significance criterion (P < 1 × 10−5), harmonization, LD clumping, and F-statistics >10, genetic IVs for 731 immune cells were identified. Consequently, our study certainly exhibited negligible instrument bias.

Causal effects of immunological traits on PSC

When examining the causal effects of immunological traits on PSC, we screened 8604 and 19,792 SNPs as IVs from 731 immune cells for PSC ieu-a-1112 and finn-b-K11_CHOLANGI_STRICT, respectively (Supplementary material 3 and 4). A total of 40 immunophenotypes (ieu-a-1112) (Supplementary material 5 and Fig. S1) and 31 immunophenotypes (finn-b-K11_CHOLANGI_STRICT) (Supplementary material 6 and Fig. S2) yielded causal effects on PSC using the IVW method (P < 0.05) (Table 2 and Fig. S1). Sensitivity analysis showed that 36 immunophenotypes (ieu-a-1112) (Supplementary material 5 and Fig. S3) and 28 immunophenotypes (finn-b-K11_CHOLANGI_STRICT) (Supplementary material 6 and Fig. S4) using the IVW method had the same direction as the findings of MR-Egger regression, WM, simple model, and weight model. To further improve the reliability and strength of the causal interrelationship, we take the intersection of two PSC GWAS (ieu-a-1112 and finn-b-K11_CHOLANGI_STRICT) and consider the direction of action of the immunological traits. As a result, three immunophenotypes (CD39+ resting Treg % CD4 Treg cell, CD3 on CD39+ secreting Treg cell, BAFF-R on IgD- CD38dim B cell) were causally associated with PSC (Fig. 1). In the PSC ieu-a-1112 cohort, Host-genetic-driven increase in CD39+ resting Treg % CD4 Treg cell (OR = 1.05, 95% CI = 1.00–1.11, PIVW = 0.049), BAFF-R on IgD- CD38dim B cell (OR = 1.31, 95% CI = 1.01–1.72, PIVW = 0.039) significantly increases PSC risk, while genetically predicted higher relative abundances of CD3 on CD39+ secreting Treg cell (OR = 0.92, 95% CI = 0.85–0.99, PIVW = 0.041) exhibited significant protective effects on PSC (Table 2 and Fig. 2A). Similar results were also observed in the PSC finn-b-K11_CHOLANGI_STRICT cohort (Table 2 and Fig. 2B).

Table 2 Mendelian randomization estimating the effect of immune cells exposure on the risk of primary sclerosing cholangitis in two GWAS datasets.
Full size table
Fig. 1: Venn diagram displays the shared immune cells subtypes in two primary sclerosing cholangitis (PSC) GWAS datasets (GWAS ID: ieu-a-1112, and finn-b-K11_CHOLANGI_STRICT).
figure 1

+ represent positive correlation, – represent negative correlation, text with color shading represent the result direction of inverse variance weighted (IVW) method exists inconsistent with those of sensitivity analysis (MR-Egger regression, weighted median, simple model and weight model).

Full size image
Fig. 2: MR and meta-analysis evaluated the effects of 3 immunophenotypes (CD39+ resting Treg % CD4 Treg cell, CD3 on CD39+ secreting Treg cell, BAFF-R on IgD- CD38dim B cell) on PSC.
figure 2

A Forrest plot showing causal effect of genetically predicted immunological traits on PSC (GWAS ID: ieu-a-1112). B Forrest plot showing causal effect of genetically predicted immunological traits on PSC (GWAS ID: finn-b-K11_CHOLANGI_STRICT). *p < 0.05; **p < 0.01; ***p < 0.001. C Pooled odds ratio (OR) of BAFF-R on IgD- CD38dim B cell and PSC risk in three PSC GWAS datasets. D Pooled odds ratio (OR) of CD3 on CD39+ secreting Treg cell and PSC risk in three PSC GWAS datasets. E Pooled odds ratio (OR) of CD39+ resting Treg % CD4 Treg cell and PSC risk in three PSC GWAS datasets.

Full size image

There was no obvious heterogeneity in the Cochran Q test utilizing MR-Egger regression and IVW (all P > 0.05, Table 2, Supplementary material 7 and 8, Figs. S5 and S6). The absence of any deviation from zero in the MR-Egger regression intercepts indicates the lack of horizontal pleiotropy (P > 0.05 for all intercepts, Table 2, Supplementary material 9 and 10). Moreover, according to the results of the MR Steiger test, the SNP selection was valid, confirming the hypothesized causal direction of immunophenotypes’s impact on PSC (all P > 0.05). Lastly, the LOO analyses ascertained that none of the detected causal interrelationships were driven by any individual IV (Figs. S7 and S8).

Replicating the findings in another PSC GWAS and then pooling with meta-analysis

To further clarify the reliability of causality, we validate these findings in another PSC GWAS (finn-b-K11_CHOLANGI) (Supplementary material 11). As a result, the abundances of CD39+ resting Treg % CD4 Treg cell and BAFF-R on IgD- CD38dim B cell remain positively associated with PSC risk, and CD3 on CD39+ secreting Treg cell remains negatively correlated with PSC occurrence but did not render significant alteration (Supplementary material 12). Interestingly, we performed a meta-analysis to pool the results of three PSC GWAS that the Host-genetic-driven increase in BAFF-R on IgD- CD38dim B cell (OR = 1.27, 95% CI = 1.07–1.50, P = 0.005, I2 = 0), CD39+ resting Treg % CD4 Treg cell (OR = 1.06, 95% CI = 1.02–1.09, P = 0.001, I2 = 0) significantly increases PSC risk, while genetically predicted higher relative abundances of CD3 on CD39+ secreting Treg cell (OR = 0.91, 95% CI = 0.85–0.96, P = 0.002, I2 = 0) exhibited significant protective effects on PSC (Fig. 2C–E).

Analysis of the infiltration level of Tregs and B cells in PSC transcriptome data

In the GSE119600 and GSE159676 cohorts (Fig. 3A–H), Tregs were more abundant in the PSC group relative to the control group, according to CIBERSORT, ssGSEA, and Quantiseq algorithms. Considering that the xCell algorithm inferred that the abundance of Tregs was extremely low, no significant difference existed between the two groups. However, in the GSE159676 cohort through the CIBERSORT algorithm alone, infiltrating B cells were relatively richer in the PSC group than the control group, and no significant variations were found in other cohorts and methods (Fig. 3I–P). These underscored the pivotal role of Tregs proportion imbalance in the occurrence of PSC.

Fig. 3: Comparison of the relative abundance of Tregs and B cells in two PSC transcriptome data using four immune cell infiltration algorithms (CIBERSORT, ssGSEA, xCell, and Quantiseq).
figure 3

A Tregs in GSE119600 dataset using CIBERSORT. B Tregs in GSE119600 dataset using ssGSEA. C Tregs in GSE119600 dataset using xCell. D Tregs in GSE119600 dataset using Quantiseq. E Tregs in GSE159676 dataset using CIBERSORT. F Tregs in GSE159676 dataset using ssGSEA. G Tregs in GSE159676 dataset using xCell. H Tregs in GSE159676 dataset using Quantiseq. I B cells in GSE119600 dataset using CIBERSORT. J B cells in GSE119600 dataset using ssGSEA. K B cells in GSE119600 dataset using xCell. L B cells in GSE119600 dataset using Quantiseq. M B cells in GSE159676 dataset using CIBERSORT. N B cells in GSE159676 dataset using ssGSEA. O B cells in GSE159676 dataset using xCell. P B cells in GSE159676 dataset using Quantiseq. The asterisks indicate a significant statistical p value calculated using the Wilcoxon test (*p < 0.05; **p < 0.01; ***p < 0.001).

Full size image

WGCNA

To identify significant Treg module genes, we performed WGCNA in two PSC datasets. The first step was to calculate the pairwise gene correlation to create a similarity matrix. We designed a scale-free network after testing soft threshold power (β) from 1 to 20. Second, in the GSE119600 and GSE159676 cohorts, respectively, the optimal value β attained was 13 and 4 and applied in converting the similarity matrix into an adjacent matrix (Fig. 4A and S9A). Furthermore, a negative correlation (R2 = 0.69, slope = −1.14 in GSE119600; R2 = 0.73, slope = −1.41 in GSE159676) between log10(k) and log10 (p(k)) was shown in Fig. 4B and S9B, symbolizing the proximity of the transformed adjacent matrix to a scale-free network for further analysis. Thirdly, the adjacent matrix was utilized to create the TOM for dynamic tree-clustering on genes (Fig. 4C and S9C), enabling the identification of gene modules containing Tregs. Two characteristics—B cells and Tregs—were correlated with the identified gene modules. As demonstrated in Fig. 4D and S9D, the ME cyan, ME dark gray, ME dark turquoise, and ME light cyan modules were positively correlated with Tregs (R ≥ 0.69, P ≤ 0.002) in GSE119600 and GSE159676 cohorts, respectively. Furthermore, an intramodular analysis was conducted, which revealed statistically significant positive correlations between gene significance (GS) and module membership (MM) in the ME dark gray (R = 0.76, P < 0.001) (Fig. 4E), ME dark turquoise (R = 0.76, P < 0.001) (Fig. 4F), ME cyan (R = 0.73, P < 0.001) (Fig. 4G) and ME light cyan modules (R = 0.68, P < 0.001) (Fig. S9E). Lastly, the genes in the ME cyan module (n = 346), ME dark gray module (n = 1328), ME dark turquoise module (n = 1553) in the GSE119600 cohort, and ME light cyan module (n = 1304) in GSE159676 cohort were deemed to be significant Tregs module genes for later analysis.

Fig. 4: Weighted gene coexpression network analysis (WGCNA) for screening functional gene modules highly associated with Tregs proportion in GSE119600 dataset.
figure 4

A Analysis of network topology for various soft-thresholding powers. (weighting coefficient, β). The x-axis represents different soft-thresholding powers. The y-axis represents the correlation coefficient between log (k) and log [P(k)]. The red line indicates a correlation coefficient of 0.6, Average network connectivity under different weighting coefficients. B Distribution of nodes with the degree of connection, k and Correlation of log (k) and log [P(k)]. C Clustering dendrograms of all genes, with dissimilarity based on topological overlap, together with assigned module colors. Altogether, 14 co-expression modules were constructed and displayed in different color. D The heatmap showed the relationship between five WGCNA-identified modules and two traits (B cells and Tregs). Each cell includes the correlation coefficient and p value. Scatterplots of high correlations between gene significance (GS) versus module membership (MM) for Tregs, (E): ME darkgrey module and Tregs; (F): ME darkturquoise module and Tregs; (G): ME cyan module and Tregs. Correlation coefficient and p value were calculated by Spearman correlation analysis.

Full size image

Identification of Tregs-related genes and functional enrichment analysis

The following DEGs were detected utilizing the NetworkAnalyst online-Gene Expression Table: 1908 DEGs in GSE119600, and 1860 in GSE159676 datasets, respectively. Then, we take the intersection of the results of DEGs and significant Tregs module genes in two datasets, and a total of 65 genes were determined to be Tregs-related genes in PSC (Fig. 5A). Subsequently, the GO term enrichment analysis for Tregs-related genes demonstrates the top five clusters with significant enrichment in cellular components (CCs), molecular functions (MFs), and biological processes (BPs) (Fig. 5B), including the regulation of small GTPase mediated signal transduction, the extrinsic component of cytoplasmic side of plasma membrane, GTPase regulator activity and so on. As for the KEGG, they were primarily involved in endocytosis, Fc gamma R-mediated phagocytosis, etc (Fig. 5C).

Fig. 5: Identification of Tregs-related genes and functional enrichment analysis.
figure 5

A UpSet plot presents the intersection of the results of differentially expressed genes (DEGs) and significant Tregs module genes in two PSC datasets. B, C, Functional enrichment analysis of Tregs-related genes in PSC. B Gene ontology (GO) analysis on biological process (BP), cellular component (CC), and molecular function (MF). C Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. DI Identification of hub Tregs-related genes based on multiple machine learning algorithms(including Lasso, support vector machine (SVM). Modified Lasso was used to identify candidate Tregs-related genes with 5-fold cross-validation in GSE119600 (D) and GSE159676 (G) datasets. The Y-axis shows mean-square error and the X-axis is Log (λ), dotted vertical lines represent minimum and 1 standard error values of λ. The genes selected at minimum standard error values of λ were finally used for further analysis; (E, H): Lasso coefficient profiles of the Tregs-related genes. A vertical line is drawn at the optimal value by1 – s.e. criteria and results in 10 non-zero coefficients in GSE119600 (E) and GSE159676 (H) datasets; (F, I): SVM algorithm was applied to screen candidate Tregs-related genes. The green dots indicated the lowest error rate and the highest precision when genes are this number in GSE119600 (F) and GSE159676 (I) datasets. J UpSet plot presents the intersection of the results of Lasso and SVM in two PSC datasets.

Full size image

Identification and construction of Tregs classifier through multiple machine learning approaches

To identify hub Tregs-related genes during PSC in GSE119600 (Fig. 5D–F) and GSE159676 (Fig. 5G–I) cohorts, the expression data of 65 Tregs-related genes were input into LASSO and SVM machine-learning models. Notably, we identified seven hub Tregs-related genes that were shared by at least three results by analyzing the intersection of the aforementioned datasets (Fig. 5J). To create a consistent diagnostic model for PSC, we incorporated seven hub Tregs-related genes into our integration program (LOOCV framework). We designed prediction models employing 5-fold cross-validation and 107 algorithm combinations in the GSE119600 training cohort. For testing cohorts, the average AUC value was calculated for each algorithm. The final model was determined to be the combination of LASSO + GBM, which exhibited the highest average AUC (0.959), as illustrated in Fig. 6A. Ultimately, we constructed a diagnostic model, named the Tregs classifier, based on seven hub Tregs-related genes (AKAP10, BASP1, DENND3, PLXNC1, KLF13, SCAP, and TMCO3). Detailed feature selection for each model and risk score for each patient were provided in Supplementary material 13.

Fig. 6: Construction and testing of the artificial intelligence-derived diagnostic signature (Tregs classifier) for PSC, as well as diagnostic value and clinical usefulness of Tregs classifier.
figure 6

A The area under curve (AUC) of 107 machine-learning algorithm combinations in the training (GSE119600) and testing (GSE159676) cohorts. B, C Receiver operating characteristic (ROC) curves with AUC values to evaluate predictive efficacy of Tregs classifier in GSE119600 (B) and GSE159676 (C) datasets. D, E Principal component analysis for the expression profiles of seven hub Tregs-related genes to distinguish PSC patients from healthy control patients in GSE119600 (D) and GSE159676 (E) datasets. F, G Decision curve analysis was applied to evaluate the clinical usefulness of Tregs classifier in GSE119600 (F) and GSE159676 (G) datasets. The Y-axis represents the net benefit. The black line represents the hypothesis that no patient’s treatment. The X-axis represents the threshold probability. The threshold probability is where the expected benefit of treatment is equal to the expected benefit of avoiding treatment.

Full size image

Diagnostic value, and clinical usefulness of Tregs classifier

ROC curves revealed that the Tregs classifier exhibited a superior diagnostic capacity, with an AUC of 0.974 in the GSE119600 cohort (Fig. 6B) and 0.944 in the GSE159676 cohort (Fig. 6C). PCA demonstrated that the expression of seven hub Tregs-related genes could effectively differentiate between PSC patients and healthy control in GSE119600 (Fig. 6F) and GSE159676 cohorts (Fig. 6G). Notably, the DCA chart showed that within the range of the threshold probability from 0 to 1, the net benefit derived from employing the Tregs classifier (predictive model) exceeded that obtained by either not intervening in all patients or intervening in all patients in GSE119600 (Fig. 6F) and GSE159676 cohorts (Fig. 6G).

Expression levels and correlation pattern of hub Tregs-related genes

In GSE119600 (Fig. 7A) and GSE159676 cohorts (Fig. 7B), seven hub Treg-related genes exhibited differential expression. qRT-PCR verified that compared to the control, the mRNA expression levels of Akap10, Basp1, Dennd3, Plxnc1, and Tmco3 were significantly up-regulated in the PSC mice model yet the Klf13, and Scap expression levels were significantly reduced in PSC group relative to the controls (Fig. 7E–K). Additionally, as shown in Fig. 7C, the strongest correlation is TMCO3 and PLXNC1 (R = 0.85, P < 0.05), followed by PLXNC1 and PLXNC1 (R = 0.81, P < 0.05); AKAP10 exhibited a significant positive link to DENND3 (R = 0.67, P < 0.05); and KLF13 was correlated positively with SCAP (R = 0.64, P < 0.05). Consistently, the GSE159676 cohort confirmed the strong expression correlation patterns of seven hub Treg-related genes (Fig. 7D).

Fig. 7: Expression levels, and correlation pattern of hub seven Tregs-related genes.
figure 7

Expression levels of seven Tregs-related genes in PSC compared to the control group in GSE119600 (A) and GSE159676 (B) datasets. The asterisks indicate a significant statistical p value calculated using the Wilcoxon test (*p < 0.05; **p < 0.01; ***p < 0.001). Correlation heatmap of seven Tregs-related genes expressed in PSC in GSE119600 (C) and GSE159676 (D) datasets. Correlation coefficient and p value were calculated by Spearman correlation analysis. In PSC mice models, through quantitative real-time polymerase chain reaction (qRT-PCR), we compared the mRNA expression levels of seven hub Tregs-related genes, including Akap10 (E), Basp1 (F), Dennd3 (G), Plxnc1 (H), Klf13 (I), Scap (J), and Tmco3 (K). The asterisks indicate a significant statistical p value calculated using the Student’s t-tests (*p < 0.05; **p < 0.01; ***p < 0.001).

Full size image

Analyses of immune cell infiltration in PSC and its relationship with hub Tregs-related genes

The variation in immune profile composition between PSC patients and healthy control was examined, ESTIMATE and xCell algorithms uncovered that the immune score, ESTIMATE score, stromal score, and microenvironment score were substantially higher in the PSC group than in the healthy control group in the GSE119600 dataset (Fig. 8A–F). The findings (such as immune score and ESTIMATE score) from the GSE159676 dataset were similar (Fig. 8G–L). Moreover, higher expression of hub Tregs-related genes was generally positively correlated with the immune score, microenvironment score, neutrophils, and natural killer T cell (NKT), yet generally negatively correlated with M2 macrophages, megakaryocyte-erythroid progenitor (MEP), and vice versa. The evidence was derived from seven immune cell infiltration methods and two PSC cohorts (Fig. 9A, B).

Fig. 8: Analyzing the microenvironment (equal to immune + stromal) between PSC and healthy control groups in two PSC datasets using two immune cell infiltration algorithms (ESITMATE, and xCell).
figure 8

A immune score in GSE119600 dataset using ESITMATE. B stromal score in GSE119600 dataset using ESITMATE. C ESITMATE score in GSE119600 dataset using ESITMATE. D immune score in GSE119600 dataset using xCell. E stromal score in GSE119600 dataset using xCell. F microenvironmen score in GSE119600 dataset using xCell. G immune score in GSE159676 dataset using ESITMATE. H stromal score in GSE159676 dataset using ESITMATE. I ESITMATE score in GSE159676 dataset using ESITMATE. J immune score in GSE159676 dataset using xCell. K stromal score in GSE159676 dataset using xCell. L microenvironmen score in GSE159676 dataset using xCell. The asterisks indicate a significant statistical p value calculated using the Wilcoxon test (*p < 0.05; **p < 0.01; ***p < 0.001).

Full size image
Fig. 9: Correlation of seven hub Tregs-related genes with immune cells infiltration using seven methods (CIBERSORT, TIMER, xCell, MCPcounter, ESITMATE, EPIC, and Quantiseq).
figure 9

A GSE119600 dataset. B GSE159676 dataset. Correlation coefficient and p value were calculated by Spearman correlation analysis. The asterisks indicate a statistically significant p value calculated using the Spearman correlation analysis (*p < 0.05; **p < 0.01; ***p < 0.001).

Full size image

Discovery of potential mechanisms in the development and progression of PSC and its relationship with hub Tregs-related genes

To explore possible biological pathways during PSC, ssGSEA, and GSVA were executed to examine the score for the enrichment of pre-defined BP and GO/KEGG/HALLMARK gene sets, respectively. Regarding pre-defined BP in two PSC cohorts (Fig. 10A, B), it was observed that the PSC group exhibited complement and coagulation cascades, and IL6 JAK-STAT3 signaling that were distinct from those of the control group. Additionally, correlation analysis revealed significant positive associations between highly expressed hub Tregs-related genes and IL6 JAK-STAT3 signaling, complement, and coagulation cascades, and vice versa (Fig. 10C, D).

Fig. 10: Analyses of several representative gene sets (immune/inflammatory/cell death-related pathways) in PSC and its relationship with hub Tregs-related genes.
figure 10

Comparison of pre-defined biological processes between PSC and control groups in GSE119600 (A) and GSE159676 (B) datasets based on ssGSEA algorithm. p value was calculated using the Wilcoxon test. The heatmap plot depicted correlation between seven hub Tregs-related genes and pre-defined biological biological pathways in GSE119600 (C) and GSE159676 (D) datasets. Correlation coefficient and p value were calculated by Spearman correlation analysis. The asterisks indicate a statistically significant p value calculated using the Spearman correlation analysis (*p < 0.05; **p < 0.01; ***p < 0.001).

Full size image

In terms of GO/KEGG/HALLMARK gene sets the differential analysis revealed a total of 78 GO pathways, 9 KEGG pathways, and 14 HALLMARK pathways by taking the intersection of the two PSC cohorts (Figs. 11 and S10). Compared to the normal group, several GO pathways (mast cell activation, regulation of toll-like receptor (TLR) signaling pathway, autophagic cell death, positive regulation of granulocyte differentiation, Fc-epsilon receptor signaling pathway), KEGG pathways (leukocyte transendothelial migration, B cell receptor signaling pathway, epithelial cell signaling in helicobacter pylori infection, Fc gamma R-mediated phagocytosis) and HALLMARK pathways (complement, inflammatory response, and IL6 JAK-STAT3 signaling) were primarily involved in the PSC group, while cellular response to copper ion, response to zinc ion, metabolism-related signaling pathways (response to cholesterol, response to folic acid, histidine metabolism, steroid biosynthesis, response to sterol, bile acid metabolism, cholesterol homeostasis), mitochondrial function-related pathways (mitochondrial respiratory chain complex assembly, mitochondrial envelope, mitochondrial large ribosomal subunit) were mainly enriched in healthy control group. Additionally, correlation analysis uncovered that highly expressed hub Tregs-related genes were generally positively linked to upregulated signaling pathways in PSC, but were negatively correlated with down-regulate, and vice versa (Fig. 12).

Fig. 11: Comparison of GO/KEGG/HALLMARK gene sets from Molecular Signatures Database between PSC and control groups in GSE119600 cohort based on gene set variation analysis (GSVA).
figure 11

A GO gene sets. B KEGG gene sets. C HALLMARK gene sets. t value and p value were calculated using the limma R package.

Full size image
Fig. 12: Analyzing the correlation between shared differential GO/KEGG/HALLMARK biological pathways in two PSC cohorts and seven hub Tregs-related genes.
figure 12

A GSE119600 dataset. B GSE159676 dataset. Correlation coefficient and p value were calculated by Spearman correlation analysis. The asterisks indicate a statistically significant p value calculated using the Spearman correlation analysis (*p < 0.05; **p < 0.01; ***p < 0.001).

Full size image

XGBoost and SHAP

To identify unique genes in diagnosing PSC, we utilized SHAP and XGBoost to illuminate feature importance ordering. The summary plot (Fig. 13A, C) ranks feature according to their impact, highlighting AKAP10 and KLF13 in the GSE119600 cohort, and KLF13 and AKAP10 in the GSE159676 cohort, as crucial determinants. The impact of each component on the model’s predictions is visualized in Fig. 13B, D. Notably, AKAP10 and KLF13 were identified as the paramount factor for PSC diagnosis. Additionally, the Mantel test uncovered that AKAP10 was correlated positively with apoptosis, regulation of cellular response to macrophage colony-stimulating factor, complement, epithelial cell signaling in helicobacter pylori infection, IL6 JAK-STAT3 signaling, APC costimulation, Fc-epsilon receptor signaling pathway, inflammatory response in two PSC cohorts (Figs. 13E and S11); KLF13 were positively correlated with bile acid metabolism, DNA repair, biosynthesis of unsaturated fatty acids, steroid biosynthesis, response to cholesterol, response to sterol, mitochondrial function-related pathways (mitochondrial envelope, mitochondrial large/small ribosomal subunit) (Figs. 13F and S12).

Fig. 13: Identification of optimal Tregs-related gene for PSC using XGBoost algorithm and SHapley Additive exPlanations (SHAP), and investigation the relationship between optimal Tregs-related gene and pivotal molecular pathways using Mantel test.
figure 13

Importance ranking of features in GSE119600 (A) and GSE159676 (C) datasets. Visualization of SHAP variables, with the included features sorted by the average absolute value of SHAP from highest to lowest in GSE119600 (B) and GSE159676 (D) datasets. Yellow dots denoting a higher impact (increasing PSC) and purple dots a lower one (reducing PSC). E In GSE119600 cohort, correlation between AKAP10 and significant biological pathways based on Spearman correlation analysis. Correlation coefficient and p value were calculated by the Mantel test. F In GSE119600 cohort, correlation between KLF13 and significant biological pathways based on Spearman correlation analysis. Correlation coefficient and p value were calculated by the Mantel test.

Full size image

Discussion

Currently, PSC lacks effective biomarkers for early diagnosis and targeted therapy. Accumulating evidence supported an association between immune cell abundance and PSC [7,8,9], however, there are deficient direct pieces of evidence affirming a causal correlation. Considering that PSC comprises a heterogeneous group, each accompanied by individual pathophysiology and variations. MR analysis examined the causal relationship between 731 immune cell subtypes and two PSC GWAS datasets (GWAS ID: ieu-a-1112, and finn-b-K11_CHOLANGI_STRICT). Our genetic analyses found several potential causal associations between 3 immunophenotypes (CD39+ resting Treg % CD4 Treg cell, CD3 on CD39+ secreting Treg cell, BAFF-R on IgD- CD38dim B cell) and PSC, using IVW method had the same direction as the findings of MR-Egger regression, WM, simple model and weight model. Replicating the findings in another PSC GWAS and meta-analysis pooling the results of three PSC GWAS revealed that these results were generally consistent. Studies have discovered that peripheral blood mononuclear cells derived from PSCs had a noticeably greater total number of B lymphocytes when contrasted with healthy controls [26]. In addition, 10% of patients with PSC had elevated serum levels of IgG4 and a significant infiltration of IgG4 PCs [27, 28]. PSC tissues were found to have IgG4+ PC clumps and deposits; the degree to which they were immunostained was associated with disease progression and lymphocyte infiltration in PSCs [27, 29]. These findings align with our study, in which B cells are involved in the development of PSC. By preventing immune reactions to self-antigens, regulatory T cells (Tregs) serve an essential role in keeping the immune system in a steady state [30]. Treg dysfunction facilitates the onset and advancement of autoimmune liver diseases, including primary biliary cholangitis, autoimmune hepatitis, and PSC [30]. Limited research has been conducted on the function of Tregs in PSC, and only a handful of studies have confirmed Treg dysfunction. The laboratory of Christoph Schramm discovered a notable reduction in the frequencies of circulating and intrahepatic Tregs in PSC, along with diminished functionality (although the Treg dysfunction is limited) [31]. However, several studies have found there were higher frequencies of liver infiltrating Tregs in PSC [32] and elevated numbers of circulating Tregs were reported [33]. In our study, we found host-genetic-driven increase in CD39+ resting Treg % CD4 Treg cell significantly increases PSC risk, while genetically predicted higher relative abundances of CD3 on CD39+ secreting Treg cell exhibited significant protective effects on PSC. Based on previous literature and our results, different subtypes of Tregs play different roles in the development of PSC. To further elucidate and verify the relationship between B cells, Tregs, and PSC, we apply eight different algorithms (CIBERSORT, ssGSEA, TIMER, xCell, MCPcounter, ESTIMATE, EPIC, and Quantiseq) to infer PSC patients’ immune cell infiltration at the transcriptome level. As a result, we discovered that Tregs proportion was considerably changed, while B cells didn’t exert significance based on 4 different algorithms and two transcriptome data. These underscored the pivotal role of Tregs proportion imbalance in the occurrence of PSC.

Considering these results from MR analysis, meta-analysis, and immune cell infiltration analysis at the transcriptome level comprehensively, a thorough comprehension of the possible functions of Tregs-associated molecules is profoundly significant for advancing early PSC diagnosis, improvement of the understanding of pathogenesis, and the development of potentially applicable drugs for PSC patients. PSC continues to be a diagnostic conundrum, frequently being identified years after the initial reporting of symptoms or during the later stages of the disease. MRI/MRCP is the first-line diagnostic instrument for patients suspected of having PSC, yet their costs are substantial, and diagnostic performance for PSC is suboptimal in the early phases. There are several immune markers (such as immunoglobulin G, immunoglobulin M, anticardiolipin, and atypical perinuclear antineutrophil cytoplasmic antibodies) known to be correlated with the PSC, yet there is a lack of consensus surrounding its diagnostic application in patients with PSC and their inadequate specificity prevent them from having a substantial diagnostic utility [5]. This can be attributed to the intricate pathophysiology and heterogeneity of PSC. Pathophysiologically, thousands of genes are differentially expressed in response to the development and progression of PSC, and some genes can potentially provide valuable insights into diagnosis and prediction. In this study, we systematically reviewed available PSC bulk RNA-seq datasets (Homo sapiens) from inception to date and emphasized the effects of Tregs-related molecules in PSC. As a result, we applied WGCNA and differential analysis in two cohorts to identify 65 Tregs-related genes, which were chosen to be candidates for PSC. Also, previous research shows that individuals often choose modeling algorithms according to their preferences and the extent of their knowledge. We compiled a list of 12 machine-learning methods that may be employed to create biological diagnosis markers to address this limitation. We further merged them into 107 algorithm combinations, with the functions of variable screening and data dimensionality reduction performed by LASSO, RF, Stepglm, and glmBoost. The final model was determined to be the combination of LASSO and GBM that yielded the largest average AUC (0.959) among the two PSC cohorts. When developing biomedical models using artificial intelligence and machine learning, one of the most troublesome issues is over-fitting, which occurs when a model fits the training data well but fails to do so when tested with other external validation datasets [34]. A final seven-gene signature, the Tregs classifier, was formed using GBM after LASSO minimized redundant data. The Tregs classifier demonstrated outstanding predictive and diagnostic performance in the training cohort with an AUC of 0.974 and the test cohort with an AUC of 0.944. Moreover, it effectively differentiated between PSC and healthy control. Based on the DCA results, treatment decisions guided by the Tregs classifier yielded greater net benefits than those based on the approach of treating either all patients or none. This suggests that the Tregs classifier holds promise for diagnosing PSC and informing clinical decision-making.

Recent studies have shown that bile duct immune cells in PSC patients can drive hepatic inflammation and fibrosis by affecting the hepatic microenvironment [35, 36]. In our study, we revealed that compared with healthy control, the scores of the immune, microenvironment, and ESTIMATE were substantially higher in PSC patients in two PSC cohorts. Moreover, correlation analysis between hub Tregs-related genes and infiltrating immune cells indicated that neutrophils and NKT may promote the occurrence of PSC, while M2 macrophages, MEP may inhibit the occurrence of PSC, using seven immune cell infiltration methods and two PSC cohorts. Patient-specific biliary inflammation was marked by a higher presence of T cells and neutrophils than that of healthy controls, according to a biliary immune landscape map of PSC [9], which was consistent with our study. The presence of activated macrophages, predominantly M1 polarized, in the liver of PSC patients was discovered to be linked to an increase in Notch signaling and an improvement in HPC self-renewal, according to a previous study [37]. MEP gives rise to the cells that produce platelets and red blood cells, the anucleate cells that facilitate healing. In myelodysplastic neoplasms (MDS), maintenance of MEPs and lymphoid progenitors predicted favorable outcomes [38]. However, the impact of a decrease in M2 macrophages and MEP infiltration has not been studied in PSC, and elevation in neutrophils and NKT activation in PSC remains relatively understudied. Thus, further characterization of those cells will provide a novel perspective for the development and progress of PSC and will aid in finding potential therapy for PSC patients.

Subsequently, ssGSEA and GSVA functional enrichment analyses were performed to elucidate the fundamental biological mechanisms of PSC in two transcriptome data. As a result, we uncovered that immune-related pathways (mast cell activation, regulation of the TLR signaling pathway, positive regulation of granulocyte differentiation, B cell receptor signaling pathway), inflammatory-related pathways (complement, inflammatory response, and IL6 JAK-STAT3 signaling), cell death-related pathways (autophagic cell death) were primarily involved in the PSC development, while metabolism-related signaling pathways (bile acid metabolism, response to cholesterol, response to folic acid, histidine metabolism, steroid biosynthesis), mitochondrial function-related pathways (mitochondrial respiratory chain complex assembly, mitochondrial envelope, mitochondrial large ribosomal subunit) were significantly down-regulated in PSC. These suggested that PSC is characterized by the activation of immunity and inflammation, autophagic cell death increased, metabolic disturbance, and mitochondrial dysfunction. Additionally, we found that significant involvement of complement and coagulation cascades, Fc gamma R-mediated phagocytosis, Fc-epsilon receptor signaling pathway, and epithelial cell signaling in helicobacter pylori infection in the development of PSC. Current research on the mechanism of PSC mainly focuses on immune and inflammatory signaling pathways. PSC patients often develop liver dysfunction, including fat-soluble vitamin malabsorption, biological metabolism disorders, and bile acid metabolism disorders, for which several symptomatic treatment drugs (such as ursodeoxycholic acid, norursodoxycolic acid, and obeticholic acid) are used to treat PSC [39]. The above phenomenon can be explained by the disturbance of metabolism-related pathways in PSC. The findings in one study indicate that patients diagnosed with PSC are at risk for developing hypercoagulopathy [40], whereas another study indicated that even though the mean coagulation indices of PSC patients were normal, they were often lower in individuals with nonbiliary causes of cirrhosis [41]. Our findings indicate a strong correlation between PSC and complement, as well as the coagulation cascade and complement. Remarkably, the precise mechanisms underlying how several metabolism-related pathways (histidine metabolism, steroid biosynthesis), complement and coagulation cascades, epithelial cell signaling in helicobacter pylori infection, Fc-epsilon receptor signaling pathway, Fc gamma R-mediated phagocytosis, mitochondrial dysfunction led to PSC onset remain elusive. Further investigation in this direction is warranted to unveil new therapeutic avenues for treating PSC.

One prominent finding of this study was that we identified seven hub Tregs-related genes (AKAP10, BASP1, DENND3, PLXNC1, KLF13, SCAP, and TMCO3) that are anticipated to be new potential targets for PSC therapy. In two PSC cohorts, seven hub Tregs-related genes exhibited significantly differential expression. In the PSC mice model, qRT-PCR confirmed that compared to the control, the mRNA expression levels of Akap10, Basp1, Dennd3, Plxnc1, and Tmco3 were significantly up-regulated in PSC yet the Klf13, and Scap expression level was considerably reduced in PSC group relative to the control group. Moreover, the correlation pattern of hub Tregs-related genes uncovered that there may be some intermolecular interaction (TMCO3 and PLXNC1, PLXNC1 and PLXNC1, AKAP10, and DENND3, KLF13 and SCAP), which might be implicated in the occurrence of PSC. Importantly, to identify optimal genes in diagnosing PSC, we employed the XGBoost algorithm and SHAP to visualize the importance of feature variables. In two PSC cohorts, we identify AKAP10 and KLF13 as crucial determinants. AKAP10 potentially contributes to the occurrence of PSC while KLF13 potentially reduces the occurrence of PSC, which might mediate PSC onset and might potentially function as a novel therapeutic target for PSC. To further clarify the molecular function of AKAP10 and KLF13 during PSC, spearman correlation analyses and Mantel tests revealed that AKAP10 may promote the development of PSC through activating immune/inflammatory-related pathways, yet KLF13 may prevent PSC by maintaining normal liver metabolism and mitochondrial function. However, the potential effects of AKAP10 and KLF13 on PSC development have not been investigated and warrant further investigation in vitro and in vivo experiments.

Our work has certain differences from previous published studies in several aspects. (1) Through utilizing large-scale GWAS summary data (731 immune cell subtypes and three PSC GWAS datasets), we first performed MR and meta-analysis between immune subtypes and PSC, ascertaining a robust causal relationship CD39+ resting Treg % CD4 Treg cell, CD3 on CD39+ secreting Treg cell, BAFF-R on IgD- CD38dim B cell and PSC. (2) At the transcriptome level, we employed eight different algorithms (CIBERSORT, ssGSEA, TIMER, xCell, MCPcounter, ESITMATE, EPIC, and Quantiseq) to infer PSC patients’ immune cell infiltration, and uncovered the pivotal role of Tregs proportion imbalance in the occurrence of PSC. (3) We systematically collected PSC transcriptome data and developed our Tregs classifier using the machine-learning algorithms that had the highest average AUC across all cohorts, which improved the model’s stability and diagnostic power. (4) We merged 12 machine-learning algorithms into 107 different combinations and then chose the most accurate one to avoid inappropriate modeling methods owing to subjective preferences. (5) The study identified the M2 macrophages, MEP, metabolic disturbance (histidine metabolism, steroid biosynthesis), complement and coagulation cascades, coagulation and complement cascades, Fc-epsilon receptor signaling pathway, epithelial cell signaling in helicobacter pylori infection, Fc gamma R-mediated phagocytosis, mitochondrial dysfunction involved in PSC, offering new insights into the mechanisms underlying PSC pathogenesis. (6) The XGBoost algorithm and SHAP identified AKAP10 and KLF13 as optimal genes, which are anticipated to be new potential targets for PSC therapy. Our study was as thorough and meticulous as possible, but there are certain caveats to consider. First, as for MR analysis, our findings failed to withstand a stringent Bonferroni correction for multiple comparisons. However, as a hypothesis-driven approach, the MR study with some biological evidence was used to test epidemiologically established associations, irrespective of Bonferroni corrected P values. It is worth noting that our results were validated in three PSC GWAS datasets and two PSC transcriptome data. Second, our work relies on large-scale GWAS summary data from European descent, which makes it difficult to generalize to other populations; thus, more research with groups other than Europeans is necessary. Third, the small sample size in GEO cohorts suggests that the generalizability of the model needs to be verified in larger prospective cohorts. Fourth, we found seven hub Tregs-related genes participating in PSC development. Nonetheless, in vivo validation should be performed in future investigations to investigate the specific mechanisms underlying, especially AKAP10 and KLF13. Lastly, the comprehensive molecular mechanisms in vivo and in vitro ought to be further investigated in future research for the identification of the M2 macrophages, MEP, metabolic disturbance, complement and coagulation cascades, and mitochondrial dysfunction, potentially opening up new avenues for PSC treatment.

Conclusion

This work is the first of its kind, as far as we are aware, to integrate MR and transcriptome analysis to investigate the causal relationships between Tregs and PSC. We used 107 different combinations of 12 machine-learning algorithms to develop and verify a consensus diagnostic signature (Tregs classifier) for PSC based on seven Tregs-related genes. Furthermore, the identification of the M2 macrophages, MEP, metabolic disturbance, complement and coagulation cascades, mitochondrial dysfunction, AKAP10, and KLF13 could represent crucial targets for PSC treatment, offering insights into the underlying pathogenesis of the disease.

Related Articles

Targeting of TAMs: can we be more clever than cancer cells?

With increasing incidence and geography, cancer is one of the leading causes of death, reduced quality of life and disability worldwide. Principal progress in the development of new anticancer therapies, in improving the efficiency of immunotherapeutic tools, and in the personification of conventional therapies needs to consider cancer-specific and patient-specific programming of innate immunity. Intratumoral TAMs and their precursors, resident macrophages and monocytes, are principal regulators of tumor progression and therapy resistance. Our review summarizes the accumulated evidence for the subpopulations of TAMs and their increasing number of biomarkers, indicating their predictive value for the clinical parameters of carcinogenesis and therapy resistance, with a focus on solid cancers of non-infectious etiology. We present the state-of-the-art knowledge about the tumor-supporting functions of TAMs at all stages of tumor progression and highlight biomarkers, recently identified by single-cell and spatial analytical methods, that discriminate between tumor-promoting and tumor-inhibiting TAMs, where both subtypes express a combination of prototype M1 and M2 genes. Our review focuses on novel mechanisms involved in the crosstalk among epigenetic, signaling, transcriptional and metabolic pathways in TAMs. Particular attention has been given to the recently identified link between cancer cell metabolism and the epigenetic programming of TAMs by histone lactylation, which can be responsible for the unlimited protumoral programming of TAMs. Finally, we explain how TAMs interfere with currently used anticancer therapeutics and summarize the most advanced data from clinical trials, which we divide into four categories: inhibition of TAM survival and differentiation, inhibition of monocyte/TAM recruitment into tumors, functional reprogramming of TAMs, and genetic enhancement of macrophages.

Type 2 immunity in allergic diseases

Significant advancements have been made in understanding the cellular and molecular mechanisms of type 2 immunity in allergic diseases such as asthma, allergic rhinitis, chronic rhinosinusitis, eosinophilic esophagitis (EoE), food and drug allergies, and atopic dermatitis (AD). Type 2 immunity has evolved to protect against parasitic diseases and toxins, plays a role in the expulsion of parasites and larvae from inner tissues to the lumen and outside the body, maintains microbe-rich skin and mucosal epithelial barriers and counterbalances the type 1 immune response and its destructive effects. During the development of a type 2 immune response, an innate immune response initiates starting from epithelial cells and innate lymphoid cells (ILCs), including dendritic cells and macrophages, and translates to adaptive T and B-cell immunity, particularly IgE antibody production. Eosinophils, mast cells and basophils have effects on effector functions. Cytokines from ILC2s and CD4+ helper type 2 (Th2) cells, CD8 + T cells, and NK-T cells, along with myeloid cells, including IL-4, IL-5, IL-9, and IL-13, initiate and sustain allergic inflammation via T cell cells, eosinophils, and ILC2s; promote IgE class switching; and open the epithelial barrier. Epithelial cell activation, alarmin release and barrier dysfunction are key in the development of not only allergic diseases but also many other systemic diseases. Recent biologics targeting the pathways and effector functions of IL4/IL13, IL-5, and IgE have shown promising results for almost all ages, although some patients with severe allergic diseases do not respond to these therapies, highlighting the unmet need for a more detailed and personalized approach.

Enhancer reprogramming: critical roles in cancer and promising therapeutic strategies

Transcriptional dysregulation is a hallmark of cancer initiation and progression, driven by genetic and epigenetic alterations. Enhancer reprogramming has emerged as a pivotal driver of carcinogenesis, with cancer cells often relying on aberrant transcriptional programs. The advent of high-throughput sequencing technologies has provided critical insights into enhancer reprogramming events and their role in malignancy. While targeting enhancers presents a promising therapeutic strategy, significant challenges remain. These include the off-target effects of enhancer-targeting technologies, the complexity and redundancy of enhancer networks, and the dynamic nature of enhancer reprogramming, which may contribute to therapeutic resistance. This review comprehensively encapsulates the structural attributes of enhancers, delineates the mechanisms underlying their dysregulation in malignant transformation, and evaluates the therapeutic opportunities and limitations associated with targeting enhancers in cancer.

Iron homeostasis and ferroptosis in muscle diseases and disorders: mechanisms and therapeutic prospects

The muscular system plays a critical role in the human body by governing skeletal movement, cardiovascular function, and the activities of digestive organs. Additionally, muscle tissues serve an endocrine function by secreting myogenic cytokines, thereby regulating metabolism throughout the entire body. Maintaining muscle function requires iron homeostasis. Recent studies suggest that disruptions in iron metabolism and ferroptosis, a form of iron-dependent cell death, are essential contributors to the progression of a wide range of muscle diseases and disorders, including sarcopenia, cardiomyopathy, and amyotrophic lateral sclerosis. Thus, a comprehensive overview of the mechanisms regulating iron metabolism and ferroptosis in these conditions is crucial for identifying potential therapeutic targets and developing new strategies for disease treatment and/or prevention. This review aims to summarize recent advances in understanding the molecular mechanisms underlying ferroptosis in the context of muscle injury, as well as associated muscle diseases and disorders. Moreover, we discuss potential targets within the ferroptosis pathway and possible strategies for managing muscle disorders. Finally, we shed new light on current limitations and future prospects for therapeutic interventions targeting ferroptosis.

Reprogramming of fatty acid metabolism: a hidden force regulating the occurrence and progression of cholangiocarcinoma

Cholangiocarcinoma (CCA) is a malignant tumor that originates from the bile duct epithelium and with a poor outcome due to lack of effective early diagnostic methods. Surgical resection is the preferred method for cure, but treatment options are limited for advanced diseases, such as distant metastatic or locally progressive tumors. Therefore, it is urgent to explore other new treatment methods. As modern living standards rise, the acceptance of high-fat, high-protein, and high-carbohydrate diets is growing among the public, and the resulting metabolic abnormalities are intimately linked to the initiation and spread of tumors. Metabolic reprogramming is a key mechanism in the process of tumor development and progression and is closely related to cancer cell proliferation, metastasis and drug resistance. Fatty acid (FA) metabolism, an integral component of cancer cell metabolism, can provide an energy source for cancer cells and participate in cell signaling, the regulation of the immune response and the maintenance of homeostasis of the internal environment, which are closely linked to the development and progression of CCA. Therefore, a better understanding of FA metabolism may provide promising strategies for early diagnosis, prognostic assessment and targeted therapy for CCA patients. In this paper, we review the effects of FA metabolism on CCA development and progression, summarize related mechanisms and the existing clinical applications of targeted lipid metabolism in CCA, and explore new targets for CCA metabolic therapy.

Responses

Your email address will not be published. Required fields are marked *