Related Articles
A machine learning approach to leveraging electronic health records for enhanced omics analysis
Omics studies produce a large number of measurements, enabling the development, validation and interpretation of systems-level biological models. Large cohorts are required to power these complex models; yet, the cohort size remains limited due to clinical and budgetary constraints. We introduce clinical and omics multimodal analysis enhanced with transfer learning (COMET), a machine learning framework that incorporates large, observational electronic health record databases and transfer learning to improve the analysis of small datasets from omics studies. By pretraining on electronic health record data and adaptively blending both early and late fusion strategies, COMET overcomes the limitations of existing multimodal machine learning methods. Using two independent datasets, we showed that COMET improved the predictive modelling performance and biological discovery compared with the analysis of omics data with traditional methods. By incorporating electronic health record data into omics analyses, COMET enables more precise patient classifications, beyond the simplistic binary reduction to cases and controls. This framework can be broadly applied to the analysis of multimodal omics studies and reveals more powerful biological insights from limited cohort sizes.
Pancreatic organogenesis mapped through space and time
The spatial organization of cells within a tissue is dictated throughout dynamic developmental processes. We sought to understand whether cells geometrically coordinate with one another throughout development to achieve their organization. The pancreas is a complex cellular organ with a particular spatial organization. Signals from the mesenchyme, neurons, and endothelial cells instruct epithelial cell differentiation during pancreatic development. To understand the cellular diversity and spatial organization of the developing pancreatic niche, we mapped the spatial relationships between single cells over time. We found that four transcriptionally unique subtypes of mesenchyme in the developing pancreas spatially coordinate throughout development, with each subtype at fixed locations in space and time in relation to other cells, including beta cells, vasculature, and epithelial cells. Our work provides insight into the mechanisms of pancreatic development by showing that cells are organized in a space and time manner.
Integrated proteogenomic characterization of ampullary adenocarcinoma
Ampullary adenocarcinoma (AMPAC) is a rare and heterogeneous malignancy. Here we performed a comprehensive proteogenomic analysis of 198 samples from Chinese AMPAC patients and duodenum patients. Genomic data illustrate that 4q loss causes fatty acid accumulation and cell proliferation. Proteomic analysis has revealed three distinct clusters (C-FAM, C-AD, C-CC), among which the most aggressive cluster, C-AD, is associated with the poorest prognosis and is characterized by focal adhesion. Immune clustering identifies three immune clusters and reveals that immune cluster M1 (macrophage infiltration cluster) and M3 (DC cell infiltration cluster), which exhibit a higher immune score compared to cluster M2 (CD4+ T-cell infiltration cluster), are associated with a poor prognosis due to the potential secretion of IL-6 by tumor cells and its consequential influence. This study provides a comprehensive proteogenomic analysis for seeking for better understanding and potential treatment of AMPAC.
Multi-omics insights into the molecular signature and prognosis of hypopharyngeal squamous cell carcinoma
Approximately two-thirds of hypopharyngeal squamous cell carcinoma (HPSCC) cases are diagnosed at advanced stages, with the worst prognosis among head and neck squamous cell carcinomas (HNSCCs). Identifying biomarkers for high-risk patients requiring aggressive treatment is crucial. We present mutational, transcriptomic, and proteomic studies of 103 Chinese HPSCC patients and observe a higher prevalence and poorer prognosis in males. Estrogen response pathways are up-regulated, and proteins phosphorylated by protein kinase C (PKC) and cyclin-dependent kinases (CDKs) are aberrantly regulated in HPSCC. We identify aberrant copy number regions including SOX2(3q26.33), FGFR(8p11.23), CCND1(11q13.3), CDKN2A/2B(9p21.3), and MYC(8q24.21). Human papillomavirus (HPV) status combined with highly mutated genes, such as SYNE1 in HPV(−) and MUC4 in HPV(+) patients, were assessed as prognosis markers. A predictive model involving clinical factors and expression of six genes was established and cross-site validated. These findings open new opportunities for stratifying high-risk patients and molecular targets for personalized therapeutic strategies.
Transcriptomic clustering of chronic lymphocytic leukemia: molecular subtypes based on Bruton’s tyrosine kinase expression levels
Historically, CLL prognostication relied on disease burden, reflected in clinical stage. Later, chromosome abnormalities and genomics suggested several CLL subtypes which were aligned with response to therapy. Gene expression profiling data identified pathways associated with CLL progression. We hypothesized that transcriptome and proteome may identify functional omics associated with CLL nosology. As a test cohort, we utilized publicly available treatment-naïve CLL transcriptomics data (n = 130) and did consensus clustering that identified BTK-expression-based clusters. The BTK-High and BTK-Low clusters were validated in public and our in-house databases (n = >550 CLL patients). To associate with functional relevance, we took samples from 151 previously treated patient with CLL and analyzed them using RNA sequencing and reverse-phase protein array. Transcript levels were strongly correlated with BTK protein levels. BTK-High subtype showed increased CCL3/CCL4 levels and disease burden such as high WBC. BTK-Low subtype showed down-regulated mRNA/proteins of DNA-repair pathway and increased DNA-damage-response, which may have contributed to enrichment of inflammatory pathway. BTK-Low subtype was rich in proapoptotic gene and protein expression and relied less on BCR pathway. High-BTK subgroup was enriched in replication/repair pathway and transcription machinery. In conclusion, profiling of 5 datasets of ~700 patients revealed unique BTK-associated expression clusters in CLL.
Responses