Related Articles
Genetic analysis of a Yayoi individual from the Doigahama site provides insights into the origins of immigrants to the Japanese Archipelago
Mainland Japanese have been recognized as having dual ancestry, originating from indigenous Jomon people and immigrants from continental East Eurasia. Although migration from the continent to the Japanese Archipelago continued from the Yayoi to the Kofun period, our understanding of these immigrants, particularly their origins, remains insufficient due to the lack of high-quality genome samples from the Yayoi period, complicating predictions about the admixture process. To address this, we sequenced the whole nuclear genome of a Yayoi individual from the Doigahama site in Yamaguchi prefecture, Japan. A comprehensive population genetic analysis of the Doigahama Yayoi individual, along with ancient and modern populations in East Asia and Northeastern Eurasia, revealed that the Doigahama Yayoi individual, similar to Kofun individuals and modern Mainland Japanese, had three distinct genetic ancestries: Jomon-related, East Asian-related, and Northeastern Siberian-related. Among non-Japanese populations, the Korean population, possessing both East Asian-related and Northeastern Siberian-related ancestries, exhibited the highest degree of genetic similarity to the Doigahama Yayoi individual. The analysis of admixture modeling for Yayoi individuals, Kofun individuals, and modern Japanese respectively supported a two-way admixture model assuming Jomon-related and Korean-related ancestries. These results suggest that between the Yayoi and Kofun periods, the majority of immigrants to the Japanese Archipelago originated primarily from the Korean Peninsula.
Comparative analysis of the Mexico City Prospective Study and the UK Biobank identifies ancestry-specific effects on clonal hematopoiesis
The impact of genetic ancestry on the development of clonal hematopoiesis (CH) remains largely unexplored. Here, we compared CH in 136,401 participants from the Mexico City Prospective Study (MCPS) to 416,118 individuals from the UK Biobank (UKB) and observed CH to be significantly less common in MCPS compared to UKB (adjusted odds ratio = 0.59, 95% confidence interval (CI) = [0.57, 0.61], P = 7.31 × 10−185). Among MCPS participants, CH frequency was positively correlated with the percentage of European ancestry (adjusted beta = 0.84, 95% CI = [0.66, 1.03], P = 7.35 × 10−19). Genome-wide and exome-wide association analyses in MCPS identified ancestry-specific variants in the TCL1B locus with opposing effects on DNMT3A-CH versus non-DNMT3A-CH. Meta-analysis of MCPS and UKB identified five novel loci associated with CH, including polymorphisms at PARP11/CCND2, MEIS1 and MYCN. Our CH study, the largest in a non-European population to date, demonstrates the power of cross-ancestry comparisons to derive novel insights into CH pathogenesis.
SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants
Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs. It exploits a dual-reference strategy to engineer a curated set of genomic, alignment, and genotyping features based on a reference genome in concert with an allele-based alternative genome. Using 38,613 human-derived SVs, we show that SVLearn significantly outperforms four state-of-the-art tools, with precision improvements of up to 15.61% for insertions and 13.75% for deletions in repetitive regions. On two additional sets of 121,435 cattle SVs and 113,042 sheep SVs, SVLearn demonstrates a strong generalizability to cross-species genotype SVs with a weighted genotype concordance score of up to 90%. Notably, SVLearn enables accurate genotyping of SVs at low sequencing coverage, which is comparable to the accuracy at 30× coverage. Our studies suggest that SVLearn can accelerate the understanding of associations between the genome-scale, high-quality genotyped SVs and diseases across multiple species.
High-coverage whole-genome sequencing of a Jakun individual from the “Orang Asli” Proto-Malay subtribe from Peninsular Malaysia
Jakun, a Proto-Malay subtribe from Peninsular Malaysia, is believed to have inhabited the Malay Archipelago during the period of agricultural expansion approximately 4 thousand years ago (kya). However, their genetic structure and population history remain inconclusive. In this study, we report the genome structure of a Jakun female, based on whole-genome sequencing, which yielded an average coverage of 35.97-fold. We identified approximately 3.6 million single-nucleotide variations (SNVs) and 517,784 small insertions/deletions (indels). Of these, 39,916 SNVs were novel (referencing dbSNP151), and 10,167 were nonsynonymous (nsSNVs), spanning 5674 genes. Principal Component Analysis (PCA) revealed that the Jakun genome sequence closely clustered with the genomes of the Cambodians (CAM) and the Metropolitan Malays from Singapore (SG_MAS). The ADMIXTURE analysis further revealed potential admixture from the EA and North Borneo populations, as corroborated by the results from the F3, F4, and TreeMix analyses. Mitochondrial DNA analysis revealed that the Jakun genome carried the N21a haplogroup (estimated to have occurred ~19 kya), which is commonly found among Malays from Malaysia and Indonesia. From the whole-genome sequence data, we identified 825 damaging and deleterious nonsynonymous single-nucleotide polymorphisms (nsSNVs) affecting 720 genes. Some of these variants are associated with age-related macular degeneration, atrial fibrillation, and HDL cholesterol level. Additionally, we located a total of 3310 variants on 32 core adsorption, distribution, metabolism, and elimination (ADME) genes. Of these, 193 variants are listed in PharmGKB, and 21 are nsSNVs. In summary, the genetic structure identified in the Jakun individual could enhance the mapping of genetic variants for disease-based population studies and further our understanding of the human migration history in Southeast Asia.
Evaluation of polygenic scores for hypertrophic cardiomyopathy in the general population and across clinical settings
Hypertrophic cardiomyopathy (HCM) is an important cause of morbidity and mortality, with pathogenic variants found in about a third of cases. Large-scale genome-wide association studies (GWAS) demonstrate that common genetic variation contributes to HCM risk. Here we derive polygenic scores (PGS) from HCM GWAS and genetically correlated traits and test their performance in the UK Biobank, 100,000 Genomes Project, and clinical cohorts. We show that higher PGS significantly increases the risk of HCM in the general population, particularly among pathogenic variant carriers, where HCM penetrance differs 10-fold between those in the highest and lowest PGS quintiles. Among relatives of HCM probands, PGS stratifies risks of developing HCM and adverse outcomes. Finally, among HCM cases, PGS strongly predicts the risk of adverse outcomes and death. These findings support the broad utility of PGS across clinical settings, enabling tailored screening and surveillance and stratification of risk of adverse outcomes.
Responses