Related Articles

The genomic landscape of gene-level structural variations in Japanese and global soybean Glycine max cultivars

Japanese soybeans are traditionally bred to produce soy foods such as tofu, miso and boiled soybeans. Here, to investigate their distinctive genomic features, including genomic structural variations (SVs), we constructed 11 nanopore-based genome references for Japanese and other soybean lines. Our assembly-based comparative method, designated ‘Asm2sv’, identified gene-level SVs comprehensively, enabling pangenome analysis of 462 worldwide cultivars and varieties. Based on these, we identified selective sweeps between Japanese and US soybeans, one of which was the pod-shattering resistance gene PDH1. Genome-wide association studies further identified several quantitative trait loci that accounted for large-seed phenotypes of Japanese soybean lines, some of which were also close to regions of the selective sweeps, including PDH1. Notably, specific combinations of alleles, including SVs, were found to increase the seed size of some Japanese landraces. In addition to the differences in cultivation environments, distinct food processing usages might result in changes in Japanese soybean genomes.

The Marchantia polymorpha pangenome reveals ancient mechanisms of plant adaptation to the environment

Plant adaptation to terrestrial life started 450 million years ago and has played a major role in the evolution of life on Earth. The genetic mechanisms allowing this adaptation to a diversity of terrestrial constraints have been mostly studied by focusing on flowering plants. Here, we gathered a collection of 133 accessions of the model bryophyte Marchantia polymorpha and studied its intraspecific diversity using selection signature analyses, a genome–environment association study and a pangenome. We identified adaptive features, such as peroxidases or nucleotide-binding and leucine-rich repeats (NLRs), also observed in flowering plants, likely inherited from the first land plants. The M. polymorpha pangenome also harbors lineage-specific accessory genes absent from seed plants. We conclude that different land plant lineages still share many elements from the genetic toolkit evolved by their most recent common ancestor to adapt to the terrestrial habitat, refined by lineage-specific polymorphisms and gene family evolution.

High-coverage whole-genome sequencing of a Jakun individual from the “Orang Asli” Proto-Malay subtribe from Peninsular Malaysia

Jakun, a Proto-Malay subtribe from Peninsular Malaysia, is believed to have inhabited the Malay Archipelago during the period of agricultural expansion approximately 4 thousand years ago (kya). However, their genetic structure and population history remain inconclusive. In this study, we report the genome structure of a Jakun female, based on whole-genome sequencing, which yielded an average coverage of 35.97-fold. We identified approximately 3.6 million single-nucleotide variations (SNVs) and 517,784 small insertions/deletions (indels). Of these, 39,916 SNVs were novel (referencing dbSNP151), and 10,167 were nonsynonymous (nsSNVs), spanning 5674 genes. Principal Component Analysis (PCA) revealed that the Jakun genome sequence closely clustered with the genomes of the Cambodians (CAM) and the Metropolitan Malays from Singapore (SG_MAS). The ADMIXTURE analysis further revealed potential admixture from the EA and North Borneo populations, as corroborated by the results from the F3, F4, and TreeMix analyses. Mitochondrial DNA analysis revealed that the Jakun genome carried the N21a haplogroup (estimated to have occurred ~19 kya), which is commonly found among Malays from Malaysia and Indonesia. From the whole-genome sequence data, we identified 825 damaging and deleterious nonsynonymous single-nucleotide polymorphisms (nsSNVs) affecting 720 genes. Some of these variants are associated with age-related macular degeneration, atrial fibrillation, and HDL cholesterol level. Additionally, we located a total of 3310 variants on 32 core adsorption, distribution, metabolism, and elimination (ADME) genes. Of these, 193 variants are listed in PharmGKB, and 21 are nsSNVs. In summary, the genetic structure identified in the Jakun individual could enhance the mapping of genetic variants for disease-based population studies and further our understanding of the human migration history in Southeast Asia.

EV DNA from pancreatic cancer patient-derived cells harbors molecular, coding, non-coding signatures and mutational hotspots

DNA packaged into cancer cell-derived EV is not well appreciated. Here, we uncovered signatures of EV DNA secreted by pancreatic cancer cells. The cancer cells and non-cancer counterparts exhibit distinct low vs. high molecular weight (LMW vs. HMW) EV DNA fragments distribution, respectively. Genome sequencing and Single Nucleotide Variants analysis revealed that 95% of reads and 94% of SNVs map to noncoding regions of the genome. Given that ~1% of the human genome represents coding regions, the 5% mapping rate to coding regions suggests a non-random enrichment of certain coding regions and mutations. The LMW DNA fragments not only set cancer cells apart, but also harbor cancer specific enrichment of unique coding regions, the top nine being FAM135B, COL22A1, TSNARE1, KCNK9, ZFAT, JRK, MROH5, GSDMD, and MIR3667HG. Additionally, the cancer cells’ LMW DNA fragments exhibit dense centromeric mapping more strikingly on chromosomes 3, 7, 9, 10, 11, 13, 17, and 20. Mutational profiling turned up close to 200 mutations specific for the cancer cells. Altogether, our analyses suggest that centromeric regions might hold clues to EV DNA content from pancreatic cancer, the molecular, mutational signatures thereof, and rationalizes the need for a new approach to DNA biomarker research.

SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants

Structural variations (SVs) are diverse forms of genetic alterations and drive a wide range of human diseases. Accurately genotyping SVs, particularly occurring at repetitive genomic regions, from short-read sequencing data remains challenging. Here, we introduce SVLearn, a machine-learning approach for genotyping bi-allelic SVs. It exploits a dual-reference strategy to engineer a curated set of genomic, alignment, and genotyping features based on a reference genome in concert with an allele-based alternative genome. Using 38,613 human-derived SVs, we show that SVLearn significantly outperforms four state-of-the-art tools, with precision improvements of up to 15.61% for insertions and 13.75% for deletions in repetitive regions. On two additional sets of 121,435 cattle SVs and 113,042 sheep SVs, SVLearn demonstrates a strong generalizability to cross-species genotype SVs with a weighted genotype concordance score of up to 90%. Notably, SVLearn enables accurate genotyping of SVs at low sequencing coverage, which is comparable to the accuracy at 30× coverage. Our studies suggest that SVLearn can accelerate the understanding of associations between the genome-scale, high-quality genotyped SVs and diseases across multiple species.

Responses

Your email address will not be published. Required fields are marked *