Chromosome-scale genomes of wild and cultivated Morinda officinalis

Chromosome-scale genomes of wild and cultivated Morinda officinalis

Background & Summary

Morinda officinalis [Gynochthodes officinalis, taxonomy ID: 266091], a perennial vine belonging to the Rubiaceae family, is a renowned medicinal and edible plant native to the Lingnan region of southern China and northern Vietnam1. This species holds immense importance in traditional Chinese medicine2, where its dried roots, known as “bajitian,” are extensively used to treat a variety of ailments, including impotence, infertility, rheumatism, and arthralgia3,4,5. Interestingly, M. officinalis exists in both wild and cultivated forms, each with distinct characteristics and implications. The wild plants have a more trichome-dense leaf, a thinner vane thickness, a thinner stem, and a poorer root system compared to the cultivated species1 (Fig. 1). In contrast, the cultivated Morinda plants have less trichomes on the leaves, a thicker vane thickness, a thicker stem, and a more abundant root system than the wild counterparts. Research on phytochemical composition has revealed that M. officinalis harbors a diverse array of bioactive constituents, including anthraquinones, iridoids, flavonoids, polysaccharides, volatile oils, and various other noteworthy compounds6. The roots of wild M. officinalis populations have historically served as the primary resource for the extraction of herbal medicine2,4,6. However, due to the sharp increase in market demand, the wild resources of M. officinalis have been significantly threatened, leading to the risk of extinction3. As a result, artificially cultivated M. officinalis has become the primary source of medicinal material in China.

Fig. 1
Chromosome-scale genomes of wild and cultivated Morinda officinalis

17-kmer distribution in two Morinda officinalis genomes. (A) Morinda officinalis (cultivated) (B) Morinda officinalis (wild). The dashed line indicates the expected Kmer-peaks.

Full size image

Compared to their wild counterparts, the cultivated varieties of M. officinalis, such as the “Gaoji 3” cultivar, exhibit several advantageous traits, including higher yield, improved quality, and enhanced disease resistance3. These cultivated varieties have been selectively bred and optimized for commercial production, often through the integration of modern agricultural practices and biotechnological approaches. Despite the importance of both wild and cultivated M. officinalis, the genomic resources available for this species have been limited. While the genome of the cultivated variety has been previously reported3, the lack of a comprehensive genomic characterization of the wild form has hindered our understanding of the genetic mechanisms underlying the production of medicinal compounds, adaptive traits, and the evolutionary trajectories of this species. The availability of robust genome data represents an invaluable asset for investigating the genetic underpinnings of crucial traits in medicinal plants7,8,9,10,11,12. This rich resource opens up avenues for comprehensive exploration and understanding of the genetic factors governing various characteristics essential for medicinal efficacy3,7.

This study aims to address this knowledge gap by presenting the chromosome-scale genome assemblies of both wild and cultivated M. officinalis. By leveraging the power of comparative genomics, we intend to unveil the unique genomic features, identify the key genes and pathways involved in the biosynthesis of bioactive compounds, and elucidate the evolutionary adaptations that have enabled this species to thrive in its native habitats. These high-quality genomic data will not only contribute to the fundamental understanding of M. officinalis but also provide valuable resources for the genetic improvement and sustainable utilization of this important medicinal plant.

Methods

Sample preparation and sequencing

The fresh leaves of cultivated Morinda officinalis were collected from Zhaoqing City, Guangdong, China (23°23′24″N 112°20′19″E), while the wild type was originally collected from Yunfu City, Guangdong, China (22°41′3″N, 112°8′20″E), and is maintained at the experimental farm of South China Agricultural University, Guangzhou, China (23°9′9″N, 113°22′44″E). The DNA extraction process followed the CTAB (Cetyltrimethylammonium bromide) method13, and subsequently underwent purification using the QIAGEN Genomic kit (Cat#13343, QIAGEN). DNA quality and quantity were assessed using complementary techniques. Spectrophotometric analysis with a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA) revealed DNA samples with high purity, as evidenced by OD260/280 ratios of 1.8–2.0 and OD260/230 ratios of 2.0–2.2, which are consistent with high-quality genomic DNA. Precise DNA quantification was performed using a Qubit 4.0 fluorometer (Invitrogen, USA), ensuring accurate input for subsequent library preparation. We prepared sequencing libraries using a standardized workflow. DNA fragments were size-selected using the PippinHT system (Sage Science, USA) to obtain optimal fragment lengths, followed by end repair using the NEBNext Ultra II End Repair/dA-tailing Kit (Cat# E7546). Adapter ligation was performed using the SQK-LSK109 kit from Oxford Nanopore Technologies. Sequencing on the GridION X5 platform generated substantial sequencing data, with 12.9 Gb of raw long-reads from wild M. officinalis and 43.91 Gb from cultivated M. officinalis, providing comprehensive genomic coverage for our comparative analysis. Furthermore, the extracted DNA was subjected to digestion using MboI in accordance with the standard Hi-C library preparation protocol, and was subsequently sequenced on the BGI-DIPSEQ platform, generating 79 and 72 Gb of data for both cultivated and wild M. officinalis, respectively (Table S1). For the RNAseq experiment, TIANGEN Kit was used for total RNA extraction from the roots. After a quality control check, library construction and sequencing were performed on the BGI-DIPSEQ platform which generated 30.67 Gb and 66.33 Gb raw data for wild and cultivated M. officinalis, respectively (Table S2).

Estimation of genome size

The short DNA reads underwent preprocessing to remove adapter sequences, duplicate reads, and low-quality reads using trimmomatic (v3.0)14, employing the following parameters: (adapter:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50). Subsequently, the resulting clean data were utilized for genome size estimation through kmerfreq. 16 bit (Version 2.4) and GCE software (refer to Fig. 1 and Table S3)15. The analysis revealed estimated genome sizes of 383 Mb for wild M. officinalis and 312 Mb for cultivated M. officinalis (Table 1).

Table 1 Genome assembly and assessment statistics.
Full size table

De novo genome assembly and evaluation

The nanopore long reads obtained from both wild and cultivated M. officinalis were assembled using NECAT16. Subsequently, all assemblies underwent polishing with short reads using NextPolish software17. Finally, the genomes were aligned to resolve contig overlaps using purge dups (v.1.2.3)18, utilizing default parameters. This process yielded genome assemblies of 423 Mb and 425 Mb, with scaffold N50 lengths of 5.91 Mb and 10.99 Mb for wild and cultivated M. officinalis, respectively (as detailed in Table 1 and Table S3).

Furthermore, we utilized Hi-C data to align the contig assemblies onto chromosomes. The Juicer software19 facilitated the extraction of uniquely mapped and non-PCR duplicated Hi-C contact reads, followed by integration of the assembled genome into the pseudochromosome-level assembly using 3D-DNA20. Subsequently, the Hi-C assembly output was visualized using Juicebox and manually refined based on the Hi-C contact map. This process resulted in pseudochromosome-level assemblies totaling 412 Mb and 421 Mb, anchored to 11 chromosomes in wild and cultivated M. officinalis, respectively, with scaffold N50 lengths measuring 38.12 Mb and 39.18 Mb, respectively. Moreover, over 96.5% and 97.4% of scaffolds were successfully integrated into the pseudochromosomes of each species, respectively, consistent with the reported chromosome number3 (Figs. 2, 3, Table 1, Table S4).

Fig. 2
figure 2

Circos plots of two Morinda officinalis genomes. (A) Image showing the morphological attributes of cultivated and wild M. officinalis (B) Circos plot of M. officinalis (cultivated) and M. officinalis (wild) genome. Concentric circles from outermost to innermost show (A) chromosomes and megabase values, (B) gene density, (C) GC content, (D) repeat density, (E) LTR density, (F) LTR Copia density, (G) LTR Gypsy density and (H) inter-chromosomal synteny (features B-G are calculated in non-overlapping 500 Kb sliding windows).

Full size image
Fig. 3
figure 3

Hi-C map of the M. officinalis (cultivated) and M. officinalis (wild). Map showing genome-wide all-by-all interactions. The map shows a high resolution of individual chromosomes that are scaffolded and assembled independently. The heat map colors ranging from light pink to dark red indicate the frequency of Hi-C interaction links from low to high (0–10).

Full size image

Repeat annotation

We employed a combination of de novo and homolog-based approaches to detect repeat elements within the genomes of five species. For de novo prediction, LTR_FINDER21 and RepeatModeler22 were utilized to identify repeat elements, followed by the construction of a non-redundant library for repeat element identification using RepeatMasker23. As for the homolog-based methods, TRF was utilized to detect tandem repeats, and RepeatMasker was employed to search for repeat elements against RepBase (v.21.12). Overall, 51% and 67.16% of the genome sequences were recognized as repetitive sequences in wild and cultivated M. officinalis, respectively (detailed in Table 2 and Table S5). Notably, long terminal repeats (LTRs) constituted the highest proportions, accounting for 34.29% and 32.77% in wild and cultivated M. officinalis, respectively.

Table 2 Genome annotation statistics.
Full size table

Protein-coding gene prediction and Non-coding RNA annotation

The prediction of protein-coding genes was conducted using the BRAKER2 pipeline24, resulting in the discovery of 31,308 and 29,528 protein-coding genes in wild and cultivated M. officinalis, respectively (as outlined in Table 2 and Table S6). Notably, more than 97.1% and 96.4% of these genes exhibited complete BUSCOs in the respective species (as indicated in Table S8). All protein-coding genes underwent BLAST analysis against NR, SwissProt, KOG, and KEGG databases, employing a cutoff E-value of 1e-05 (Figs S1–S4).

Ribosomal RNA (rRNA) genes were identified by querying against the plant rRNA database using BLAST. MicroRNAs (miRNA) and small nuclear RNA (snRNA) were searched against the Rfam 12.0 database. Additionally, tRNAscan-SE was utilized for tRNA detection25. This comprehensive approach led to the identification of a total of 2021 and 2291 non-coding RNAs (ncRNAs) in wild and cultivated M. officinalis, respectively. Notably, the number of rRNAs in the cultivated variety was found to be twice as high as in the wild type (Table S8).

Confirming the phylogenetic position of Morinda officinalis

To show the phylogenetic positions of wild and cultivated Morinda officinalis, in comparison with 12 representative plant species (including and other published genomes of Citrus grandis, Populus trichocarpa, Platycodon grandiflflorus, Daucus carota, Solantum tuberosum, Ophiorrhiza pumila, Gardenia jasminoides, Coffea canephora, Oryza sativa, and Arabidopsis thaliana) (Table S9), the gaps were filtered, and the sequences of each of the 317 single-copy orthologs were extracted and aligned using MAFFT (v 7.310)26. Following alignment, the protein-coding sequences of each species were concatenated to form a supergene sequence. Subsequently, a phylogenetic tree was constructed using IQ-Tree (v 1.6.1)27, employing the parameters ‘-bb 1000 -alrt 1000’ (Fig. 4).

Fig. 4
figure 4

Phylogenetic position of M. officinalis. The phylogenetic tree constructed by IQtree with ‘-b 100’ using 317 single copy orthologues of two Morinda species and eight other representative plant species. The numbers below the middle of each branch represent the bootstrap values.

Full size image

Data Records

All the sequencing data are deposited in the Genome Sequence Archive in National Genomics Data Center (NGDC) Genome Sequence Archive (GSA) database under the BioProject accession number PRJCA03238728. The Chromosome-scale genome assemblies are deposited to the NCBI under the accession number GCA_046128155.129 and GCA_048301565.130 for the cultivated and wild M. officinalis, respectively. All the raw and assembled sequencing data, including the chromosome-scale genome assemblies are also deposited to CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) under accession number CNP000485731. The annotation files are available via Figshare32.

Technical Validation

The completeness and contiguity of the genomes were evaluated using BUSCO (V3.0.2)33 software with the Embryophyta odb10 dataset. The analysis revealed that 97.3% and 97.5% of complete embryophyte BUSCOs were present in the genome assemblies of wild and cultivated M. officinalis, respectively (as detailed in Table 1 and Table S10). Furthermore, DNA short reads were mapped to the genomes using BWA (v.2.21), demonstrating a high mapping rate (>99%) (Table S11).

In addition, the contiguity of the genome assembly was assessed using the LTR Assembly Index (LAI), which evaluates the assembly of LTR sequences. Initially, LTRharvest34 was employed to detect LTR sequences with specific parameters. These results were then combined with the output from LTR_FINDER. Subsequently, LTRretriever (v.2.8)35 was utilized to identify high-confidence LTR retrotransposons using default settings. The LAI score was calculated based on these results, yielding values of 13.44 and 12.33 for wild and cultivated M. officinalis, respectively (refer to Table 1). The high quality, contiguity, and completeness of the assembled genomes were corroborated by multiple lines of evidence36.

Related Articles

Energy metabolism in health and diseases

Energy metabolism is indispensable for sustaining physiological functions in living organisms and assumes a pivotal role across physiological and pathological conditions. This review provides an extensive overview of advancements in energy metabolism research, elucidating critical pathways such as glycolysis, oxidative phosphorylation, fatty acid metabolism, and amino acid metabolism, along with their intricate regulatory mechanisms. The homeostatic balance of these processes is crucial; however, in pathological states such as neurodegenerative diseases, autoimmune disorders, and cancer, extensive metabolic reprogramming occurs, resulting in impaired glucose metabolism and mitochondrial dysfunction, which accelerate disease progression. Recent investigations into key regulatory pathways, including mechanistic target of rapamycin, sirtuins, and adenosine monophosphate-activated protein kinase, have considerably deepened our understanding of metabolic dysregulation and opened new avenues for therapeutic innovation. Emerging technologies, such as fluorescent probes, nano-biomaterials, and metabolomic analyses, promise substantial improvements in diagnostic precision. This review critically examines recent advancements and ongoing challenges in metabolism research, emphasizing its potential for precision diagnostics and personalized therapeutic interventions. Future studies should prioritize unraveling the regulatory mechanisms of energy metabolism and the dynamics of intercellular energy interactions. Integrating cutting-edge gene-editing technologies and multi-omics approaches, the development of multi-target pharmaceuticals in synergy with existing therapies such as immunotherapy and dietary interventions could enhance therapeutic efficacy. Personalized metabolic analysis is indispensable for crafting tailored treatment protocols, ultimately providing more accurate medical solutions for patients. This review aims to deepen the understanding and improve the application of energy metabolism to drive innovative diagnostic and therapeutic strategies.

Tissue macrophages: origin, heterogenity, biological functions, diseases and therapeutic targets

Macrophages are immune cells belonging to the mononuclear phagocyte system. They play crucial roles in immune defense, surveillance, and homeostasis. This review systematically discusses the types of hematopoietic progenitors that give rise to macrophages, including primitive hematopoietic progenitors, erythro-myeloid progenitors, and hematopoietic stem cells. These progenitors have distinct genetic backgrounds and developmental processes. Accordingly, macrophages exhibit complex and diverse functions in the body, including phagocytosis and clearance of cellular debris, antigen presentation, and immune response, regulation of inflammation and cytokine production, tissue remodeling and repair, and multi-level regulatory signaling pathways/crosstalk involved in homeostasis and physiology. Besides, tumor-associated macrophages are a key component of the TME, exhibiting both anti-tumor and pro-tumor properties. Furthermore, the functional status of macrophages is closely linked to the development of various diseases, including cancer, autoimmune disorders, cardiovascular disease, neurodegenerative diseases, metabolic conditions, and trauma. Targeting macrophages has emerged as a promising therapeutic strategy in these contexts. Clinical trials of macrophage-based targeted drugs, macrophage-based immunotherapies, and nanoparticle-based therapy were comprehensively summarized. Potential challenges and future directions in targeting macrophages have also been discussed. Overall, our review highlights the significance of this versatile immune cell in human health and disease, which is expected to inform future research and clinical practice.

Overview and recommendations for research on plants and microbes in regolith-based agriculture

The domestication of agriculture is widely recognized as one of the most crucial technological adaptations for the transition of humanity from hunter-and-gatherer groups into early city-states and ultimately, complex civilizations. As humankind sets forth to permanently establish itself on the Moon and use it as a testing ground to colonize other worlds, like Mars, agriculture will again play a pivotal role. In this case, the development of sustainable crop production systems capable of succeeding in these harsh environments becomes vital to the success of our star-faring journey. Over decades, studies varying in species and approaches have been conducted in microgravity, testing the limits of plants and various growth systems, to better understand how Earth-based agriculture could be translated into environmental conditions and therefore evolutionary pressures beyond what life on our planet has known. While we have passed several significant milestones, we are still far from the goal of a sustainable agricultural system beyond our planet Regolith-based agriculture (RBA) should be a component of sustainable agriculture solutions beyond Earth, one which can also provide insight into plant growth in poor soils across our own world. However, RBA studies are in their infancy and, like any other new field, need an established set of parameters to be followed by the RBA community so the generated data can be standardized and validated. Here, we provide an extensive multi-disciplinary review of the state of RBA, outline important knowledge gaps, and propose a set of standardized methods and benchmarks for regolith simulant development and selection as well as plant, microbe, and plant-microbe interaction studies conducted in lunar and Martian regolith. Our goal is to spur dialog within the RBA community on proper regolith simulant selection, experimental design, and reporting. Our methods are divided into complexity tiers, providing a clear path for even the simplest experiments to contribute to the bulk of the knowledge that will shape the future of RBA science and see it mature as an integrated part of sustainable off-world agriculture.

Advance in peptide-based drug development: delivery platforms, therapeutics and vaccines

The successful approval of peptide-based drugs can be attributed to a collaborative effort across multiple disciplines. The integration of novel drug design and synthesis techniques, display library technology, delivery systems, bioengineering advancements, and artificial intelligence have significantly expedited the development of groundbreaking peptide-based drugs, effectively addressing the obstacles associated with their character, such as the rapid clearance and degradation, necessitating subcutaneous injection leading to increasing patient discomfort, and ultimately advancing translational research efforts. Peptides are presently employed in the management and diagnosis of a diverse array of medical conditions, such as diabetes mellitus, weight loss, oncology, and rare diseases, and are additionally garnering interest in facilitating targeted drug delivery platforms and the advancement of peptide-based vaccines. This paper provides an overview of the present market and clinical trial progress of peptide-based therapeutics, delivery platforms, and vaccines. It examines the key areas of research in peptide-based drug development through a literature analysis and emphasizes the structural modification principles of peptide-based drugs, as well as the recent advancements in screening, design, and delivery technologies. The accelerated advancement in the development of novel peptide-based therapeutics, including peptide-drug complexes, new peptide-based vaccines, and innovative peptide-based diagnostic reagents, has the potential to promote the era of precise customization of disease therapeutic schedule.

Conserved immunomodulation and variation in host association by Xanthomonadales commensals in Arabidopsis root microbiota

Suppression of chronic Arabidopsis immune responses is a widespread but typically strain-specific trait across the major bacterial lineages of the plant microbiota. We show by phylogenetic analysis and in planta associations with representative strains that immunomodulation is a highly conserved, ancestral trait across Xanthomonadales, and preceded specialization of some of these bacteria as host-adapted pathogens. Rhodanobacter R179 activates immune responses, yet root transcriptomics suggest this commensal evades host immune perception upon prolonged association. R179 camouflage likely results from combined activities of two transporter complexes (dssAB) and the selective elimination of immunogenic peptides derived from all partners. The ability of R179 to mask itself and other commensals from the plant immune system is consistent with a convergence of distinct root transcriptomes triggered by immunosuppressive or non-suppressive synthetic microbiota upon R179 co-inoculation. Immunomodulation through dssAB provided R179 with a competitive advantage in synthetic communities in the root compartment. We propose that extensive immunomodulation by Xanthomonadales is related to their adaptation to terrestrial habitats and might have contributed to variation in strain-specific root association, which together accounts for their prominent role in plant microbiota establishment.

Responses

Your email address will not be published. Required fields are marked *