A Chromosome-level genome assembly of the American bullfrog (Aquarana catesbeiana)

A Chromosome-level genome assembly of the American bullfrog (Aquarana catesbeiana)

Background & summary

Aquarana catesbeiana, commonly known as bullfrog, belongs to the class Amphibia, order Anura and family Ranidae. Native to North America, this species has undergone significant evolutionary adaptations, allowing it to thrive across diverse landforms and climatic conditions1. It has become a highly successful invasive species, expanding to over, 40 countries across four continents and contributing to the decline of indigenous species across multiple taxonomic categories2. Furthermore, bullfrog meat is considered a delicacy in many regions, driving increased global consumption3,4,5,6,7.

As a representative species of the order Anuran, the American bullfrog plays important roles in various fields of study, from developmental biology and physiology to ecology and evolution8,9. Previous studies utilised anuran species (frogs) to exhibit a wide range of sex chromosome differentiation stages as they exhibit diverse sex-determination systems and varying stages of sex chromosome differentiation, both between and within species10,11,12,13. Therefore, they serve as suitable candidates for investigating sex chromosome evolution and diversity13. Moreover, frogs display significant variation in genome size14,15, with sequenced species ranging from Platyplectrum ornatum (1.06 Gb)16 to Bombina bombina (9.8GB)17, making them ideal models for studying genome size evolution15.

To facilitate ecological, evolutionary and functional genomic studies, high-quality genome assemblies are essential. In 2017, a draft genome of A. catesbeiana was published, featuring a scaffold N50 of 51.6 Kb and 45.3% complete BUSCOs8. While these data provided valuable insights, the poor quality of the assembly limited its broader applicability. In the present study, we generate a chromosome-level genome assembly for the American bullfrog using integrated datasets from Pacific Biosciences (PacBio) HiFi reads, MGI short reads and Hi-C reads. This high-quality assembly will significantly advance research on the unique characteristics of frogs.

Methods

Sample collection

Tissue samples were collected from a female American bullfrog (Fig. 1) bred at the Qingyuan Yangshan breeding base, Zhongyang Group, Guangdong Province. Approximately 10 g of muscle tissue was extracted from this individual for whole-genome sequencing, including short-read, long-read and Hi-C sequencing. Additionally, multiple tissues were collected for transcriptome sequencing. Samples included intestines, lungs, vas deferens, liver, muscles, stomach and eyes, with approximately 1 g of each tissue collected. All tissue samples were fragmented, rapidly cooled with liquid nitrogen and stored at −80 °C for future use. The sample collection protocol was approved by the Animal Ethics Committee of Zhongkai University of Agriculture and Engineering (Guangzhou, China).

Fig. 1
A Chromosome-level genome assembly of the American bullfrog (Aquarana catesbeiana)

The full-body appearance of the American bullfrog. (AC) Female. (DF) Male. (A,D) Lateral anterior view. (B,E) Dorsal view. (C,F) Ventral supine view.

Full size image

DNA extraction and genome sequencing

Genomic DNA (gDNA) was extracted from muscle tissue using the Qiagen Blood & Cell Culture DNA Kit (Qiagen, USA). The gDNA was randomly fragmented, and 1.5 µg was used to construct a 350-bp insert-size library with the MGIEasy Universal DNA Library Prep Set (MGI, China). Sequencing was performed on the MGISEQ2000 platform (MGI, China), generating 322.27 Gb of paired-end raw reads (150 bp). After filtering the raw reads using SOAPnuke (v2.1.0)18 to remove adaptors and low-quality sequences, 324.64 Gb of clean reads were obtained and used for genome size estimation and assembly.

For long-read sequencing, 2 µg of gDNA was used to create libraries following PacBio’s HiFi sequencing protocol (Pacific Biosciences, USA) with the SMRTbell Express Template Prep Kit 2.0. Sequencing was carried out on the PacBio Sequel II System, generating approximately 10.32 million consensus reads (156.21 Gb) with an average read length of 15.128 kb, processed using CCS software (SMRT Link v9.0)19 (-min-passes 1–min-rq 0.99–min-length 100).

Hi-C sequencing was performed on muscle tissue (~1 g) from the same female individual. Libraries were constructed using the GrandOmics Hi-C Kit (GrandOmics, China) with DpnII as the restriction enzyme. Sequencing was carried out on the Illumina NovaSeq platform (paired-end reads, 150 bp), yielding 711.68 Gb of raw reads. After quality filtration using fastp20, eliminating low-quality reads (quality scores <20) and those shorter than 30 bp, 697.99 Gb of high-quality reads (98.76% of the initial reads) were obtained for constructing pseudo-chromosomes.

RNA extraction and transcriptome sequencing

RNA was extracted from tissue samples using the standard Trizol protocol, followed by purification with the Qiagen RNeasy Mini Kit (Qiagen, USA). cDNA libraries were constructed following Illumina’s guidelines, and sequencing on the HiSeq X Ten platform (Illumina, USA). Finally, approximately 52 Gb of transcriptome data were generated. These data were used to support genome assembly and annotation.

Genome size estimation

To genome size of the American bullfrog was estimated using a K-mer-based21 analysis method. The frequency of 17-mers was calculated using the GCE software22 (v1.0.2). The genome size (G) was determined by the formula G = K_num/K_depth, where K_depth represents K-mer depth and K_num is the total number of 17-mers. A BLASTN (v2.7.1)23 alignment against the NT database confirmed the absence of significant exogenous contamination in the library data. The estimated genome size was approximately 5.81 Gb, with an estimated genome heterozygosity of 0.51% (Fig. 2a).

Fig. 2
figure 2

Genome assessment of the American bullfrog. (a) A GenomeScope K-mer plot. (b) GC-depth plot. The x- and y-axis represent GC content and sequencing depth, respectively, and the corresponding histograms are displayed along the top and right axes.

Full size image

De novo genome assembly

After obtaining subreads, we used the assembly software hifiasm (v0.16.1)24 to perform reference-free de novo assembly of long reads from the pacbio platform’s SMRT sequencing, using default parameters. Error correction was performed using racon (v1.4.3)25, gcpp (v2.0.2) (https://github.com/PacificBiosciences/gcpp) and pilon (v1.22)26, which utilised MGI short reads. The initial genome assembly size was 6.37 Gb, aligning with the previously estimated genome size. Various evaluation methods were applied to assess the quality of the assembly.

Pseudo-chromosome construction

Hi-C technology was used to construct pseudo-chromosomes based on high-quality genome assembly. Trimmomatic software (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:15)27 was employed to remove sequencing low-quality fragments and adapters. We aligned the retained data to the draft assembly using Juicer (baw: -SP5M)28, filtering out low-quality and redundant reads. Interaction maps were created using 3D-DNA software (Fig. 3a), and error correction was performed using JuiceBox software (v1.11.08)29. The final genome assembly reached 6.32 Gb, consisting of 13 pseudo-chromosomes (See details in Table 1), which covered 99.21% of the original genome. The overall assembly metrics included a scaffold N50 of 691.8 Mb.

Fig. 3
figure 3

(a) A total of 13 distinct blocks were visualised in the Hi-C contact matrixes. Chromosomes 1–6 are designated as macrochromosomes, while chromosomes 7–13 are classified as microchromosomes. (b) Pseudo-chromosome and Karyotype Correlation Analysis. The x-axis represents chromosome length measured based on existing karyotype data, assuming that each μm of chromosome contains 37.0 Mb. The y-axis represents the pseudo-chromosome cumulative length. (c) A Circos plot summarising the genome features. From outside to inside: (I) the lengths of 13 pseudo-chromosomes, (II) gene density, (III) repeat coverage, (IV) non-coding RNA content, (V) GC content and (VI) internal syntenic blocks.

Full size image
Table 1 Statistics of the genome assembly for the American Bullfrog.
Full size table

Based on the existing karyotype data on American bullfrog30,31, we conducted a correlation analysis between the relative lengths obtained from the karyotype data and the cumulative length of pseudo-chromosomes. The analysis yielded an R-squared value above 0.99, confirming consistency with prior research. Using the karyotype image, we calculated physical chromosome lengths and determined a compression ratio of 37.0 Mb per μm (Table 1). The correlation analysis between the pseudo-chromosome lengths and karyotype measurements produced an R-squared value greater than 0.98 (Fig. 3b), demonstrating the accuracy of the pseudo-chromosome construction.

Gene annotation and functional assignment

Repetitive sequences were predicted using a Homolog-based approach, employing RepeatMasker and RepeatProteinMask software (v4.1.2)32 with the RepBase database. Additionally, an ab initio approach was employed, utilising RepeatModeler (v2.0.2a)32 and LTR-FINDER (v1.0.5)33 software to construct an ab initio repetitive sequence library, followed by prediction using RepeatMasker. Simultaneously, the TRF software (v4.09)34 was employed to identify Tandem Repeats within the genome. Finally, a total of 5.07 Gb of repetitive sequences were detected, accounting for 79.51% of the assembled genome (See details in Table 2).

Table 2 Repetitive Sequence Classification Results Statistics.
Full size table

Gene structure prediction involved three approaches: transcriptome-based, homology-based and de novo prediction. Exonerate (v2.4)35 was used to predict gene structures in homologous species, including Engystomops pustulosus, Bufo bufo, Rana temporaria and Xenopus laevis. De novo assembly of transcriptome reads was performed using Trinity (v2.13.2)36, followed by de novo prediction using PASA37 and Augustus (v3.3)38. A total of 32,382 genes were annotated, with an average gene length of 58,248.47 bp, an average CDS length of 1,264.88 bp and an average of 6.55 exons per gene (See details in Table 3).

Table 3 Gene structure and function annotation.
Full size table

Functional gene annotation was conducted through comparisons across various databases such as SwissProt39, NCBI NR, PFAM40, GO41, KEGG42, InterPro43 and TrEMBL44. Among these predicted genes, 98.96% were annotated in protein databases (See details in Table 3). Syntenic blocks were constructed using MCscan45, and a Circos46 plot (Fig. 3c) illustrated key features, including: (I) the lengths of 13 pseudo-chromosomes, (II) gene density, (III) repeat coverage, (IV) non-coding RNA content, (V) GC content and (VI) internal syntenic blocks.

Data Records

The genome assembly of the American bullfrog have been deposited in the GenBank database under the accession number GCA_042186555.147. The original genomic and transcriptomic data has been deposited in the China National GeneBank DataBase (CNGBdb) under project ID CNP0004806. Genome annotation files can be accessed publicly via the FigShare48.

Technical Validation

Quality assessment of the genome assembly

To evaluate the quality of the assembled genome, gDNA quality was assessed using both a Nanodrop spectrophotometer (Thermo Fisher Scientific, USA) and agarose gel electrophoresis. The OD260/280 values ranged between 1.8 and 2.0, while OD260/230 values were between 2.0 and 2.2, confirming sample quality. RNA extracted from the samples was analysed using a 2100 Bioanalyzer (Agilent Technologies, USA), yielding a 28S/18S ratio exceeding 1.0 and an RNA Integrity Number (RIN) greater than 7.0, further meeting quality requirements.

The minimap2 software (v2.12)49 was employed to map long reads to assembly data, using a 1000 bp sliding window, which confirmed that the genome assembly was free from contamination (Fig. 2b). Additionally, alignment of both long and short reads to the assembled genome indicated coverage of 99.13% and 99.96%, respectively, with average sequencing depth of 50.39x and 35.42x. The integrity of the genome assembly was further assessed using Benchmarking Universal Single Copy Orthologs (BUSCO, v5.2.2)50 with the vertebrata_odb10 database (Table 4). The results showed that the assembly and annotation contained 92.2% and 89.5% complete BUSCO genes, respectively, with single copy BUSCOs accounting for 90.1% and 86.9%, and duplicated BUSCOs representing 2.1% and 2.7%. Merqury (v1.3)51 analysis revealed a QV score and error rate of 38.924 and 1.28e-04, respectively. Gfastats (v1.3.1)52 was used to generate assembly summary statistics (Table 4).

Table 4 Statistics of the assembled genome and gene annotation for the American bullfrog.
Full size table

Collinearity analysis

The JCVI package53 was used to perform whole-genome synteny analysis by aligning the chromosome-level genome data between the American bullfrog (this study) and its relative common frog (R. temporaria). Our results demonstrated strong one-to-one chromosomal correspondences between the two species (Fig. 4), highlighting the high-quality of the American Bullfrog genome assembly.

Fig. 4
figure 4

Chromosomal Synteny Analysis of Aquarana catesbeiana and its relative Rata temporaria.

Full size image

Related Articles

Energy metabolism in health and diseases

Energy metabolism is indispensable for sustaining physiological functions in living organisms and assumes a pivotal role across physiological and pathological conditions. This review provides an extensive overview of advancements in energy metabolism research, elucidating critical pathways such as glycolysis, oxidative phosphorylation, fatty acid metabolism, and amino acid metabolism, along with their intricate regulatory mechanisms. The homeostatic balance of these processes is crucial; however, in pathological states such as neurodegenerative diseases, autoimmune disorders, and cancer, extensive metabolic reprogramming occurs, resulting in impaired glucose metabolism and mitochondrial dysfunction, which accelerate disease progression. Recent investigations into key regulatory pathways, including mechanistic target of rapamycin, sirtuins, and adenosine monophosphate-activated protein kinase, have considerably deepened our understanding of metabolic dysregulation and opened new avenues for therapeutic innovation. Emerging technologies, such as fluorescent probes, nano-biomaterials, and metabolomic analyses, promise substantial improvements in diagnostic precision. This review critically examines recent advancements and ongoing challenges in metabolism research, emphasizing its potential for precision diagnostics and personalized therapeutic interventions. Future studies should prioritize unraveling the regulatory mechanisms of energy metabolism and the dynamics of intercellular energy interactions. Integrating cutting-edge gene-editing technologies and multi-omics approaches, the development of multi-target pharmaceuticals in synergy with existing therapies such as immunotherapy and dietary interventions could enhance therapeutic efficacy. Personalized metabolic analysis is indispensable for crafting tailored treatment protocols, ultimately providing more accurate medical solutions for patients. This review aims to deepen the understanding and improve the application of energy metabolism to drive innovative diagnostic and therapeutic strategies.

Tissue macrophages: origin, heterogenity, biological functions, diseases and therapeutic targets

Macrophages are immune cells belonging to the mononuclear phagocyte system. They play crucial roles in immune defense, surveillance, and homeostasis. This review systematically discusses the types of hematopoietic progenitors that give rise to macrophages, including primitive hematopoietic progenitors, erythro-myeloid progenitors, and hematopoietic stem cells. These progenitors have distinct genetic backgrounds and developmental processes. Accordingly, macrophages exhibit complex and diverse functions in the body, including phagocytosis and clearance of cellular debris, antigen presentation, and immune response, regulation of inflammation and cytokine production, tissue remodeling and repair, and multi-level regulatory signaling pathways/crosstalk involved in homeostasis and physiology. Besides, tumor-associated macrophages are a key component of the TME, exhibiting both anti-tumor and pro-tumor properties. Furthermore, the functional status of macrophages is closely linked to the development of various diseases, including cancer, autoimmune disorders, cardiovascular disease, neurodegenerative diseases, metabolic conditions, and trauma. Targeting macrophages has emerged as a promising therapeutic strategy in these contexts. Clinical trials of macrophage-based targeted drugs, macrophage-based immunotherapies, and nanoparticle-based therapy were comprehensively summarized. Potential challenges and future directions in targeting macrophages have also been discussed. Overall, our review highlights the significance of this versatile immune cell in human health and disease, which is expected to inform future research and clinical practice.

Advance in peptide-based drug development: delivery platforms, therapeutics and vaccines

The successful approval of peptide-based drugs can be attributed to a collaborative effort across multiple disciplines. The integration of novel drug design and synthesis techniques, display library technology, delivery systems, bioengineering advancements, and artificial intelligence have significantly expedited the development of groundbreaking peptide-based drugs, effectively addressing the obstacles associated with their character, such as the rapid clearance and degradation, necessitating subcutaneous injection leading to increasing patient discomfort, and ultimately advancing translational research efforts. Peptides are presently employed in the management and diagnosis of a diverse array of medical conditions, such as diabetes mellitus, weight loss, oncology, and rare diseases, and are additionally garnering interest in facilitating targeted drug delivery platforms and the advancement of peptide-based vaccines. This paper provides an overview of the present market and clinical trial progress of peptide-based therapeutics, delivery platforms, and vaccines. It examines the key areas of research in peptide-based drug development through a literature analysis and emphasizes the structural modification principles of peptide-based drugs, as well as the recent advancements in screening, design, and delivery technologies. The accelerated advancement in the development of novel peptide-based therapeutics, including peptide-drug complexes, new peptide-based vaccines, and innovative peptide-based diagnostic reagents, has the potential to promote the era of precise customization of disease therapeutic schedule.

Advances in the structures, mechanisms and targeting of molecular chaperones

Molecular chaperones, a class of complex client regulatory systems, play significant roles in the prevention of protein misfolding and abnormal aggregation, the modulation of protein homeostasis, and the protection of cells from damage under constantly changing environmental conditions. As the understanding of the biological mechanisms of molecular chaperones has increased, their link with the occurrence and progression of disease has suggested that these proteins are promising targets for therapeutic intervention, drawing intensive interest. Here, we review recent advances in determining the structures of molecular chaperones and heat shock protein 90 (HSP90) chaperone system complexes. We also describe the features of molecular chaperones and shed light on the complicated regulatory mechanism that operates through interactions with various co-chaperones in molecular chaperone cycles. In addition, how molecular chaperones affect diseases by regulating pathogenic proteins has been thoroughly analyzed. Furthermore, we focus on molecular chaperones to systematically discuss recent clinical advances and various drug design strategies in the preclinical stage. Recent studies have identified a variety of novel regulatory strategies targeting molecular chaperone systems with compounds that act through different mechanisms from those of traditional inhibitors. Therefore, as more novel design strategies are developed, targeting molecular chaperones will significantly contribute to the discovery of new potential drugs.

Iron homeostasis and ferroptosis in muscle diseases and disorders: mechanisms and therapeutic prospects

The muscular system plays a critical role in the human body by governing skeletal movement, cardiovascular function, and the activities of digestive organs. Additionally, muscle tissues serve an endocrine function by secreting myogenic cytokines, thereby regulating metabolism throughout the entire body. Maintaining muscle function requires iron homeostasis. Recent studies suggest that disruptions in iron metabolism and ferroptosis, a form of iron-dependent cell death, are essential contributors to the progression of a wide range of muscle diseases and disorders, including sarcopenia, cardiomyopathy, and amyotrophic lateral sclerosis. Thus, a comprehensive overview of the mechanisms regulating iron metabolism and ferroptosis in these conditions is crucial for identifying potential therapeutic targets and developing new strategies for disease treatment and/or prevention. This review aims to summarize recent advances in understanding the molecular mechanisms underlying ferroptosis in the context of muscle injury, as well as associated muscle diseases and disorders. Moreover, we discuss potential targets within the ferroptosis pathway and possible strategies for managing muscle disorders. Finally, we shed new light on current limitations and future prospects for therapeutic interventions targeting ferroptosis.

Responses

Your email address will not be published. Required fields are marked *