Newly identified species from the dog dental plaque microbiome highlight little overlap with humans
Methods
Subject recruitment and sample collection
This study involved 23 pet dogs (avg. age 4.3 years) undergoing a trial testing a pet toothbrush based on electrolytic cleaning (product of VisHealth d.o.o., Belgrade, Serbia) vs. placebo. The dogs remained in their respective households, with owners instructed not to alter their feeding practices during the study. None of the dogs had received antibiotic treatment in the month preceding sampling. Plaque samples were collected by an experienced veterinarian at a single veterinary clinic by swabbing supragingival plaque (both maxillary and mandibular side) of all teeth at time point zero, after 28 days, and finally after 56 days. Sample collection was performed in accordance with the ethical standards of The Veterinary Directorate, Ministry of Agriculture, Forestry and Water Management (No. 323-07-02291/2023-05) and the Animal Research Committee of Faculty of Veterinary Medicine, University of Belgrade (No. 01-02/2023). Participation in the study was voluntary, and owners were advised to withdraw if their dogs exhibited any signs of discomfort related to the sampling procedure.
DNA extraction and shotgun metagenomic sequencing
DNA extraction and isolation were conducted immediately after sample collection following a modified version of the procedure described by ref. 36. The tops of the cotton swabs were transferred to 2 mL tubes containing TE buffer (0.01 M Tris-Cl pH 8.0, 0.001 M EDTA pH 8.0, 0.1 M NaCl). To maximize DNA yield, extraction from the cotton swabs was performed using a Digital Cell Disruptor Genie (Scientific Industries) at 2,850 rpm for two 15-minute intervals to mechanically lyse microbial cells. The solution was then centrifuged (10 minutes, 13,000 rpm), and the pellet was washed in TE buffer, following the standard DNA isolation protocol36. DNA integrity was assessed using 1% agarose gel electrophoresis and quantified using the Qubit 2.0 fluorometer (Thermo Fisher Scientific). Sequencing libraries were prepared using the Nextera DNA Library Preparation Kit (Illumina) and paired-end sequenced (2x150bp) on the Illumina NovaSeq 6000 platform at the Department of Cellular, Computational and Integrative Biology, University of Trento, following the manufacturer’s protocols.
Preprocessing and metagenomic assembly
Raw reads were preprocessed through a validated pipeline (available at https://github.com/SegataLab/preprocessing) for the removal of poor-quality and contaminant reads. Low-quality reads (average quality <20, length <75 bp, or >2 ambiguous nucleotides) were discarded using Trim Galore (v0.6.6, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). The remaining reads were then mapped with BowTie2 (v2.2.9)37 against the human (hg19) and dog (canFam6) reference genomes, besides the bacteriophage phiX174DNA (Illumina spike-in), and discarded accordingly. One dog metagenome with <100k quality-controlled reads was discarded.
Metagenome-assembled genomes (MAGs) were generated through a previously validated metagenomic assembly pipeline8. First, contigs were assembled with MEGAHIT (v1.1.1)38, and their coverage was calculated with Bowtie2 (v2.2.9)37. Next, contigs were binned with MetaBAT2 (v2.12.1)39, and resulting bins were quality-checked with CheckM (v1.1.3)40. MAGs were classified as low- (completeness <50% or contamination ≥5%), medium- (completeness ≥50% and contamination <5%), or high-quality (completeness <50% or contamination ≥5%), as previously proposed41. As CheckM is a tool tailored to prokaryotic genomes, the possibility that eukaryotic MAGs were among the low-quality bins was evaluated by running BUSCO (v5.6.1)42, but no eukaryotic MAGs were identified. Low-quality MAGs were then discarded and medium/high-quality MAGs were dereplicated at <0.01% Mash distance43.
Expansion of the MetaPhlAn database with dog plaque microbial species
The collection of MAGs and isolate genomes available in our microbial genomic database MetaRefSGB was expanded by including the quality-checked non-redundant MAGs from dog plaque. Genomes having an average nucleotide identity >95% (as computed via Mash43) were clustered into the same species-level genome bin (SGB), an operational genome-based definition of a prokaryotic species. SGBs containing only genomes reconstructed from metagenomes (i.e., MAGs) are considered unknown SGBs (uSGBs), while SGBs containing also isolate genomes are considered known (kSGBs) and receive the species-level taxonomic label of its isolate genomes (following a majority rule when there are isolates with conflicting taxonomies), as previously proposed8. Taxonomies of uSGBs are, by definition, unknown at species-level, but uSGBs inherit higher-level taxonomic labels from the higher-level genome clusters they belong to, such as their genus-level genome bin (clusters of >85% ANI) or family-level genome bins (clusters of >70% ANI), whose taxonomy is defined as described for SGBs. The phylogenetic tree illustrating the phylogenetic diversity of SGBs reconstructed from dog plaque was built with PhyloPhlAn (v3.1)44 using the 400 universal markers database and default parameters. For each SGB, the genome maximizing the score ‘completeness – 3 × contamination’ was chosen as the SGB representative to be placed in the tree. Representatives not having enough universal markers were discarded and therefore 35 reconstructed SGBs are not shown in the tree. The resulting tree was rerooted on the longest internal branch (a reconstructed member of the Archaea domain: unknown Euryarchaeota SGB146976) and annotated and visualized with GraPhlAn (v1.1.3)45.
This expanded MetaRefSGB resource (labeled vJan24) was used to build a database of SGB-specific marker genes through the ChocoPhlAn pipeline4. Only SGBs having either at least one reference genome (i.e., a genome sequenced from an isolate), at least one HQ MAG in MetaRefSGB, or at least one newly assembled dog plaque MAG were kept, resulting in an expanded version (vJan24) of the ChocoPhlAn database used along with MetaPhlAn 4 for metagenomic taxonomic profiling4.
Survey of public human plaque metagenomes and taxonomic profiling
Publicly available human plaque metagenomic datasets were searched in the literature. Only datasets including healthy individuals and with publicly available metadata identifying the healthy condition of individuals included were considered. Overall, four datasets including 154 plaque metagenomes from healthy humans were identified and downloaded: HMP_2012 (n = 88)29, EspinozaJ_2018 (n = 30)30, ShaiberA_2020 (n = 7)31, and MinK_2024 (n = 29)32. The aforementioned preprocessing pipeline was applied to the downloaded data. Quality-checked dog and human metagenomes (n = 218) were profiled at SGB-level resolution with MetaPhlAn 4 (v4.1.0) using the expanded markers database (vJan24) described above4.
Strain-level analysis
Strain-level characterization of SGBs found to be prevalent in both hosts was performed with StrainPhlAn 4 (v4.1.0)4. Resulting phylogenetic trees were rerooted at midpoint and annotated and visualized with GraPhlAn (v1.1.3, available at https://github.com/biobakery/graphlan). Association between SGB strain-level phylogenies and host species was assessed using the anpan package (available at https://github.com/biobakery/anpan), which allows incorporation of phylogenetic information into generalized linear mixed models to reveal phylogenetic structures associated with covariates of interest. When the absolute difference in terms of expected log point-wise predictive density (ELPD) between the phylogenetic model and the base model (i.e., without phylogenetic information) was greater than two, the association was considered significant.
Responses