Understanding rare variant contributions to autism: lessons from dystrophin-deficient model

Introduction
The dystrophinopathies Duchenne (DMD) and Becker Muscular Dystrophy (BMD) are allelic genetic disorders characterized by progressive muscle weakness and degeneration, affecting about 1:5000–6000 males1. They are caused by pathogenic variants in DMD, which encode dystrophin, a protein that plays a crucial role in maintaining fiber stability in skeletal muscles1. Neurodevelopmental changes in dystrophinopathies are well-established, encompassing delays in developmental milestones and the co-occurrence of neurodevelopmental disorders (NDDs), including intellectual disability (ID; ~30%) and autism spectrum disorders (ASD; ~20%)2,3. The low penetrance of these conditions in dystrophinopathies suggests the presence of additional factors to their manifestation. The role of DMD shorter isoforms Dp140 and Dp71 in increasing the risk of NDD is well recognized. However, it is important to note that their absence alone cannot account for all cases of NDDs observed in individuals with dystrophinopathies4,5.
Genomic studies in individuals with dystrophinopathies have primarily focused on identifying modifier genes influencing the muscle phenotype6,7. To date, despite a few case reports in the literature of individuals with dystrophinopathy carrying additional genetic variants8,9,10, there is no systematic genomic analysis of additional genetic contributions to the manifestations of NDDs in this group of individuals. This gap exists despite the recognized relevance of the genetic background to the manifestation of complex neurodevelopmental phenotypes11,12.
ASD encompasses a heterogeneous group of neurodevelopmental alterations characterized by persistent deficiencies in social interaction and communication, as well as repetitive behaviors and restricted interests (DSM-V)13. ASD occurs in ~1% of the global population, with a prevalence four times higher in boys than in girls. Despite the known contribution of environmental factors to its etiology, ASD presents one of the highest heritability among NDDs (38–90%14,15). It is remarkable that, although hundreds of genes have been associated with ASD, only up to 30% of ASD cases carry a known pathogenic, highly penetrant variant. The majority of these cases represent monogenic forms of ASD. The remaining cases are believed to be associated with more complex patterns of inheritance, such as the oligogenic and polygenic/multifactorial models. In these models, both inherited or de novo rare and common variants act as risk variants, and can contribute to the phenotypic manifestation.
In general, rare variants confer higher risk compared to common variants because they can lead to more significant functional damage to the protein. Despite the advances in understanding the contribution of both common and rare variants to ASD16,17, several questions remain largely unanswered, such as the minimum required number of variants and which combinations of such variants are sufficient to cause the phenotype at the individual level. In this sense, the analysis of the genetic background of ASD individuals carrying a known risk variant may be helpful in understanding these more complex patterns of inheritance.
Through the analysis of the genetic background of 83 individuals with dystrophinopathies, this study main aims were: to characterize ASD in males with dystrophinopathies; to determine the relevance of additional de novo (DNV) and inherited rare risk variants (RRVs) for their neurodevelopmental phenotype; and to clinically and functionally characterize the identified risk variants.
Results
Participants, clinical characterization, and CNV analysis
In total, 83 individuals (36 DMD-ASD, 12 DMD-ID, 35 DMD-Control), from 77 families, were kept for further analysis. No significant difference in motor development was observed among DMD-subgroups (Kruskal–Wallis test, p = 0.2320). Speech impairment was common (62.5%) and significantly higher in the DMD-ASD group, compared to DMD-Control (Fisher’s exact test, adjusted p value < 0.0002). Additional details on DMD variants and clinical data are provided in Table 1 and Supplementary Table 1.
Four individuals (P15, P21, P61, P91) were diagnosed with BMD, while the remaining individuals exhibited typical DMD progression. DMD-ASD (61.1%) and DMD-ID (75.0%) had a large proportion of variants affecting shorter isoforms (Dp140 and Dp71), as compared to those affecting only the largest isoform, though not statistically significant (Fisher’s exact test, adjusted p value > 0.999 and 0.5382, respectively; Supplementary Fig. 2). No differences occurred in DMD-Control (51.4%). PCA analysis showed no ancestry differences between groups (Supplementary Fig. 3).
Finally, one pathogenic CNV (copy number variation) was identified in the individual P41: a 1.5 Mb microdeletion at 1q21.1q21.2 (minimum deletion region chr1:146877659–148483252; Supplementary Fig. 4), overlapping the 1q21.1 recurrent region (BP3-BP4, distal) (includes GJA5) from ClinGen. This 1q21.1 microdeletion confers an increased risk of global developmental delay, ID, and other neurodevelopmental conditions such as ASD, among other clinical signs, with variable expressivity and reduced penetrance (OMIM #612474 CHROMOSOME 1q21.1 DELETION SYNDROME, 1.35-MB).
De novo variant analysis in DMD-ASD and DMD-control groups
In total, 39 DNVs were identified (Supplementary Table 3, Supplementary Fig. 6), leading to a DNV rate of 1.1 variant/exome/individual. DMD-ASD group presented a large average number of DNVs than the DMD-Control group (26/18 [1.44] vs. 13/17 [0.76] variants/individuals; one-tailed Fisher’s exact test, adjusted p value = 0.2732, alpha = 0.025), although not statistically significant. However, when considering risk DNVs, DMD-ASD group was significantly enriched (7/18 vs. 0/17 variants/individuals; one-tailed Fisher’s exact test, adjusted p value = 0.0356, alpha = 0.025; Fig. 1a).

a Bar plot showing the distribution of total and risk DNVs in DMD-ASD and DMD-Control groups. Fishers’ exact test was used to test whether there was a difference between groups regarding total and risk DNVs. b Boxplot showing the distribution of paternal age between DMD-ASD and DMD-Control groups. Welch’s t test was applied to determine whether there was a difference in the mean paternal age. c Boxplot showing the distribution of rare variants for each group. Mann–Whitney test was performed to verify whether there was a difference in the average number of RRVs and synonymous variants between cases (DMD-ASD and DMD-ID) and the DMD-Control group. The DMD-ASD group presented, on average, a higher number of RRVs than DMD-Controls, after Bonferroni correction. ns = non-significant; *p value < 0.05.
Paternal age was positively associated with the number of DNVs (mean (SD) = 32.51 (±8.02) years; Poisson regression analysis, beta = 0.05708, p value = 0.0133, Supplementary Fig. 5), while maternal age showed no significant association (mean (SD) = 29.66 (±5.42) years; beta = 0.04463, p value = 0.133). Moreover, DMD-ASD fathers were, on average, older than DMD-Control fathers (DMD-ASD: 34.83 (±9.43) years, DMD-Control: 29.94 (±5.31) years; one-tailed Welch’s t test, p value = 0.0347; Fig. 1b). Parental origin of three out of 39 variants (7.7%) was determined, being all of them paternal, Supplementary Table 3.
Enrichment of RRVs analysis
A total of 221 RRVs (in 209 genes), and 379 synonymous variants (in 338 genes), were selected. DMD-ASD individuals showed a higher average of RRVs compared to DMD-Controls (3.25 [117/36] vs. 2.23 [78/35] variants/individual; Mann–Whitney test, adjusted p value = 0.0285, alpha = 0.025). No difference was observed for synonymous variants (DMD-ASD: 4.69 [169/36] vs. 4.69 [164/35] variants/individual; Mann–Whitney test, adjusted p value > 0.999, alpha = 0.025), Fig. 1c.
Comparing DMD-ID with DMD-Control, no significant differences were found for RRVs or synonymous variants (RRVs: 2.17 [26/12] variants/individual, Mann–Whitney test, adjusted p value > 0.999, alpha = 0.025; synonymous: 3.83 [46/12] variants/individual, Mann–Whitney test, adjusted p value = 0.7348, alpha = 0.025).
Considering the parental WES data for 38 individuals from our sample (18 DMD-ASD, 17 DMD-Control, and 3 DMD-ID individuals), we were able to infer the origin of 60 out of 117 RRVs in the DMD-ASD group, 8/26 in the DMD-ID group, and 45/78 in the DMD-Control group. DMD-ASD presented 27 (45.0%) maternally inherited and 29 (48.33%) paternally inherited variants (Fishers’ exact test, p value = 0.8549); DMD-ID presented four (50.0%) maternally inherited and three (37.5%) paternally inherited variants (Fishers’ exact test, p value > 0.9999); and DMD-Control presented 21 (46.7%) maternally inherited and 26 (53.3%) paternally inherited variants (Fishers’ exact test, p value = 0.6735). In addition, five variants were classified as de novo, and all de novo risk variants were found in the DMD-ASD (four variants) and DMD-ID (one variant) groups.
Frequency of dual diagnosis in the DMD-ASD and DMD-ID groups
Two pathogenic and one likely pathogenic variants associated with autosomal dominant neurodevelopmental conditions (EBF3:NM_001005463.1:c.616 C > T [p.Arg206Ter], #MIM617330; SCAF4: NM_020706.2:c.1276 C > T [p.Arg426Ter], #MIM620511; and UBE3A:NM_000462.3:c.29delA [p.Lys10fs], #MIM105830) were identified in DMD-ASD individuals (P89, P86 and P85, respectively). The pathogenic variants in EBF3 and SCAF4 were DNVs, while the likely pathogenic variant in UBE3A was an RRV, maternally inherited. The rate of dual diagnosis in the DMD-ASD group was 8.33% (3/36 individuals) and 0.0% (0/12 individuals) in the DMD-ID group. Of note, a likely pathogenic DNV was identified in individual P69 (HEPACAM: NM_152722.5:c.382 G > A [p.Asp128Asn], DMD-ID group), which is associated with ID, although with low penetrance (around 40%, #MIM613926).
Gene enrichment analysis
Joint analysis of RRVs from DMD-ASD and DMD-ID groups (n = 138 genes) revealed enrichment for molecular functions related to extracellular matrix (ECM) constituents and ECM structural constituent conferring tensile strength (Supplementary Table 4). It was also enriched for the following cellular components: collagen trimmer fibrillar, collagen trimmer, and banded collagen fibril. Similarly, pathway analysis showed significant pathway enrichment associated with integrin and non-integrin cell surface interactions, collagen trimerization, collagens, and syndecan interactions. Finally, genes were enriched for Ehlers-Danlos Syndrome (EDS), Supplementary Table 4. No enrichment terms or pathways were observed for the DMD-Control group (n = 78 genes).
Discussion
DMD pathogenic variants are established risk factors for NDDs18. In this study, we explored the impact of RRVs on the neurodevelopmental phenotype of 83 individuals with dystrophinopathies, particularly ASD. The relevance of the oligogenic model for ASD has been recognized12. However, due to the low populational frequency of RRVs, detecting causative combinations with statistical significance often requires larger sample sizes than those available in well-established autism cohorts19. In this scenario, homogenizing the sample for the known primary risk variant allowed the identification of genetic patterns and genotype-phenotype associations with a significantly reduced sample size.
In this study, we have proposed the analysis of a cohort with a known primary ASD risk variant (DMD gene) to verify if it allows the identification of genetic patterns and genotype-phenotype associations with a significantly reduced sample size. We have taken advantage of the current knowledge of the genetic architecture in ASD for rare variants, which include de novo and inherited variants together with one of the best-known environmental risk factors for ASD, parental age20.
All the included individuals had a confirmed molecular diagnosis of dystrophinopathy. Speech impairment was prevalent (62.5%) and significantly higher in DMD-ASD individuals than in DMD-Controls, suggesting an association between speech development and ASD in dystrophinopathies. In contrast, motor development delay was observed across all DMD-subgroups, as commonly described in boys with dystrophinopathies regardless of their neurodevelopment21. No differences were observed in the proportion of individuals carrying variants affecting the DMD shorter isoforms Dp140 and Dp71 for any DMD-subgroup. While a higher frequency of cognitive impairment is associated with pathogenic variants affecting these isoforms3,22, their role in other neurodevelopmental alterations is unclear. Indeed, the highest frequency of variants affecting these isoforms was observed in our DMD-ID group. The lack of statistical difference may be due to a limited statistical power due to the sample size.
The analysis of DNVs revealed that DMD-ASD exhibited an enrichment of risk DNVs compared to DMD-Controls. The findings on the higher DNV rate for risk genes and increased paternal age in the DMD-ASD group suggest that ASD in dystrophinopathies is being partly driven by paternal age, similarly to the results observed in ASD large cohorts23.
DMD-ASD cases also showed a higher average of RRVs than DMD-Controls, supporting an oligogenic model in ASD among dystrophinopathies. No bias in the parental origin of RRVs was observed in the three analyzed groups. In addition, the rate of dual diagnosis for the DMD-ASD group was 8.33%. The three affected genes are associated with monogenic forms of ASD and/or ID, with a high (> 90%) penetrance (EBF3, UBE3A, and SCAF4). Notably, in all cases, the primary clinical concern reported in these individuals was their behavioral alterations and/or delayed development. While these variants are causative of NDDs, how they interact with the DMD variant to determine the final clinical outcome is unclear. Nevertheless, given its relevance for genetic counseling, exome testing should be considered in dystrophinopathies cases with a severe neurodevelopmental phenotype.
Moreover, one individual from the DMD-ASD group (P41) and one from the DMD-ID group (P69) presented an additional clinically relevant variant (a pathogenic 1q21.1 microdeletion and a likely pathogenic missense variant in HEPACAM, respectively), both associated with neurodevelopmental alterations with reduced penetrance24,25. The presence of these additional known RRVs further supports an oligogenic model in the manifestation of NDDs among individuals with dystrophinopathies. Additionally, although our analysis mainly focused on rare single nucleotide variants (SNVs), we cannot rule out contributions from other types of variants, including common and regulatory ones26.
Enrichment analysis based on the RRVs identified in DMD-ASD and DMD-ID groups converged to ECM components, most particularly collagen genes. Moreover, disease analysis revealed enrichment for EDS, a group of genetic disorders that affect connective tissues caused by pathogenic variants in collagens or collagen processing proteins27. ECM genes have been shown to contribute to ASD28, and individuals with EDS and hypermobility syndrome are at increased risk for NDDs, including ASD29,30, although the molecular mechanisms are still unclear. In addition, dystrophin is essential in the DGC (dystrophin-glycoprotein complex), which links the ECM with intracellular components in different cell types1,31. Our findings thus suggest that insults in the DGC, through dystrophin pathogenic variants, in conjunction with insults in ECM components, elevate the risk of ASD in dystrophinopathies. However, despite their functional proximity, it remains unclear how alterations in one affect the other, and how they jointly contribute to neurodevelopmental changes.
In summary, the observation of enrichment of rare DNVs and RRVs in ASD-DMD-affected individuals suggests that ASD in DMD may be dependent on several variants in addition to dystrophin pathogenic variants, favoring an oligogenic model of inheritance. Paternal age seems also to contribute to ASD risk in this DMD cohort. Finally, this study shed light on the biological functions and pathways altered in this subgroup of ASD-affected individuals. We thus support that the study of genetically homogenized cohorts represents a strategy to explore complex genetic models.
Despite our significant results, this study presents some limitations, such as the relatively low sample size, particularly for the DMD-ID and DMD-ASD-ID groups, and the unavailability of clinical data for all individuals, limiting a more comprehensive genotype-phenotype correlation. We expect that in the near future, with the analysis of larger DMD-ASD, DMD-ID, and DMD-ASD-nonID cohorts with more in-depth phenotyping, we will be able to investigate if the genetic architecture differs among these subgroups, which represent a relevant challenge to be addressed. Furthermore, it is significant to evaluate regulatory or common variants besides rare variants in their contribution to the final outcome.
Methods
Subjects, DNA samples, and clinical characterization
We collected DNA samples from 92 individuals (from 81 families; Supplementary Fig. 1). Families were recruited at Centro de Estudos do Genoma Humano e Células-Tronco (CEGH-CEL), Universidade de São Paulo, or through the foundation Aliança Distrofia Brasil (ADB). DNA was extracted from whole blood or saliva at CEGH-CEL. The main criteria of inclusion were male individuals confirmed to carry a known pathogenic DMD variant. Exclusion criteria, applied after neuropsychological characterization (described below), are detailed in the “DMD-subgroups definition” section. Clinical data was provided by parents or retrieved from medical records. This study was performed in accordance with the ethical standards of the Declaration of Helsinki. The Ethics Committee from Instituto de Biociências, Universidade de São Paulo, CEP/USP, fully approved the study (CAEE 56459522.0.0000.5464). Written informed consent was obtained from parents or legal guardians, whenever possible. As this study involved the use of stored DNA of families previously seen by our group, our Ethics Committee allowed us to include individuals whom we were not able to contact after at least five attempts. All these families had previously signed a consent term allowing the use of biological and clinical data for research purposes, and all clinical data at the individual level was de-identified.
Neuropsychological characterization was performed using CARS (Childhood Autism Rating Scale, n = 83) or previous ASD diagnosis (n = 9). Individuals with CARS score ≥30 were classified as at risk for autism. Moreover, 55 individuals were also evaluated for cognitive impairment using the Raven’s Progressive Matrices test or the WISC-IV (Wechsler Intelligence Scale for Children, 3rd or 4th edition). Three groups were defined: DMD-ASD: individuals with CARS ≥ 30 or previous ASD diagnosis; DMD-ID: individuals with CARS < 30 and RAVEN/WISC indicating cognitive impairment; DMD-Control: individuals with CARS < 30 and Raven/WISC (when available) negative for cognitive impairment, and individuals with CARS < 30 without cognitive evaluation, but without a history of speech delay nor reported learning difficulties.
DMD-subgroups definition
For the ten families with more than one individual with dystrophinopathy in our sample, at most two individuals were kept in the sample and only if they were ASD/ID discordant. In cases in which more than one individual from the same family was in the same DMD group, the following criteria were adopted for individual’s selection: (1) clinical evaluation–the presence of previous ASD or ID diagnosis over only CARS/cognitive evaluation (for DMD-ASD and DMD-ID groups) or best performance on cognitive evaluation (for DMD-Control group), (2) in cases for which both individuals presented the same clinical performance, the younger individuals was kept in the sample, (3) in case of dizygotic twins, WES quality was considered, keeping the individual with higher mean coverage. Detailed information on clinical data, DNA extraction, sequencing metrics, and group definition is provided in Supplementary Tables 1 and 2.
Based on cognitive and ASD evaluations, the 92 individuals were firstly distributed into three groups as following: DMD-ASD: 38 individuals (29 with CARS ≥ 30 and nine individuals with previous ASD diagnosis); DMD-ID: 12 individuals (11 based on RAVEN and one based on WISC-subtest results); DMD-Control: 42 individuals (including 28 individuals with normal cognitive evaluation). Two individuals from the DMD-ASD group (P29 and P82) were excluded since they have another affected brother in the same group. In addition, seven individuals were excluded from the DMD-Control group: four individuals (P18, P53, P63, and P73) without cognitive evaluation that presented both speech impairment and learning difficulties and three individuals with an affected relative individual in the same group (P5, P11, and P22).
Parents’ WES was available for 38/83 individuals (DMD-ASD group: 18 individuals, DMD-ID group: three individuals, and DMD-Control group: 17 individuals; Supplementary Table 2).
DMD pathogenic variants
All DMD variants were identified through either MLPA (Multiplex ligation-dependent probe amplification, kits P034 and P035 – MRC-Holland BV, Amsterdam, the Netherlands) or NGS Next Generation Sequencing (NGS)analysis: target sequencing of the DMD gene, a panel for myopathy and dystrophy associated genes, containing 266 genes as target, or WES, and were confirmed through Sanger (for NGS results), or exome sequencing (WES; for MLPA results). Individuals were categorized based on variant type and affected isoforms, distinguishing between those impacting the Dp427 isoforms (exons 1–44) and those affecting the brain isoforms Dp140 and Dp71 (exons 45–79), DMD accession number NM_004006.3.
Exome sequencing, variant selection, validation, and copy number variation (CNV) analysis
DNA samples were prepared using one of the following kits: SureSelect QXT Target Enrichment for Illumina Multiplexed Sequencing – V6 (Agilent Technologies, Inc., Santa Clara, CA – USA), IDT – xGen Exome Research Panel V1.0 or IDT – xGen Exome Research Panel V2.0 (Integrated DNA Technologies [IDT], Inc., Iowa, USA), following the manufacturer’s recommendations. The libraries were sequenced with Illumina HiSeq 2500 sequencer or Illumina NovaSeq 6000 sequencer, in paired-end reads of ~100 bp (Supplementary Tables 1 and 2). Sequence alignments to the human genome reference (UCSC hg38) were performed with Burrows-Wheeler Aligner, data processing and variant calling was performed with Picard (version 2.18.732) and Genome Analysis Toolkit package (GATK, version 4.0.9.033). SNVs and small indels located in exons or within 50 bp of the splicing junction were called in individual samples using GATK v.4.0.9.0 HaplotypeCaller, and were jointly genotyped using GATK GenotypeGVCFs. The mean coverage of exome data from the 92 individuals was 126.3, with 97% of the exome being covered by at least 10 reads, and 96% by at least 20 reads, on average.
After variant quality score recalibration, called variants were annotated using ANNOVAR v.2016Feb0134 and Ensembl Variant Effect Predictor (VEP35), including allele frequency in the general population (GnomAD, ABraOM and 1000 G databases36,37,38), predictive tools for deleteriousness of the alternate allele (CADD score, SpliceAI39,40), gene probability of being loss-of-function intolerant (pLI) or missense intolerant (Z score) scores (GnomAD database) and gene-disease association databases (OMIM, ClinVar, SFARI41,42,43). For all analyses performed, only variants that reached the minimum quality criteria defined as variants with at least 10 reads (and alternate allele number of reads ≥5) and allele balance between 0.3 and 0.7 (heterozygous) or ≥0.9 (homozygous), were selected. Next, variants were visually inspected using the Integrative Genomic Viewer (IGV44) software and, for individuals for which parents were sequenced, variant segregation was also inspected using IGV. All variants were verified using the variant description validation software VariantValidator45.
For each analysis performed, a set of variants with poor quality on IGV’s visual inspection were selected for Sanger sequencing, which was performed using BigDye Terminator v3.1 Cycle Sequencing Kit, in the ABI 3730 DNA Analyzer (Thermo Fisher Scientific, Massachusetts, USA).
The aim of CNV analysis in this dataset was to identify additional pathogenic or likely pathogenic CNVs mapped to 48 chromosomal regions with recurrent CNVs46. eXome Hidden Markov Model (XHMM)47 was used in conjunction with at least one additional software (CNVkit48, panelcn.MOPS49, or NextGene (https://softgenetics.com/products/nextgene/cnv-analysis/), following the recommended parameters, to investigate CNVs from exome sequencing data in individuals from the DMD-ASD and DMD-ID groups (detailed information on software used for each individual is available in Supplementary Table 1). After the CNV call, the variants were annotated with AnnotSV50.
The following prioritization criteria were used to select relevant CNVs of high confidence: only CNVs identified by XHMM and one additional software were considered (considering an overlap of, at least, 10% with the minimum chromosomal region detected by XHMM). Rare (frequency ≤1% according to DGV (Database of Genomic Variants51) deletions and duplications with at least 200 kb in size and SQ ≥ 30, unique in our sample (or in a family, in cases where more than one individual was evaluated) and mapped to each of the 48 target chromosomal regions from CLinGen recurrent regions. CNVs automatically classified as ACMG (American College of Medical Genetics and Genomics52) score 3 (Variant of Uncertain Significance, VUS), 4 (Likely pathogenic) or 5 (Pathogenic) by AnnotSV were further evaluated. Additional CNV classification criteria included (a) the exclusion of CNVs highly recurrent on DGV, after manual curation; characterization according to (b) size, (c) type (deletion or duplication), (d) gene content, (e) literature and clinical information about disease-associated genes, and (f) inheritance (whenever possible), (g) Franklin by Genoox (https://franklin.genoox.com/) automated CNV classification (based on the ACMG guidelines). For all prioritized CNVs for manual inspection, the size was considered as the minimum CNV size considering the overlap between the software calls.
Kinship and ancestry analysis
All trios, quartets, and related individuals were tested for kinship. In summary, in search for rare variants that are most likely inherited by descent, we determine the frequency of shared rare (frequency <1% in public databases) variants in all loci with high quality (properly genotyped in at least 80% of the cohort). Individuals with at least 2% of shared rare variants were considered related to some degree (Supplementary Table 1). We also performed ancestry characterization based on PCA analysis aiming to determine whether our groups potentially present differences in their genetic background due to ancestry (Supplementary Fig. 3).
De novo and RRvs analysis
High-confidence de novo variants (DNVs) were annotated using DeNovoGear53 and PossibleDeNovo54. DMD variants were excluded from analysis due to the study’s inclusion criteria. Selected DNVs, identified by at least one software, included rare (frequency ≤1%), missense, splicing, synonymous, and loss-of-function variants (LoF; frameshift insertion, frameshift deletion, stopgain, and stop loss variants). The DMD-ASD and DMD-Control groups were compared for total and risk DNVs (defined as LoF, missense (CADD ≥ 25), and splicing (spliceAI ≥ 0.8) variants in brain-expressed genes). Correlations between the number of DNVs in the offspring and parental age were also tested. Finally, to obtain parental phasing information for the DNVs, DeNovoGear phaser53 analysis was performed.
To test whether DMD-ASD and the DMD-ID individuals present a higher average of RRVs in their genetic background, compared to DMD-Controls, we selected rare (frequency ≤1%) variants in a subset of genes relevant to neurodevelopment, here pre-defined as a merge of the following gene sets: (a) genes expressed in brain (HPA list) with pLI ≥ 0.5 or Z score ≥3.0 (4292 genes); (b) genes associated with neurodevelopmental conditions with autosomal or X-linked recessive conditions (based on the Decipher55 list of genes; 545 genes) and (3) genes with pLI < 0.5 previously associated with ASD (based on SFARI database genes score 1 and S, https://gene.sfari.org/, retrieved in March, 23th 2023; 393 genes), in addition to a set of 154 genes retrieved from different articles on ASD candidate genes identification56,57,58, totalizing 4934 unique genes, here referred as risk genes (Supplementary Data 1). Regarding the variant type, only LoF, missense (CADD ≥ 30), and splicing (spliceAI ≥ 0.8), variants were analyzed. For the Z-score set of brain-expressed genes (with pLI < 0.5 and not listed in the other set of genes), the variant was kept in the analysis only if it was classified as missense with CADD ≥ 30. The subset of such variants identified in our DMD groups are defined as “RRVs”. As a form of control for the false discovery rate, we also tested if the groups differed regarding the synonymous ultra-rare (absent in the populational databases) variants in the same set of 4934 risk genes.
Frequency of dual diagnosis
To determine the number of individuals carrying additional pathogenic or likely pathogenic variants associated with monogenic NDDs, rare risk DNVs and RRVs from DMD-ASD and DMD-ID groups were clinically characterized. Genes were characterized according to disease association (OMIM, Decipher, and SFARI databases) and disease inheritance patterns. Variants in genes associated with monogenic neurodevelopmental conditions with autosomal dominant or X-linked inheritance, in addition to variants affecting known ASD risk genes, were classified according to the ACMG guidelines59, using the Franklin by Genoox (https://franklin.genoox.com/) automatized classification, followed by manual curation. The frequency of individuals with dual diagnosis (DMD diagnosis plus a monogenic, highly penetrant NDD) was then determined for each group.
Gene enrichment and statistical analysis
Toppgene portal (http://toppgene.cchmc.org) was used to perform gene ontology (GO) and pathway enrichment analysis with the RRVs identified in the DMD-subgroups (biological processes, cellular components, molecular functions, biological pathways or disease). All terms enriched after Bonferroni correction for multiple tests were returned.
Statistical analysis
All statistical analyses were performed using either GraphPad Prism 9.0 (version 9.5.1) or RStudio (version 4.2.1). Statistical descriptive analyses were performed for clinical data, total number and frequency of individuals in different groups. Shapiro–Wilk and Kolmogorov–Smirnov tests were used to test data normality, and then nonparametric and parametric tests were used to compare clinical and molecular data, such as motor development among the three groups. Fishers’ exact test was used to determine whether there was a difference in frequencies of speech impairment, learning difficulties, and DMD-affected isoforms among groups. It was also used to determine the differences between the number of total and risk DNVs between DMD-ASD and DMD-Control groups, and to determine the differences between the frequency of paternal and maternal risk variants in all groups. Poisson regression analysis was used to determine the association between parental age (number of variants ~ parental age) and de novo mutations, and Welch’s t test was applied to determine whether there was a difference in the mean paternal age. Mann–Whitney Test (Wilcoxon Rank Sum Test) was used to compare the average of RRVs and synonymous variants between DMD-ASD/ID group and DMD-Control. It was also used to compare the average correlation coefficient between genes from the DMD-ASD/ID group and the total number of genes affected by RRVs. Bonferroni (BFC) or Benjamini-Hochberg False Discovery Rate (BH-FDR) correction was applied whenever multiple tests were performed.
Responses