Copy number variant analysis improves diagnostic yield in a diverse pediatric exome sequencing cohort

Introduction
Clinical exome sequencing (CES) is a routine diagnostic tool for rare Mendelian genetic disorders. The reported rate for identification of the causative single nucleotide variant (SNV) or insertion/deletion (INDEL) variant across all phenotypes using ES analysis ranges from 20 to 41%1,2,3,4,5. Diagnostic yield varies, depending on the clinical phenotype, cohort, and clinical workflow, with many cases remaining unsolved, i.e., without the identification of a pathogenic or likely pathogenic variant (P/LP) that explains the patient’s phenotype. Copy number variants (CNVs), ranging from intragenic deletions and duplications to whole chromosomal gains and losses, cause a significant proportion of genetic disorders6,7. Typically, CES is not validated for gene-level CNV detection or identification of regions of homozygosity (ROHs). Historically, array-based techniques have been used to assess copy number variation8.
Chromosomal microarray (CMA) is generally used to detect CNVs that are typically >50 kb in size. In practice, institutions, laboratories, and clinicians may have different approaches to the clinical workflow for first-line genomic diagnostic test selection, which can complicate review of the literature for diagnostic yield of ES data and CNV analysis9. This variability results from clinical suspicion of a candidate genomic abnormality, clinical indication, access to specific genetic testing, or merely to differences in provider ordering habits. In many cases, ES is ordered without an upfront CMA when the suspected causative variant is an SNV or INDEL, or CMA is ordered first if a copy number abnormality is suspected. However, the classical practice of using CMA to detect CNVs and ES to detect sequence variants, based on the clinical suspicion or ordering habits, leads to a higher false negative rate. For example, the causative CNV could be below the resolution of CMA or establishing diagnosis for autosomal recessive conditions may involve the presence of a CNV in trans with a SNV affecting the same gene.
For these reasons recent studies have leveraged CES data to detect CNVs in a wide variety of clinical scenarios, ranging from prenatal diagnostics to rare hereditary syndromes10,11,12,13. The diagnostic yield from CNV analysis in exome studies is reported to range from 1% to approximately 30% in the postnatal setting14,15. This variation is affected by the degree of upfront workup, for example if CMA was performed prior to CES; specific patient cohorts, as some clinical phenotypes are known to be more frequently caused by CNVs than others; stringency of variant assessment; and bioinformatic methodologies employed. The use of multiple CNV analysis tools often improves detection of CNVs16. Several studies have explored the additional yield provided by CNV analysis of CES data when CMA was negative, and it was reported to be 1–2%14,17.
In this study, we investigated the diagnostic yield from CNV analysis of CES data from a large cohort of pediatric patients (1538 cases), with a wide variety of testing indications, clinical referral sources, and patient phenotypes. The results demonstrate that CNV analysis of CES data can be incorporated into the standard clinical workflow and increase the diagnostic yield, particularly when CMA and/or genome sequencing (GS) testing are not performed.
Results
Study cohort
The study cohort included 407 CES cases and 1131 focused exome sequencing (FES) cases, 206 of which were also analyzed by CMA (Fig. 1A). Cases analyzed solely by CMA, including those with a significant CNV finding, were excluded from the study. Patient age ranged from newborn to >18 years with the majority (597) between 3–12 years. There was a slightly higher proportion of males; 836 males and 702 females. The most frequent HPO terms included global developmental delay, seizures, microcephaly, generalized hypotonia, delayed speech and language development, and immunodeficiency (Fig. 1B, Supplementary Data 2). Interrogation of genomic ancestry showed the highest number of individuals tested to be in the admixed American ancestry group (n = 914), consistent with the majority of families being of Hispanic origin. The next most common represented ancestry was European (n = 360) (Fig. 1C).

A The current clinical workflow regarding test selection. B Demographics of the cohort, including age, gender, test type, and the top 10 most frequent human phenotype ontogeny (HPO) terms in our cohort. C The predicted genomic ancestry of our cohort. AFR African, AMR mixed American, EAS East Asian, EUR European, SAS South Asian, CMA chromosomal microarray, CES clinical exome sequencing, FES focused exome sequencing*: There is significant variability in first-line genomic diagnostic test selection, this can relate to the clinical suspicion of the culprit genomic abnormality, clinical indication, or merely to differences in physician ordering habits.
Diagnostic yield
The diagnostic yield of our patient cohort was assessed for differential distribution by multiple factors such as clinical referral source, ancestry, and test type (Fig. 2, Supplementary Data 1). Our cohort showed differential distributions of diagnostic yields among the referral sources for CNV (p = 2.6 × 10-12, chi-squared = 102.26, df = 22) and SNV findings (p = 2.2 × 10-16, chi-squared = 155.48, df = 22) (Fig. 2A). The clinical indications for exome analysis were varied, with the referral from epileptologists being the most frequent (n = 611). The other common referral sources included genetics (n = 301), followed by pediatrics (n = 270), neonatology (n = 69), immunology (n = 63), and hematology (n = 62). Patients referred for genetic testing from hematology showed the highest percentage (11.3%) of P/LP CNVs out of all the referral sources. Other clinical sources of referrals with a high contribution of P/LP CNVs to their overall diagnostic yield included neonatology (10.1%), dermatology (9.1%), and gastroenterology (8.3%). The lowest contribution of CNVs to the diagnostic yield came from cases referred by Pulmonology, with a rate of 0%.

A The overall diagnostic yield of CES including CNV analysis by the referring department. Note the CNV component in bright red. B Total number of cases with P/LP and VUS SNV/INDEL diagnoses for each predicted ancestry. CTotal number of cases with P/LP and VUS CNV diagnoses for each predicted ancestry. D Overall diagnostic yield and inheritance pattern by predicted ancestry. E The results of CES with SNV/INDEL and CNV analysis. F The results of FES with SNV/INDEL and CNV analysis. AFR African, AMR mixed American, EAS East Asian, EUR European, SAS South Asian.
In addition, diagnostic yields were similarly distributed among predicted ancestries within the patient cohort (p = 0.89, chi-squared = 3.6, df = 8) (Fig. 2B–D, Supplementary Data 1). Admixed Americans (AMR) were our most prevalent ancestry (n = 914), which showed a prevalence of P/LP variants of CNV (4.5%) and SNV (24.1%) compared to other ancestries. The South Asian population (SAS) was our least prevalent ancestry group (n = 35), which showed a percentage of variants of uncertain significance (VUS) of 40% (n = 14 out of 35). We tested for differences in diagnostic yields of those of European ancestry and all others (EUR vs. Non-EUR) and determined that there were no significant differences in CNV (p = 0.96, chi-squared = 0.08, df = 2) and SNV findings (p = 0.44, chi-squared = 1.63, df = 2). We also assessed the mode of inheritance of SNVs and found no significant differences among the predicted ancestry of our cohort (p = 0.13, chi-squared = 12.58, df = 8). The Admixed American (AD = 28%, AR = 58%, XL = 13%) and African American (AD = 42%, AR = 37%, XL = 21%) population of our cohort displayed a varied distribution of SNV inheritance patterns.
For the CES cases, SNVs were classified as P/LP in 108 (26.0%) cases, VUS in 100 (24.8%) cases, and 199 cases (42.8%) were negative (Fig. 2E). Similarly for the FES cases, SNVs were classified as P/LP in 262 (22.6%) cases, VUS in 354 (31.1%) cases, and negative in 515 (42.7%) cases (Fig. 2F). CNV (p-value = 4.99 × 10-5, chi-squared = 19.81, df = 2) and SNV (p = 3.45 × 10-2, chi-squared = 6.73, df = 2) diagnostic yields were differentially distributed by test type. The overall diagnostic yield for the cohort was 28.0% (430/1538), with additional exome based CNV analysis yielding relevant (P/LP) results in 4.6% (70/1538) patients (selected cases shown in Table 1). CMA was not clinically ordered for 54.3% of (38/70) these cases with positive CNV findings. For the remaining 45.7% (32/70) of these cases, CMA results were concordant with CNV findings from CES for 27 cases but discordant for 5 cases. Among the discordant cases, three cases, Case #4, involving a single exon deletion in EPM2A, and Cases #10 and #17, involving intragenic deletions of CACNA1A and ZMIZ1, respectively, were not detected by the ChAS software tool used for CMA analysis with CytoScan HD. This limitation occurred because the software requires a minimum of 25 consecutive markers to call a deletion or duplication. The single exon deletion in EPM2A was confirmed by targeted reanalysis of the previous CMA data, and the intragenic deletion in CACNA1A (Supplementary Fig. 1) and ZMIZ1 were confirmed using the CytoScan XON array. In the remaining two discordant cases (Cases #5 and #19), the deletions were detected by the ChAS software but were not clinically reported due to their sizes falling below the established clinical reporting criteria and/or the lack of established disease associations at the time of analysis (e.g., the JAK1 deletion case). For the remaining 174 cases where CES CNV analysis did not detect clinically significant variants, CMA results were concordant in 173 cases. These included 161 negative cases and 12 cases with a CNV of uncertain significance. One case had a 40 kb copy number loss of uncertain significance within intron 3 of the TCF12 gene (NM_207036.2) in the 15q21.3 region, which does not affect any coding regions. This deletion was not detected by CES CNV analysis due to the lack of coverage in the intronic regions.
CNV subcategories
Deletions and duplications that were categorized as P/LP ranged in size from single exon to whole chromosome or whole genome abnormalities. The positive cases were further subdivided into the following categories: small exonic and gene-level deletions (n = 24; 34.2%), deletions ≤10 Mb (n = 21; 30.0%), duplications ≤10 Mb (n = 3; 4.3%), large (>10 Mb) and complex CNVs (n = 9; 12.9%), aneuploidies (n = 10; 14.3%), and regions of homozygosity (ROH) and mosaic cases (n = 3; 4.3%) (Table 1, Fig. 3A). Of these, 4 (7%) represented a combination of sequence variants and CNVs in autosomal recessive conditions, with CNVs in all 4 cases being intragenic deletions. Recurrent P/LP CNV diagnoses identified in our cohort included trisomy 21 (n = 7), 22q11.2 deletions (n = 3), 1q24.3q25.2 deletions (n = 2), 16q23.3q24.1 deletions (n = 2), 8p23.1 deletions and duplications (n = 3), and 2 cases of XXY. Low-copy repeat mediated recurrent CNVs were detected in 8p23.1, 15q11.2, 16p11.2, and 22q11.21 (n = 9). The size distribution of CNVs with CNV category is shown in Fig. 3B, with sizes ranging from 0.2 kb to involving the whole genome (Fig. 3B).

A Subcategories of pathogenic and likely pathogenic copy number variants detected in our cohort. B Size comparison of pathogenic and likely pathogenic copy number variants.
Exonic or gene-level deletions
Exonic and gene level deletions were the most common P/LP category of CNVs identified through CES analysis. The exonic deletions ranged from 0.2 to 285 kb, affecting a number of genes and exons (Table 1).
An intragenic deletion of exons 3–7 of COL17A1 was identified in a 5-week old (Case#7, Table 1) who presented with epidermolysis bullosa. Interestingly, this deletion was not initially detected by the NxClinical software, but was evident by manual visual inspection, which was performed for comprehensive evaluation of variants involving this gene as a heterozygous frameshift COL17A1 variant was identified by CES. The CNV calling algorithm was later modified to visualize the exonic COL17A1 deletion (Fig. 4A), and the presence of the exonic deletion was subsequently confirmed by XON array.

A An intragenic deletion of exons 3–7 of COL17A1 was identified on manual inspection of a sequence variant in the same gene (red circle). This combination of a sequence variant and CNV accounts for the patient’s phenotype. B An example of a 3.3 Mb microduplication at 15q22.31q23 that was confirmed by chromosomal microarray. C 18p and 18q terminal losses, likely resulting in formation of a ring chromosome. D De novo 22q11.2 deletion.
In another case of a neonatal patient with immune dysregulation and multiple congenital anomalies (Case#5, Table 1), CES showed an apparently homozygous JAK1 variant in exon 11 (NM_002227.3:c.1613dupT, p.Met539Hisfs*21) (Supplementary Fig. 2A). CNV analysis demonstrated that the variant was not homozygous, rather there was an overlapping intragenic 55.6 kb deletion (exons 9–25) in JAK1 (Supplementary Fig. 2B). Subsequent targeted parental studies revealed that one parent carried the heterozygous JAK1 deletion while the other parent harbored the JAK1 SNV. This deletion was detected by CMA, but not reported due to its size not meeting the reporting criteria, and limited JAK1-disease association at the time of CMA analysis.
Deletions (≤10 Mb) and Duplications (≤10 Mb)
The deletions ranged from 959.3 kb to 7.6 Mb and were recurrently present in 22q11.2 (n = 3), 16q23.3q24.1 (n = 2), and 1q24.3q25.2 (n = 2). Additionally, we identified 3 deletions in chromosome 8, a de novo 5.4 Mb deletion involving 8p23.1 associated with the 8p23.1 deletion syndrome18, and a 1.1 Mb deletion involving 8q12.1q12.2 including the CHD7 gene in a patient with CHARGE syndrome (MIM# 214800), and a de novo 7.6 Mb deletion in 8q21.3q22.1 in a patient with multiple congenital anomalies; two microdeletions affecting chromosome 11, a 2.0 Mb deletion involving 11p11.2 that includes the haploinsufficient PHF21A gene and a 4.5 Mb de novo deletion involving 11p14.3p14.1. Additional examples of deletions identified in this category included a 5 Mb pathogenic deletion involving 15q11.2 Prader-Willi Syndrome/Angelman Syndrome (PWS/AS) region, and a 2q24.3q31.1 deletion including sodium channel genes SCN3A, SCN2A, SCN1A, SCN9A, and SCN7A in a patient with seizures.
Supplementary Fig. 3A shows an example of 1p32.1p31.3 deletion syndrome. The 9-month-old patient presented with macrocephaly and feeding difficulty (Case#23, Table 1). CES was negative and CMA had not been ordered. Copy number analysis of the exome data demonstrated the deletion, which explained the patient’s phenotype (Supplementary Fig. 3A).
The duplications ranged from 832 kb to 3.3 Mb. A diagnostic 3.3 Mb copy number gain in 15q22.31q23 was found in a 4-year-old patient with epilepsy, microcephaly, developmental delay, and spine abnormalities (Case#40, Table 1; Fig. 4B). Chromosome analysis showed the presence of a supernumerary marker chromosome, likely derived from this extra copy of the chromosome 15q22.31q23 region. Two patients with congenital heart defects showed duplications involving 8p23.1 that were 832 kb and 2.4 Mb in size, respectively. The former was found to be de novo by follow-up targeted FISH studies and the latter was found to be paternally inherited by trio CES. Duplications of 8p23.1 are associated with the 8p23.1 duplication syndrome which is characterized by variable phenotypes including developmental delay, intellectual disability, and obsessive-compulsive behavior. Congenital heart disease has also been described in these patients, with the GATA4 gene suggested as a candidate gene19,20. Both cases involved 8p23.1, where olfactory receptor and defensin repeats are located21.
Large (>10 Mb) and Complex CNVs
Large, unbalanced rearrangements were seen in 9 of our P/LP cases. Examples of larger deletions included a 15.2 Mb deletion in 18p11.32p11.21 for which karyotype analysis showed a derivative chromosome 15 comprising of the long arms of chromosome 15 and 18; a 13.7 MB deletion in 18p11.32p11.21; and terminal losses of 18p and 18q, likely resulting in formation of a ring chromosome (Fig. 4C, Case#47, Table 1). Additional cases of larger or complex CNV findings included a 0.6 MB terminal deletion in 11p15.5 with a 27.5 Mb terminal duplication in 11q22.3p12.1 likely arising as an abnormal recombinant from a parental pericentric inversion; a 21.6 Mb deletion in 13q31.3q34; a 12.1 Mb deletion in 10p15.3p13 and a 1.7 Mb duplication in 13q24, likely inherited from a parent who is a carrier of a balanced translocation involving these chromosomes.
Aneuploidies
Multiple patients with suspected or known Down syndrome were confirmed to have trisomy 21 (n = 7). Two patients in whom there was no clinical suspicion of Klinefelter syndrome, were found to have an additional X chromosome (47,XXY). This included a patient with suspicion for a connective tissue disorder for whom the diagnosis of Klinefelter syndrome, though unexpected, likely explained the clinical symptoms. One female patient with clinically suspected Turner syndrome was found to have loss of one X chromosome.
Regions of homozygosity and somatic CNVs
There were two instances where ROH findings were clinically significant. In one case, the patient presented with an overgrowth phenotype and was found to have whole genome uniparental disomy (Supplementary Fig. 3B). The other case demonstrated somatic loss of heterozygosity in 17p involving TP53 in a patient with a clinical history of acute myeloid leukemia (AML), who was in clinical remission (Case# 62, Table 1; discussed in detail in Supplementary Materials).
Copy number variant parent-of-origin analysis
When trio CES is performed, NxClinical can demonstrate if a CNV is de novo and can also be used to predict whether a de novo CNV arose on the maternal or paternal chromosome when informative SNP markers are present in the CNV region. Fig. 4D demonstrates the de novo origin of a 22q11.2 deletion in a patient with failure to thrive, congenital heart disease, and metabolic acidosis. Case#28 (Table 1) highlights the additional parent-of-origin prediction feature of NxClincial for de novo CNVs. A de novo 16q23.3q24.1 deletion including the FOXF1 gene was detected in this patient with respiratory failure (Supplementary Fig. 3C). Additional parent -of-origin analysis in NxClinical predicted that the deletion arose on the maternal allele due to the presence of 8 informative paternal probes, and absence of maternal probes in the deleted region. This finding is causative of the disorder alveolar capillary dysplasia with misalignment of pulmonary veins (ACDMPV) (MIM: 265380).
CNVs beyond initial diagnosis
Incidental CNVs were identified in 4 cases. These comprised 2 exonic deletions and 2 microdeletions. The exonic deletions were in RAD51C (exons 4–9) and BRCA1 (exons 8–12). The microdeletions were in chromosomal regions 15q11.2 (NIPA1/NIPA2) and 16p11.2, both of which are microdeletions associated with reduced penetrance and variable expressivity. Additionally, a dual diagnosis was identified in a 2-year-old patient who presented with multiple congenital anomalies and hypoglycemia. CES revealed a NF1 pathogenic sequence variant, and CNV analysis demonstrated an 11p terminal deletion and 11q terminal gain via a pericentric inversion.
Discussion
We have demonstrated that incorporating CNV analysis in a standard CES and FES workflow improves the diagnostic yield for patients with a wide variety of clinical presentations referred by multiple different providers. Our additional yield of 4.6% is above the 2.6% recently reported in a retrospective undiagnosed rare disease cohort12. Additionally, we report a similar distribution of CNVs across the subcategories, albeit slightly greater number of unbalanced rearrangements and aneuploidies12,22. The additional diagnostic yield of CNV analysis is highest in patients referred from hematology, dermatology and neonatology. As such, cases referred from these clinics would benefit from additional CNV analysis.
We studied a large clinical cohort of pediatric patients with diverse ancestries which reflects our patient population in Los Angeles. However, we did not identify significant differences in diagnostic yield, or rates of negative or uncertain diagnoses, between ancestries. The literature is mixed in this regard, with some studies reporting similar, and others reporting less uncertain diagnoses in individuals of European ancestry23,24,25,26. Uniquely, Admixed Americans were our most prevalent ancestry, and this population did not demonstrate any significant difference in diagnostic yield from the other, more historically genomically well-represented ancestral populations.
The use of CMA is typically performed as a proband-only analysis, and often if needed, targeted testing is performed for parents. Trio-exome CNV analysis allows for the detection of inheritance patterns simultaneously for both sequence variants and CNVs, which is particularly helpful for interpretation and facilitating family counseling. It is assumed that if a CNV is detected in an affected proband but is absent from both parents (and thus originated de novo) it is more likely to be pathogenic27. Classification of CES results incorporate inheritance patterns including de novo status, making trio sequencing useful and even necessary, in some cases. The utility of trio testing in CNV analysis goes beyond variant classification, and extends to diagnosis and prognostication in certain conditions, such as 16q24.1 microdeletions in which a de novo aberration and knowledge of the parent-of-origin of the deletion, impact diagnosis28. Thus, within the current workflow, in which trio sequencing is common practice in CES and not CMA, the additional data augments diagnostic ability, as more relevant data is incorporated into the analysis. The benefit of the synergistic impact of SNV and CNV variants in AR conditions also speaks to the power of such analyses, as is evidenced by our cases with such diagnoses.
We identified multiple recurrent diagnoses through CNV analysis of CES data. Exonic or gene-level deletions were the most frequent diagnostic category captured by CNV analysis. Additionally, our experience stresses the importance of reanalysis of both SNVs and CNVs in unsolved cases, as novel disease discovery after the initial analysis was helpful in some cases, such as the case of DNAJC21-associated bone marrow failure (Supplementary Fig. 4).
We present cases in which potentially deleterious CNVs were identified that are unrelated to the testing indication. These cases were primarily related to cancer predisposition (CNVs identified in RAD51C and BRCA1). These findings are important, as these patients require follow-up with genetic counseling and further discussion. There is also a common tendency to halt additional genetic workup once a sequence variant is identified that explains the patient’s phenotype. However, in our case series, we identified cases where a dual diagnosis was made, with one being a sequence variant, and the other being copy number variants (NF1 P/LP SNV and 11p terminal deletion and 11q terminal gain via pericentric inversion).
Although CNV analysis is currently not routinely performed for CES, it is a standard practice for WGS. WGS diagnostic yields for CNV detection range from 4.7 to 75% depending on the specific clinical context, and has been shown to be as sensitive as CMA in detecting pathogenic CNVs1,7,29,30. While acknowledging CNV analysis of CES data is not the gold standard for the detection of CNVs, particularly due to the lower sensitivity in detection of small intragenic alterations, this study demonstrates the significant value of performing such analyses.
In our study, deletions at the exonic and gene levels were the most common P/LP category of CNVs detected through CES CNV analysis. It is known that small CNVs are more difficult to detect accurately due to limitations in the resolution and coverage of standard assays, and clinical laboratories should acknowledge these inherent assay limitations. A key consideration when interpreting such small CNVs is determining whether it is a true variant or a false positive, which requires careful evaluation of the coverage of the region of interest and comparison with similar variants in internal and external databases. It is essential to confirm findings using orthogonal methods. This step was essential, as discrepancies in the detected CNV size can arise when comparing results across different assays. When the XON array confirms the presence of a CNV detected by exome CNV analysis, the size of the CNV is expected to differ slightly between the two assays. The assay with higher resolution and coverage in the region, including the breakpoints, is used for reporting the size and genomic content. Additionally, clinical interpretation should take into account other factors, such as disruption of the reading frame (particularly for intragenic CNVs), the role of the affected region in protein function, its potential association with clinical phenotypes, and family history.
Our data suggest that CES with CNV analysis is a feasible and worthwhile first-line testing strategy for clinical cases, in contrast to the current testing algorithm that employs CES alone, or CMA analysis as the first-line testing options. When comparing negative CMA results to those of CNV analysis of CES data, the “false negative” CNVs were not detected or not reported by CMA due to the following reasons: the CNV was below the resolution of CMA, particularly small exonic deletions; the CNV was below the laboratory reporting threshold for CNVs; the heterozygous nature of the CNV in genes associated with recessive conditions; and the lack of well-established gene-disease associations at the time of analysis. Our data underscore the importance of knowing the limitations of the different testing strategies. For example, when analyzing CES data, compound heterozygosity for a deletion and a pathogenic SNV may be misinterpreted as homozygosity for the sequence variant (e.g., the JAK1 case). Additionally, our data highlights the importance of continued quality control exercises guided by institutional protocols and/or clinical judgment, even in the face of more sensitive methods. As is exemplified by the case with an exonic deletion in COL17A1 (Fig. 4A), having the SNV data to guide manual inspection for copy number changes helped identify the causative deletion and make the correct diagnosis. Lastly, although we present data on a large and diverse patient population, comprising a wide variety of clinical phenotypes and referral sources, our data is only representative of a single institution’s experience.
In summary, CNV analysis of CES data in the clinical setting adds significant value and is feasible. Bioinformatic and confirmatory wet-lab approaches have been incorporated into our prospective CES-based testing to enable CNV detection. As additional causative CNVs are discovered, the overall diagnostic yield of CES analyses will continue to improve.
Methods
Case selection and exome sequencing
A total of 1538 patients with variable clinical phenotypes were referred for clinical exome sequencing (CES) or FES between the dates 01/01/2016 and 12/31/2022. In many cases, trio CES was performed. FES, often referred to as “Slice Testing,” restricts variant review and interpretation to a predefined list (or “slice”) of clinically relevant genes, utilizing an exome backbone31. Exome sequencing, SNV and INDEL variant calling and reporting were performed as previously described3. Genes related to patients’ phenotypes were selected using Human Phenotype Ontology (HPO) terms and were prioritized for analysis. This study was reviewed and approved by the Institutional Review Board (IRB) of the Children’s Hospital Los Angeles and the University of Southern California (CHLA-23-00117); patients’ informed consent/assent/permission for the use of coded samples and extracted clinical information was waived. This study complies with all relevant ethical regulations, including the Declaration of Helsinki.
CNV detection using exome sequencing data
CNV detection from CES data was performed using the Multi-Scale Reference (MSR) algorithm built into NxClinical version 6.1 software (Bionano, San Diego, CA). NxClinical is a commercially available software tool that has been clinically validated in our laboratory for whole genome sequencing CNV analysis (validation data not shown in this manuscript) and supports SNV analysis simultaneously with CNV. The MSR algorithm consists of two major steps. First, a reference model is created using 40 “normal” samples (20 males and 20 females) with a minimum bin size of 75 bp, minimum reads per bin as 140, and an average read length as 100 bp. The software divides the genome into equal bins and counts the nucleotides in each read overlapping the bin. Bins meeting the target nucleotide count are used as-is, while those below the target are grouped together incrementally until they reach the target or a boundary, e.g., the end of the chromosome. Bins not reaching at least 80% of the target are discarded. This process generates bins of various sizes with smaller bins in captured exonic regions and larger bins in off-target regions. The software also accounts for systematic biases, such as GC content, which are corrected later. The next step involves CNV and ROH calling for each sample. Using the reference file, it counts the nucleotides falling into each bin and calculates a log ratio between the sample and reference. Systematic correction is applied to remove biases. For each known SNP position, the percentage of alleles is measured to create a B-Allele Frequency (BAF) value. The Log R and BAF data are processed by the FASST2 algorithm, which uses a Hidden Markov Model (HMM) to segment the genome into various copy number states, allelic imbalance, and ROH.
CNV analysis and results reporting
The whole genome view generated in NxClinical was used to identify copy number alterations of whole chromosome or chromosomal arms. Copy number calls with an internal database frequency greater than 1% were filtered out32,33. Our internal database includes historical clinical exome cases, consisting of 3817 probands (excluding parents and family members from exome trios, duos, and quads). The remaining copy number variants were ranked by their frequencies in the internal database, from lowest to highest. Segments involving OMIM disease-associated genes relevant to the patient’s referral phenotypes were prioritized for analysis and review34. The identified copy number variants were classified according to ACMG and ClinGen variant classification guidelines27. P/LP CNVs as well as variants of uncertain clinical significance (VUS) were reported. Benign and likely benign variants as well as regions of ROH less than 10 Mb that are commonly observed in the general population were not reported. Deletions/duplications and ROH including suspected uniparental disomy (UPD) were confirmed by CytoScan HD (Thermo Fisher), and intragenic and small gene-level CNVs that were considered disease-causing were confirmed using the CytoScan XON array (Thermo Fisher). The array coverage for the CNV regions detected by exome CNV analysis was verified before running these arrays for confirmation. The assay with higher resolution and coverage in the region, including the breakpoints, was used for reporting the size and genomic content.
Ancestry analysis
A previously published tool, Somalier, was used to estimate genetic ancestry35. In brief, Somalier uses previously reported genetic ancestry from the 1000 genomes project to predict ancestry on a set of query samples35. The Somalier tool was run according to default instructions provided on Github (https://github.com/brentp/somalier).
Data analysis and visualization
Statistical analysis was conducted in the R programming environment (v4.2.1) using stats R package (v4.2.1). Pearson’s chi-squared tests were conducted to determine whether the distribution of P/LP variants likely explaining the patients’ primary clinical concerns, VUS, and negative cases differed among referral source, patient ancestry, mode of inheritance, and test type. Paired-wise comparisons were conducted to determine differences among groups. Multiple comparison tests p-values were adjusted using a Benjamini–Hochberg procedure. Fisher’s tests were conducted to assess differences in diagnostic yields and VUS rate among European and non-European individuals. Odds ratio with 95% confidence intervals were calculated for Fisher’s tests. Statistics results are reported in Supplementary Data 1. Data visualization was performed using the BPG package (v6.1.0) and ggplot2 (v.3.4.4).
Responses