Identification of deep intronic variants in junctional epidermolysis bullosa using genome sequencing and splicing assays

Introduction
The junctional epidermolysis bullosa (JEB) is a group of inherited trauma-induced blistering diseases of the skin and mucous membranes1. JEB is associated with recessive variants in several genes, including COL17A1 which encodes collagen XVII; ITGA6 and ITGB4, which encode integrin α6β4; and LAMA3, LAMB3, or LAMC2 which encode laminin 332. The severe JEB is mainly caused by premature termination codon (PTC) variants in LAMA3, LAMB3, or LAMC2. Variants in the COL17A1 gene are associated with localized or intermediate JEB2.
In our previous study, approximately 4.8% of recessive EB cases and 10.7% of recessive JEB cases remained genetically unsolved by exome sequencing (ES), as only a single pathogenic allele was identified in the targeted EB genes2,3. In previous reports, 0%-5.9% of JEB cases with COL17A1 variants could not be genetically diagnosed by ES4,5,6. Julia et al., reported that only a single COL17A1 heterozygous pathogenic variant was identified in four out of 68 cases4. In contrast4,5,6, our cohort found that only one pathogenic variant was found in 31.6% of cases caused by COL17A1 variants2.
The ES focus on protein-coding regions, hence there may exist unidentified variants in non-coding regions, such as promoter and regulatory untranslated regions or deep intronic sequences7,8,9. A pre-mRNA molecule can be alternatively splicing through exon skipping, alternative splice site selection, and intron retention7,10,11. Deleterious DNA variants located 100–150 bp away from the exon–intron junctions most commonly lead to pseudo-exon inclusion by activating non-canonical splice sites or altering splicing regulatory elements10.
We hypothesized that the second pathogenic allele could be a deep intronic variant. Hence, we applied genome sequencing (GS) to these JEB cases to find the second variant which would explain the patients’ phenotypes and to evaluate its consequences on splicing.
Methods
Subjects and clinical evaluation
We recruited 69 cases of recessive JEB from the dermatology clinic and the Debra Shanghai center (Table 1). Nine cases failed to identify the second pathogenic allele in targeted gene by ES analysis (Table 1). Clinical diagnoses were established by standard clinical dermatologic examinations, which included, but were not limited to, imaging, skin biopsy.
All patients and family members were recruited in accordance with the principles of the Declaration of Helsinki. This study was approved by the Ethics Committee of the Children’s Hospital of Fudan University, National Children’s Medical Center. Informed consent was obtained from all subjects or their guardians, specifically, Pat. 13, Pat, 53 and Pat.57, and the guardian of Pat.52 and Pat.65 provided written consent for the publication of the photographs, clinical details, and molecular data.
Exome sequencing (ES) and Sanger sequencing
ES analysis and subsequent Sanger analysis were performed as described previously3,12. Genomic DNA was extracted from peripheral blood samples of the 69 recessive JEB cases (from 68 families) using commercial kits (Qiagen, 56304). Sequencing was carried out by Shanghai WeHealth Biomedical Technology Co., Ltd using an ABI PRISM 3730XL automated sequencer. Variants were identified by comparing the sequences to reference cDNA sequences (COL17A1 NM_000494.4, LAMB3 NM_000228.3) from GenBank.
ES analysis was conducted on all 69 probands. Targeted exon was enriched and sequenced using the Illumina HiSeq2000 platform (Illumina). Reads with a mapping quality below 20 and potential duplicates were removed. We obtained an average of 500 Mb of mappable sequences with a mean coverage of 250×; 99% of the targeted bases were covered sufficiently for variant calling ( > 10× coverage). The data were compared to the hg19 human genome reference to remove repeated sequences and identify genetic variants. The frequencies of the identified variants must be equal to zero or very low in Asian populations in all the databases used, such as ExAC, gnomAD, and others. The following prediction tools were used to estimate the impact of variants: Human Gene Mutation Database (HGMD), and NCBI ClinVar. The putative effects of all the identified variants were explored using a variety of prediction algorithms, including, PolyPhen2, SIFT, and Mutation Taster, SpliceAI. We used ACMG/AMP classification system to classify the candidate variants.
Sanger sequencing
Sanger sequencing was performed on the 69 probands and their available relatives to verify likely pathogenic variants. Primers for JEB genes were designed using the Primer Premier 5.0 software. DNA samples were amplified using polymerase chain reaction (PCR), and the PCR products were purified and directly sequenced using an ABI PRISM3730 automated sequencer. The resulting sequence was compared with the normal sequence and known polymorphisms
Genome sequencing (GS)
GS was performed by Shanghai WeHealth Biomedical Technology Co., Ltd. on 6 affected individuals (one with LAMB3 variants and five with COL17A1 variants) with informed consent. Informed consent for GS analysis was not obtained from Pat.45, Pat.48 and Pat.68, so GS sequencing was not performed on the three patients. A focused search for coding and non-coding variants in the LAMB3 gene for Pat.13 and in the COL17A1 gene for Pat.52, Pat.53, Pat.57, Pat.61, and Pat.65 was conducted to find potential deep intronic variants. Sanger sequencing was conducted to validate these variants in the 6 probands and their available relatives (Supplementary Table 1).
Splicing analysis using minigene assays
To investigate the in vitro splicing effects of the four deep intronic variants (c.4156+117 G > A, c.2039-104 G > A and c.1267+237dupC in the COL17A1 gene and c.-38 + 2 T > C variant in the LAMB3 gene), we constructed minigenes using the pEGFP-C3 vector (Invitrogen). For each of the four deep intronic variants, a fragment of the human LAMB3 or COL17A1 gene-including the full-length targeted exons and introns-was amplified from the patient’s genomic DNA (Supplementary Table 2).
These sequence variants, along with their flanking exons and introns, were cloned in their natural context, into the pEGFP-C3 vector using BamHI/XhoI restriction endonucleases. The forward primer used was (CCCGACAACCACTACCTGAG) and the reverse primer was (ACCTCTACAAATGTGGT ATGGC). Mutant and wildtype minigene constructs were thus prepared.
HaCaT cells and HEK293 cells12 (Cell Bank of Type Culture Collection Committee of Chinese Academy of Sciences) were seeded in six-well plates with 2.5 mL of dulbecco’s minimum essential medium (DMEM) containing 10% fetal bovine serum (Invitrogen, Switzerland), 1% penicillin, and 1% streptomycin. Cells were incubated overnight at 37 °C and 5% CO2, then transfected with the targeted pEGFP-C3 vector harboring either wildtype or mutant alleles using Lipofectamine 3000 Transfection Reagent (Thermo Fisher). Transfection efficiency was verified by assessing green fluorescence intensity after 48 h.
Total RNA was extracted from peripheral blood or transfected cells using Trizol Reagent (TIANGEN DP443). Reverse transcriptase polymerase chain reaction (RT-PCR) was performed as described previously12. PCR products were purified on 2% agarose gels and confirmed by Sanger sequencing. The deep intronic variants (c.4156+117 G > A, c.2039-104 G > A and c.1267+237dupC in the COL17A1 gene) produced at least one splicing product, so clonal sequencing was used to isolate single products for further testing. Splicing analysis of the three COL17A1 variants was performed by TA cloning sequencing using the M13+ AgggTTTTCCCAgTCACg primers.
Immunohistochemistry
Formalin-fixed, paraffin-embedded sections (6 μm thick) were deparaffinized, autoclaved and incubated overnight with the primary antibody, rabbit anti-Collagen XVII (COL17A1) antibody (NBP1-91800, 1:100). After washing, the sections were incubated with biotinylated secondary antibodies for 1 h and avidin–biotin complex for another hour. The bound antibody complexes were visualized by incubating the sections in a solution containing 100 mM TrisHCl, pH 7.6, 0.1% Triton X-100, 1.4 mM diaminobenzidine, 10 mM imidazole and 8.8 mM H2O2. Afterward, the tissue sections were dehydrated, mounting media was added to the slides, and coverslips were applied. Images were captured using a brightfield microscope, and processed with Image J (National Institutes of Health), Adobe Photoshop CC (Adobe), and Adobe Illustrator 2019 (Adobe) software.
Results
Molecular presentation of the JEB patients
A total of 69 patients with clinically suspected recessive JEB were recruited for deep intronic variant analyses. All 69 patients underwent initial ES sequencing test, which revealed the following variants: 3 cases caused by LAMA3 variants, 24 by LAMB3 variants, 9 by LAMC2 variants, 14 by ITGB4 variants and 19 by COL17A1 variants. However, the genetic diagnosis of 9 cases (1 with LAMB3, 2 with ITGB4 and 6 with COL17A1) remained inconclusive (Table 1).
Pat.13 was clinically diagnosed with recessive JEB, but ES only identified a heterozygous LAMB3 c.28+1 G > A variant (Table 1). The phenotypes of Pat.52, Pat.53, Pat.57, Pat.61, Pat.65 and Pat.68 were consistent with localized to intermediate recessive JEB caused by variants in the COL17A1 gene. However, in each of these patients, only one heterozygous variant was found: c.1507del in Pat.52, c.3482_3483del in Pat.53, c.4088del in Pat.57, c.1612del in Pat.61, c.3340_3349del in Pat.65, and c.3281del in Pat.68 (Table 1). Pat.45 and Pat.48 were clinically diagnosed with recessive JEB, but ES identified only heterozygous ITGB4 variants (c.3931 G > T in Pat.45 and c.1805A>T in Pat.48).
Clinical presentation of the JEB patients
The Pat.13 presented with erythema, scars, scattered haemorrhagic blisters, depigmentation, dyspigmentation and dystrophic nails (Fig. 1a). Patients 52, 53, 57, 61, 65, and 68 exhibited alopecia, nail dystrophy, enamel dental problems, oral erosions, disseminated tense blisters on the erythema and skin healing with dyspigmentation (Figs. 2–3, Supplementary Fig. 2a). Pat. 45 and Pat.48 presented had scattered blisters and depigmentation. All patients displayed skin fragility at an early age and none reported a family history of the condition.

a Pat.13 with LAMB3 variants presented with erythema, scars, scattered haemorrhagic blisters, depigmentation, dyspigmentation and dystrophic nails. b Schematic representation of the 5′ untranslated region (UTR) and the coding sequence (CDS) of LAMB3. The CDS of the LAMB3 gene starts from exon 2 (start codon, ATG) while the 5′ region upstream of exon 2, exon 1 are non-coding (5′ UTR). Purple box: the wild-type exon 1 (ex 1) and exon 2 (ex 2). The point variant (indicated by arrow) is located at the 5′ UTR of LAMB3. Sanger sequencing revealed that the Pat.13 carried the LAMB3 c.-38 + 2 T > C variant (arrow). c Reverse transcriptase polymerase chain reaction (RT-PCR) revealed both one product in HEK293 cells transfected with plasmid constructs harboring the wildtype (lane 2) and the c.-38 + 2 T > C-allele (lane 3). Sanger sequencing of RT-PCR products revealed that the c.-38 + 2 T > C is a splicing variant with in-frameshift pseudo-exon insertion retention of 120 bp intron. d RT-PCR for the Pat.13 derived lymphocyte. No band was detected in Pat.13. Ctrl 1-3, cDNA from the lymphocyte of three unrelated unaffected individual.

a Pat.57 with COL17A1 variants presented with mild alopecia, nail dystrophy, disseminated tense blisters and erosions on the erythema and skin healing with dyspigmentation. b Family pedigree of Pat.57. c Sanger sequencing showed the Pat.57 carried the COL17A1 c.4156+117 G > A variant (arrow) inherited from her father. d Other than the normal allele (lane 2), two smaller-than-normal fragments (M1 and M2) in the c.4156+117 G > A-allele (lane 3). Sanger sequencing of subcloned RT-PCR products revealed the 849 bp band resulted from skipping 103 bp at the 5´splice site of exon 53. The relatively weak M2 562 bp band completely missed the exon 52. e Immunohistochemistry staining of the skin demonstrates decreased collagen XVII (COL17A1) in Pat.57, in contrast to the linear pattern at the dermal–epidermal junction in the control skin (upper panel). Transmission electron microscopy (TEM) examination revealed that the hemidesmosomes (Red arrow) were poorly developed (lower panel). Note a junctional level of skin cleavage (Green arrow) in Pat.57.

a The Pat.65 with JEB-intermediate presented with nearly completely alopecia, enamel dental problems, oral erosions, nail dystrophy, disseminated tense blisters on the erythema and skin healing with dyspigmentation. b Family pedigree and Sanger sequencing showed that the Pat.65 carried COL17A1 c.4156+117 G > A variant inherited from her father. c The Pat.52 with JEB–localized presented scattered blisters, skin healing with dyspigmentation and mild nail dystrophy. d Family pedigree and Sanger sequencing showed that the Pat.52 carried COL17A1 c.4156+117 G > A variant inherited from her father.
We then applied GS to Pat.13, Pat.52, Pat.53, Pat.57, Pat.61, and Pat.65 to genetically diagnosis them.
Novel c.-38 + 2 T > C variant in the LAMB3 gene
Screening of the LAMB3 gene in Pat.13 disclosed a novel c.-38 + 2 T > C variant within the 5′ untranslated region (UTR), which was approximately 1400 bp away from the coding region (Fig. 1b). This variant was absent from the population database gnomAD and predicted to cause an aberrant transcript.
Minigene assays demonstrated that the c.-38 + 2 T > C-allele induced a larger band than the wildtype in both HEK293 cells and HaCaT cells (Fig. 1c, Supplementary Fig. 1). Sanger sequencing of RT-PCR products revealed that the c.-38 + 2 T > C variant led to the in-frameshift pseudo-exon insertion of 120 bp intron: r.-38_-37ins[gc:-38 + 3_-38 + 120] (Fig. 1c). To verify its effect, the cDNA covering exons 1–6 from the Pat.13′s blood was amplified and analyzed, showing that the c.-38 + 2 T > C and c.28+1 G > A variants resulted in the loss of LAMB3 expression (Fig. 1d).
The deep intronic variants identified in COL17A1 gene
We detected three novel deep intronic variants in the COL17A1 gene through GS: the c.4156+117 G > A variant in Pat.52, Pat.57 and Pat.65, the c.2039-104 G > A variant in Pat.53 and the c.1267+237dupC variant in Pat. 61 (Table 2). All variants were absent or had a minor allele frequency in the population database gnomAD.
The COL17A1 c.4156+117 G > A deep intronic variant
Pat.57 was identified to carry a paternal c.4156+117 G > A variant in the COL17A1 gene (Fig. 2b, c). The variant is located in intron 52. The c.4156+117 G > A mutation produced two aberrant transcripts13: an 849 bp transcript (M1: r.4157_4259del)) and a 562 bp transcript (M2: r.3767_4156del), in addition to the wildtype transcript (952 bp) (Fig. 2d). Sanger sequencing of subcloned RT-PCR products confirmed that the 849 bp band resulted from skipping 103 bp at the 5´splice site of exon 53, leading to a frameshift in the transcript, and becoming the predominant product in the mutated group. The relatively weak 562 bp band completely missed the exon 52, resulting in an in-frameshift transcript (Fig. 2d).
Loss of collagen XVII impairs hemidesmosomes structure leading to mucocutaneous fragility
Decreased levels of collagen XVII were observed in skin biopsies from Pat.57 compared to control levels (Fig. 2e upper panel). Transmission electron microscopy (TEM) examination revealed poorly developed hemidesmosomes in Pat.57 (Fig. 2e lower panel). Consequently, the c.4156+117 G > A variant is pathogenic in Pat.57.
The COL17A1 c.4156+117 G > A common deep intronic variant
Pat.52 and Pat.65 also carried the c.4156+117 G > A variant (Fig. 3a–d). This variant was observed in 3 out of 5 patients in this study, suggesting that it may be a common deep intronic variant in the COL17A1 gene.
The COL17A1 c.2039-104 G > A and c.1267+237dupC deep intronic variants
Pat.53 carried the a COL17A1 c.2039-104 G > A variant (Fig. 4a, Supplementary Fig. 2a), located in intron 25. This variant resulted in a larger aberrant transcript corresponding to an in-frameshift pseudo-exon insertion of 171 bp (r.2038_2039ins[2039-277_2039-107]), along with a smaller product corresponding to the correctly processed transcript (Fig. 4b, c).

a Family pedigree and Sanger sequencing presented the Pat.53 carried the COL17A1 deep intronic variant c.2039-104 G > A (arrow), inherited to her son. b, c Gel loading is as follows: RT-PCR products derived from HaCaT cells transfected with plasmid constructs harboring the wildtype allele of the c.2039-104 G > A variant (lane 2), the mutant allele was shown in lane 3. RT-PCR revealed one product in HaCaT cells transfected with plasmid constructs harboring the wildtype and two products in cells transfected with the c.2039-104 G > A-allele. The larger product comprised a pseudo-exon of 171 bp spliced between exon 25 and 26 while the minor product represented correct splicing of exon 25 to exon 26. d Family pedigree and Sanger sequencing presented the Pat.61 carried the COL17A1 c.1267+237dupC variant. e, f The c.1267+237dupC variant induced in skipping of the exon 16 and exon 17 which was 243 bp smaller than the wildtype allele in HaCaT cells. A weak product corresponding to the correctly processed transcript was also observed in mutant allele. A weak product corresponding to 519 bp transcript was observed in wildtype allele.
The c.1267+237dupC COL17A1 variant identified in Pat.61 resulted in skipping of exons 16 and 17: r.1223_1465del that was 243 bp smaller than the wildtype allele in HaCaT cells (Fig. 4d–f).
Pat.13 presented with intermediate JEB due to compound heterozygous variants in LAMB3. Pat.52, Pat.53, Pat.57, and Pat.61 exhibited localized or intermediate JEB caused by biallelic COL17A1 PTC variants, and presented with localized or intermediate JEB.
Notably, HEK293 cells did not show any splicing effect in all cases with COL17A1 deep intronic variants (Supplementary Fig. 2b–d).
Discussion
Cases caused by variants in LAMA3, LAMB3, or LAMC2 are associated with intermediate or severe forms of JEB, often resulting in mortality within the first two years of life14. JEB cases with COL17A1 variants are primarily caused by biallelic PTC variants, which diminish the adhesive capacity of the hemidesmosomes, leading to widespread skin blistering, extensive wounds, enamel hypoplasia, nail dystrophy and irreversible alopecia2,4. In contrast to previous studies4,5,6, 31.6% JEB cases caused by COL17A1 variants could not be genetically diagnosed by ES in our cohort12. This suggests significant differences in the genetic background among different populations, indicating that further research is necessary to clarify the genetic background. Deep intronic variants that lead to splicing alterations are likely underestimated, as they may not be detected by routine ES approaches or even by RNA analyses. GS is more powerful than ES for detecting pathogenic variants as it covers more exonic and intronic regions of the genome9,15.
HEK293 cells transfected with the wildtype and mutant alleles of the c.4156+117 G > A, c.2039-104 G > A and c.1267+237dupC variants did not any exhibit any splicing defects; only the wildtype band was observed in the PCR product gel (Supplementary Fig. 2d–c). This does not mean that these changes are benign changes, but rather that the specific minigene constructs and HEK293 cells used cannot adequately demonstrate the effects on mRNA processing. Splicing alterations caused by intronic variants can be tissue specific16. HEK293 cells are human embryonic kidney epithelial cells, while HaCaT cells are immortalized human keratinocyte. Subsequently, we re-validated these COL17A1 plasmids in HaCaT cells, and observed splicing defects associated with the c.4156+117 G > A, c.2039-104 G > A and c.1267+237dupC variants. This suggests that HEK293 cells may lack the skin-specific splicing proteins (i.e., trans splicing factors) required for recognizing the new splicing site created by these deep intronic COL17A1 variants.
In contrast to the c.-38 + 2 T > C variant in LAMB3, the splicing defects associated with the three COL17A1 variants are not fully penetrant, at least in HaCaT cells. Upon transfection with the mutant alleles, these cells also produced correctly spliced transcripts, but the variants led to mRNA degradation, and decreased COL17A1 protein levels17,18. Shared alleles were found in 3 out of five cases with COL17A1 variants, suggesting that these may represent a common deep intronic variant for JEB patients in China, and potentially for those in East Asia.
Novel approaches, such as GS, have facilitated the scanning of an entire genes and uncovered deep-intronic splice variants7. To carry out molecular diagnoses in EB patients who remain genetically unsolved after routine analysis, we recommend targeted deep intronic sequencing of specific genes, particularly addressing on the c.4156+117 G > A variant in COL17A1. Additionally, employing selected minigene constructs, using skin cell lines and modifying standard protocols for mRNA analysis enhance the diagnostic process19.
In our cohort, nine out of 69 recessive JEB cases failed to find the second variant in the targeted recessive gene through ES. Six cases then underwent GS sequencing test, all the 6 cases successfully find the second variant. GS sequencing can serve as a complementary approach to identify variants that are not detectable by ES, in cases with a clear clinical diagnosis but no genetic confirmation via ES. Especially when one pathogenic, likely pathogenic or uncertain significant variant in the target gene has been detected by ES.
Responses