EV DNA from pancreatic cancer patient-derived cells harbors molecular, coding, non-coding signatures and mutational hotspots
Introduction
Pancreatic cancer is a deadly disease of still unknown etiology. The mechanisms of pancreatic cancer progression, including metastasis, are not fully known. Emerging evidence hints at exosome cargo as a driving factor of pancreatic cancer metastasis and a prognostic indicator for metastasis1,2. Exosomes are a class of extracellular vesicles (EVs) that were initially dismissed as just a way of cellular waste disposal3. However, that idea was quickly sidelined by studies showing that EVs are reservoirs of biomolecules with pleiotropic effects on cellular functions in both normal physiology and disease state. EVs such as exosomes have even been touted to harbor potential diagnostic markers for cancer4 but to reach such a diagnostic milestone, an appreciation of EV molecular content is imperative. One component of EV cargo is double-stranded (dsDNA), an intriguing molecule fit for a non-invasive disease diagnosis. DNA is a relatively stable biomolecule, and it has been known for some time now that most cell-free DNA is enclosed in exosomes5. The stability of DNA is desirable not just for ensuring the preservation and accurate transmission of genetic information, but also for a potential diagnostic marker for diseases. EVs from tumor cells carry genomic dsDNA, with segments spanning the entire genome6. Moreover, DNA molecules have been detected in exosomes from a spectrum of cell/system types6,7. These and similar several other findings indicated the potential for leveraging exosome-enriched EV and DNA composition to differentiate, at the basic level, cancer vs. non-cancer cells.
Previous studies on EV DNA were not all carried out in the same cell/system types. Likewise, the studies’ approaches were often different, for example, targeted mutations detection versus a whole genome sequencing approach, or exosomes vs. other classes of EV. Some of the data seem to be conflicting as well. For example, while it has been known for some time now that exosomes carry DNA5,6,8,9 a recent study suggests that exosomes might not contain DNA10. Though, that study did not provide evidence for where DNA is in EVs if it were not in exosomes. Moreover, a recent review article11 pointed to some of the limitations of that study. The variation in published data above could also be a reflection of the complexity of different cell/system models, whereby cell type-to-cell type distinction might result in significant differences in the pool of molecules packaged into the respective exosomes. For cells that do package DNA in exosomes, the genomic origination of these DNA molecules has not been well established. Likewise, cell-specific exosomal DNA signatures have yet to be characterized but are critical to differentiate different cell types. Here, we profile EVs DNA using high depth next-generation sequencing (NGS) of dsDNA from pancreatic cancer patient-derived cells. Our analyses suggest that DNA packaged in EVs is not random.
Results
Characterization of the vesicles by Nanoparticles Tracking Analysis, transmission electron microscopy, and antibody array
We isolated EVs from pancreatic cancer patient-derived cells and non-cancer counterparts, all grown in cell culture media containing exosome-depleted FBS. We first assessed the size distribution of the EVs using NanoSight (A.K.A Nanoparticles Tracking Analysis) (Fig. 1A and Supplementary Fig. 1). Next, we used transmission electron microscopy (TEM) to visualize the morphology of the nanostructure which revealed a cup-shaped morphology (Fig. 1B). Lastly, we performed a molecular characterization of the vesicles by carrying out an antibody array that simultaneously evaluates the expression of 8 EV protein markers and contaminant. These protein markers are cluster of differentiation 63 (CD63)5,12, cluster of differentiation 81 (CD81)12, tumor susceptibility gene 101 (TSG101)12,13, ALG-2-interacting Protein X (ALIX)12, intracellular adhesion molecule (ICAM)12, Annexin5 (ANXA5)12, epithelial cell adhesion molecule (EpCAM)12, and Flotilin1 (FlOT1)12. The antibody array included two positive controls for HRP detection, and a negative control for cellular contamination of the EVs. Importantly, the array confirms the expression of EV protein markers (CD63, CD81, ALIX, FLOT1, ICAM1, EpCam, ANXA5, and TSG101) to varying degrees (Fig. 1C).

A EVs from H6C7 (i), Panc1 (ii), and HPAF-II (iii) cells isolated by the indicated methods were analyzed for size distribution by NanoSight (NTA) or B visualized with Transmission electron microscopy (TEM); scale bar is 500 nm. The bottom panels represent the zoom-in on the boxed area in the image. C Exosomes antibody array was carried out on the EVs from the cells above isolated by the indicated methods, showing expression of various EV protein markers.
Distinct DNA fragments packaged in EVs released from pancreatic cancer and non-cancer cells
Next, we extracted double-stranded (ds) DNA molecules from the vesicles and first validated the molecules using the Qubit Fluorometer which employs a dsDNA sensitive dye for dsDNA detection and measurement (Fig. 2A; Supplementary Fig. 2A). DNA fragments size was assessed using the Agilent 4200 TapeStation. A prominent DNA fragment distribution in the pancreatic ductal epithelial non-cancer cells (H6C7) appeared at about 15,000 base-pairs (bp) in length, suggesting that most of the EV DNA in those cells is high molecular weight (HMW) (Fig. 2B; Supplementary Fig. 2B). Importantly, such patterns of DNA fragments were mostly absent in DNA samples from pancreatic cancer cell-derived vesicles (Fig. 2B; Supplementary Fig. 2B). Instead, these cells produced consistent DNA banding patterns at 200, 350, and 600 bp with varied degrees of intensity, and low levels of HMW DNA representation compared to the non-cancer cell counterpart (Fig. 2B; Supplementary Fig. 2B). The electropherograms (Fig. 2C; Supplementary Fig. 2C) summarize these observations and show two salient patterns: HMW DNA fragment mostly from the non-cancer cells (H6C7) but low molecular weight (LMW) DNA fragment from the cancer cells. To better visualize and evince the LMW DNA fragments, we used a different bioanalytical protocol that focuses on smaller fragment sizes and provides better band resolution. This approach showed more evident bands that represent smaller DNA fragments from the cancer cell-derived vesicles with varying degrees, but not from the non-cancer cell-derived vesicles (Fig. 2D; Supplementary Fig. 2D). This DNA banding pattern was prominent in the EVs but absent in DNA isolated directly from parental cells secreting the EVs (Supplementary Fig. 2E, F). Lastly, we re-evaluated the results above after first treating the vesicular preparation with DNAse I before proceeding to DNA extraction. Again, as before, here too dsDNA molecules were detected by the Qubit Fluorometer (Fig. 2E). Control naked DNA molecules treated with the DNAse were degraded (Fig. 2E, F) suggesting that the other DNA molecules being analyzed (Fig. 2E) are protected from DNAse I treatment. Importantly, the fragment analysis showed a pattern (Fig. 2G, H) like that described previously (Fig. 2B, D). To further assess that result, we treated the EVs with a membrane-solubilizing detergent to lyse the EVs (Supplementary Fig. 2G). DNAse treatment of the pre-lysed EV preparations before DNA extraction led to a loss of DNA recovery (Supplementary Fig. 2H).

A Following DNA extraction from the EVs prepared by the kit-based method, a dsDNA-specific detection method was used to validate the presence of DNA molecules in the samples; representative of 3 experiments. Equal amount of dsDNA, 2 ng, was loaded on the Agilent 4200 TapeStation system using the genomic (B) or the D5000 (D) screen tape protocol; experiments were done at least three times. C The electropherograms showing distinct peaks for the control and cancer cells; 3 representatives from the 10 cell lines. E EV preparations were treated with DNAse I before DNA was extracted. A dsDNA-specific detection method was used to confirm the presence of DNA molecules as in (A). F–H The DNA fragments in samples from (E) were analyzed as described in (B, D). The activity of the DNAse I was confirmed by analyzing EV DNA from H6C7 (#1) or MiaPaCa2 (#2) digested with the DNAse I.
Whole genome sequencing analysis demonstrated a large proportion of EV DNA molecules from non-coding regions of the genome
To further investigate the DNA banding features, the EV DNA was fractionated using magnetic bead size selection into HMW and LMW DNA sub-populations (Fig. 3A). The HMW DNA was fractionated using sonication to ~500 bp insert sizes, while shearing was not necessary for the LMW DNA. Each cell line had two libraries, an HMW and an LMW, each receiving its own barcode for sequencing. Samples were sequenced on the Illumina NextSeq (150 bp paired-end sequencing) to an average depth of 48,152,253 reads per sample and 98.1% reads mapping to the GRCh38 human genome build (Supplementary Table 1); only ~0.01% of the reads map to the mitochondrial genome (Supplementary Table 2). The consensus from the human genome project is that only about 1–2% of the human genome is coding genes, and ~99% of the genome is non-coding nucleic acids. We reasoned that if the packaging of DNA in EVs is random and uniformly distributed across the genome as previous studies report6,14, then it would be expected that similar ratios of coding to non-coding read mapping would be observed in EV DNA. Thus, the proportion of reads mapping to coding and non-coding regions were determined from the DNA fragments. Irrespective of the molecular weight fraction of the DNA sample (HMW or LMW), roughly 5% of reads mapped to coding regions (Fig. 3B–D) whereas about 95% of the DNA fragments fall into non-coding regions of the genome (Fig. 3B–D). This suggests that a large proportion of the DNA molecules are composed of mostly non-coding regions of the genome with a small fraction containing coding regions, but the frequency of reads mapping to coding regions (~5%) is slightly elevated as compared to the anticipated 1–2% coding region mapping that would be expected if the DNA was packaged uniformly across the entire genome.

A Schematic diagram of the size selection. Following the sequencing of the DNA from the EVs (isolated by the kit-based method), the reads were re-aligned against the human genome, the percentage of reads from the high (B) or low (C) molecular weight fraction mapping to the coding or non-coding regions was determined bioinformatically, and graphed. D Average of the reads mapping to coding vs. non-coding regions from all ten cell lines.
The low molecular weight DNA fragments distinguish cancer cell from non-cancer cell counterpart
To address how similar the read profiles of each fraction were to one another, Pearson correlation analysis of the mapped read features was performed for all pairwise comparisons in LMW DNA fractions as well as HMW DNA from all cell lines. When the LMW fragments from the non-cancer cells (H6C7) are compared to the LMW fragments from the cancer cells, the correlation coefficient is farther from 1 (Fig. 4A) than when the HMW fragments from the non-cancer cells are compared to the HMW fragments from the cancer cells in which case the coefficient is closer to 1 (Fig. 4B). That observation indicates that the LMW DNA fragments from the cancer cells’ EV DNA samples are distinct from the non-cancer counterparts’ whereas the HMW DNA fragments are more similar among the non-cancer and cancer cells. To further assess that inference, we evaluated the principal components of the mapped read profiles for each DNA fraction (LMW and HMW). Again, most of the cancer cells’ LMW EV DNA samples cluster together and further away from the non-cancer cells’ samples (Fig. 4C) whereas that clustering pattern is much less evident for the HMW fragments (Fig. 4D). The analysis indicates that the LMW DNA fragments might have more clues as to the distinction between cancer cell and the non-cancer cell counterpart.

A, B Pearson correlation coefficient showing that the LMW fragments (A) are more distinctive between the cancer cells and the non-cancer cells. C, D Principal components analysis showing that the LMW fragments (C) from the cancer cells cluster away from the non-cancer cells.
Unique coding DNA signatures for the LMW fragments
DNA mapping signatures among the LMW DNA fragments could further set the cancer cells apart from their non-cancer cell counterpart. To investigate this, the LMW reads which mapped to coding regions were extracted, the read counts per gene normalized to approximate abundance, and differential abundance analysis was performed to determine which genes had differential read mapping. In total, 44 coding regions had significantly upregulated read abundances in the cancer cells as compared to the non-cancer cells (Table 1 and Fig. 5).

Volcano plot showing the differential abundance of the DNA fragments mapping to their corresponding coding genes.
The top 10 genes which had elevated read mapping were Family with sequence similarity 135 member B(FAM135B), Collagen type XXII alpha 1 chain(COL22A1), t-SNARE domain containing 1(TSNARE1), Potassium two pore domain channel subfamily K member 9(KCNK9), Zinc finger and AT-hook domain containing(ZFAT), Jrk helix-turn-helix protein(JRK), Maestro heat like repeat family member 5 (gene/pseudogene)(MROH5), Gasdermin D(GSDMD), Trafficking protein particle complex subunit 9(TRAPPC9), and MIR3667 host gene(MIR3667HG) (Table 1 and Fig. 5). The evaluation turned ATPase phospholipid transporting 11C (ATP11C) gene as a top gene with elevated read mapping in the non-cancerous cells (Table 1 and Fig. 5). Further studies are needed to evaluate any correlation of those EV DNA fragments with patient outcomes.
The low molecular weight DNA fragments demonstrate dense centromeric mapping for cancer cell
Even though a majority of the human genome is non-coding, we now know that those regions are full of components that play a crucial role in regulating gene expression15. It is often the case in genome-wide association studies, that genetic markers associated with disease risk or phenotype fall within uncharacterized regions of the genome16. The non-coding portions of the genome are important and merit further investigation but remain difficult to interpret. To this end, a focus was narrowed on the patterns of elevated read coverages from the EV DNA in sections of the genome with elevated read coverage (>50× coverage) which might distinguish the cancer cells from the non-cancer cells. We previously observed that the HMW DNA from the cancer cells and non-cancer counterparts were similar (Fig. 4B), with reads spanning the entire genome at very low coverage. However, centromeric regions (Fig. 6; Supplementary Figs. 4–25) and mitochondrial DNA (Supplementary Fig. 3) exhibited high levels of read coverage exceeding 50X coverage. In LMW fragments, the read distribution across the entire genome was higher in the pancreatic non-cancer cells, while the cancer cells’ LMW DNA shows sparse whole genome coverage with peaks of high coverage (>50×) in distinct areas near the centromeres (Fig. 6), mitochondrial DNA (Supplementary Fig. 3), and certain chromosome-specific regions (Supplementary Figs. 4–25).

A A mapping of the reads showing a dense centromeric mapping of all cancer cell lines but a sparse mapping for the control non-cancer cells. B A zoom-in of the centromeric region demonstrating clear distinct mapping signatures for the cancer vs. control cells.
Additionally, our analysis reveals that the LMW fragments of the non-cancer cells’ EV DNA uniformly mapped throughout the genome whereas, remarkably, the cancer cells densely map to very similar regions, especially near the centromeres of most chromosomes (Fig. 6A; Supplementary Figs. 5–25). The HMW fractions (Supplementary Fig. 4) all exhibited similar mapping across the genome, again with distinct peaks of high read coverages, but this time the non-cancerous cells HMW fraction was more similar to the cancer HMW fractions (Supplementary Fig. 4) when compared to the LMW profiles (Fig. 6A). The observations above were consistent across most of the chromosomes (Supplementary Figs. 5–25) but were more striking on chromosomes 7 (Fig. 6A), 3, 9, 10, 11, 13, 17, and 20 (Supplementary Figs. 7, 12, 13, 14, 16, 20, and 23, respectively). Chromosome 17 even shows a larger accumulation of the LMW reads from the cancer cells to the centromere (Supplementary Fig. 20), whereas chromosomes 9 and 11 reveal centromeric mapping uniquely for the cancer cells’ LMW only (Supplementary Figs. 12 and 14). A zoom-in on the centromeric region revealed evident mapping signatures that distinguish all the cancer cells from their non-cancer counterparts (Fig. 6B).
Mutational analysis reveals cancer cell-specific mutations in low-molecular-weight DNA fragments
Mutations in cancer remain a topic for discussion. Our observations are that (i) a large proportion (95%) of cell-derived EV DNA molecules are non-coding (Fig. 3) and that (ii) the LMW DNA fragments distinguish the cancer cells from their non-cancer counterparts (Fig. 4A, C) and showed a distinct centromeric mapping signature (Fig. 6; Supplementary Figs. 7, 12, 13, 14, 16, 20, and 23). Accordingly, we tested the hypothesis that the LMW DNA fragments from cancer cells’ EVs might harbor unique distinguishing mutational marks. Our mutational analysis demonstrates that a large proportion (>94%) of the mutations harbored in the DNA are in intergenic regions (Fig. 7A, B), a trend that aligns with our previous observations that a similar proportion of EV DNA fragments (~95%) originates from non-coding regions of the genome (Fig. 3B–D). Further analyses showed that A > G/T > C and C > T/G > A are the most frequent mutation types (Fig. 7C). Likewise, 1 bp and 2–4 bp insertion or deletion were in higher proportion (Fig. 7D). Lastly, we identified close to 200 mutations unique to the LMW DNA samples from the cancer cells only. These mutations, again, were largely intergenic with precise genomic locations (Table 2).

Number (A) or percentage (B) of variants that fall into coding (intragenic) or non-coding (intergenic) regions of the genome. Types of mutations, single nucleotide variant and frequency (C) or insertion, deletion, and proportion (D).
Discussion
The molecular, coding, non-coding, and mutational profiling of EV DNA from different cell types has not yet been previously appreciated. In this study, we demonstrated that pancreatic cancer patient-derived cells vs. pancreatic non-cancer cells can be differentiated based on their EV DNA signatures. The state-of-the-art Agilent 4200 TapeStation-based DNA electrophoretic analysis reveals that pancreatic cancer patient-derived cell and their pancreatic non-cancer cell counterpart secrete EV DNA fragments with distinct signatures whereby HMW DNA fragments distinguish non-cancer cell from cancer cell which feature mostly LMW DNA fragments (Fig. 2B–D). The facts that the patterns above were absent in genomic DNA directly from the parental cells secreting the EVs (Supplementary Fig. 2E, F) indicate that the fragments observed might be selectively packaged into EVs from the respective cell lines. NGS analysis revealed that dsDNA from the EVs from pancreatic cancer cell lines cluster together and are distinguishable from non-cancer cells based on their LMW read mapping profiles as well as dsDNA fragment size distribution (Fig. 4A, C). We further uncovered specific DNA fragments unique to the cancer cells (Table 1; Fig. 5). The biology of these DNA molecules, in the context of pancreatic cancer, remains to be elucidated especially given emerging functions of DNA aptamers in cell biology, such as their growth factor-like activities17,18,19. Though, an overview of the literature suggests potential biomarker-like applicability for these molecules. For instance, the Family with sequence similarity 135 member B (FAM135B) (Table 1; Fig. 5) has been identified as mutated and promotes esophageal squamous cell carcinoma20. Likewise, the Collagen type XXII alpha 1 chain (COL22A1) is commonly mutated in lung adenocarcinoma and is associated with increased tumor mutations and poorer prognosis in the cancer21. The Potassium two pore domain channel subfamily K member 9(KCNK9) is notoriously amplified in triple-negative breast and is associated with poor patient outcome22. Conversely, ATPase phospholipid transporting family proteins have tumor-suppressor-like biology23. In our study, a DNA fragment corresponding to ATP11C was differentially highly abundant in the non-cancer cells only (Table 1; Fig. 5). Further investigations are necessary to evaluate a correlation of these DNA fragments with pancreatic cancer outcome.
Our analysis further revealed that EV DNA might originate from non-coding regions of the genome. A large proportion of the DNA fragments (~95%) across all cell lines used originate from non-coding regions of the genome (Fig. 3), somewhat following a similar general trend in the human genome project reports. It was assessed that almost all human genome, or 99% to be accurate, is non-coding, while only a very small fraction (1%) is coding genes. Nonetheless, our result that 5% of EV DNA reads mapped to coding regions of the genome suggests that EV DNA distribution (coding vs. non-coding) follows a different pattern than the expected 1% coding vs. 99% non-coding distribution reported in the human genome project. This indicates that perhaps the coding DNA packaged in EVs might be amplified from 1% to 5%. Despite the similar distribution of the EV DNA (coding vs. non-coding) across all cell lines assayed (Fig. 3), we observed a distinctive mapping pattern of the LMW DNA fragments: that from cancer cells densely mapped to the centromeric region of the genome while non-cancer cell-derived EV DNA sparsely mapped throughout the genome (Fig. 6; Supplementary Figs. 5–25).
Some chromosomes showed the trend above more clearly than others, such as chromosomes 7 (Fig. 6A), 3, 9, 10,11, 13, 17, and 20 (Supplementary Figs. 7, 12, 13, 14, 16, 20, and 23, respectively). These chromosomes revealed distinct mapping signatures, setting aside the cancer cells in this study from their control non-cancer cells or vis-versa. Why this centromere-dense mapping pattern for the cancer cell-derived EV DNA is an interesting question. We speculate that the chromosome-dependent centromeric mapping signature of EV DNA from cancer cells represents unique genomic events at those chromosomes. The fact that these DNA signatures are mostly from non-coding regions, not coding regions, further hints at higher genomic activities at these non-coding regions. The 0.01% of the sequencing reads mapping to the mitochondrial genome (Supplementary Table 2) suggests very negligible mitochondrial DNA (mtDNA) proportional to the rest of the reads in the dataset; however, this was enough to achieve high coverage of the mitochondrial genome (Supplementary Fig. 3) potentially due to the relatively small size of the mitochondrial (~16569 bp bp) genome compared to the whole human genome (~3,200,000,000 bp). A relatively small size of the mitochondria genome might make high coverage of the mitochondrial genome more achievable compared to the rest of the human genome because there are many fewer bases in the mitochondrial chromosome. The significance of this small fraction of mtDNA fragments remains to be elucidated as well. Previous reports indicate that an association might exist between mtDNA and chemoresistance24 or tumor growth25. This merits further investigation.
Ultimately, we demonstrated that EV dsDNA fragments and the mutations therein originate largely from non-coding centromeric regions of the genome. The 5% fractions of the DNA molecules mapping to coding regions hints at a potentially elevated packaging of coding DNA fragments in EV DNA relative to what would have been expected based on the 1% coding region of the genome. The LMW DNA fragments (i) hold clues as to the distinction between pancreatic cancer patient-derived cells and their non-cancer counterparts, (ii) harbor unique coding DNA fragments and mutation hotspots, and iii) demonstrate dense centromeric mapping for the cancer cells. Overall, the study rationalizes the need for a new approach to clinically relevant DNA biomarker discoveries and applications in cancer.
As prefaced in the introduction, conflicting data currently exists in the literature pertaining to DNA in EVs such as exosomes. On one hand, exosomes harbor DNA5,6,8,9. On the other hand, exosomes might not contain DNA10. As we highlighted in the introduction, the differences in the various cell/system types used in those studies might explain the seemingly contradicting findings in those studies. Our study aligns with those studies that found that exosomes contain DNA5,6,8,9. Like our study, pancreatic cancer cells were used6,9, unlike the other study that focused on other cancer types10. In that study, DNA might be contained in a fraction of particles that the authors stated “in some ways…..resemble … exomeres”10. Though, a recent review article pointed to some of the limitations of that study, such as the limited number of cell lines used, as well as the lack of details about how many exosomes were used in the study11. Importantly, the previous study that first discovered and characterized exomeres9 also reported DNA in not just the exomeres but also in exosomes9, indicating that DNA might be differentially distributed in EVs such as exosomes and exomeres. It is even likely that exosomes have more DNA than exomeres depending on the cell line9. Of note, the methods used to purify the exomeres in those two studies above are also different, one conventional method10 while the other9 employed an advanced extraction technique. Regardless, exomeres are said to be non-membranous particles9. If so, in one’s mind, they lack defined structural boundaries within which hypothetical molecules such as DNA are to be found in the first place, let alone “exclusively”. On the other hand, exosomes are membranous particles, making exosomes more likely to harbor molecules such as DNA.
On the question of DNA in exosomes, it is most likely that the DNA is mostly membrane-enclosed. If it was not, a remaining possibility would be that it is tethered/attached to the exosomes. However, the biophysical properties of exosomes make DNA attaching to them very unlikely in the first place. Namely, exosomes are said to have a negatively charged surface9. DNA molecules have long been widely thought to be negatively charged as well, making direct exosome-DNA association unlikely since particles/molecules with similar charges would be less likely to interact. Still, we acknowledge that biology is not fully predictable. In a more complex environment such as in vivo, the dynamics might be different where interaction between countless molecules/particles might alter charges in a way that enables attraction/association between molecules/particles that would otherwise not associate with each other in the first place. Yet, so far, even in that in vivo environment, Fernando et al. found out that most serum/plasma-derived DNA is inside, not outside, exosomes5.
Just like any study, ours has its limitations. While the use of nine pancreatic cancer-derived cell lines demonstrated more than just the rigor in the study, there was only one control cell line derived from the pancreatic duct of a non-cancer individual, which offers the best comparison to pancreatic ductal cancer cells. This limitation is inherent to the lack of a large pool of pancreatic non-cancer cell lines for use in pancreatic cancer research. Currently, there are only two such cell lines widely used, the Human Pancreatic Ductal Epithelial (HPDE) and the Human Pancreatic Nestin-expressing (HPNE) cell lines. The H6C7 control cell line is derived from the HPDE cell line26. The highly priced HPNE cell line (over $6K/vial at ATCC and over $7K/vial at Fisher) prevented the use of that cell line as another control cell line. Second, the study lacked pancreatic cancer patients’ data. We actively sought to produce such data but were limited by access to the needed patients’ plasma/serum samples.
Methods
Cell lines and culture
H6C7, a normal HPDE cell line27,28,29 was purchased from Kerafast (Shirley, MA, USA; cat#: ECA001-FP) and maintained in Keratinocyte serum-free media [dermal cell basal medium (ATCC PCS-200-030) supplemented with keratinocyte growth kit (PCS-200-040)] from American Type Culture Collection (ATCC; Manassas, VA, USA), streptomycin and gentamycin. Of note, the H6C7 cell line is derived from the normal primary HPDE cell line by immortalization with human papillomavirus-16 E6E7 gene26, is morphologically similar to and demonstrated near normal genotype and phenotype of the parental HPDE cell line, per the vendor (Kerafast). Pancreatic cancer cells (Panc1, Panc10.05, CFPAC-1, Capan-2, MiaPaCA2, BxPC-3, HPAF-II, AsPC1, and SW1990) were purchased from ATCC (cat#: TCP-1026; CRL-1469) and grown in DMEM (cat#: ATCC 30-2002; ATCC) supplemented with 10% FBS (cat#: 16000044; Thermo Scientific) and streptomycin and gentamycin. All cell lines were used immediately after purchase.
EV collection
Cells, at about 60% confluency, were switched from their growth media (with 10% FBS) to a media supplemented with 10% exosome-free FBS (Thermo Scientific; cat#: A2720803). The culture media was collected 48 h later for EV isolation and the cells were harvested for genomic DNA extraction (below). The media was centrifuged at 2 K × g for 30 min to remove cells and debris and then at 10 K × g for 30 min to remove macromolecules/microparticles. EVs were collected from the supernatant made of the cell/debris-free media by two methods:
-
i)
Ultracentrifugation was described30 and used in other studies1,31,32,33: Briefly, the cell/debris-free media above was centrifuged at 120 K × g for 4 h. The EV pellet was washed in 1X PBS followed by a re-centrifugation as above. The resulting pellet was dissolved in PBS and used immediately or stored at −80 °C.
-
ii)
The PEG (Polyethylene Glycol)-based precipitation using the Thermo Fisher reagent for collecting total exosomes from culture media (cat#: 4478359) and previously reported in other studies34,35,36,37: The cell/debris-free media above was mixed with the reagent at 1:0.5 ratio and incubated overnight at 4 °C. The next day, the mixture was centrifuged at 10,000 × g for 1 h at 4 °C, the supernatant removed and the exosomal pellet re-suspended in 1x PBS.
Extraction of EV DNA, cellular total DNA, DNA detection and quantification by Qubit Fluorometer
EV DNA extraction was carried out using the Qiagen circulating nucleic acid kit (cat# 55114) used in ref. 5. In brief, 1 ml of EV preparation + 100 µl of Proteinase K and 800 µl of lysis buffer (ACL) was vortexed (30 s) and digested for 30 min at 60 °C, followed by incubation with binding buffer (ACB). The nucleic acids were inserted into a column, washed, and eluted. Total cellular DNA (including genomic and mitochondrial) was extracted from cell pellet using the Zymo Research genomic DNA isolation kit (cat#: D4068) following vendor’s instructions. Concisely, cell pellets re-suspended in 200 μl of PBS were digested with 200 µl of BioFluid & Cell Buffer + 20 µl Proteinase K at 55 °C for 10 min followed by addition of genomic binding buffer, insertion into column, wash, and elution.
DNA detection and quantification were performed by adding 2 µl of sample to 198 µl of dsDNA detection buffer containing 1 µl of reagent (dye) followed by reading in the Qubit Fluorometer.
Transmission electron microscopy (TEM) and nanoparticle tracking analysis (NTA)
Were performed at the University of Nebraska Medical Center. During the last revision, additional TEM images were taken at the University of Nebraska, Lincoln, and appear in Supplementary Fig. 2.
Exosome markers array
Protein quantification was run on EV samples using the Pierce BCA protein assay (cat#: 23227; Thermo Fisher) and exosome marker proteins were assayed using the Exo-Check Exosome Array antibodies (cat#: EXORAY200B-4; System Biosciences, CA, USA) according to vendor’s manual. Briefly, 50 μg of EV were lysed in lysis buffer, incubated with labeling reagent for 30 min at room temperature (RT), followed by blocking and incubation with the membrane overnight at 4 °C. Next, the membrane was washed, incubated with detection buffer for 30 min at RT and developed on the Azure 600 imager using Clarity Western ECL detection substrate (cat #: 1705060, Bio-Rad, USA).
Exosomal DNA analysis by Agilent 4200 TapeStation system
The D5000 and Genomic TapeStation Screen Tapes were run according to the manufacturer’s protocol. 2 ng of DNA was mixed with 10 μl of sample buffer (containing SYBR green) and run on the TapeStation instrument per manufacturer’s instructions.
DNAse I treatment
Exosomal preparation in PBS or dsDNA in TE buffer was treated with DNAse I (Qiagen) 0.2 unit per μl of EVs preparation plus reaction buffer with MgCl2, and incubated at 37 °C for 30 min, then at 65 °C for 10 min to inactivate the DNAse.
Size selection
DNA fragments in the EVs exhibited unique banding patterns based on tapestation screening. This observation led us to split NGS samples into two fractions using ~1 kb as the dividing threshold, HMW > 1 kb, and LMW <1 kb. Samples were split using a modified AMPure XP magnetic bead size selection (Beckman Coulter, PN A63881). Initially, to each cell line exosomal DNA extraction, a 0.5X volume AMPure XP beads was added and thoroughly mixed and incubated at RT for 5 min. Samples were then placed on a magnetic stand and allowed to clear (takes ~2 min). The supernatant containing LMW DNA was transferred to a new tube, while the HMW DNA remained bound to the beads. HMW samples were washed with two rounds 200 μL of 80% ethanol. Following the second wash, all remaining ethanol was removed, and the pellet was allowed to dry (takes ~2 min) and HWM samples were eluted in 15 μL 0.1X TE buffer. HMW DNA was then sheared to ~500–700 bp using a Covaris M220 (PN 500295) focused-Ultrasonicator in microTUBE AFA Fiber Snap-Cap (PN 520045), peak incident power = 50, Duty factor = 10, cycles per burst = 200, and a treatment time of 50 s. LMW DNA fractions did not need to be sheared, already falling within the desired library insert size range. LMW and sheared HMW DNA fractions were quantified on a Thermo Fisher Qubit 4.0 fluorometer (cat# Q33238) using Invitrogen’s Broad Range dsDNA assay (cat# Q33265). DNA insert size was verified using an Agilent’s Tapestation 4150 (PN G2992AA) using a D1000 assay (PN 5067-5582, 5067-5583).
DNA library preparation and NGS
HMW and LMW NGS libraries were prepared using New England Biolab’s NEBNext Ultra II Library Prep Kit for Illumina (PN E7645S) according to the manufacturer’s instructions. A 1:10 dilution of the adaptor at step 2.1 was used for all samples, and cleanup of adaptor-ligated DNA was performed without size selection (protocol section 3B). PCR amplification of barcoded libraries at step 4.1.3 was performed using 7 cycles. Resulting libraries cleanup was performed using a 0.65X AMPure XP bead cleanup. LMW and HMW libraries were then sequenced on an Illumina NextSeq 550DX High output flowcell (PN20028871), with an average read depth of 48 million and average of 98% reads mapping to the human genome reference (HG38).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Bioinformatics, statistical analysis and reproducibility
Read mapping, variant calling, genome coverage, and visualization
Read processing, mapping and variant calling were performed using Sention’s DNAScope pipeline for variant calling (Sentieon Inc., San Jose, California USA)38. Reads from all samples were mapped to the GRCh38 build of the human reference genome using BWA-MEM algorithm, creating BAM files for high and LMW samples in each cell line. Duplicate reads were removed with Sention’s Dedup algorithm, and variant calling was performed with the DNAscope algorithm. Genome coverage for each sample was calculated using the genomeCoverageBed function in the BEDTools software suite39. Regions of the genome with an average of 50X coverage or greater were documented in bedfiles using covtobed command in the BamToCov toolkit40. The Integrative Genomics Viewer (IGV) genome browser was used to visualize genomic regions with high read coverage ( > 50X) from the sequenced EV DNA41.
Variant analysis
Variant analysis was performed using VarSeqTM v2.5.0 (Golden Helix, Inc., Bozeman, MT, www.goldenhelix.com) (ref: VarSeq™ (Version 8.x) [Software]. Bozeman, MT: Golden Helix, Inc. Available from http://www.goldenhelix.com). Overall, 10,066,263 variants were detected in the dataset. Variants were quality-filtered to a minimum read depth of at least 100, without a LowQual flag, as well as have a variant allele fraction ≥0.1. Additionally, variants with allele frequencies ≤0.01 and ≥0.99 in gnomAD and 1 kG Phase3 databases were kept42,43. Variants remaining after filtration are summarized and visualized using ggplot2 in R.
Investigation of reads mapped to exon regions
Reads mapped to coding regions of the genome were calculated using the featureCount() function in the subreadR package in R. The percentage of reads which mapped to coding regions of the genome versus non-coding are summarized and graphed. Gene feature counts that were differentially abundant in cancer cell lines and control cell lines were performed with DESeq2 package in R and visualized using ggplot2 in R.
BAM comparisons
Comparison of BAM alignment profiles was performed using the command line interface of deepTools244. The multiBamSummary command was used to summarize all reads with mapping quality scores over 30. Deeptool’s plotPCA function was used to perform a Principal Component Analysis (PCA) of read count profiles generated in the multi-Bam summaries. Similarly, Pearson correlation coefficient matrices and heatmaps were generated from the multi-Bam summaries with deepTools2 plotCorrelation function.
Responses