Quantifying metabolites using structure-switching aptamers coupled to DNA sequencing

Main

Metabolites are critical biomarkers that report on our health and errors in the regulation of metabolite levels cause diseases such as diabetes^1,2 and phenylketonuria³. It is still challenging to quantify metabolites because they are biochemically highly diverse and cannot be trivially amplified with techniques analogous to PCR. The challenge for metabolomics is to be able to quantify this huge range of different classes of molecule in samples ranging from tissues to plasma to single cells in a rapid, efficient manner.

Many experimental assays from drug screens⁴ to protein–protein interaction mapping^5,6 have been tweaked to use DNA sequencing as an output. This dramatically increases both depth and speed because it allows the massively parallel multiplexing of assays and harnesses the huge power and accessibility of DNA sequencing. Here, we present a method for reading metabolite or drug levels with DNA sequencing, bringing the power of DNA sequencing to metabolomics. We term the technique smol-seq for ‘small-molecule sequencing’.

At the core of the smol-seq approach are structure-switching aptamers (SSAs). Aptamers are short oligonucleotides that can bind specific targets with high affinity⁷. Aptamers have been identified for a wide range of targets including metabolites, fluorescent dyes, drugs and toxins. SSAs are a specific class of aptamer that undergo a major conformational change on binding their cognate ligand. This conformational change can drive a readout of target binding, typically through fluorescence or a change in conductance^8,9. SSAs can, thus, be used as sensors.

We adapted a basic SSA design¹⁰ shown in Fig. 1a. Each SSA comprises two oligonucleotides: a longer ligand-binding oligo (LBO), which is base-paired to a short release oligo (SRO) through a short stem. When the cognate ligand binds to the ligand-binding region of the LBO, this drives a conformational change to a stable stem-loop structure, which displaces and releases the SRO. By immobilizing the LBO to a solid matrix, ligand binding acts as a switch, driving release of the SRO from the matrix, where it is initially sequestered by base-pairing to the LBO. Crucially, each SSA sees a unique target and each SSA has an SRO with a unique DNA barcode sequence that corresponds to its target; for example, a glucose SSA has an LBO that corresponds to glucose and releases an SRO with a glucose barcode and an SSA that recognizes phenylalanine releases an SRO with a different phenylalanine barcode. Sequencing the released barcoded SROs should, thus, read out the levels of each metabolite in a complex mixture. Our goal in this study was to carry out key proof-of-concept experiments to establish whether smol-seq works. Specifically, we tested whether barcode SSAs (bSSAs) can detect a wide range of targets, whether they can achieve high specificity and whether they can be multiplexed to detect many targets in parallel.

**Fig. 1: smol-seq design and use for quantifying target levels.**

To test whether bSSAs can detect a range of different targets, we adapted existing SSAs, adapted existing aptamers or acquired commercial SSAs selected by Naloomar using their proprietary DeepSeq selection pipeline; in each case, we tested whether barcode release can quantify target levels. In all cases tested, the more ligand is present, the more sensors bind their target and the more of their barcode SRO is released; levels of released barcode, thus, report on levels of the target (for example, adenosine triphosphate (ATP) in Fig. 1b; other examples in Extended Data Fig. 1a and details of barcode detection in Extended Data Fig. 3). We next assessed bSSA specificity because each bSSA must only respond to its correct target and not to other molecules for smol-seq to work. We found that bSSAs are highly specific; for example, an ATP bSSA detects ATP but not other nucleoside triphosphates (NTPs) (Fig. 1b), a glucose bSSA discriminates between glucose and galactose (Fig. 1c) and an ampicillin bSSA¹¹ detects ampicillin but not carbenicillin, a highly related penicillin (Fig. 1d). Remarkably, bSSAs can even distinguish between stereoisomers; for example, bSSAs can distinguish among stereoisomers of glucose (Fig. 1c), phenylalanine and lactate (Extended Data Fig. 1b). Distinguishing between stereoisomers is challenging for metabolomics and this appears to be a strength of aptamer-based sensors. We, thus, conclude that bSSAs can achieve very high specificity.

In real-world settings, bSSAs must accurately report on the abundances of individual targets in complex mixtures of metabolites such as in cytoplasm or plasma. This requires high specificity of the LBO for ligand and for the SSA barcode readout to be functional in these complex settings. We tested a bSSA that detects the antimalarial piperaquine (PQ)¹² and found that this can respond specifically to PQ levels in complex settings such as cell lysate or in Luria–Bertani (LB) bacterial growth medium (Fig. 2a). We also exposed Caenorhabditis elegans worms to PQ and found that the PQ bSSAs could distinguish between lysates of worms exposed to PQ from controls (Extended Data Fig. 2a). We found that a cortisol bSSA¹³ quantifies exogenously added cortisol in yeast extract, showing that this cortisol bSSA only detects exogenous cortisol and not the highly related endogenous yeast ergosterol (Fig. 2b and Extended Data Fig. 2b). Lastly, we used a bSSA that detects ATP¹⁴ to measure endogenous ATP levels in the parental Escherichia coli strain and a mutant strain that contains a deletion for the gene cyoA, which is known to have reduced ATP levels¹⁵. We found significantly reduced ATP levels in the ΔcyoA mutant and confirmed this reduction with an independent luciferase ATP assay¹⁶ (Fig. 2c). These data together show that bSSAs can be used to measure exogenous or endogenous targets in complex molecular settings such as cell lysates.

**Fig. 2: bSSAs can detect targets in complex mixtures and can be multiplexed.**

Lastly, for smol-seq to be used for deep targeted metabolomics, it must be able to detect many targets simultaneously. To test this, we multiplexed multiple SSAs in the same assay; each had an SRO with a distinct barcode, whereby the activity of each sensor could be measured by sequencing the released barcodes (Fig. 2d). We found that each bSSA works as an independent sensor and we could read multiple sensors in parallel using barcode sequencing. bSSAs can, thus, be multiplexed.

A key limitation for smol-seq is that any specific bSSA sensor does not have infinite linear range. Therefore, if the target is at too high a concentration in the sample, it cannot be accurately measured. While this can be readily solved for a specific target by diluting the analyte such that it falls in the correct range, if the goal is to measure hundreds of targets at once, this is impractical. Conceptually, a solution is to have many sensors that all detect the same target but with different dose responses and read each out independently with a different barcode. To do this efficiently, we developed a generic way to tune an initial parent sensor through subtle changes to the stem region of the bSSA, generating a suite of derived sensors that have the same target specificity but with altered dynamic range. We illustrate this with sensors for PQ (Fig. 2e), vancomycin, glucose and cortisol (Extended Data Fig. 2c). In each case, the derived sensors had different dynamic ranges and combining these derived sensors greatly expanded the detection range for any target.

In summary, smol-seq is a method that uses SSAs to detect and quantify metabolites and read out their levels in the form of DNA sequence. This converts the complex and challenging problem of metabolomics into a simple DNA sequencing problem. Because the output of smol-seq is DNA barcodes, it is easy to integrate with other sequence-based workflows such as RNA sequencing, allowing metabolomics to be added to other multiomics readouts. In addition, the barcodes can be PCR-amplified, which opens up the possibility of spatial or single-cell metabolomics where sample volumes and target abundances are highly limited. The next key goal is to build up large collections of sensors through a combination of classic systematic evolution of ligands by exponential enrichment (SELEX) and data-driven artificial intelligence approaches. This will harness the power and ease of DNA sequencing for the quantification of metabolites and drugs.

Methods

Reagents

All oligonucleotides were synthesized by Integrated DNA Technologies and dissolved in nuclease-free water at a concentration of 100 µM. The oligonucleotide sequences used were adapted from Yang et al.^10,13, Coonahan et al.¹², Song et al.¹¹, Warner et al.¹⁷, Huizenga and Szostak¹⁴, Nakatsuka et al.¹⁸, Huang and Liu¹⁹ or Dauphin-Ducharme et al.²⁰ or selected by Naloomar using their proprietary DeepSeq platform. Oligonucleotide sequences used for SSAs are listed in Supplementary Data 1. Stock solutions were made in nuclease-free water with the exception of the stock solutions of d-glucose, d-galactose, d-fructose and l-glucose (1 M each), which were made directly in SELEX buffer (20 mM HEPES (pH 7.5), 1 M NaCl, 10 mM MgCl₂ and 5 mM KCl). Stock solutions of PQ tetraphosphate (1 mg ml⁻¹) and mefloquine (MQ) hydrochloride (500 μg ml⁻¹) were made in 5% methanol. Stock solutions of cortisol were made in 100% ethanol with a final working concentration of 5% ethanol. The stock solution of DFHBI-1T (20 mM) was made in DMSO.

Ligand-binding assay

For each sample, 5 µl of Dynabeads MyOne Streptavidin C1 magnetic beads (Invitrogen) were washed three times in bind and wash buffer as per the manufacturer’s protocol and finally resuspended in 10 μl of binding buffer. Then, 25 pmol of LBO and 125 pmol of SRO (10 μl in total, diluted in binding buffer) were heated at 95 °C for 5 min and slowly cooled to 25 °C. The oligos were then added to the resuspended beads and incubated on a rotator for 30 min at room temperature. The beads were then washed two times with 100 μl of binding buffer and resuspended in 20 μl of the same buffer and incubated for another 45 min. The beads were then washed once and resuspended with various concentrations of ligand. The samples were incubated for 45 min on a rotator at room temperature and the supernatant was collected (‘supernatant’ sample). The beads were then resuspended in 20 μl of strand separation buffer without magnesium and heated at 95 °C for 4 min; the supernatant sample was collected (‘remainder’ sample). The percentage release was then calculated as (frac{{rm{Supernatant}}; {rm{RFU}}}{{rm{Supernatant}}; {rm{RFU}}+{rm{Remainder}}; {rm{RFU}}}times100), where RFU is the relative fluorescence units. For dose–response curves, the minimum and maximum release were additionally normalized to 0 and 100, respectively.

For experiments with l-phenylalanine and d-phenylalanine, the amino acids were first complexed with Cp*Rh(III) at a final concentration of 100 μM [Cp*RhCl₂]₂ and varying concentrations of amino acids as described by Yang et al.¹⁰ and incubated together at room temperature for >45 min.

For experiments with glucose, phenylalanine, lactate, ampicillin and carbenicillin, SELEX buffer was used as the binding buffer and a mixture of 20 mM HEPES pH 7.5 and 300 mM NaCl was used as the strand separation buffer. For all other experiments, PBS pH 7.4 (Gibco) supplemented with 10 mM MgCl₂ was used as the binding buffer and PBS without magnesium was used as the strand separation buffer.

For the LB experiments, the LB broth (BioShop) was diluted 1:2 in PBS + 10 mM MgCl₂ and spiked with various amounts of PQ. SSAs were then incubated in these mixtures. LB medium with no PQ added was used as the negative control. For lysate experiments, lysates were derived from N2 C. elegans worms lysed in 3× pellet volume of extraction solvent (detailed below), diluted 1:5 in PBS + 10 mM MgCl₂ and spiked with various amounts of PQ. Lysate with no PQ added was used as the negative control. Mg²⁺ was added to all diluted samples to a final concentration of 10 mM added MgCl₂. Dose responses for spike-in experiments were plotted after subtracting the percent release at each ligand concentration from the percentage release in lysate or LB alone.

For yeast extract experiments, a 30% stock solution of autolyzed yeast extract (BioShop) was made in distilled H₂O and filter-sterilized. The stock was then diluted 1:3 in PBS + 10 mM MgCl₂ for a final concentration of 10% yeast extract. Because the yeast extract alone affected fluorescence readings in the supernatant samples, the normalized percentage release values presented were taken from the inverse of the remainder samples.

SRO readout

In this paper, SRO levels were detected through release of an SRO with a T7 promoter sequence or an SRO with a fluorophore attached (Extended Data Fig. 3a,d). For T7 barcode SROs, we used SROs containing a barcode corresponding to the sense strand of the T7 promoter (Extended Data Fig. 3a). When this is released, it can hybridize to a complementary reporter oligo that contains an antisense T7 promoter and a region corresponding to the Baby Spinach RNA aptamer²¹ (Extended Data Fig. 3b). When the T7 barcode:reporter duplex is formed, this can direct transcription of the Baby Spinach RNA, which can be detected using fluorescence in the presence of DHFBI or its derivative DFHBI-1T; this gives a linear measurement of barcode SRO release across three orders of magnitude (Extended Data Fig. 3c). For samples with a T7 SRO, 1.5 µl of the 20-µl supernatant sample was added to 5 µl of nuclease-free water and 2 µl of 10 µM Baby Spinach transcription template. Oligos were mixed and heated at 95 °C for 5 min and slowly cooled to 25 °C. Reaction buffer, NTPs (2 mM each) and T7 RNA polymerase were then added to the sample for a total volume of 20 µl and incubated at 37 °C for 2 h. After transcription, the sample was then heated at 90 °C for 2 min before incubation on ice for >3 min. Then, 30 µl of water and 5 µl of 100 µM DFHBI-1T (diluted in Tris-HCl buffer: 40 mM Tris-HCl pH 8.0, 5 mM MgCl₂ and 125 mM KCl) were added to 15 µl of the sample, which was then heated to 65 °C for 5 min and slowly cooled to room temperature (as per Okuda et al.²¹). Each sample was then transferred to a 96-well plate and the fluorescence intensity for each sample was measured using a FLUOstar Omega microplate reader (excitation: 485 nm, emission: 520 nm). For samples with a 6-FAM SRO, 20 µl of the sample was added to 35 µl of water and fluorescence was directly measured using the microplate reader (excitation: 485 nm, emission: 520 nm). Data were then exported and analyzed in Excel and plotted in Python using standard scientific (SciPy, NumPy and Pandas) and plotting (Matplotlib and Seaborn) packages.

Multiplexing and sequencing of bSSAs

bSSAs that recognize ATP, PQ or MQ were mixed together; each SSA had a distinct barcode and we used three different 8-nt barcodes for each SSA. Each barcode SRO had the following design, comprising the regular SRO stem sequence, an 8-nt barcode, a PCR handle complementary to the Illumina P5 adaptor sequence and a random N_5–7 sequence to help with sequencing these low-diversity sequences:

5′ Stem-specific sequence–8-nt barcode–N_5–7–AGATCGGAAGAGCGTCGTGTAG 3′

The pool of sensors was added to either a solution of 125 μM ATP, 30 μM MQ, 25 μM PQ or PBS buffer for 45 min and the released barcode SROs were collected (‘released’ samples). To account for the variation in the amount of each sensor present after the bind and wash steps, the remaining bound barcode SROs were also boiled off and collected, allowing us to measure the percentage release similar to our fluorescence assays (‘boiled’ samples). Samples from three independent replicates (identifiable through a short identifier sequence) were then pooled together before library preparation. To account for differences in the amount of amplification in each sample through each PCR and sequencing step, spike-in controls were added after this step to both the released and the boiled samples to allow normalization of counts after sequencing. DNA from each collected sample was then purified using the Monarch PCR and DNA cleanup kit (New England Biolabs), using the oligonucleotide cleanup protocol to recover the short single-stranded DNA.

A first round of PCR was then carried out to add on partial P5 and P7 adaptor sequences onto the barcode SRO sequences. PCR was performed using a minimal number of cycles to avoid PCR saturation (14 cycles for the released samples and 11 cycles for the boiled samples). PCR products were then cleaned up using the Monarch PCR and DNA cleanup kit (New England Biolabs) (PCR cleanup protocol) and eluted in 20 μl of elution buffer. This was then diluted 1:5 with water and 5 μl of the DNA was used for the second round of PCR (100-μl reaction; eight cycles) to add on i5/i7 dual indices for each library and the complete P5/P7 adaptor sequences. PCR products were then cleaned up using the Monarch PCR and DNA cleanup kit (New England Biolabs). An additional size-selection step was then performed using the NucleoMag NGS cleanup and size select kit (Macherey-Nagel) at a 1.2 bead-to-sample ratio. The samples were then resuspended in 10 mM Tris-HCl pH 8.0 and sent for MiSeq sequencing (~4 million 50-bp single-end reads) at the Center for Applied Genomics.

Counts for each barcode SRO were collated from the parsed FASTQ files. Spike-in controls (a series of twofold titrations for a total of 10 sequences) were used to normalize read counts between the buffer-only released and boiled control samples to generate the background percent release rates for each sensor. We also used the spike-in controls to confirm linear amplification throughout the process up to the sequencing output. Because we noticed that presence of the different ligands affected the DNA yield after the DNA purification steps, we used the numbers in the control sample to adjust read counts accordingly, applying a size factor to each sample to standardize read counts, following the assumption that background release values should be consistent across all samples. The percentage release values were then calculated for each barcode SRO using the normalized read counts in the release and boiled samples. For each replicate, we noticed very similar ‘percentage release’ values for all three barcodes corresponding to a specific ligand and averaged these three values to generate the data plotted. All raw counts and percentage release values for each barcode and replicate are in Supplementary Data 2.

Drug treatment and worm lysis

C. elegans worms (wild-type N2 strain and bus-5(br19), which shows increased drug permeability²² were grown and maintained on nematode growth medium (NGM) agar plates seeded with OP50 bacteria. Strains were provided by the Caenorhabditis Genetics Center (CGC), which is funded by the National Institutes of Health (NIH) Office of Research Infrastructure Programs (P40 OD010440).

For drug treatments, OP50 cultures were heat-killed at 65 °C for 30 min, spun down and then concentrated twofold in NGM. Next, 1,600 μl of the suspended cultures were then added to 200 μl of mixed-stage bus-5(br19) worms in M9, along with 200 μl of tenfold concentrated drug. The worms were then incubated with the drugs for 7 h in a 20 °C shaker, with PQ added at a final concentration of 100 μg ml⁻¹. Worms were treated with 5% methanol in lieu of drugs as the negative control. After a 7-h drug treatment, the worms were washed three times in M9 followed by once in PBS. The worm pellet was then flash frozen and stored at −80 °C.

To lyse the worms, the frozen pellets were ground with a pestle until the pellet defrosted. Extraction solvent (8:1:1 ratio of methanol, chloroform and water) was then added to the tubes at three times the volume of the pellet. Samples were then vortexed and subjected to three freeze–thaw cycles. The tubes were then centrifuged at 16,000g for 15 min and the supernatants were collected and stored at −80 °C until needed. The worm lysate used for the PQ spike-in experiment was collected from N2 worms that were not treated with any drugs.

Quantification of ATP levels in E.
coli strains

The parental (BW25113) and ΔcyoA strains used were taken from the E. coli Keio knockout collection²³. Then, 2 ml of bacteria were inoculated and grown in LB medium at 30 °C for 24 h and collected at the stationary phase. Samples were spun down at 3,500g for 5 min at 4 °C and the pellet was quenched in 75 μl of 80% cold methanol and water. Samples were then stored at −80 °C for passive cold extraction and subjected to multiple freeze–thaw cycles. To normalize for differences in growth rates and/or bacterial density in the collected samples, the volume of extraction solvent used was then normalized on the basis of optical density measurements of the quenched samples at 600 nm. Samples were then spun down at 16,000g for 10 min at 4 °C and the supernatant was collected and used for ATP measurements. For luciferase-based ATP assays, the extracted sample was diluted 1:10 in PBS buffer and added to an equal volume of BacTiter-Glo reagent (Promega). Luminescence was then measured after 5 min on the FLUOstar Omega microplate reader. For ATP sensor assays, the extracted sample was diluted 1:20 in PBS + 10 mM MgCl₂ and the assay was run using a 6-FAM SRO as described above. Fluorescence was then measured after 40 min of ligand–sensor incubation (excitation: 485 nm, emission: 520 nm).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.