Plasma proteome variation and its genetic determinants in children and adolescents

Main

The global prevalence of pediatric obesity has increased markedly over the past four decades, with affected children and adolescents facing elevated risks of prediabetes, metabolic syndrome, asthma and fatty liver disease1. Studying childhood obesity is vital for understanding its health consequences and formulating effective prevention and treatment strategies2.

The concentration of some proteins in the blood varies with growth, especially during puberty, as a result of hormonal and metabolic changes, immune maturation and tissue development. Examples include insulin-like growth factor 1 (IGF1), leptin, growth hormone, sex hormones, insulin and C-reactive protein3.

The levels of specific proteins in human blood are the most commonly used indicators of potential health issues4. Understanding the genetic and other determinants of the plasma proteome can support biomarker research and drug development5,6,7,8. Factors such as genetics, age, sex, body mass index (BMI), growth and development including puberty affect circulating protein levels9,10,11,12, and this is important to investigate at a large scale in children and adolescents, covering pre-pubertal and post-pubertal stages.

Affinity-based proteomics can infer the relationship between blood protein levels and these factors at a large scale8,13,14,15,16. In comparison, mass spectrometry (MS)-based proteomics provides much higher specificity in identifying and quantifying proteins17,18,19, as MS measures the mass of peptides and their fragments down to several parts per million. However, MS-based proteomics has been limited by lower throughput and fewer quantified proteins because of constraints of MS sensitivity and other factors20,21,22,23. In bottom-up MS-based proteomics, peptide signals are measured, quantified and assembled into proteins using a reference proteome database. Depending on the number of peptides mapping to each protein, this generally yields several data points on the identity and quantitative changes of the proteins in the plasma proteomes of the study participants. These can be used for better protein-level estimates or the removal of outliers. By contrast, binder-based methods typically probe a single epitope of plasma proteins or two closely spaced ones, as in the Olink proximity extension assay. Here, we aimed to alleviate the limitations of MS-based proteomics, enabling us to investigate larger cohorts with greater proteome depth than previously possible, including replication cohorts.

Results

Discovery and replication cohorts

Our discovery cohort included 2,147 children and adolescents aged 5–20 years from the HOLBAEK Study, 45% of whom were from the general population and 55% from the Children’s Obesity Clinic in Holbæk, Denmark (Fig. 1a). For replication of the genetic effects on plasma protein levels, we included 1,000 additional children and adolescents from the HOLBAEK Study and 558 adults with alcohol-related liver disease aged 19–82 years, matched with healthy controls recruited from the Region of Southern Denmark5. Baseline characteristics for all cohorts are provided in Tables 1 and 2, with the discovery cohort further stratified by sex in Supplementary Note 1. We conducted single-nucleotide polymorphism (SNP)-based genotyping and MS-based plasma proteome profiling (Fig. 1b and Methods). We acquired plasma proteome profiles of all participants (n = 2,147) with a data-independent acquisition (DIA) strategy and a single-run workflow, using a liquid chromatography system designed for robust clinical use24 coupled online to a recently released Orbitrap Astral mass spectrometer25 (Fig. 1c). We performed sample preparation and liquid chromatography–MS analysis using equal plasma volumes, with precursor-level MS signal normalization across samples using Spectronaut’s cross-run normalization method (Methods and Supplementary Note 2)26. After stringent quality control, 1,216 proteins remained with 91% data completeness. Analyzing 94 quality assessment samples over a 6 week measurement period revealed a 33% median coefficient of variation for the entire workflow (Supplementary Fig. 1a–d). We estimated the effect of age, sex, BMI-standard deviation score (BMI-SDS) adjusted for age and sex27 and 5.2 million SNPs on plasma protein levels (Fig. 1d).

Table 1 Participant characteristics in the discovery cohort
Full size table
Table 2 Participant characteristics in the replication cohorts
Full size table
Fig. 1: Study overview and proteomics workflow.
figure 1

a, Discovery and replication cohorts used in this study. b, MS-based plasma proteome profiling and SNP-based genotyping were performed on the discovery and replication cohorts. c, Proteome profiling workflow and the computational tools used for processing of proteomics data, including (1) sample organization, (2) sample preparation, (3) data acquisition and (4) informatics. d, Schematic representation of (1 and 2) the association analysis with (3) a quality control step to eliminate artefactual pQTLs as described in the main text and (4) prediction of age, BMI and genotype based on plasma protein levels. ALD, alcohol-related liver disease; m/z, mass to charge ratio; QA, quality assessment; QC, quality control.

Full size image

Impact of demographic and health factors on plasma proteome

To the best of our knowledge, this is the first MS-based proteomics study on neat plasma with substantial depth (>1,000 proteins), largely owing to the use of the Orbitrap Astral mass spectrometer25 for this purpose. We investigated the biological processes represented by the quantified proteins. The most prevalent processes included complement and coagulation cascades, metabolism and inflammatory response, reflecting the key roles of plasma proteins in immunity, blood clotting and transport (Fig. 2a).

Fig. 2: Age-associated, sex-associated and BMI-SDS-associated plasma proteins.
figure 2

a, Biological processes represented by all identified proteins after quality control. b, Schematic representation of linear modeling of protein levels using various factors. c, Number of proteins associated with age, sex, BMI-SDS and the interaction term between obesity status and BMI-SDS; n = 1,601 biologically independent samples. df, Volcano plots showing proteins associated with age (d), sex (e) and BMI-SDS (f), highlighting strongly associated proteins. For cf, multiple linear regression was used to test for association, with beta coefficients estimated using ordinary least squares regression. Two-sided P values were approximated using a t-distribution with significance set at Benjamini–Hochberg-corrected P < 0.05. g, Schematic representation of linear modeling of age and BMI using plasma proteome. h,i, Prediction of age (h) and BMI (i) in the test set. Pearson’s correlation coefficients between predicted and real values are indicated; n = 639 biologically independent samples.

Source data

Full size image

To assess the impact of age, sex and BMI-SDS27 on the plasma proteome, we performed multiple linear regression analysis (Fig. 2b and Methods). In total, 58% of the quantified plasma proteins were associated with at least one of these factors (40% with age, 32% with sex and 22% with BMI-SDS), with 8% linked to all three based on the current dataset and our modeling (Fig. 2c). Proteins most strongly associated with age included known age-related proteins such as F9 (ref. 28), RBP4 (ref. 29) and COL1A1 (ref. 9) as well as others not previously linked to age, such as GPLD1, APCS and IGFALS (Fig. 2d and Supplementary Table 1). The age association of IGFALS aligns with evidence that defects or low expression of IGFALS can cause pubertal delay in children30. In addition to individual proteins, our dataset reveals insights into three critical biological processes that occur during pediatric development. Firstly, we observed an age-related increase in IGF1 receptor signaling, with IGF1 levels peaking at ages 12–13 years in female adolescents and 14–15 years in male adolescents and then declining (Extended Data Fig. 1a,b), consistent with the literature31,32. IGFBP5 mirrored IGF1, although the levels of IGFBP1 and IGFBP2 declined from childhood through adolescence (Extended Data Fig. 1c–e). IGFBP1 and IGFBP2 were consistently reduced in children with obesity, along with adiponectin (Extended Data Fig. 1d–f). Similarly, SHBG, A2M and CRP all showed obesity-dependent levels (Extended Data Fig. 1g–i). Secondly, we captured post-pubertal declines in essential bone development proteins (ACAN, COL1A1, COL1A2, THBS4, COMP and POSTN), reflecting sex-specific differences in growth plate closure during skeletal maturation (Extended Data Fig. 1j–m). Aggrecan (ACAN) is a major proteoglycan in cartilage, and mutation in this protein causes early growth cessation, resulting in a severely reduced adult stature31. Our data documents a more than tenfold decline in ACAN levels starting at age 12–13 years for female adolescents and 13–14 years for male adolescents, offering the potential for early growth disorder diagnostics. Thirdly, proteins involved in angiogenesis and cellular adhesion (ANGPTL3, CDH5, ITGB1, ICAM1, VCAM1 and ACE) showed an age-related decline (Extended Data Fig. 1n–q). This analysis identified proteins not previously associated with childhood obesity, to the best of our knowledge, including A2M, PON3, ADAMTSL4, HSPG2 and MAGEB6B, all of which showed decreased levels in children with obesity.

Similarly, we recapitulated known sex differences in protein levels, such as PZP and BCHE, and identified proteins without previously reported sex-specific differences, including CD5L (Fig. 2e and Supplementary Table 1). Notably, CD5L and IGHM emerged as the top proteins associated with sex, aligning with the observation that females have higher IgM levels than males33. This finding is further supported by a recent report showing that CD5L is an obligate member of circulating IgM34.

We included an interaction term between BMI-SDS and obesity status to assess whether protein association with BMI-SDS differed between subgroups. Of the 240 BMI-SDS-associated proteins, 163 were specific to the obesity group and 32 showed differing effect sizes between groups (Fig. 2c). This suggests that a general population of similar size would have yielded fewer BMI-SDS-associated proteins. Inflammatory proteins showed the strongest associations with BMI-SDS, including CRP, complement system proteins (C3, CFH, CFI) and acute phase proteins (A2M, APCS, SAA1, LBP)35 (Fig. 2f). Most of these were also statistically significant in the normal weight group, albeit with smaller effect sizes, indicating that elevated inflammatory protein levels with increasing BMI-SDS are not exclusive to obesity (Supplementary Table 1). Among the BMI-SDS-associated proteins is ANGPTL3, which has shown variable associations with BMI and obesity in previous studies (Supplementary Note 3). Notably, we observed that PRG4 decreased with weight loss35, consistent with its positive association with BMI-SDS in this dataset. Interestingly, PRG4 deficiency protects mice against glucose intolerance and fatty liver disease, suggesting the therapeutic potential of the proteins identified here36.

Further exploring age, sex and BMI-SDS interactions revealed 24 proteins with interaction effects between age and BMI-SDS, 18 between sex and BMI-SDS and 149 between age and sex, including PZP, AGT, SHBG and the above-mentioned bone development proteins (Supplementary Table 2).

Plasma protein levels can serve as a ‘biological clock’ in adults9. We extended this concept to children and adolescents, accurately estimating age within ±1.2 years using the 50 most predictive proteins (Pearson’s r = 0.85 between predicted and actual age) in a subset of 639 individuals not included in model training (Fig. 2g,h, Supplementary Table 3 and Methods). Likewise, a panel of 50 proteins consistently indicated BMI (Fig. 2i). Age-predictive proteins primarily regulated IGF1 receptor signaling, cartilage and skeletal development, fibroblast growth factor response and cell–cell adhesion. Notably, the top five to ten proteins alone predicted age nearly as well (mean absolute error, 1.5–1.6 vs 1.2 years for all 50). BMI-predictive proteins included established obesity markers and obesity-related proteins, including adiponectin, CRP, IGFBP1, IGFBP2, PRG4, SHBG, apolipoproteins (APOA4, APOF) and inflammatory response proteins (A2M, APCS, LBP, HSPG2, HP, AOC3, ITGB1, VNN1).

Effect of SNPs on the plasma proteome

We tested 5.2 million SNPs for association with plasma levels of 1,216 proteins in 1,909 individuals (Methods). We defined the primary protein QTL (pQTL) of a protein as the most significant variant in linkage disequilibrium (r2 > 0.2) within ±1 Mb of the protein-coding gene (Supplementary Note 4)37,38.

In addition, we performed genome-wide association analysis on nearly 10,000 peptides after quality control. Protein-altering variants can lead to alterations in amino acid sequences and modify the binding surfaces in the case of affinity-based technologies, potentially introducing biases in proteomics studies17,22,39. To address this source of potential bias, we developed a framework leveraging peptide-level data to exclude artefactual pQTLs and categorize pQTLs into confidence tiers based on peptide-level evidence (Methods and Extended Data Fig. 2a–d).

Applying a study-wide significance level of P < 4.1 × 1011 (5 × 10−8 adjusted for 1,216 proteins tested) yielded 1,252 primary pQTLs for 327 proteins (Supplementary Table 4). For downstream analysis, we adopted the conventional genome-wide association study (GWAS) significance threshold of P < 5 × 10−8, identifying 1,947 primary pQTLs for 443 proteins (Fig. 3a,b). Approximate conditional analysis revealed 733 conditionally independent pQTLs for 443 proteins (Methods and Supplementary Table 5). Genomic inflation was well controlled (median lambdaGC = 1.002, s.d. = 0.004; see Supplementary Note 5 for quantile–quantile plots). Comparison with our previous dataset, which was generated using an older generation of MS40, reveals that our methodology produces highly consistent protein quantification across time points and instrumentations (Extended Data Fig. 3a–h, Supplementary Table 6 and Supplementary Note 6). This confirms the robustness of our findings and suggests that our approach can reliably detect biological variation despite technological advancements or delays in sample analysis.

Fig. 3: Characterization of pQTLs.
figure 3

a, Primary pQTLs across the genome (two-sided Wald test in a linear mixed model with genome-wide significance P < 5 × 10−8). b, Primary pQTLs against the locations of the transcription start site of the gene coding the protein target. c, Variant annotation. d, Classification of pQTLs based on peptide-level evidence. e, Number of cis-pQTLs and trans-pQTLs. f, Number of proteins that are associated with cis only, trans only and both cis and trans-pQTLs. g, Distribution of the number of associated proteins per SNP. h, Distribution of the number of associated SNPs per protein. ik, Proportion of proteins with genetic associations when stratifying proteins into buckets based on technical variation (i), median abundance (j) and number of identified peptides after quality control per protein (k). TF, transcription factor; UTR, untranslated region; CV, coefficient of variation.

Source data

Full size image

These pQTLs are primarily located in the non-coding regions, with only 3% representing missense and 1% synonymous variants (Fig. 3c, Supplementary Table 4 and Methods). The dominance of non-coding variants among the identified pQTLs is consistent with previous studies reporting 86% and 98% of them37,41.

Genetic variations normally affect the expression level of the entire protein. Therefore, all peptides identifying the same protein should generally show the same fold change between genotypes. Our peptide-level analysis showed that 77% of reported pQTLs had at least two supporting peptides (Fig. 3d and Supplementary Table 4). Notably, in 94% of these cases, all peptides exhibited the same direction of effect, indicating highly consistent quantitative information at the peptide level (Extended Data Fig. 4a–c). The peptide-level data also helps quantify protein variants affected by amino acid substitutions, a limitation in affinity-based proteomics as demonstrated by the influence of rs9898 on histidine-rich glycoprotein abundance42. We identified the association between rs9898 and circulating histidine-rich glycoprotein levels, which was successfully replicated in both the pediatric and adult cohorts, with a 62% protein sequence coverage and 26 supporting peptides (Extended Data Fig. 5a–d). Importantly, protein quantification was unaffected by the missense mutation (Pro204Ser), which was not identified, probably because it would only produce a four-amino-acid sequence (NCPR) but would have been an outlier if it had. This example illustrates an important advantage of MS-based proteomics in navigating the complexities of protein quantification across variants.

Fig. 4: Variance in plasma protein levels explained by various factors.
figure 4

a, Proportion of variance explained by conditionally independent pQTLs, age, sex, obesity and BMI-SDS (summed variance from BMI-SDS and its interaction with obesity status). Proteins are ordered by decreasing variance attributable to independent pQTLs. bd, Pairwise comparisons of variance explained by independent pQTLs across three age groups: 10-14 years vs. 5-9 years (b), 15-20 years vs. 5-9 years (c) and 15-20 years vs. 10-14 years (d); Pearson correlation coefficients are also shown.

Source data

Full size image

Remarkably, 62% of discovered pQTLs were in cis and 60% of the proteins had at least one cis-pQTL associated, implying pervasive local regulation (Fig. 3e,f). The MS-based proteomics data recapitulated that the same genomic locus can regulate multiple proteins and that one protein can be regulated by multiple genomic loci. Specifically, 25% of the pQTLs were associated with more than one protein, while 64% of the proteins had multiple pQTLs associated with them and 26% of these had pQTLs located on different chromosomes (Fig. 3g,h).

We reasoned that higher plasma protein abundance would manifest in a higher number and signal of identifying peptides, increasing the likelihood of finding genetic associations. Indeed, as abundance increased, so did the proportion of proteins with genetic associations, a pattern also seen with increased technical reproducibility and peptide count per protein after quality control (Fig. 3i–k).

Decomposition of variance in plasma protein levels

Having established the quality of the pQTLs, we performed variance decomposition to understand the relative contribution of genetic variation and demographic factors to plasma protein levels (Methods). This revealed that independent pQTLs accounted for 1% to 66% of the variance in protein levels (average, 11%), and for 63% of proteins, pQTLs contributed more variance than age, sex, BMI-SDS and obesity combined (Fig. 4a and Supplementary Table 7). However, some proteins were primarily affected by other factors: SHBG by age and obesity; PRG4 by obesity; and IGF1 and RBP4 by age.

To address the important question of how stable the genetic influences are across pediatric development, we segmented the cohort into 5–9, 10–14 and 15–20-year-olds. This revealed a remarkable stability in genetic influences, with Pearson correlations between 0.95 and 0.98 (Fig. 4b–d). Our results establish that MS-based proteomics can provide unique insights into factors determining protein-level variance throughout pediatric development.

Characterization of pQTL effect sizes

Next, we investigated the effect size using beta statistics derived from the association tests and allelic fold change calculated on data before normalization43 (Methods). Although they were mostly mild, 143 pQTLs for 71 proteins exceeded twofold differences in protein levels (absolute log2(fold change) of >1) between homozygous reference (0/0) and alternative (1/1) genotypes (Supplementary Fig. 2a,b and Supplementary Table 4). These large genetic effect sizes led us to reason that protein levels could predict genotype, which was confirmed for eight proteins with a balanced accuracy of ≥0.8 (Supplementary Table 8 and Methods). Local (cis-pQTL) regulation generally had the largest effect sizes (Supplementary Fig. 2c,d). Furthermore, missense mutations, variants in the 5′ and 3′ untranslated regions and transcription factor binding sites had greater average effect sizes than intronic and intergenic variants (Supplementary Fig. 2e), supporting an important role of these regions in transcriptional regulation. Large effect sizes for proteins like MST1, PROCR, BST1, IL1RAP, APOE and LPA may have important implications for clinical and biomarker research (Fig. 5a–f). Notably, the rs2232613-T missense mutation reduced levels of LBP fourfold (Supplementary Table 4). Given the important role of this protein in innate immunity, we speculate that individuals with the mutant form have compromised immunity, which has indeed been reported44. The observed substantial genetic effects emphasize the importance of considering pQTL information when interpreting findings from clinical and biomarker research, particularly for proteins under strong influence of genetic variation.

Fig. 5: Effect sizes and integration of pQTLs with known variant–trait associations.
figure 5

af, Distribution of log2 intensity values of the top six proteins with the highest absolute beta value in genome-wide association analysis. The gray line in the middle of the box is the median, the top and bottom of the box represent the upper and lower quartile values of the data and the whiskers represent the upper and lower limits for consideration of outliers (Q3 + 1.5 × IQR, Q1 – 1.5 × IQR); IQR, interquartile range (Q3 – Q1); MAF, minor allele frequency. For genotype 0/0:0/1:1/1, the numbers of biological replicates are n = 1278:328:6, 1708:191:0, 1490:399:23, 1225:410:35, 1637:260:15 and 1227:606:79, respectively. Only non-imputed values are shown. g, Venn diagram showing the number of protein–outcome pairs that are significant in colocalization analysis (HyPrColoc method) and two-sample Mendelian randomization (MR), using a two-sided Wald ratio test implemented in the twoSampleMR package with significance defined as P < 2.5 × 10−6 (correcting for the number of protein-coding genes). h, Protein–trait pairs that are colocalized and with supporting evidence for causation from MR. CAD, coronary artery disease; ALT, alanine aminotransferase; AST, aspartate aminotransferase; CRP, C-reactive protein; ALP, alkaline phosphatase; GGT, gamma-glutamyl transferase; HbA1c, hemoglobin A1c; LDL, low-density lipoprotein; eGFR, estimated glomerular filtration rate; SBP, systolic blood pressure; DBP, diastolic blood pressure; WHRadjBMI, waist-to-hip ratio adjusted for BMI; ASCVD, atherosclerotic cardiovascular disease; MASH, metabolic dysfunction-associated steatohepatitis; CKD, chronic kidney disease.

Source data

Full size image

Integrating pQTLs with variant–trait associations

To investigate whether our MS-based study had discovered previously unreported pQTLs, we compared it to 35 published studies (Extended Data Fig. 6a, Supplementary Table 9 and Methods). This revealed 643 such pQTLs regulating 213 proteins, of which 140 proteins had no previously reported genetic regulation (Supplementary Table 4). Of the remaining pQTLs, 55% were replicated in at least five studies, with cis-pQTLs showing higher replication rates than trans-pQTLs, indicating stronger and more consistent evidence for cis-pQTLs across studies (Extended Data Fig. 6b,c).

Fig. 6: Replication of pQTLs in children and adults.
figure 6

a, Correlation of beta coefficient for replicated pQTLs in the children replication cohort. b, Correlation of beta coefficient for replicated pQTLs in the adult cohort. c, Distribution of absolute beta coefficient of replicated and non-replicated pQTLs. d, Manhattan plot of association between SNPs and plasma levels of TGFBI in the discovery cohort (upper panel) and adult replication cohort (lower panel) with the lead variant annotated. e, Manhattan plot of association between SNPs and plasma levels of LBP in the discovery cohort (upper panel) and adult replication cohort (lower panel) with the lead variant annotated. f, Distribution of plasma levels of TGFBI stratified by the genotype of its lead-associated variant and fibrosis stage in the adult replication cohort. For genotype 0/0:0/1:1/1, n = 96:181:90 and 55:93:43 for fibrosis stage F0–F1 and F2–F4, respectively. g, Distribution of plasma levels of LBP stratified by the genotype of its lead-associated variant and steatosis stage in the adult replication cohort. For genotype 0/0:0/1:1/1, n = 318:52:3 and 157:25:3 for steatosis <5% and ≥ 5%, respectively. GALA–ALD, gut and liver axis–alcohol-related liver disease.

Source data

Full size image

GWAS provide increasingly detailed associations between genomic loci and phenotypes but often lack mechanistic insights into the proteins mediating the effects43,45,46. To close this gap, we investigated whether our novel pQTLs were associated with phenotypic traits. Mapping our protein-associated variants to the GWAS Catalog revealed 162 such cases (Supplementary Table 10 and Methods). Many connections were immediately biologically plausible; for example, rs73001065 has been associated with a decreased risk for non-alcoholic fatty liver disease, which our data indicates may happen through reduced APOB levels. Individuals with the bone mineral density-lowering allele rs13469-T in POLDIP2 had reduced levels of SPP2, a protein crucial for bone metabolism healing47. Similarly, children with the type 2 diabetes risk allele rs529565-C in ABO had higher MASP1 levels in our data, consistent with elevated MASP1 levels in patients with type 2 diabetes48,49. Colocalization analysis supported all these associations (Supplementary Table 11).

Our analysis identified 24 proteins associated with missense variants in various genes, including 11 linked to APOE, a key gene in Alzheimer’s and cardiovascular disease. Specifically, the APOE-ε4 allele (rs7412-C and rs429358-C), a major Alzheimer’s disease risk factor, was associated with lower plasma levels of SPTBN4, a brain-expressed protein implicated in neurodevelopmental disorder50. Therefore, we speculated that there is a downregulation of SPTBN4 in Alzheimer’s disease, which is indeed supported by previous reports and its epigenetic silencing in patients with Alzheimer’s disease51,52. Additionally, our data suggests that APOE-ε4 allele’s link to depression53 may involve elevated HTR1D levels, a target of approved treatments for depression and anxiety. These observations offer additional hypotheses for understanding disease mechanisms and potentially alternative therapeutic strategies.

In addition, we performed systematic two-sample Mendelian randomization using the top cis-pQTLs from 267 proteins on 47 cardiometabolic GWAS outcomes including obesity, diabetes, atherosclerotic cardiovascular disease, metabolic dysfunction-associated steatohepatitis, Alzheimer’s disease and chronic kidney disease (Methods and Supplementary Note 7). This analysis reported 345 causal relationships between 106 proteins and 36 traits related to these six highly prevalent diseases (P < 2.3 × 10−6) (Extended Data Fig. 7a and Supplementary Table 12). Of these, 101 (29%) causal relationships between 41 genes and 33 traits were further validated by colocalization (posterior probability above 70%) (Fig. 5g,h, Supplementary Table 11 and Methods). Interestingly, our data causally connect SHH to height, which may be connected to its crucial role in embryonic development. These results illustrate how integrating high-confidence pQTLs with GWAS results helps to understand the molecular mechanisms between variant–disease and variant–trait associations.

Highly replicated pQTLs in children and adults

We assessed the replication rate of the pQTLs in 1,000 children and adolescents and 558 adults, respectively. Around 90% of the pQTLs were eligible for replication owing to independent quality control on proteomics and genomics data (Supplementary Note 8). Of these, we successfully replicated 97% of pQTLs in children (99% of the cis, 92% of the trans and 92% of the novel) and 91% in adults (92% of the cis, 88% of the trans and 90% of the novel) with nominal significance (P < 0.05) (Supplementary Fig. 3a,b and Supplementary Tables 13 and 14). The high replication rate in adults suggests that the vast majority of detected pQTLs are not life-stage-specific. To detect such potential pQTLs would probably require larger cohorts with similar sizes and health status. Furthermore, the direction and magnitude of effects aligned well between the discovery and replication cohorts (Pearson’s r = 0.97 and 0.98; Fig. 6a,b), with larger effects observed in the replicated pQTLs (Fig. 6c).

We next asked whether pQTL information could improve biomarker performance, building on our previously reported biomarkers for liver fibrosis, inflammation and steatosis5. For half of these biomarkers, including TGFBI and LBP, which showed the largest genetic effect sizes, we identified and replicated pQTLs (Fig. 6d,e). Our data revealed that TGFBI protein levels shifted depending on its corresponding pQTL in both disease and control groups (Fig. 6f). Incorporating genotype information further improved the accuracy of classifying patients with fibrosis stages F0–F1 versus F2–F4 using TGFBI (Extended Data Fig. 8a). Similarly, we observed a marked decrease in LBP levels associated with the rs2232613 variant (Fig. 6g and Extended Data Fig. 8b). These findings suggest that pQTLs should be integrated into biomarker research, especially for proteins with strong genetic effects, although their impact on classification performance should be evaluated individually.

Discussion

In this large MS-based plasma proteomics study, we analyzed the plasma proteomes from over 3,000 children and adolescents, mapping highly specific and quantitative protein trajectories throughout childhood and adolescence, including puberty-related differences between male and female adolescents. Prior studies have focused on infancy11, childhood10 or adulthood9; our study fills a gap by providing insights into the pre-pubertal and post-pubertal stages.

Variance decomposition revealed varying contributions from genetic variants, demographic and health factors to pediatric plasma proteome variation. Notably, about 70% of quantified proteins were significantly regulated, with over one-third influenced by SNPs. These associations enabled accurate predictions of age, BMI or the genotype for a few SNPs from the plasma proteome. Our study provides a valuable resource for pediatric research and biomarker studies, and it is available for further exploration at proteomevariation.org.

We identified proteins that to the best of our knowledge were not previously linked to childhood obesity, including decreased levels of brain-enriched OLFM1, heart muscle-enriched NEBL, intestine and pancreas-enriched ANPEP and skeletal muscle-enriched FHL3. UMOD, a kidney-specific protein typically abundant in urine, was also reduced in children with obesity. Although its physiological role in serum is unclear, UMOD may have anti-inflammatory effects54 and is inversely associated with metabolic syndrome and type 2 diabetes in the older population55,56. We speculate that lower UMOD levels may contribute to the chronic low-grade inflammation observed in obesity. These findings facilitate hypothesis generation and provide insights into early metabolic dysregulation involving extrahepatic organs.

Prior studies have demonstrated the importance of large sample sizes in pQTL studies15. As far as we know, ours is the largest of its kind, although it remains moderate compared to affinity-based proteomics studies. Nevertheless, we identified pQTLs for over one-third of the quantified plasma proteome with robust peptide-level evidence, including hundreds of novel pQTLs. Our findings suggest that technological improvements in MS-based plasma proteomics will probably enhance the genetic associations detected, even within current studies. Thus, expanding proteome depth without compromising throughput or accuracy is crucial, which may be achievable with emerging workflows combining depletion, multiplexing and further improved MS acquisition schemes.

Focusing on the Danish population, this study minimized confounding effects from population stratification, genetic admixture and environmental variation, potentially enhancing the identification and replication of pQTLs. It will be interesting to extend this work to other populations.

We replicated known pleiotropy of the ABO and APOE locus, identifying both known and novel trans-pQTLs for 13 and 10 proteins, respectively. Variants in the ABO locus regulated proteins critical for homeostasis, including VWF for platelet adhesion, ACE for blood pressure regulation, adhesion proteins like PTPRM, ICAM2, CDH1, CDH5 and CDH13 as well as MBL2 and MASP1 involving innate immunity. We also identified previously unreported pQTLs in the ABO locus for PTPRM, CDH13 and MASP1 along with another potential pleiotropic locus in AKNA, which regulated proteins involved in inflammation and immune response, including ORM1, ORM2, S100A8 and S100A and TRAV17.

According to Medawar’s mutation accumulation theory, genetic regulation is more robust in early life owing to stronger selective pressures57,58. This aligns with our observation that the proportions of variance explained by pQTLs remained relatively stable across the three age groups. The high replication rate in adults further suggests that pQTLs are largely stable between children and adults, despite influences like disease. However, it would be valuable to investigate how aging affects pQTL detection in the older population as environmental factors and age-related processes become more prominent. Prior studies have shown a 4.7% decline in detected expression QTLs in the blood of patients aged 70–80 years57 and that aging impacts the predictive power of expression QTLs differently across tissues59. This indicates that pQTL stability observed in younger populations may be altered as individuals age.

Our findings emphasize the need to tailor biomarker reference levels according to their associated pQTLs. We found that incorporating pQTL data improved the performance of diagnostic biomarkers, a benefit that could extend to prognostic and predictive biomarkers.

One of the limitations of our study is that it uses a cross-sectional design. A longitudinal approach would better capture age-dependent trajectories of plasma protein levels, reducing inter-individual variability. In addition, the participants’ ages follow a normal distribution with a mean of 12 years, and there is limited representation (n < 30) at the younger (age 5 years) and older (age 20 years) extremes. This may affect the accuracy of the age-dependent protein abundance trajectories at these age extremes. Furthermore, stringent quality control excluded nearly 90% of SNPs owing to low minor allele frequency or imputation quality, potentially omitting low-frequency pQTLs. A larger sample size could help lower the minor allele frequency threshold.

Existing large-scale pQTL studies have predominantly used affinity-based proteomics platforms optimized for body fluids. Although these platforms report the quantification of thousands of proteins, results often lack consistency across studies, and large-scale validation of binding reagent specificity, ideally by orthogonal methods such as MS17,18,19, is still needed. Furthermore, up to one-third of affinity proteomics pQTLs may be affected by epitope effects60. Although such artefactual pQTLs can be estimated through conditional and colocalization analysis, they cannot be directly assessed8,14.

By contrast, MS-based proteomics is highly specific and agnostic to sample type and species, allowing for direct elimination of artefactual pQTLs using peptide-level data. For instance, our peptide-based framework excluded artefactual pQTLs for four proteins: IL6ST (rs2228043), FLT4 (rs34221241), LYZ (rs1800973) and OSMR (rs34675408). The association between FLT4 and missense variant rs34221241 arose from a single peptide, flanking the variant amino acid Asn149Asp. Similarly, LYZ’s association with rs1800973 was caused by a peptide flanking Thr88Asn. These data helped flag artefactual pQTLs identified in affinity-based studies; for example, rs11739016 for IL6ST and rs34221241 for FLT4 (ref. 15).

Additionally, our data resolved discrepancies in pQTL effect direction, such as for APOE. The T-allele of rs7412 was negatively associated with APOE levels in an aptamer-based study of the Icelandic population61 but positively associated in an MS-based study of Han Chinese23, which aligns with our data in the Danish population. These discrepancies underscore the value of MS-based proteomics in validating and extending pQTL discoveries.

MS-based proteomics is not necessarily immune to the ‘epitope effect’17. A prior study using exome sequencing and MS found that nearly half of the associations were technical artefacts when using both protein and peptide data for discovery22. Our approach addresses this issue by using protein-level data for discovery and peptide-level data for verification, classifying pQTLs into confidence tiers. Alternatively, modifying the protein sequence database upfront can also reduce artefactual pQTLs62, although it may miss variants in linkage disequilibrium with untyped protein-altering variants. Our method complements these approaches by providing a comprehensive framework for pQTL validation.

To date, affinity-based studies have identified tens of thousands of pQTLs for thousands of circulating proteins, but the search remains incomplete, especially at the tissue and cell-type levels. Many proteins only leak into plasma and primarily function in tissues, where MS-based proteomics can quantify nearly complete proteomes. A few pioneering MS-based proteomics studies with a few hundred samples have already identified pQTLs in the brain and liver63,64,65. We propose expanding these efforts by developing deep and high-throughput workflows and ensuring access to genotyped samples. Analyzing specific cell types in tissues could also be envisioned66. Future research should also explore the validation of affinity-based pQTLs using MS-based methods. These efforts will enable fruitful downstream applications like colocalization and Mendelian randomization, aiding in causal inference and drug candidate identification67,68.

Methods

Ethical approval

The study protocol was approved by the ethics committee for the Region Zealand (protocol no. SJ-104) and is registered at the Danish Data Protection Agency (REG-043-2013). The HOLBAEK Study, including the obesity clinic cohort and the population-based cohort, is also registered at ClinicalTrials.gov (NCT00928473). The study was conducted according to the principles of the Declaration of Helsinki, and oral and written informed consent was obtained from all participants. An informed oral assent was given by the participant if the participant was younger than 18, and the parents gave informed written consent.

Study participants

We included 2,147 children and adolescents (55% females, 45% males, with 17 missing values because of survey failure) aged 5–20 years from The HOLBAEK Study for pQTL discovery, and an additional 1,000 matched by age, sex and obesity status for replication (58% females, 42% males). The participants were recruited from two groups: the Children’s Obesity Clinic, Centre of Obesity Management offering the multidisciplinary childhood obesity management program at Copenhagen University Hospital Holbæk69, and a population-based cohort from schools across 11 municipalities in Zealand, Denmark70, as part of a cross-sectional study conducted between January 2009 and April 2019. Eligibility criteria for the obesity clinic group included individuals aged 5–20 years with a BMI above the 90th percentile (BMI-SDS ≥ 1.28) according to Danish reference values27. Exclusion criteria for both groups were diagnosed type 1 or type 2 diabetes, treatment with medications including insulin, liraglutide and/or metformin or meeting type 2 diabetes criteria71 based on the blood sample taken for this study (fasting plasma glucose of >7.0 mmol l−1 and/or hemoglobin A1c (HbA1c) > 48 mmol mol−1). Tanner stage72,73 was evaluated by a pediatrician for individuals recruited at the obesity clinic and self-evaluated using a questionnaire with picture pattern recognition for individuals in the population-based group. The Danish population in this study was defined based on residency in Denmark.

Plasma proteomics

We prepared 2,241 plasma samples in the discovery cohort (including 94 quality assessment samples (pooled plasma samples)) across six batches, and 1,043 plasma samples in the replication cohort (including 43 quality assessment samples) across four batches. Samples were randomized before proteomics sample preparation on an automated liquid handling system (Agilent Bravo) in a 96-well plate format (Supplementary Note 2)5,74. Peptides were partially eluted from Evotips with <35% acetonitrile and analyzed using the Evosep One liquid chromatography system (Evosep Biosystems) coupled online to an Orbitrap Astral mass spectrometer (Thermo Fisher Scientific)25,75. Eluted peptides were separated on an 8 cm-long PepSep column (150 µm inner diameter packed with 1.5 μm of Reprosil-Pur C18 beads (Dr. Maisch)) using a 21-min gradient and electrosprayed with a stainless emitter (30 µm inner diameter) at 1.9 kV. Data were acquired in DIA mode with full scan and tandem mass spectrum acquired in parallel by the Orbitrap and Astral mass analyzer, respectively. Full scans (380–980 m/z) were acquired at 240,000 resolution with a normalized automatic gain control target of 500% and 3 ms maximum injection time. Tandem mass spectra (150–2,000 m/z) were acquired at 80,000 resolution with automatic gain control of 500% and a maximum injection time of 5 ms. We used 200 3-Th isolation-window scanning from 380 to 980 m/z and fragmented ions using high-energy collision dissociation with 25% normalized collision energy.

Protein quantification was performed at the MS2 level using the Quant 2.0 algorithm in Spectronaut (v.18), with the ‘proteotypicity’ filter set to ‘only protein group specific’26. Default settings were used unless otherwise noted. Data filtering was set to ‘Qvalue’. ‘Cross-run normalization’ was enabled with the strategy of ‘local normalization’ based on rows with ‘Qvalue complete’. False discovery rate was set to 1% at both protein and peptide precursor levels. A spectral library generated using direct DIA from a subset of the samples in this study was used in the targeted analysis of DIA data against the human reference canonical proteome database (2023.05 release). We excluded samples with a protein count lower than the median minus three times the standard deviation across all runs (n = 2). The plasma proteomics dataset was filtered for 60% valid values across all samples (proteins with >40% missing values were excluded from downstream statistical analysis), log2 transformed with the remaining missing values imputed by drawing random samples from a normal distribution with downshifted mean by 1.8 and scaled standard deviation (0.3) relative to that of abundance distribution of all proteins in one sample. Specifically, a total of 2,498 proteins were quantified, filtering for 60% valid values across all samples, resulting in a dataset of 1,216 proteins with a data completeness of 91%. The resulting dataset was then corrected for sample preparation batches using ComBat (v.0.3.2)76.

Tissue specificity annotation was according to the Human Protein Atlas classification, which is based on transcriptomics data77. Biological processes and functions represented by the quantified proteins were mapped using Gseapy (v.1.1.1).

Association of plasma proteome with age, sex and BMI-SDS

Plasma protein levels were normalized by rank-based inverse normal transformation with a Python implementation (https://github.com/edm1/rank-based-INT). The default Blom offset of c = 3/8 was adopted. We used the following equation:

$${Y}_{i}^{t}={Phi }^{-1}left(frac{{r}_{i}-c}{N-2c+1}right)$$

where ri is the rank of the ith observation among the total number of N, and Φ−1 denotes the quantile function (or percent point function) implemented in SciPy (v.1.7.1; https://scipy.org/citing-scipy). We used multiple linear regression implemented in Pingouin78 (v.0.4.0) to estimate the effect of age, sex, BMI-SDS27, obesity and the interaction of obesity with BMI-SDS on the protein level. Associations were considered significant if the Benjamini–Hochberg-corrected P values were below 0.05. We observed that the largest variation in the plasma proteome is driven by platelet markers, and further investigation revealed unequal distribution of platelet marker levels across the sample collection period (Supplementary Fig. 1e–g), suggesting a systematic bias related to storage duration, although sampling-related technical factors cannot be ruled out. We included pubertal status and time to analysis (plasma sample storage) as covariates in the regression model. We further adjusted for principal component 1 to eliminate potential confounding. Pubertal status was dichotomized according to Tanner stage (Tanner 1, pre-pubertal; Tanner 2–5, pubertal or post-pubertal). The normality of the residuals was tested using the Shapiro–Wilk test implemented in the Python package Pingouin (v.0.5.4), which showed that residuals were normally distributed for 90% of the proteins after rank-based inverse normalized transformation. Linearity between protein levels and predictors was assumed for all proteins as a blanket approach but was not formally tested. Variance inflation factor analysis implemented in the Python package Statsmodels (v.0.13.0) showed no serious multicollinearity among predictors of primary interest (age, sex and BMI-SDS), with all variance inflation factors below ten and those for age and sex below five. Samples with missing values of variables in the regression analysis were excluded (n = 1,601 remained including 646 males and 955 females with a median age of 12 years).

Prediction of age and BMI using plasma proteins

We used the linear regression model from scikit-learn79 (v.1.0) to predict age and BMI (here we used the BMI without adjusting for age and sex). We split the dataset into training (70%) and test sets (30%), with the training set further divided into training and validation sets (at a ratio of 70:30). Features were ranked by their absolute correlation coefficients with the outcome. We trained models with an increasing number of features (1–200) and evaluated them on the validation set to determine the optimal number of features based on an incremental decrease in mean squared error. The final model with the selected features was then evaluated on the held-out test set. Mean absolute error and Pearson’s r between the predicted values and real values in the test set were calculated to indicate prediction accuracy.

Hierarchical cluster analysis of protein trajectories

Unsupervised hierarchical clustering of age-associated proteins was performed using the Python seaborn package (v.0.12.2). Proteins that passed the Benjamini–Hochberg-corrected P value with an absolute coefficient above 0.06 were included. Row clustering was based on median log2-intensity after Z-score normalization across ages.

Genotyping and imputation in the discovery cohort

Participants in this study were genotyped in three batches on the Infinium HumancoreExome12 (v.1.0) and HumancoreExome24 (v.1.1) Beadchips (Illumina). Genotypes were called using the Genotype module of the GenomeStudio (Illumina). Before imputation, datasets from the three different batches were merged after quality control (only variants present on both chip versions were kept), and monomorphic variants and batch-associated variants were removed (Fisher’s exact test, P < 1 × 10−7). We used the Sanger imputation server to phase the genotype data using EAGLE2 (v.2.0.5) and impute it using PBWT with the HRC1.1 panel (GRCh37). We excluded individuals with more than 5% missing genotypes, with heterozygosity that was too high or too low (inbreeding coefficient abs(F) > 0.2), duplicated measurements (keeping the one with higher quality) as well as individuals of non-European descent as determined using principal component analysis based on ancestry informative markers. All study samples whose Euclidean distance from the center fell outside a radius of >1.5× the maximum Euclidean distance of the European reference samples (the 1000 Genomes dataset) were considered non-European. We excluded SNPs with a call rate of <95% and actionable variants. We conducted additional quality control steps for genetic association analysis according to the guidelines80 using PLINK (v.1.90b6.24) and custom R scripts. In brief, we checked for sex discrepancy based on the X chromosome inbreeding coefficient and none had mismatches. We removed SNPs with an imputation INFO score of <0.7, minor allele frequency of <0.05 and SNPs that deviated from Hardy–Weinberg equilibrium (P < 1 × 10−6) as well as SNPs on the sex chromosomes. We removed individuals with high or low heterozygosity rates (individuals who deviate ±3 s.d. from the sample’s heterozygosity rate mean), resulting in a final dataset of 5,242,958 SNPs and 1,924 individuals (846 males, 1,078 females).

Genome-wide association analysis

For each protein, we adjusted rank-based inverse normal transformed levels for age, sex, BMI-SDS, binary obesity status, obesity × BMI-SDS interaction, PC1 and plasma sample storage time. We standardized the residuals again using rank-based inverse normal transformation and used the standardized values as phenotypes and genotyping arrays as covariates for genome-wide association testing using a univariate linear mixed model implemented in GEMMA81 (v.0.98.5). We calculated the centered relatedness matrix to control for cryptic relatedness and population stratification. In total, data from 1,909 individuals passed the genotype data quality control and included proteomics and covariate data. We used the Wald test to compute all P values. The genomic inflation factor from GWAS results was calculated for each protein using the open-source Python script compute_lambda.py (v.2.0).

Annotation of SNPs

SNP effects were annotated using the Ensemble Variant Effect Predictor with the RefSeq transcript database. The nearest gene of the SNPs is annotated using GeneLocator (v.1.1.2), a Python package that returns a list of genes or overlapping genes in which the SNP is included, or the gene whose start or end is closest to the specified coordinates in the case a SNP does not fall within any genes.

Eliminating artefactual pQTLs

Protein-altering variants can create artefactual pQTLs in both affinity-based and bottom-up MS-based proteomics. In bottom-up MS-based proteomics, this occurs because the reference proteome database lacks variant protein versions17. Consequently, peptides containing variant amino acids are undetected, and only the reference version is quantified in individuals with the reference allele, creating artificial differences in observed peptide abundances, potentially biasing protein quantification if the reference peptide is used.

Bottom-up MS-based proteomics provides multiple measurements at the tryptic peptide level per protein, offering greater granularity for protein quantification. We leveraged this by assessing pQTLs using peptide-level data and categorizing genuine pQTLs into confidence tiers based on the number of supporting peptides. We performed genome-wide association analysis on approximately 10,000 quality-controlled and protein-group-specific tryptic peptides; that is, peptides unique to a protein group in the context of the canonical human protein sequence database. A peptide is considered significantly associated with a variant if the P value is below the threshold after adjusting for the number of primary variant–protein associations (2.4 × 10−5).

  • Tier 1: a variant–protein association with at least two supporting peptides showing directionally concordant associations;

  • Tier 2: a variant–protein association with only one supporting peptide, the SNP is non-synonymous and does not reside within the protein-coding gene;

  • Tier 3: a variant–protein association with only one supporting peptide, the SNP is non-synonymous and resides within the protein-coding gene. However, the SNP-encoded single amino acid variant should not be part of the supporting peptide;

  • Tier 4: a variant–protein association with only one supporting peptide, and the SNP is synonymous.

For tiers 2–4, the total number of peptides available for testing must not exceed three, as the SNP–protein associations in these tiers are supported by only one peptide.

Decomposition of variance in pediatric plasma protein levels

We estimated the variance explained by various predictors using a stepwise addition approach in a multiple regression model. Predictors were added in this order: independent pQTLs, sex, age, obesity status, BMI-SDS and the interaction between BMI-SDS and obesity status. Incremental sums of adjusted squares were calculated at each step to determine the additional variance explained.

Effect size of pQTLs

In addition to beta statistics derived from the association test, we determined pQTL effect sizes by calculating the ratio of mean protein abundance in heterozygotes (1/0) and homozygotes (1/1) relative to wild type (0/0), without data transformation.

Prediction of genotype based on protein levels

We used QLattice82, a symbolic regression method implemented in the Feyn Python module (v.3.0.1) to predict the presence of variant alleles based on protein levels in the derivation cohort. The model was allowed to include age and sex as predictors. The derivation cohort was randomly split into a training set (70%) and a validation set (30%), stratified by genotype. Classification performance was evaluated on the validation set, which was excluded from training.

Comparison with previous pQTL studies

To assess the novelty of the identified pQTLs, we compared our results at genome-wide significance level to 35 previously published pQTL studies, encompassing over 100,000 pQTLs for 5,000 distinct genes from various proteomics platforms, primarily SomaScan and Olink panels. For all studies, we retained the pQTLs at the reported significance levels. We defined novelty if no variants within ±1 Mb of our primary pQTLs had been previously reported for the corresponding protein. Replication was defined otherwise. Linkage disequilibrium was not considered in this analysis. A comparison was done at the protein level by matching the reported gene name from each study. Gene names are mapped based on Uniprot IDs through Uniport ID mapping in case of missing values. Genome coordinates based on GRCh38 were converted to GRCh37 using pyliftover (v.0.4).

Mapping pQTLs to GWAS catalog results

We mapped the primary pQTLs to GWAS Catalog results (v.1.0.2) through SNP IDs to identify any associations with disease traits. Linkage disequilibrium was not considered in this analysis.

Two-sample Mendelian randomization

We performed systematic Mendelian randomization on 47 cardiometabolic GWAS outcomes spanning obesity, diabetes, atherosclerotic cardiovascular disease, metabolic dysfunction-associated steatohepatitis, Alzheimer’s disease and chronic kidney disease. Mendelian randomization was calculated with the Wald ratio method (TwoSampleMR v.0.5.6)83 to test whether the top cis-pQTLs (variant with the lowest P value) were causally linked to any GWAS outcomes, with significance defined as Wald ratio Mendelian randomization P < 2.5 × 10−6 (correcting for the number of protein-coding genes).

Colocalization analysis

We performed pairwise colocalization analysis using HyPrColoc (R package hyprcoloc v.1.0.0)84. The prior probability of a SNP being causal to one trait in a region was set to 1 × 10−4 by default. A colocalization signal was defined by a regional probability of ≥0.6 and a posterior probability of ≥0.7 with non-uniform priors.

Statistics and reproducibility

Sample size was not predetermined statistically, as this depends on the proteomics workflow throughput, but our sample sizes are similar to those reported in previous publications38,85,86. Samples failing proteomics quality control (n = 2; Methods) were excluded to minimize potential analytical biases. Additionally, we excluded participants with genomic data not meeting quality control standards and non-European ancestry (Methods) for population homogeneity. Plasma samples were randomized, and experimenters were blinded to participants’ genotype data but not age, BMI or sex.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Related Articles

Type 2 immunity in allergic diseases

Significant advancements have been made in understanding the cellular and molecular mechanisms of type 2 immunity in allergic diseases such as asthma, allergic rhinitis, chronic rhinosinusitis, eosinophilic esophagitis (EoE), food and drug allergies, and atopic dermatitis (AD). Type 2 immunity has evolved to protect against parasitic diseases and toxins, plays a role in the expulsion of parasites and larvae from inner tissues to the lumen and outside the body, maintains microbe-rich skin and mucosal epithelial barriers and counterbalances the type 1 immune response and its destructive effects. During the development of a type 2 immune response, an innate immune response initiates starting from epithelial cells and innate lymphoid cells (ILCs), including dendritic cells and macrophages, and translates to adaptive T and B-cell immunity, particularly IgE antibody production. Eosinophils, mast cells and basophils have effects on effector functions. Cytokines from ILC2s and CD4+ helper type 2 (Th2) cells, CD8 + T cells, and NK-T cells, along with myeloid cells, including IL-4, IL-5, IL-9, and IL-13, initiate and sustain allergic inflammation via T cell cells, eosinophils, and ILC2s; promote IgE class switching; and open the epithelial barrier. Epithelial cell activation, alarmin release and barrier dysfunction are key in the development of not only allergic diseases but also many other systemic diseases. Recent biologics targeting the pathways and effector functions of IL4/IL13, IL-5, and IgE have shown promising results for almost all ages, although some patients with severe allergic diseases do not respond to these therapies, highlighting the unmet need for a more detailed and personalized approach.

Iron homeostasis and ferroptosis in muscle diseases and disorders: mechanisms and therapeutic prospects

The muscular system plays a critical role in the human body by governing skeletal movement, cardiovascular function, and the activities of digestive organs. Additionally, muscle tissues serve an endocrine function by secreting myogenic cytokines, thereby regulating metabolism throughout the entire body. Maintaining muscle function requires iron homeostasis. Recent studies suggest that disruptions in iron metabolism and ferroptosis, a form of iron-dependent cell death, are essential contributors to the progression of a wide range of muscle diseases and disorders, including sarcopenia, cardiomyopathy, and amyotrophic lateral sclerosis. Thus, a comprehensive overview of the mechanisms regulating iron metabolism and ferroptosis in these conditions is crucial for identifying potential therapeutic targets and developing new strategies for disease treatment and/or prevention. This review aims to summarize recent advances in understanding the molecular mechanisms underlying ferroptosis in the context of muscle injury, as well as associated muscle diseases and disorders. Moreover, we discuss potential targets within the ferroptosis pathway and possible strategies for managing muscle disorders. Finally, we shed new light on current limitations and future prospects for therapeutic interventions targeting ferroptosis.

Integrated proteogenomic characterization of ampullary adenocarcinoma

Ampullary adenocarcinoma (AMPAC) is a rare and heterogeneous malignancy. Here we performed a comprehensive proteogenomic analysis of 198 samples from Chinese AMPAC patients and duodenum patients. Genomic data illustrate that 4q loss causes fatty acid accumulation and cell proliferation. Proteomic analysis has revealed three distinct clusters (C-FAM, C-AD, C-CC), among which the most aggressive cluster, C-AD, is associated with the poorest prognosis and is characterized by focal adhesion. Immune clustering identifies three immune clusters and reveals that immune cluster M1 (macrophage infiltration cluster) and M3 (DC cell infiltration cluster), which exhibit a higher immune score compared to cluster M2 (CD4+ T-cell infiltration cluster), are associated with a poor prognosis due to the potential secretion of IL-6 by tumor cells and its consequential influence. This study provides a comprehensive proteogenomic analysis for seeking for better understanding and potential treatment of AMPAC.

Targeting of TAMs: can we be more clever than cancer cells?

With increasing incidence and geography, cancer is one of the leading causes of death, reduced quality of life and disability worldwide. Principal progress in the development of new anticancer therapies, in improving the efficiency of immunotherapeutic tools, and in the personification of conventional therapies needs to consider cancer-specific and patient-specific programming of innate immunity. Intratumoral TAMs and their precursors, resident macrophages and monocytes, are principal regulators of tumor progression and therapy resistance. Our review summarizes the accumulated evidence for the subpopulations of TAMs and their increasing number of biomarkers, indicating their predictive value for the clinical parameters of carcinogenesis and therapy resistance, with a focus on solid cancers of non-infectious etiology. We present the state-of-the-art knowledge about the tumor-supporting functions of TAMs at all stages of tumor progression and highlight biomarkers, recently identified by single-cell and spatial analytical methods, that discriminate between tumor-promoting and tumor-inhibiting TAMs, where both subtypes express a combination of prototype M1 and M2 genes. Our review focuses on novel mechanisms involved in the crosstalk among epigenetic, signaling, transcriptional and metabolic pathways in TAMs. Particular attention has been given to the recently identified link between cancer cell metabolism and the epigenetic programming of TAMs by histone lactylation, which can be responsible for the unlimited protumoral programming of TAMs. Finally, we explain how TAMs interfere with currently used anticancer therapeutics and summarize the most advanced data from clinical trials, which we divide into four categories: inhibition of TAM survival and differentiation, inhibition of monocyte/TAM recruitment into tumors, functional reprogramming of TAMs, and genetic enhancement of macrophages.

Stromal architecture and fibroblast subpopulations with opposing effects on outcomes in hepatocellular carcinoma

Dissecting the spatial heterogeneity of cancer-associated fibroblasts (CAFs) is vital for understanding tumor biology and therapeutic design. By combining pathological image analysis with spatial proteomics, we revealed two stromal archetypes in hepatocellular carcinoma (HCC) with different biological functions and extracellular matrix compositions. Using paired single-cell RNA and epigenomic sequencing with Stereo-seq, we revealed two fibroblast subsets CAF-FAP and CAF-C7, whose spatial enrichment strongly correlated with the two stromal archetypes and opposing patient prognosis. We discovered two functional units, one is the intratumor inflammatory hub featured by CAF-FAP plus CD8_PDCD1 proximity and the other is the marginal wound-healing hub with CAF-C7 plus Macrophage_SPP1 co-localization. Inhibiting CAF-FAP combined with anti-PD-1 in orthotopic HCC models led to improved tumor regression than either monotherapy. Collectively, our findings suggest stroma-targeted strategies for HCC based on defined stromal archetypes, raising the concept that CAFs change their transcriptional program and intercellular crosstalk according to the spatial context.

Responses

Your email address will not be published. Required fields are marked *