Associations between common genetic variants and income provide insights about the socio-economic health gradient

Main

Income is a crucial determinant of individuals’ access to resources and overall quality of life. Extensive evidence shows that income is positively correlated with subjective well-being, overall health and life expectancy1,2,3,4,5. For instance, the gap in life expectancy between the richest and poorest 1% of individuals in the USA has been estimated to be 14.6 years for men (95% confidence interval (CI), 14.4 to 14.8 years) and 10.1 years for women (95% CI, 9.9 to 10.3 years)6. Notably, higher income is associated with increased longevity and well-being across the entire income distribution, highlighting its broad relevance in current society3,6,7.

Income is a complex phenotype influenced by many factors, including environmental conditions and education8,9. Parents’ socio-economic status (SES) shapes a child’s developmental trajectory, including their skills, behaviours, educational attainment (EA), career prospects and eventual adult income10,11. Moreover, certain heritable individual characteristics, such as cognitive ability and personality traits12,13,14, are well-known predictors of income within contemporary societies in Europe, North America and Australia. Twin studies have estimated income heritability in these societies to be around 40–50% (refs. 15,16,17). However, the heritabilities of income and its associated genes are not fixed; rather, they reflect social realities shaped by technological, institutional and cultural factors18. These factors are malleable and vary across different regions and historical epochs, which can lead to fluctuations in heritability estimates for SES over time19,20 and imperfect genetic correlations across samples21.

The results from statistically well-powered genome-wide association studies (GWASs) of SES present numerous opportunities to shed light on these social realities. For example, they allow investigating questions about sex differences in labour market processes, cross-country comparisons of the genetic architecture of income and investigating the processes contributing to intergenerational social mobility22. They also facilitate studies investigating the interaction effects between genetic and environmental factors. Furthermore, they enable the exploration of genetic correlations between income and health outcomes, potentially unveiling new insights into the positive relationship between socio-economic status and health outcomes (the socio-economic health gradient).

Two previous GWASs have been conducted on household income23,24. The first was in a sample of 96,900 participants from the initial release of the UK Biobank (UKB)25 and found two loci. The second was carried out in the full release of the UKB with 286,301 individuals and found 30 approximately uncorrelated loci. A meta-analysis of these results with the genetically correlated trait EA increased the effective sample size to 505,541 individuals and identified 144 loci. A recent GWAS on occupational status in the UKB data identified cognitive skills, scholastic motivation, occupational aspiration, personality traits and behavioural disinhibition (proxied by attention deficit hyperactivity disorder) as potential mediating factors linking genetics to occupational status26.

Building on these earlier contributions, we conducted a GWAS leveraging multiple income measures. We ran sex-stratified analyses and meta-analysed results from 32 cohorts across 12 countries (Australia, Croatia, Denmark, Estonia, Finland, Germany, the Netherlands, Norway, Sweden, Switzerland, the United Kingdom and the USA) and three continents, yielding the largest GWAS on income to date with an effective sample size of 668,288 (Table 1). Due to data availability and statistical power considerations, our analyses and conclusions are restricted to individuals carrying genotypes most similar to the EUR panel of the 1000 Genomes dataset, as compared with individuals sampled elsewhere in the world (1KG-EUR-like individuals).

Table 1 GWAS summary
Full size table

The greater statistical power of our GWAS enabled us to conduct a series of follow-up analyses that investigate the socio-economic health gradient from a genetic perspective. In particular, we leveraged the data to compare the GWAS results for income and EA to disentangle their unique genetic correlates with health. Furthermore, our multi-sample approach and sex-specific GWAS results allowed us to test for possible differences in the genetic architecture of income across samples and sexes.

For a less technical description of the paper and how it should—and should not—be interpreted, see the Frequently Asked Questions (FAQ) section in the Supplementary Information and Box 1.

Multivariate GWAS of income

GWAS of four different measures of income

We used four measures of income (individual, occupational, household and parental income) and conducted a GWAS meta-analysis of their shared genetic basis (Table 1). Supplementary Information Section 2.1 discusses the differences between these measures and their relative advantages and disadvantages as proxies for individual income. Dropping parental income from the meta-analysis leads to a slight statistical power decrease but does not qualitatively change our results.

A sex-stratified GWAS was carried out on each available income measure in each cohort, using at least the first 15 genomic principal components (PCs) to control for population stratification. Inflation, business cycle, age effects and other potential confounds were controlled for at the cohort level by using dummy variables (see the preregistered analysis plan, section 6, at https://osf.io/rg8sh/). We restricted our analyses to 1KG-EUR-like individuals who were not currently enrolled in an educational programme or who were aged above 30 if their current enrolment status was unknown. The natural log transformation was applied to the income measures. We applied standardized quality control procedures to each cohort-level result (see Supplementary Information Section 2.4 for details). For each sex and each income measure, we performed a sample-size-weighted meta-analysis with METAL27. We then meta-analysed the male and female results for each income measure using the multi-trait analysis of genome-wide association summary statistics method (MTAG)28, which accounts for any potential genetic relatedness between the male and female samples.

The four income measures’ pairwise genetic correlation (rg) estimates demonstrated substantial shared genetic variance, with all pairwise rg values at least 0.8 (Fig. 1).

Fig. 1: Genetic correlations between income measures.
figure 1

LDSC estimates of pairwise genetic correlations between the four input income measures, the meta-analysed income (Income Factor) and EA. The diagonal elements report SNP heritabilities from LDSC. The standard errors are reported in parentheses. Some of the results were out-of-bound estimates (exceeding 1.2).

Full size image

The Income Factor

Next, we meta-analysed the association results across the four income measures using MTAG (see Supplementary Information Section 2.5 for details). We observed that the MTAG procedure yields nearly identical results to the common factor function in genomic structural equation modelling29. Thus, we hereafter refer to the meta-analysed income as ‘the Income Factor’. Since MTAG already applies a bias correction with the intercept from linkage disequilibrium score regression (LDSC)30, we did not apply further adjustments for cryptic relatedness and population stratification.

The Income Factor GWAS was estimated to have an effective sample size (Neff) of 668,288, on the basis of occupational income’s heritability scale (Neff = 1,198,347 on the basis of individual income). The genetic correlation between individual income and the Income Factor is indistinguishable from 1 (Fig. 1).

Identification of genetic loci

Across the four GWASs on different income measures, we identified 86 non-overlapping loci in the genome (see Supplementary Information Section 2.6 for the definition of loci and lead single nucleotide polymorphisms (SNPs), and Extended Data Fig. 1a for the distribution of associated loci across the four income traits). Table 1 summarizes the results. Occupational and household income showed the most genetic associations (59 and 41 loci, respectively), as expected on the basis of sample sizes and SNP-based heritability estimates based on LDSC (occupational: h2 = 0.08, s.e. = 0.003; household: h2 = 0.06, s.e. = 0.003). Gene-based analysis was performed on the genes that overlapped with each locus using multi-marker analysis of genomic annotation (MAGMA), where 102 attained genome-wide significance, with 63 being unique to occupational income, 24 unique to household income and 55 shared between the two. No other genes attained statistical significance (Extended Data Fig. 1b).

The meta-analysis across the income measures led to a substantial increase in power, which allowed us to identify 162 loci tagged by 207 lead SNPs (Fig. 2). Of these loci, 88 were newly identified compared with the previously published GWAS household income result conducted in the UKB24. The genetic correlation of the previous household income GWAS result was 0.92 (s.e. = 0.008) with the Income Factor and 0.94 (s.e. = 0.006) when we restricted our analysis to only our household income measure.

Fig. 2: Multivariate GWAS of income.
figure 2

Manhattan plot presenting the GWAS results for the Income Factor. Unadjusted two-sided Z-test. P values are plotted on the −log10 scale. The red crosses indicate the lead SNPs found from FUMA (r2 < 0.1). The horizontal dashed line indicates genome-wide significance (P < 5 × 108).

Full size image

Furthermore, we conducted conditional and joint association analysis using the 207 lead SNPs associated with the Income Factor31, revealing 57 secondary lead SNPs (P ≤ 5 × 10−8). Of these secondary lead SNPs, 55 were located within the original primary genomic loci (Supplementary Table 30 and Supplementary Information Section 2.6).

Effect sizes

The effect sizes of the lead SNPs were small across all analyses. For example, when we adjusted for the statistical winner’s curse in the Income Factor results, one additional count in the effect allele of the median lead SNP was associated with an increase in income of 0.30%. These effect size calculations require an assumption about the standard deviation of the dependent variable because MTAG yields standardized effect size estimates; we used the standard deviation estimate of log hourly occupational wage from the UKB, which is 0.35. The estimated effects at the 5th and 95th percentiles of the SNP effect size distribution were 0.18% and 0.60%, respectively (Supplementary Information Section 2.7). To put these estimates into perspective, the median annual earnings of full-time workers in the USA was U$56,473 in 202132. A 0.3% increase would equal an additional annual income of US$169 (95% CI, US$102 to US$339). In terms of the variance explained (R2), all of the lead SNPs had R2 lower than 0.011% after adjustment for the statistical winner’s curse (Supplementary Fig. 1).

Cross-sex and cross-country heterogeneity

The heritability of income and its genetic associations may vary across different social environments or different groups within an environment. To investigate the potential heterogeneity of genetic associations with income, we examined cross-cohort genetic correlations. We found that the inverse-variance-weighted mean genetic correlations across pairs of cohorts were 0.45 (s.e. = 0.22) for individual income, 0.52 (s.e. = 0.13) for household income and 0.90 (s.e. = 0.24) for occupational income (Supplementary Table 28a–c).

Next, we meta-analysed cohorts from the same country with the same income measure available and estimated the genetic correlations across these countries (Estonia, the Netherlands, Norway, the United Kingdom and the USA; Extended Data Fig. 2a). For most country pairs, the genetic correlation of the same income measure is >0.8. While meta-analysis increases statistical power and yields more precise estimates of the average effect size, it also tends to mask non-random heterogeneity in effect size estimates across samples. Despite this latter point, we found that occupational income in Norway displayed lower genetic correlations with occupational or household income in other countries, ranging from 0.43 (s.e. = 0.23) to 0.82 (s.e. = 0.10). Similarly, occupational income’s genetic correlation with EA was also lower in Norway (rg = 0.69, s.e. = 0.08) than in the other countries. These findings align with phenotypic evidence that ranks Norway the lowest among Organisation for Economic Co-operation and Development countries in terms of financial returns for obtaining a college degree33.

We then investigated whether the large number of samples from the United Kingdom in our meta-analysis could have skewed our results. To address this, we conducted a separate meta-analysis procedure for the UK and non-UK cohorts, comprising participants from ten countries. We obtained two distinct GWAS results for the Income Factor and found a perfect genetic correlation of 1.001 (s.e. = 0.03) between them. Thus, the average effect sizes of SNPs associated with the Income Factor are almost identical in UK and non-UK cohorts.

We observed slight between-sex heterogeneity in the genetic associations of income, as supported by the evidence presented in Extended Data Fig. 2b. The estimated between-sex genetic correlations based on meta-analysed GWAS results for individual, occupational and household income were 1.06 (s.e. = 0.32), 0.91 (s.e. = 0.03) and 0.95 (s.e. = 0.03), respectively. Notably, the latter two estimates were statistically distinguishable from unity but remained above 0.9. Most cohort-specific cross-sex genetic correlations for income are too noisy to be interpreted (Supplementary Table 17b–d). One exception is the UKB sample, which shows a non-perfect genetic correlation between men and women for occupational income (rg = 0.91, s.e. = 0.03). Another exception is the Danish iPsych cohort, where we estimated a genetic correlation of 0.76 (s.e. = 0.10) between maternal and paternal income. These findings are consistent with the hypothesis that men and women face non-identical labour market conditions. The lower genetic correlation between maternal and paternal income suggests that differences in labour market conditions were more pronounced in previous generations.

We also conducted the Income Factor GWAS for the male and female results separately and found that their genetic correlation was statistically indistinguishable from 1 (rg = 0.98, s.e. = 0.02).

Comparison with EA

Genetic correlation with EA

To compare the GWAS results for the Income Factor with those for EA, we first conducted an auxiliary GWAS on EA to obtain the most-powered GWAS result for EA with the summary statistics currently available to us. We first carried out a GWAS of EA in the UKB, on the basis of the protocol of the latest EA GWAS (EA4)34. We then meta-analysed these GWAS results with the EA3 summary statistics21 that did not include the UKB, using the meta-analysis version of MTAG. While previous GWASs on income found somewhat inconsistent results on the genetic correlation between EA21,34 and income (rg = 0.90, s.e. = 0.04 (ref. 23) and rg = 0.77, s.e. = 0.02 (ref. 24)), with much greater precision, we found a high genetic correlation that is very close to the first reported estimate (rg = 0.917, s.e. = 0.006). Among the input income measures, the genetic correlation with EA was higher for occupational and parental income (rg = 0.95 and 0.92; s.e. = 0.01 and 0.05, respectively) and lower for individual and household income (rg = 0.81 and 0.82; s.e. = 0.07 and 0.01, respectively). Furthermore, 138 of 161 loci for the Income Factor overlapped with those for EA.

The rg estimate of 0.917 between the Income Factor and EA implies that only 1 − 0.9172 = ~16% of the genetic variance of the Income Factor would remain once the genetic covariance with EA was statistically removed.

GWAS-by-subtraction

We employed the GWAS-by-subtraction approach using genomic structural equation modelling29 to identify this residual genetic signal (referred to as ‘NonEA-Income’). We identified one locus of genome-wide significance for NonEA-Income, marked by the lead SNP rs34177108 on chromosome 16 (Extended Data Fig. 3c). This locus was previously found to be associated with vitamin D levels and cancer, as well as hair- and skin-related traits such as colour and sun exposure, possibly picking up on uncontrolled population stratification (Supplementary Tables 38–41).

Polygenic prediction

We conducted polygenic index (PGI) analyses with individuals of European ancestry in the Swedish Twin Registry (STR), which was not included in our meta-analysis. We chose the STR as the main prediction cohort because it has twins and administrative data on individual, occupational and household income. We also used the UKB siblings (UKB-sib) and the Health and Retirement Study (HRS) from the USA as prediction cohorts. For the UKB-sib, occupational and household income measures were available. For the HRS, a self-reported individual income measure was available. In the STR and the UKB-sib cohorts, except when examining within-family prediction, we randomly selected only one individual from each family.

After generating hold-out versions of GWASs on the Income Factor and EA to remove the sample overlap with each prediction sample, we constructed PGIs for the Income Factor and EA using LDpred2 (ref. 35). Before conducting prediction analyses, we residualized the log of income on demographic covariates, including a third-degree polynomial of age, the year of observation and interactions with sex. We measured the prediction accuracy as the incremental R2 from adding the PGI to a regression of the phenotype on a set of baseline covariates, which were the top 20 genetic PCs and genotype batch indicators.

A cohort-specific upper bound for the theoretically possible predictive accuracy of PGIs on income can be obtained by the GREML36 estimate of the SNP-based heritability of income, which is close to 10% for the available income measures in the STR and UKB-sib samples (Supplementary Table 13). The actual predictive accuracy of PGIs for income is lower than the theoretical maximum, primarily due to finite GWAS sample size but also due to imperfect genetic correlations across meta-analysed cohorts and differences in measurement accuracy of income across samples37.

In the STR (Fig. 3), the Income Factor PGI predicted ΔR2 = 1.3% (95% CI, 1.0–1.6%) for individual income, 3.7% (95% CI, 3.1–4.2%) for occupational income and 1.0% (95% CI, 0.6–1.4%) for household income. The EA PGI had predictive accuracy results in a similar range for individual and household income, but not for occupational income, for which the accuracy was larger: ΔR2 = 4.7% (95% CI, 4.0–5.4%). Supplementary Fig. 2 shows average income levels per PGI quintile in the STR sample. The expected income of individuals increases monotonically for higher PGI quintiles. Predictive accuracy is the highest for individual income, the most accurate measure of income (derived from Swedish registry data). The difference in average income for individuals in the lowest and highest quintiles of the PGI distribution is ~0.2 standard deviations.

Fig. 3: Polygenic prediction of income measures.
figure 3

Polygenic prediction results in the STR, the UKB-sib and the HRS with PGIs for Income Factor and EA. Prior to fitting the regressions, each phenotype was residualized for demographic covariates (sex, a third-degree polynomial of age and interactions with sex) within each wave, and the mean of the residuals was obtained across the waves for each individual (only a single wave for the UKB-sib). Incremental R2 is the difference between the R2 from regressing the residualized outcome on the PGI and the controls (20 genetic PCs and genotyping batch indicators) and the R2 from a regression only on the controls. Only individuals of European ancestry were included, and one sibling from each family was randomly chosen: N = 24,946 (individual), 19,245 (occupational) and 15,655 (household) for the STR; 15,556 (occupational) and 18,303 (household) for the UKB-sib; and 6,171 (individual) for the HRS. The error bars indicate 95% CIs obtained by bootstrapping the sample 1,000 times.

Full size image

In the UKB-sib, the predictive accuracy of the Income Factor PGI was ΔR2 = 4.7% (95% CI, 4.3–5.2%) for occupational income and 3.9% (95% CI, 3.5–4.3%) for household income. The EA PGI achieved a better predictive accuracy for occupational income (ΔR2 = 6.9%; 95% CI, 6.3–7.4%) but only slightly a higher one for household income (ΔR2 = 4.4%; 95% CI, 3.9–4.8%). In terms of the coefficient estimates in the UKB-sib, a one-standard-deviation increase in the Income Factor PGI was associated with a 7.2% increase in occupational income (95% CI, 6.7–7.7%) and a 12.3% increase in household income (95% CI, 11.4–13.2%). These estimates are comparable to the effect of one additional year of schooling on income, whose estimates tend to range from 5% to 15% (refs. 8,9,38).

In the HRS, the Income Factor PGI had ΔR2 = 2.7% (95% CI, 2.1–3.3%) for predicting individual income, which was close to the EA PGI’s result (ΔR2 = 3.1%; 95% CI, 2.4–3.8%).

The predictive power of the Income Factor PGI approached zero once EA or the EA PGI was controlled for. In the UKB-sib, ΔR2 decreased below 1% for occupational and household income, while the estimates were still statistically different from zero (Extended Data Fig. 4 and Supplementary Table 21).

Although the income PGI is useful for population-level analyses, its predictive accuracy is far too low to make forecasts about the income of any specific individual (Supplementary Information FAQ Section 3.2). Furthermore, the predictive accuracy of our income PGI is substantially reduced from 4–5% among European-ancestry samples to 0–2% among African, Caribbean, Indian, East Asian and South Asian samples in the UKB (Supplementary Fig. 3 and Supplementary Information Section 5.3).

Direct versus indirect genetic effects

We estimated the share of the direct genetic effect in the overall population effect captured by the Income Factor PGI, following the recent approach that imputes parental genotypes from first-degree relatives34,39. Using the UKB-sib sample, we isolated the direct effect of the PGI from the population effect on occupational and household income by controlling for parental PGIs. We found that the ratio of direct effect to population effect estimates is 0.51 (s.e. = 0.05) for occupational income and 0.49 (s.e. = 0.05) for household income (Supplementary Table 22). These results imply that only 24.0% or 25.7% of the Income Factor PGI’s predictive power was due to direct genetic effects, which was very close to the result for the EA PGI estimated elsewhere (25.5%)39.

Income and health

Genetic correlations with psychiatric and health traits

We next explored the genetic correlations of the Income Factor, EA and NonEA-Income with phenotypes related to behaviours, psychiatric disorders and physical health (Fig. 4). LDSC estimates revealed that the genetic correlations of EA and the Income Factor largely align. However, noticeable differences emerged for traits in the psychiatric and psychological domains. Specifically, NonEA-Income is associated with a reduced risk for certain psychiatric disorders previously reported to correlate positively with EA40,41,42. These discrepancies were observed for schizophrenia (rg = −0.29, s.e. = 0.04), autism spectrum (rg = −0.27, s.e. = 0.06) and obsessive-compulsive disorder (rg = −0.22, s.e. = 0.08). One possible interpretation of these findings is that these psychiatric disorders have more severe negative effects on individual performance in the labour market than in the educational system.

Fig. 4: Genetic correlation estimates with health outcomes.
figure 4

Genetic correlation estimates of Income Factor, NonEA-Income and EA with health outcomes. Point estimates were obtained from LDSC and are displayed as dots. The whiskers show 95% CIs. The black asterisks indicate statistical significance of NonEA-Income at the FDR of 5%. The red asterisks indicate that the estimate is also significantly different from the estimate for EA at the FDR of 5%. The standard error for the difference was computed from jackknife estimates. Detailed results for all traits, including the sample size for each of the traits, is presented in Supplementary Table 23. ADHD, attention deficit hyperactivity disorder.

Full size image

Intriguingly, NonEA-Income exhibits a near-zero genetic correlation with cognitive performance (rg = 0.03, s.e. = 0.03). At the same time, both EA and the general Income Factor display strong positive genetic correlations with this factor (rg = 0.66, s.e. = 0.01 and rg = 0.63, s.e. = 0.01, respectively). This may suggest that high cognitive performance primarily influences income through education. Furthermore, this result is consistent with high income being attainable through social connections, inherited wealth, entrepreneurial activities or well-paying jobs that do not require high cognitive performance.

While EA and the general Income Factor have substantial negative genetic correlations with health-related behaviours such as age of smoking initiation, smoking persistence, cigarettes per day and alcohol dependence, we found that NonEA-Income has near-zero genetic correlations with these traits (although the latter have substantially larger error margins of the point estimates).

NonEA-Income also displayed genetic correlations with other phenotypes that are similar to those of EA. Specifically, NonEA-Income had negative genetic correlations with major depressive disorder (rg = −0.15, s.e. = 0.04), anxiety disorder (rg = −0.19, s.e. = 0.05) and the related trait of neuroticism (rg = −0.14, s.e. = 0.03), but positive genetic correlations with subjective well-being (rg = 0.32, s.e. = 0.06), general risk tolerance (rg = 0.13, s.e. = 0.04) and height (rg = 0.11, s.e. = 0.03). The differences in correlations for neuroticism, subjective well-being and risk tolerance were substantial when comparing EA and NonEA-Income, with NonEA-Income showing stronger positive correlations with well-being and risk tolerance and a less negative correlation with neuroticism (Supplementary Table 23).

Phenome-wide association study on electronic health records

Next, we conducted a phenome-wide association study of the Income Factor PGI on the basis of electronic health records from the UKB-sib’s holdout sample. We tested 115 diseases with sex-specific sample prevalence no lower than 1%. In total, 50 diseases from different categories were associated with the Income Factor PGI after Bonferroni correction and 14 after controlling for parental PGI (Fig. 5, Extended Data Fig. 5 and Supplementary Table 27a,b). In all cases, a higher Income Factor PGI value was associated with reduced disease risk, including reduced risk for hypertension, gastroesophageal reflux disease, type 2 diabetes, obesity, osteoarthritis, back pain and depression. The strongest association of a higher Income Factor PGI and a disease was found for essential hypertension.

Fig. 5: Phenome-wide association study of the Income Factor PGI (without parental PGI controls) in electronic health records for the UKB-sib sample.
figure 5

The genetic association of Income Factor PGI with 115 diseases from 15 categories without controlling for parental PGIs. The yellow boxes, with arrows pointing to the observations and −log10(P) values reported after the phenotypes, highlight diseases that are strongly associated with the Income Factor PGI (−log10(P) > 10). The P values were obtained via unadjusted two-sided Z-tests. The black and red dashed lines represent the threshold for statistical significance at P < 0.05. GERD, gastroesophageal reflux disease.

Full size image

Biological annotation

We used functional mapping and annotation of genetic associations (FUMA)43 to find genes implicated in the Income Factor GWAS. FUMA uses four mapping approaches: positional, chromatin interaction, expression quantitative trait locus mapping and MAGMA gene-based association tests. In total, 2,385 protein-coding genes were implicated by at least one of the methods, of which 225 genes were implicated by all four methods (Extended Data Fig. 6a). Only three of these commonly implicated genes were unique for the Income Factor, compared with the genes implicated in EA GWASs by at least one of the four methods or previously prioritized for EA21.

We then performed tissue-specific enrichment analyses using LDSC-SEG44 and MAGMA gene-property analyses45 (Supplementary Information Section 7). Both methods indicated dominant enrichment for tissues of the central nervous system (Extended Data Fig. 6b), consistent with the previous results for household income and EA21,24.

Next, we compared the genes identified with MAGMA for the Income Factor with those identified for EA and household income. We found that of the 368 genes associated with the Income Factor, 98 had not yet been discovered for EA or household income (Extended Data Fig. 7a and Supplementary Tables 32–34). We further examined the biological processes of genes associated with the Income Factor, EA and household income with FUMA GENE2FUNC. Using a test of overrepresentation, we found three biological processes at a false discovery rate (FDR) of <0.05 that are unique to the Income Factor: neuronal migration (FDR = 0.012), bone formation in early development (FDR = 0.036) and the formation of axons (FDR = 0.047). The overlap among biological processes detected for each trait at FDR < 0.05 is shown in Extended Data Fig. 7b (Supplementary Tables 35–37).

Discussion

We conducted the largest GWAS on income to date, incorporating individual, household, occupational and parental income measures. Our study design provided increased statistical power, identifying more genetic variants and improving the predictive power of the PGI compared with previous income GWASs. It also allowed for comprehensive additional analyses. Furthermore, we found a strong genetic correlation between income and EA.

Our analyses highlight numerous associations between better health and higher income that are influenced by genetic differences among individuals. These better health outcomes include lower body mass index, blood pressure, type 2 diabetes, depression and stress-related disorders. We note that the genetic overlap between income and health could be driven by different causal mechanisms, including pleiotropic effects of genes, limited income opportunities for individuals with health problems or health advantages for individuals with higher income. Investigating these causal mechanisms is outside the scope of this study.

Previous work examining the relationship between different measures of SES have found that household income, EA, occupational prestige and social deprivation all draw on similar underlying heritable traits46. Despite this general genetic factor of SES, our study demonstrates that trait-specific loci are also evident, indicating that income and EA capture heritable traits unique to each of them. Specifically, we estimate that 16% of the genetic variance in income is not shared with EA. The relevance of these income-specific genetic effects is underscored by several diverging relationships with health outcomes between EA and the genetic components of income not shared with EA (NonEA-Income). For example, the genetic correlation with schizophrenia differs between income and EA (income and schizophrenia: rg = −0.04, s.e. = 0.02; EA and schizophrenia: rg = 0.06, s.e. = 0.02; Supplementary Table 23). This divergence is even stronger when NonEA-Income is considered (schizophrenia and NonEA-Income: rg = −0.23, s.e. = 0.04). Furthermore, we found negative genetic correlations of NonEA-Income with bipolar disorder, autism and obsessive-compulsive disorder, while EA exhibits positive genetic correlations with these psychiatric outcomes. This may indicate that the educational system is more accommodating to individuals with these disorders than the labour market and/or that talents associated with these genetic risks (for example, higher IQ with autism47 or creativity with bipolar disorder and schizophrenia48) are more advantageous in school than in the labour market.

More generally, the genetic components of the NonEA-Income factor showed weaker associations with physical health and health-related behaviour, such as drinking and smoking, than those of EA. One possible interpretation of this finding is that better health outcomes of higher SES in wealthy countries could be due more to their association with education than with income or wealth, consistent with findings from quasi-experimental studies47,48,49.

While our GWAS results contribute to constructing an income-specific PGI with improved predictive accuracy, the EA PGI remains a comparable or even better predictor of income and SES. This is due to even larger sample sizes in recent GWASs on EA (N ≈ 3,000,000), lower measurement error in EA than in measures of income and the high genetic correlation between income and EA.

It is important to point out that the results of our study reflect the specific social realities of the analysed samples and are not universal or unchangeable. This is exemplified by the substantial heterogeneity in the genetic architecture of income that we found across our cohorts of European descent, as well as the non-perfect genetic correlation between sexes. This heterogeneity is consistent with previous findings where the polygenic signal for other measures of SES (such as EA) varies by culture20 and by country50. This genetic heterogeneity is indicative of phenotypic heterogeneity between cultures, where the heritable traits linked to income may not be universal but rather vary and reflect the differences between societies in which heritable traits are facilitative of income differences.

We emphasize that our results are limited to individuals whose genotypes are genetically most similar to the EUR panel of the 1000 Genomes reference panel compared with people sampled in other parts of the world. Our results have limited generalizability and do not warrant meaningful comparisons across different groups or predictions of income for specific individuals (see FAQ in the Supplementary Information). To increase the representation of individuals from diverse backgrounds, cohort and longitudinal studies should seek to sample more diverse and representative samples of the global population.

Our results contribute to the understanding of genetic and environmental factors that influence income. Future research could focus on disentangling these relationships further by integrating genomic data with longitudinal assessments of environmental exposures and behavioural traits. Such approaches could help elucidate the pathways through which genetic predispositions interact with socio-economic contexts, life experiences and individual behaviours to shape income-related outcomes. This line of research may ultimately contribute to a deeper understanding of the mechanisms underlying social mobility and economic inequality.

Studies of genetic analyses of behavioural phenotypes have been prone to misinterpretation, such as characterizing identified associated variants as ‘genes for income’. Our study illustrates that such characterization is incorrect for many reasons. The effect of each individual SNP on income is minimal, capturing less than 0.01% of the overall variance in income. Furthermore, the genetic loci we identified correlate with many other traits, including education and a wide range of health outcomes. Finally, the finding that only one quarter of the genetic associations we identified are due to direct genetic effects suggests the potential importance of family-specific factors (including potential resemblance between parents) and environmental factors as important drivers of income inequality.

Methods

This section provides an overall summary of the analysis methods. Further details are available in the Supplementary Information.

GWAS meta-analysis

We preregistered our analysis plan for the main income GWAS meta-analysis on 30 August 2018 (https://osf.io/rg8sh/). We used four measures of income (individual, occupational, household and parental income) and conducted a multivariate GWAS to combine these different measures. In total, we recruited 32 cohorts. Some of these cohorts contributed to multiple income measures. Supplementary Tables 1 and 2 summarize the income measures used for each cohort. Supplementary Information Section 2.1 provides details on the phenotype definition. The study was limited to 1KG-EUR-like individuals who were not enrolled in an educational programme at the time of survey or who were above the age of 30 if their current enrolment status was unknown.

Each cohort conducted the additive association analysis as follows. The log-transformed income measure was regressed on the count of effect-coded alleles of the given SNP, controlling for any sources of variation in income that do not reflect individual earning potential according to the data availability of each cohort. This included hours worked (with square and cubic terms), year of survey, indicators of employment status (such as retired or unemployed), self-employment and pension benefits (Supplementary Table 4). In addition, the covariates included at least the top 15 genetic PCs and cohort-specific technical covariates related to genotyping (genotyping batches and platforms). This analysis was performed for male and female samples separately.

We applied a stringent quality-control protocol based on the EasyQC software package51 to the GWAS results from each cohort (see Supplementary Information Section 2.4 for more detail). To combine multiple GWAS results on different income measures collected from multiple cohorts, we performed the meta-analysis in several steps. First, for each income measure and each sex, we meta-analysed the cohort-level GWAS results with METAL27 using sample-size weighting. Then, for each income measure, we meta-analysed the male and female results by using the meta-analysis version of MTAG28. To extract the common genetic factor from the four GWAS results with different income measures, we again leveraged MTAG, allowing for different heritabilities among the input traits.

Independent loci were identified using FUMA42. First, independent significant SNPs were defined using a cut-off of P < 5 × 10−8 and as independent from any other SNP (r2 < 0.6) within a 1-Mb window. Next, lead SNPs were identified as significant SNPs independent from each other at r2 < 0.1. Finally, independent genomic loci were formed from all independent signals that were in physical proximity to each other by merging independent significant SNPs closer than 250 kb into a single locus using the 1000 Genomes EUR reference panel to ensure that the accuracy of the loci borders was not influenced by missing data in our GWAS. The distance between two loci defined by FUMA is thus between the SNPs in linkage disequilibrium with the independent significant SNPs rather than between the independent significant SNPs themselves.

Cross-sex and cross-country heterogeneity

We investigated the potential environmental heterogeneity in the GWAS of income by estimating the cross-cohort genetic correlations by sex or by country with LDSC39. Sex-specific meta-analysis results for each income measure were available as intermediary outputs from the meta-analysis procedure. In addition, we conducted an Income Factor GWAS on the sex-specific results, which yielded an effective sample size of 360,197 for men and 353,429 for women.

To derive country-specific GWAS meta-analyses, we used only occupational and household income, for which we were able to obtain a sufficiently large sample size for multiple countries. We obtained the household income GWAS for the USA (Neff = 30,855), the UK (Neff = 387,579) and the Netherlands (Neff = 40,533); and the occupational income GWAS for Estonia (Neff = 75,682), Norway (Neff = 42,204), the UK (Neff = 279,883) and the Netherlands (Neff = 24,425).

Comparative analysis with EA

We compared our Income Factor GWAS results with the GWAS of EA by examining genetic correlation with LDSC and using the GWAS-by-subtraction approach52. Here we used a version of EA summary statistics slightly different from publicly available ones. The latest EA GWAS revised the coding of the years of schooling in the UKB33 to better reflect the educational qualifications of the participants. On the basis of the new coding, we conducted a GWAS of EA in the UKB. Then, by using MTAG with the meta-analysis option, we meta-analysed the UKB result with EA3 summary statistics21 that did not include the UKB.

We then statistically decomposed the estimated genetic association of the Income Factor into the indirect effect due to EA and the direct effect unexplained by EA (NonEA-Income), leveraging the GWAS-by-subtraction approach in genomic structural equation modelling34,52. We implemented this method in the form of a mediation model.

PGI analysis

We conducted three sets of analyses based on the PGI: (1) prediction analysis, (2) direct genetic effect estimation and (3) a phenome-wide association study of common diseases.

For the PGI prediction analysis, we used the STR53, the UKB-sib and the HRS54. We constructed PGIs using the meta-analysis results for income excluding one prediction cohort at a time, as well as a PGI based on the EA GWAS summary statistics constructed in the same way for comparison. PGIs were created only with HapMap 3 SNPs55, as these SNPs have good imputation quality and provide good coverage for 1KG-EUR-like individuals. We derived PGIs on the basis of a Bayesian approach implemented in the software LDpred2 (ref. 29).

We measured the prediction accuracy on the basis of incremental R2, which is the difference between the R2 from a regression of the phenotype on the PGI and the baseline covariates and the R2 from a regression on the baseline covariates only. Because income typically contains substantial demographic variation, we pre-residualized the log of income for demographic covariates. Then, as baseline covariates, we included only the top 20 genetic PCs and genotype batch indicators. Because income data were available for multiple years for the STR and the HRS, we residualized the log of income for age, age2, age3, sex, and interactions between sex and the age terms within each year and obtained the mean of residuals for each individual. For the UKB-sib, which had only cross-sectional data, we residualized the log of income for age, age2, age3, sex, dummies for survey year, and interactions between sex and the rest. For the EA measure (years of education), we applied the same procedure with birth-year dummies. We constructed CIs for the incremental R2 by bootstrapping the sample 1,000 times.

To estimate the direct genetic effect of the Income Factor PGI, we used snipar38 to impute missing parental genotypes from sibling and parent–offspring pairs. Parental PGIs were then created with the imputed SNPs. We estimated the direct genetic effect of the PGI by controlling for the parental PGI. This analysis was conducted only with the UKB-sib sample. See Supplementary Information Section 5.2 for further details.

To explore the clinical relevance of the Income Factor PGI for common diseases, we carried out a phenome-wide association study, using the in-patient electronic health records for 115 diseases with sex-specific sample prevalence no lower than 1% in the UKB-sib sample. We derived case–control status according to the phecode scheme by mapping the UKB’s ICD-9/10 records to phecodes v.1.2 (ref. 56). We fitted a linear regression of case–control status on the Income Factor PGI while controlling for the parental PGIs to capture the direct genetic effects of income PGI. As covariates, we also included the year of birth, its square term and its interactions with sex, genotype batch dummies and 20 genetic PCs. Standard errors were clustered by family.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Related Articles

Skill dependencies uncover nested human capital

Modern economies require increasingly diverse and specialized skills, many of which depend on the acquisition of other skills first. Here we analyse US survey data to reveal a nested structure within skill portfolios, where the direction of dependency is inferred from asymmetrical conditional probabilities—occupations require one skill conditional on another. This directional nature suggests that advanced, specific skills and knowledge are often built upon broader, fundamental ones. We examine 70 million job transitions to show that human capital development and career progression follow this structured pathway in which skills more aligned with the nested structure command higher wage premiums, require longer education and are less likely to be automated. These disparities are evident across genders and racial/ethnic groups, explaining long-term wage penalties. Finally, we find that this nested structure has become even more pronounced over the past two decades, indicating increased barriers to upward job mobility.

Does earning money empower women? Evidence from India

This study investigates the relationship between women’s earnings and empowerment in India, focusing on married women aged 15-49. Utilizing data from the Indian DHS (2019–21), the research employs Bayesian statistical analyses to explore how earning money influences women’s empowerment in areas like financial and bodily autonomy, marital rights, attitudes towards intimate partner violence, and peer recognition. The study reveals that while earning money enhances women’s financial and bodily autonomy, decision-making capacity, and societal recognition, it simultaneously restricts their marital rights, degrades intimate partner relations, and increases the risk of domestic violence. The findings contribute to the discourse on gender equality, offering empirical insights into the complexities of women’s empowerment in India. This research has practical implications for policymakers, NGOs, and stakeholders working towards women’s empowerment, providing a deeper understanding of the varied impact of women’s economic contributions in India’s socio-cultural context.

Iron homeostasis and ferroptosis in muscle diseases and disorders: mechanisms and therapeutic prospects

The muscular system plays a critical role in the human body by governing skeletal movement, cardiovascular function, and the activities of digestive organs. Additionally, muscle tissues serve an endocrine function by secreting myogenic cytokines, thereby regulating metabolism throughout the entire body. Maintaining muscle function requires iron homeostasis. Recent studies suggest that disruptions in iron metabolism and ferroptosis, a form of iron-dependent cell death, are essential contributors to the progression of a wide range of muscle diseases and disorders, including sarcopenia, cardiomyopathy, and amyotrophic lateral sclerosis. Thus, a comprehensive overview of the mechanisms regulating iron metabolism and ferroptosis in these conditions is crucial for identifying potential therapeutic targets and developing new strategies for disease treatment and/or prevention. This review aims to summarize recent advances in understanding the molecular mechanisms underlying ferroptosis in the context of muscle injury, as well as associated muscle diseases and disorders. Moreover, we discuss potential targets within the ferroptosis pathway and possible strategies for managing muscle disorders. Finally, we shed new light on current limitations and future prospects for therapeutic interventions targeting ferroptosis.

Using twin-pairs to assess potential bias in polygenic prediction of externalising behaviours across development

Prediction from polygenic scores may be confounded by sources of passive gene-environment correlation (rGE; e.g. population stratification, assortative mating, and environmentally mediated effects of parental genotype on child phenotype). Using genomic data from 10 000 twin pairs, we asked whether polygenic scores from the most recent externalising genome-wide association study predict conduct problems, ADHD symptomology and callous-unemotional traits, and whether these predictions are biased by rGE. We ran regression models including within-family and between-family polygenic scores, to separate the direct genetic influence on a trait from environmental influences that correlate with genes (indirect genetic effects). Findings suggested that this externalising polygenic score is a good index of direct genetic influence on conduct and ADHD-related symptoms across development, with minimal bias from rGE, although the polygenic score predicted less variance in CU traits. Post-hoc analyses showed some indirect genetic effects acting on a common factor indexing stability of conduct problems across time and contexts.

SETD1B-mediated broad H3K4me3 controls proper temporal patterns of gene expression critical for spermatid development

Epigenetic programming governs cell fate determination during development through intricately controlling sequential gene activation and repression. Although H3K4me3 is widely recognized as a hallmark of gene activation, its role in modulating transcription output and timing within a continuously developing system remains poorly understood. In this study, we provide a detailed characterization of the epigenomic landscapes in developing male germ cells. We identified thousands of spermatid-specific broad H3K4me3 domains regulated by the SETD1B-RFX2 axis, representing a previously underappreciated form of H3K4me3. These domains, overlapping with H3K27ac-marked enhancers and promoters, play critical roles in orchestrating robust transcription and accurate temporal control of gene expression. Mechanistically, these broad H3K4me3 compete effectively with regular H3K4me3 for transcriptional machinery, thereby ensuring robust levels and precise timing of master gene expression in mouse spermiogenesis. Disruption of this mechanism compromises the accuracy of transcription dosage and timing, ultimately impairing spermiogenesis. Additionally, we unveil remarkable changes in the distribution of heterochromatin marks, including H3K27me3 and H3K9me2, during the mitosis-to-meiosis transition and completion of meiotic recombination, which closely correlates with gene silencing. This work underscores the highly orchestrated epigenetic regulation in spermatogenesis, highlighting the previously unrecognized role of Setd1b in the formation of broad H3K4me3 domains and transcriptional control, and provides an invaluable resource for future studies toward the elucidation of spermatogenesis.

Responses

Your email address will not be published. Required fields are marked *