Genetic variants associated with longevity in long-living Indians

Introduction
Longevity and healthy ageing are complex phenomena influenced by various factors, like socioeconomic status, nutrition, lifestyle, physical activity, gender and genetics. Among them, genetics plays a major role, and its influence is estimated to be around 25–40%1. Studies indicate that offspring of long-lived individuals tend to lead a healthy and longer life compared to general population2. This could be due to inheritance of genetic variants associated with healthy lipid profiles, enhanced insulin sensitivity, slower cognitive decline and lower incidence of age-related diseases, like Alzheimer’s and cardiovascular diseases3,4,5,6,7. There are also a certain category of variants that are identified to be prevalent in long-living individuals through multiple genome-wide association studies (GWAS), candidate studies and meta-analyses related to longevity. They are the variants from Forkhead Box O3A (FOXO3A) and Apolipoprotein E (APOE) genes that are part of pathways associated with cell apoptosis, metabolism and oxidative stress8,9,10. Variants that decrease the length of telomeres also influence an individual’s lifespan11,12.
Genetic associations observed in one population might not reflect in other populations due to variable frequencies of effect alleles across populations. In developed countries or high-income countries, reaching 100 years of age is easier compared to low-income countries, and hence selection pressure is less, leading to relatively lesser frequencies of effect alleles13. Also, age thresholds to define “cases” or “long-living individuals” change as per the population under study. Studies done on isolated populations (Blue Zones) show that the environmental factors also work in synergy with genetics in contributing to exceptional longevity in those populations14. Considering the above factors, for any study done on genetics of longevity, it is necessary to understand longevity of the individuals in that population, define the age thresholds and take cases and controls from the same environment.
According to the latest estimates of Longitudinal Aging Study in India (LASI – 2019), India has world’s second-largest number of older people, aged 60 and above15. It is further estimated that the number will double by 2050. With improved access to vaccines, antibiotics and clean water, and increased awareness on healthy diets and physical activity, life expectancy at birth has improved from 62.1 years in 2000 to 70.8 years in 2019 (https://data.who.int/countries/356). Despite this, only 0.4% of the population reaches the age of 85 years. Understanding the genetics of these long living individuals (LLIs), of age 85+, in India is still a challenge due to lack of genetic research on the population. It is necessary to conduct genome-wide studies by selecting cases and controls from the individuals living in India. Identifying genetic variants associated with longevity, and in turn inferring factors that influence longevity helps in recommending strategies for healthy aging at population-wide and an individual level.
In the present study, we have identified variants or single nucleotide polymorphisms (SNPs) that have significantly higher frequency in long-living Indians, compared to younger controls, using GenomegaDB of Mapmygenome (https://mapmygenome.in/). Through literature mining on the identified variants we have inferred the phenotypes or factors that increase the longevity. To our knowledge this is the first genetic study to investigate variants associated with longevity in Indian population.
Methods
Study Data
Data for the current study was taken from GenomegaDB of Mapmygenome, which is a genotype-phenotype database of Indians living in India. Cases were defined as individuals with self-reported age of greater than 85 years at the time of sample collection. Controls were the individuals with 18–49 years of age at the time of sample collection. Genotype data were generated using Illumina Infinium iSelect HTS Custom Genotyping BeadChip-24 [Catalog ID: WG-405-1014] developed by Mapmygenome. The chip has 10,133 probes corresponding to 8768 unique variants associated with various phenotypes, like cancers, cardiovascular, neurological, gastrointestinal, metabolic and autoimmune disorders. Details on custom chip development are given in Supplementary Information. Written informed consent, including the consent to use data for research, was taken from each individual. Sample processing was carried out in compliance with the Helsinki declaration and the procedures had been approved by internal bio-safety committee at Mapmygenome.
Data pre-processing
Samples with genotype call rate <98%, samples with mismatch between genetic and reported sex, and outlier samples, defined as being more than three standard deviations away from the mean of top five genetic principal components, were excluded from the analysis. In the case of relatives, with kinship coefficient >0.0885, indicating second-degree or closer relationship, only one individual was retained in the analysis. Pre-processing on variants included—removing variants with genotype call rate <90% and minor allele frequency (MAF) < 5%, and removing variants that are not in Hardy-Weinberg Equilibrium (HWE) in controls (PHWE < = 0.0001). Additional filtration was done to remove variants that are in linkage disequilibrium with r2 > 0.9. To confirm the ethnicity, study samples were plotted on principal components along with 2504 samples of known ethnicity from 1000 Genomes project. Genetic affinity between cases and controls was evaluated through visual inspection in the space of top three principal components (Supplementary Information, Figure S1). R 4.316, PLINK 1.9 and PLINK 2.017 were used for the above data pre-processing.
Association analysis
The association of each variant with longevity was assessed using additive logistic regression available in PLINK1.9 software. Sex and top three genetic principal components were added as covariates in the model to correct for population stratification. Considering the small sample size of study data, empirical p-values were evaluated instead of False Discovery Rate (FDR). Adaptive permutation approach present in PLINK1.9 was used to obtain the minimum number of permutations required for crossing the observed test statistic. Variants that required at least 100,000 permutations to cross the test statistic were observed to have a p-value of <= 5 × 10−4 in logistic regression analysis. Hence p-value of 5 × 10−4 was considered as a threshold to identify significantly associated variants with longevity. Manhattan plot and Q-Q (quantile-quantile) plot were generated using ggplot and CMplot packages of R v4.3. Gene annotations were obtained through closestBed option available in BEDOPS v2.4.41 suite of tools18. Transcription regions of the genes in GRCh38 coordinates were obtained from refGene.txt.gz file downloaded from UCSC, and variants falling within 20KB distance from transcription regions were annotated with the corresponding gene.
Pathway enrichment analysis
Genes corresponding to the significant SNPs were mapped to KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway maps to understand the possible functional processes associated with longevity. DAVID Functional Annotation Tool v2024q119,20 was used to perform pathway enrichment analysis. In addition to that, ClueGO v2.5.1021 was used to draw interacting network of pathways grouped along with Longevity regulating pathway.
Results
A total of 1360 samples (141 cases and 1219 controls) were considered in the present study to investigate variants associated with longevity in Indian population. After excluding samples with low genotype call rate, samples with sex mismatch, related samples with Kinship coefficient >0.0885 and outlier samples, 1267 samples (133 cases and 1134 controls) were retained in the analysis. Characteristics of samples included in the analysis are given in Table 1. Mean age of cases was found to be 93.4 years, and mean age of controls was found to be 36.4 years. There was equal representation of samples from both the genders. Out of 8768 variants, 5336 autosomal variants passed quality control filters, and were finally considered in the analysis.
Association analysis using additive logistic regression along with sex and top three genetic principal components resulted in 9 variants to be significantly associated with longevity, at a p-value threshold of 5 × 10−4. Table 2 gives information on minor allele, major allele, minor allele frequency (MAF), p-value and odds ratio obtained for these variants. Empirical p-values and the number of permutations required to cross the observed test-statistic are also given in Table 2. Figure 1 shows Manhattan plot of associations. 388 variants that have suggestive association at a nominal p-value of 0.05 are given in Supplementary Table 1. Q-Q plot of association is given in Figure S2 of Supplementary Information.

X-axis shows the chromosomal locations. Y-axis shows the statistical significance of association as measured by the negative logarithm of p‐value. Larger values on y-axis correspond to smaller p-values. Red horizontal dashed line indicates the p-value threshold of 5 × 10−4. Significant variants are highlighted in red, with their gene annotation in rectangular green boxes.
Literature mining on SNPs that are significantly associated with longevity revealed that they are implicated in various diseases, indirectly affecting the lifespan. As can be seen in Table 1, G allele of rs3903239 decreases the chances of surviving to older age by 0.504 times (95% CI: 0.347 – 0.732). Its frequency is very low in LLIs compared to younger controls. Research indicates that the G allele is associated with atrial fibrillation22,23. Similarly, significant association with longevity was also observed for rs1339227, an intergenic SNP between RIMS1 and KCNQ5 genes. Several studies indicate that the C allele of this SNP is associated with schizophrenia24,25. Carriers of the other allele, T allele, at this locus have 1.64 times higher chances of surviving up to the age of 85 (95% CI: 1.263–2.129), and the frequency of T allele is higher in LLIs compared to controls. The rs2982570, from ESR1 gene that is involved in growth and metabolism, is known to be associated with the risk of osteoporosis and bone fractures26. The non-effect allele, T allele, at this locus has an odds ratio of 1.69 (95% CI: 1.262–2.261) for longevity compared to the risk allele. The rs10970985, from ACO1 gene that is involved in controlling the iron levels in the cells, is found to have lower frequency of C allele in LLIs. The C allele at this locus decreases the survival to older age by 0.561 times (95% CI: 0.422–0.746). Another variant, T allele of rs391957 from HSPA5 gene, which is associated with decreased anxiety and neuroticism27,28, was found to have high frequency in LLIs, with an odds ratio of 1.636 (95% CI: 1.26–2.125) for longer survival. Significant association with longevity was also observed for rs365990 from MYH6 gene that is involved in cardiac muscle contraction. G allele at this locus is known to be associated with prolonged PR interval and slower heart rate29,30,31. Carriers of G allele have 1.656 times higher chances of surviving to older age (95% CI: 1.264–2.17) than the carriers of A allele, and the frequency of G allele is higher in LLIs. The additional three SNPs in Table 1, rs1877455 [T] associated with autism as part of a haplotype32, rs9363918 [T] associated with pancreatic cancer33, and rs2762051 [T] associated with celiac disease34, have limited literature support for their associations, and further research studies are required to understand their role in increasing the lifespan.
The significant SNPs were further checked if there are any expression quantitative trait loci (eQTL). The Genotype-Tissue Expression (GTEx) database (https://gtexportal.org/home/) was queried for each SNP using GTEx Locus Browser (Variant-centric). Four SNPs were found to be eQTLs that effect the gene expressions in various tissues. The SNP rs1877455 was found to be associated with altered expression of TSPAN2 gene in heart-atrial appendage tissue (sample size = 372) with NES (Normalized Effect Size) of 0.344 and p-value of 2.88e-05. The expression levels of PRRX1 gene were found to be altered by rs3903239 in heart-atrial appendage and heart-left ventricle tissues with NES of -0.272 and -0.241, respectively. Upstream variant of ACO1 gene, rs10970985, was found to alter the expression of ACO1 gene in brain cerebellum and testis tissues. Studies related to brain cerebellum showed negative NES (Normalized Effect Size) indicating decreased expression of the gene, and studies related to testis showed positive NES indicating increased gene expression. The SNP rs365990 was found to be an eQTL for CMTM5 gene in heart-atrial appendage with NES of 0.179 and p-value of 1.73e-05. As limited studies are available for the above mentioned SNP-Gene associations, further research is required to confirm these associations.
Previous studies showed multiple SNPs from FOXO3A gene to be associated with longer lifespan. In particular, the G allele of rs2802292 showed significant presence in Japanese, German and French centenarians8,35,36. In our study, this SNP reached nominal significance with a p-value of 0.032 and an OR of 1.33. Other SNPs from FOXO3A (rs9400239, rs2153960, rs4946935, and rs4946936) did not show statistical significance, but the direction of association is consistent with other studies. Figure 2 shows the results of association analysis on SNPs from FOXO3A and APOE genes. APOE is also one of the most studied genes on longevity. The A allele of rs769449 from APOE gene that is associated with cognitive decline and aging related verbal memory9 has low frequency in LLIs compared to controls. The other allele, G allele, is in high frequency in LLIs though the association did not reach the required statistical significance. Other SNPs from APOE, rs769450 and rs429358, also did not show statistical significance. Further research studies are required to understand their significance in the population.

A FOXO3A, B APOE Bars indicate the allelic frequencies of SNPs in Cases and Controls. P-values given above the bars are the result of logistic regression analysis done using sex and principal components as covariates. Only significant p-values, below the threshold of 0.05, are shown.
Pathway enrichment analysis was performed on 388 variants that have suggestive association with longevity (p-value < 0.05). 47 variants located in intergenic regions did not map to any genes, and were excluded from the analysis. Significantly enriched KEGG pathways identified through DAVID Functional Analysis tool, with p-value threshold of 0.05, are given in Supplementary Table 2. Figure 3 shows the most significant pathways identified through ClueGO (P-value < 5 × 10−4). Significant pathways identified by both the tools were longevity regulating pathway, FoxO signaling pathway, adipocytokine signaling pathway and non-alcoholic fatty liver disease pathway. Some of the other pathways identified by the tools included insulin resistance, cholesterol metabolism, inflammatory bowel disease and PI3K-Akt signaling pathway. Table 3 shows the pathways associated with the significant SNPs.

A Bargraph showing the % of genes mapped to each pathway term. Colors indicate the grouping of pathways done based on Kappa score. B Network of pathways grouped along with Longevity regulating pathway.
Discussion
In the present study, we analyzed genotype data of long-living Indians, aged 85+, to identify genetic variants associated with longevity in the population. By mapping the identified variants to genes, pathways and their functions, and by doing thorough literature search we inferred the factors that influence longevity. Alleles that increase the risk of atrial fibrillation, biliary disorders, schizophrenia, anxiety and neuroticism were in low frequency in long-living individuals compared to younger controls. Alleles associated with slower heart rate, lower risk of bone fractures, short body height were in high frequency.
Atrial fibrillation is one of the risk factors that reduce the life expectancy. It contributes to complications such as stroke and heart failure. Several genetic research studies indicated the association of rs3903239, an SNP in the upstream of PRRX1 gene, with atrial fibrillation. PRRX1 (Paired related homeobox 1) regulates muscle creatine kinase and controls the growth and development of mesodermal muscles such as the heart. SNPs in the upstream of PRRX1 may reduce the expression of the gene, and shorten the atrial action potential duration37. Ellinor et al.22 conducted a GWAS meta-analysis in individuals of European ancestry, and found G allele of rs3903239 to be significantly associated with atrial fibrillation. The association was also observed in Japanese and Korean population23. In the present study we observed a lower frequency of G allele in LLIs, indicating the lower risk of cardiovascular disorders, and increased longevity.
The T allele of rs1339227, which decreases the risk of schizophrenia, is present in higher frequency in LLIs. Schizophrenia Working Group of the Psychiatric Genomics Consortium identified significant association of rs1339227 with schizophrenia24. The T allele was found to decrease the risk of disease with an odds ratio of 0.942. Yao et al.25 found a decreased effect of T allele in the association of schizophrenia. Higher frequency of T allele in LLIs suggests protective association of this SNP in increasing the lifespan.
Estrogen receptor 1 gene (ESR1) encodes an estrogen receptor (ER-alpha) that plays a major role in bone metabolism and maintenance of the skeletal system. Several genome-wide association studies have found variations in this gene to be associated with osteoporosis and risk of fractures via low bone mineral density (BMD). Estrogen-replacement therapy in post-menopausal women is known to prevent bone loss and osteoporosis. Trajanoska et al.38 found the C allele of rs2982570 from ESR1 to be associated with increased risk of fractures and decreased femoral neck BMD and lumbar spine BMD. In the present study, we observed low frequency of C allele in LLIs, indicating lower risk of skeletal and bone related problems. The other allele (T allele), which is also found to be associated with decreased body height39, is in high frequency in LLIs. Multiple research studies found a negative correlation between height and longevity. Shorter people tend to have higher resistance to chronic diseases, especially in middle age40.
Anemia is also one of the major health concerns in older adults. It is associated with cognitive deficits and reduced physical performance. Several genes and genetic variants are known to affect iron levels, hemoglobin levels and anemia. ACO1 is one such gene that encodes a bifunctional protein which controls the levels of iron within the cells. We observed that the variant in this gene, rs10970985, has lower frequency of C allele in LLIs compared to controls. Further research studies are required to understand the role of this variant.
HSPA5 gene (heat shock protein family A (Hsp70) member 5) encodes immunoglobulin protein that is an essential component of folding/unfolding processes and regulation of Ca2+ homeostasis in the endoplasmic reticulum (ER)41. Disturbed homeostasis leads to ER stress, causing inflammation, metabolic disorders and neurodegenrative diseases. It is also associated with anxiety disorders such as depression and post-traumatic stress disorder (PTSD)42. Research studies found T allele of rs391957 from HSPA5 gene to be a protective factor for anxiety and neuroticism27,28. In the present investigation we observed the frequency of T allele to be higher in LLIs compared to younger controls, indicating that lesser anxiety leads to increased lifespan.
ABCC2 (ATP binding cassette subfamily C member 2) gene encodes a protein called multidrug resistance protein 2 (MRP2) which is involved in metabolism and clearance of certain drugs from organs and tissues. It is also known to play a major role in biliary transport. Variations in ABCC2 gene were found to be associated with cholestasis and nonalcoholic fatty liver disease43,44. In our study, we observed lower frequency of T allele at rs2002042 in LLIs compared to controls (Supplementary Table 1), indicating undisturbed biliary transport, good enterohepatic circulation of bile acids and healthy gut leading to longer lifespan.
Carriers of G allele at rs365990 have 1.725 times higher chances of surviving to older age. Previous studies indicate that G allele of rs365990 is associated with decrease in heart rate29,30. Low resting heart rate results in increased lifespan due to reduced stress on arterial cells, reduced stiffening of aorta and reduced cardiovascular abnormalities45,46. Faster heart rate is related to high metabolic rate, development of free radicals, oxidative stress and faster aging. Studies also indicate that heart rate lowering through medication or selective sinus node inhibition increases the lifespan.
Our pathway enrichment analysis confirmed that the pathways or molecular processes associated with longevity are significantly different in LLIs. Pathways with the genes (through SNPs) involved in oxidative stress, apoptosis, DNA damage repair, glucose metabolism and energy metabolism are differentially regulated in LLIs. IRS2 and PRKAA1 genes that are related to regulation of insulin secretion and glucose levels are common across most of the significant pathways. Several previous studies indicated the role of PI3K-Akt signaling pathway in ageing47,48. In addition to that, our study revealed pathways related to non-alcoholic fatty liver disease (NFLD), inflammatory bowel disease (IBD) and inflammation to be associated with ageing. Screening the variants associated with these disorders early on in life helps in recommending preventive measures, and in promoting healthy ageing.
Major limitation in our study is the sample size. Increasing the sample size will increase the statistical power of the study, and may identify additional variants that increase our understanding of factors affecting longevity. Also, gender-specific analysis is not feasible due to small sample size. But our results can provide some indication on variants and factors associated with longevity in the population, and help in further follow-up studies with larger sample sizes. Additional information on clinical and lifestyle variables, such as BMI, cholesterol levels, diet patterns, physical activity, smoking and alcohol consumption help in identifying the variants associated with interaction of those variables with longevity.
Allele frequencies of the longevity-associated SNPs were observed to vary among populations studied under 1000 Genomes project (Supplementary Table 3). Major alleles in one population were observed to be minor alleles in other populations. For example, G allele of rs3903239 associated with atrial fibrillation is a minor allele in South Asians (SAS), but major allele in East Asians (EAS). Similarly, alleles associated with longevity from the SNPs of FOXO3A and APOE genes have in general higher frequency in Africans (AFR) and South Asians (SAS). This indicates the need for replication studies in different populations to confirm the associations in each population.
In conclusion, our study identified genetic variants associated with longevity in Indian population that is underrepresented in genetic research. Clinical and lifestyle factors inferred through the identified variants can guide in recommending strategies for healthy aging and longevity at population-wide and an individual level.
Responses