Related Articles
Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
Commentary: Why is genetic testing underutilized worldwide? The case for hereditary breast cancer
It is thirty years since the BRCA1 and BRCA2 genes were discovered and genetic testing for BRCA1 and BRCA2 was introduced. Despite increasing awareness of the genetic basis of cancer and our evolving knowledge of effective means of prevention, screening, and treatment for hereditary breast and ovarian cancers, genetic testing is underutilized, and most mutation carriers remain unidentified. In this commentary, we explore possible reasons for why this might be so. Our focus is on factors that may influence or deter a patient from pursuing testing, rather than discussing the implications of receiving a positive test result. Issues of concern include an inadequate number of genetic counselors, restrictive (and conflicting) eligibility criteria for testing, the cost of the test, health insurance coverage, fear of future insurance discrimination, privacy issues, lack of familiarity with the testing process in primary care and gaps in both patient and provider knowledge about the impact and the value of testing. We discuss how these factors may lead to the underutilization of genetic testing in North America and throughout the world and discuss alternative models of genetic healthcare delivery. We have invited leaders in cancer genetic from around the world to tell us what they think are the barriers to testing in their host countries.
Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data
Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector–protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall’s τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall’s τ of 0.68 and 0.49 on the FEP benchmark) while being ~400,000 times faster.
Predictive learning as the basis of the testing effect
A prominent learning phenomenon is the testing effect, meaning that testing enhances retention more than studying. Emergent frameworks propose fundamental (Hebbian and predictive) learning principles as its basis. Predictive learning posits that learning occurs based on the contrast (error) between a prediction and the feedback on that prediction (prediction error). Here, we propose that in testing (but not studying) scenarios, participants predict potential answers, and its contrast with the subsequent feedback yields a prediction error, which facilitates testing-based learning. To investigate this, we developed an associative memory network incorporating Hebbian and/or predictive learning, together with an experimental design where human participants studied or tested English-Swahili word pairs followed by recognition. Three behavioral experiments (N = 80, 81, 62) showed robust testing effects when feedback was provided. Model fitting (of 10 different models) suggested that only models incorporating predictive learning can account for the breadth of data associated with the testing effect. Our data and model suggest that predictive learning underlies the testing effect.
Diurnal timing of physical activity in relation to obesity and diabetes in the German National Cohort (NAKO)
Physical activity supports weight regulation and metabolic health, but its timing in relation to obesity and diabetes remains unclear. We aimed to assess the diurnal timing of physical activity and its association with obesity and diabetes.
Responses