Related Articles

Narrowing the gap between machine learning scoring functions and free energy perturbation using augmented data

Machine learning offers great promise for fast and accurate binding affinity predictions. However, current models lack robust evaluation and fail on tasks encountered in (hit-to-) lead optimisation, such as ranking the binding affinity of a congeneric series of ligands, thereby limiting their application in drug discovery. Here, we address these issues by first introducing a novel attention-based graph neural network model called AEV-PLIG (atomic environment vector–protein ligand interaction graph). Second, we introduce a new and more realistic out-of-distribution test set called the OOD Test. We benchmark our model on this set, CASF-2016, and a test set used for free energy perturbation (FEP) calculations, that not only highlights the competitive performance of AEV-PLIG, but provides a realistic assessment of machine learning models with rigorous physics-based approaches. Moreover, we demonstrate how leveraging augmented data (generated using template-based modelling or molecular docking) can significantly improve binding affinity prediction correlation and ranking on the FEP benchmark (weighted mean PCC and Kendall’s τ increases from 0.41 and 0.26 to 0.59 and 0.42). These strategies together are closing the performance gap with FEP calculations (FEP+ achieves weighted mean PCC and Kendall’s τ of 0.68 and 0.49 on the FEP benchmark) while being  ~400,000 times faster.

A data-driven generative strategy to avoid reward hacking in multi-objective molecular design

Molecular design using data-driven generative models has emerged as a promising technology, impacting various fields such as drug discovery and the development of functional materials. However, this approach is often susceptible to optimization failure due to reward hacking, where prediction models fail to extrapolate, i.e., fail to accurately predict properties for designed molecules that considerably deviate from the training data. While methods for estimating prediction reliability, such as the applicability domain (AD), have been used for mitigating reward hacking, multi-objective optimization makes it challenging. The difficulty arises from the need to determine in advance whether the multiple ADs with some reliability levels overlap in chemical space, and to appropriately adjust the reliability levels for each property prediction. Herein, we propose a reliable design framework to perform multi-objective optimization using generative models while preventing reward hacking. To demonstrate the effectiveness of the proposed framework, we designed candidates for anticancer drugs as a typical example of multi-objective optimization. We successfully designed molecules with high predicted values and reliabilities, including an approved drug. In addition, the reliability levels can be automatically adjusted according to the property prioritization specified by the user without any detailed settings.

Dynamic optimizers for complex industrial systems via direct data-driven synthesis

The chemical process industry (CPI) faces significant challenges in improving sustainability and efficiency while maintaining conservative principles for managing cost, complexity, and uncertainty. This work introduces a data-driven approach to dynamic real-time optimization (D-RTO) that addresses the aforementioned concerns by directly extracting process optimization policies from historical plant data. Our method constructs a value function to evaluate trajectory quality and employs weighted regression to derive improved policies. When applied to a plant-wide industrial process control problem, the proposed optimizer demonstrates superior performance in adapting to disturbances while maintaining stability and product quality. These results challenge conventional assumptions regarding the potential of data-driven optimization in the CPI. Although limitations exist due to the black-box nature of neural networks, this study presents a promising avenue for enhancing operational efficiency in industrial settings. The proposed approach offers a practical solution for process optimization, as it leverages readily available historical data and does not require extensive modeling efforts. By demonstrating significant efficiency improvement on a realistic industrial benchmark problem, this work paves the way for the adoption of data-driven optimization techniques in real-world CPI applications.

γδ T-cell autoresponses to ectopic membrane proteins: a new type of pattern recognition

T-cell receptor (TCR) γδ-expressing cells are conserved lymphocytes of innate immunity involved in first-line defense and immune surveillance. TCRγδ recognizes protein/nonprotein ligands without the help of the major histocompatibility complex (MHC), especially via direct binding to protein ligands, which is dependent primarily on the δ chain complementary determining region 3 (CDR3δ). However, the mechanism of protein‒antigen recognition by human γδ TCRs remains poorly defined. We hypothesize that γδ TCRs recognize self-proteins expressed ectopically on the cell membrane that are derived from intracellular components under stress. Here, we mapped 16 intercellular self-proteins among 21,000 proteins with a huProteinChip as putative ligands for Vδ1/Vδ2 TCRs, 13 for Vδ1 TCRs and 3 for Vδ2 TCRs. Functional tests confirmed that ectopic nucleolin (NCL) is a ligand for the Vδ1 TCR, whereas protein-glutamine γ-glutamyltransferase K (TGM1) is a ligand for the Vδ2 TCR. In the context of radiation exposure, the ectopic expression of intracellular proteins on the tumor cell surface is related to the increased antitumor cytotoxicity of γδ T cells both in vitro and in vivo. In conclusion, the recognition of intracellular proteins that are ectopically expressed on somatic cells by human γδ TCRs is a basic interaction mechanism that enables new types of immune pattern recognition and a novel γδ TCR-ligand-based strategy for tumor immunotherapy.

Chemogenomics for steroid hormone receptors (NR3)

The nine human NR3 nuclear receptors translate steroid hormone signals in transcriptomic responses and operate multiple highly important processes ranging from development over reproductive tissue function to inflammatory and metabolic homeostasis. Although several NR3 ligands such as glucocorticoids are invaluable drugs, this family is only partially explored, for example, in autoimmune diseases and neurodegeneration, but may hold therapeutic potential in new areas. Here we report a chemogenomics (CG) library to reveal elusive effects of NR3 receptor modulation in phenotypic settings. 34 highly annotated and chemically diverse ligands covering all NR3 receptors were selected considering complementary modes of action and activity, selectivity and lack of toxicity. Endoplasmic reticulum stress resolving effects of N3 CG subsets in proof-of-concept application validate suitability of the set to connect phenotypic outcomes with targets and to explore NR3 receptors from a translational perspective.

Responses

Your email address will not be published. Required fields are marked *