MGNN: Moment Graph Neural Network for Universal Molecular Potentials

Introduction

The computational determination of molecular properties has been significantly accelerated by machine learning. The crux of this advancement lies in the development of machine learning potentials with remarkable efficiency and fidelity. Although the usage of local descriptors in most machine learning interatomic potentials have improved parallel computation and scalability^1,2,3,4,5,6, yet they fall short in accuracy compared with atom-centered message-passing neural network (MPNNs) interaction potentials within the concept of graph neural network (GNN)^{7,8,9,10,11,12}, where the molecular systems are treated as 3D graphs, with atoms as nodes linked by edges within a defined cutoff radius. These networks have achieved groundbreaking results, often surpassing traditional models with minimal manual feature engineering. They encode atoms into a high-dimensional space and model interatomic interactions through message passing, effectively serving as surrogates for quantum mechanical simulations^{7,8,9,10,11,12,13,14,15,16}. Machine-learning interatomic potentials have provided promising solutions to bridge the gap between expensive electronic structure methods and efficient classical interatomic potentials^17,18,19,20. Most recently, GNN-based machine-learning interatomic potentials trained on the periodic table (for example, M3GNet²¹ and CHGNet¹⁷) have demonstrated the possibility of universal interatomic potentials that may not require chemistry-specific training for each new application^17,21,22,23.

In principle, to accurately predict scalar quantities like potential energy, vectorial properties such as forces and dipole moments, and even tensorial properties like polarizabilities^24,25,26,27, a GNN which fully accommodates the transformation properties of scalar or tensor is required. For instance, while the translation and rotation of a small molecule do not alter its formation energy, the forces acting on its atoms rotate with the molecule. Most available GNN machine learning potentials address above points by empirically constructing neural network using geometrical quantities like interatomic-distances, bond angles^28,29,30,31.

To learn molecular properties satisfying all the needed symmetries, systematical and rigorous routines to treat symmetries are highly desirable. Currently, many machine learning potentials use equivariant message-passing frameworks to achieve such goals, which typically use spherical harmonics as angular basis functions, followed by Clebsch-Gordan contraction to maintain rotational symmetry^{11,12,32,33,34,35,36,37}. However, these calculations are usually expensively. To address such issue, there are emerging efforts on finding efficient computational methods. For instance, a mathematically equivalent and simple message-passing framework that circumvents spherical harmonics is recently proposed, which avoids the usage of high-dimensional Clebsch-Gordan coupling matrix^26,38,39. On the other hand, there are a series of mathematical theorems which establishes that the Moment Tensor Potentials⁴ can provably approximate any regular function satisfying all the needed symmetries, while simultaneously being computationally efficient.

Aiming to develop simple and efficient GNN to learn molecular properties satisfying symmetry requirement, as inspired by Moment Tensor Potential⁴, we introduce the Moment Graph Neural Network (MGNN), which innovatively propagates information between molecular systems using moments—quantities that encapsulate the spatial relationships between atoms. Our framework defines molecular moments⁴ to convey relative spatial relationships between atoms within a molecule, utilizing Chebyshev polynomials to encode interatomic distance information instead of more commonly used functions like Bessel and Gaussian radial basis functions^28,40. Impressively, MGNN demonstrates multiple state-of-the-art results on benchmark datasets including QM9^41,42, revised MD17⁴³, and MD17-ethanol⁴⁴. Its generalizability and efficiency are also tested in additional systems including 3BPA⁴⁵ and 25-element high-entropy alloys⁴⁶. Additionally, we demonstrate the high consistency of our framework in predicting molecular dynamics simulations of amorphous electrolytes. Moreover, MGNN’s predictive capabilities extend to the calculation of dipole moments and polarizabilities for organic molecules, such as ethanol, enabling the rapid simulation of molecular spectra in vacuum conditions.

Results

The MGNN architecture

Overview

The Hamiltonian of a molecular system is uniquely determined by the external potential, depending on the charge quantities ({{Z}_{i}}) and atomic positions ({{mathop{r}limits^{ rightharpoonup }}_{i}}). Formally, a molecule in a specific conformation can be represented as a point cloud with (n) atoms denoted as (M=left(Z,Rright)), where (Z=({Z}_{1},ldots ,{Z}_{n})in {{mathbb{Z}}}^{n}) represents the atom type vector and (R=({mathop{r}limits^{ rightharpoonup }}_{1},ldots ,{mathop{r}limits^{ rightharpoonup }}_{n})in {{mathbb{R}}}^{ntimes 3}) represents the atom coordinate matrix.

MGNNs treat molecules as graphs, with atoms as nodes, and edges representing connections between atoms, either based on predefined molecular graphs or atoms connected within a cutoff distance ({r}_{{cut}}). MGNNs represent each atom through atomic embeddings ({h}_{i}in {{mathbb{R}}}^{{H}_{a}}) where ({H}_{a}) is the number of atom features and the edges between nodes (i) and (j) through edge embeddings ({e}_{{ij}}in {{mathbb{R}}}^{{H}_{e}}) where ({H}_{e}) is the number of edge features. These embeddings are updated in each layer through information propagation. In general, the Message Passing Neural Networks (MPNN)¹⁴ framework achieves message passing by ({m}_{v}^{t+1}={sum }_{win {mathscr{N}}left(vright)}{M}_{t}left({h}_{v}^{t},{h}_{w}^{t},{e}_{{vw}}right)) and updates by ({h}_{v}^{t+1}={U}_{t}left({h}_{v}^{t},{m}_{v}^{t+1}right)), where ({M}_{t}) represents the message function, ({U}_{t}) represents the vertex update function, and ({mathscr{N}}left(vright)={{w|||}{mathop{r}limits^{ rightharpoonup }}_{{vw}}||le {r}_{{cut}}}) denotes the neighbors of node (v) in graph (G). Our MGNN framework utilizes moment interaction to construct ({M}_{t}) and ({U}_{t}).

In this work, a molecular graph is represented as (G=(A,{E},{T})), where (A) denotes the atoms (nodes), (E) the edges within cutoff ({r}_{{cut}}), and (T) the triplets. Each atom (i) forms an edge with every neighboring atom within its cutoff distance, as demonstrated in Fig. 1a, constructing edges ({ij}), ({ik}), ({im}), and corresponding triplets ({ijk}), ({ijm}), ({ikm}) to formulate the moments. Within a triplet composed of nodes (i,j,k) and edges ({ij}), ({ik}), the message of triplets are used to update edge representation features, which are further used to update node features. As shown in Fig. 1b, for a triplet ({ijk}) whose center is atom (i) and edges are ({ij}) and ({ik}), the information within this triplet first passes to edge ({ij}) (symmetrically to ({ik})), and then the message of edge ({ij}) passes to the central node (i).

**Fig. 1: The moment message passing scheme and the architecture of MGNN.**

The MGNN framework comprises an embedding block, several sequential interaction layers, and an output block, as shown in Fig. 1c. The molecular charge quantities (labeled as (Z)) first pass through the embedding block to obtain the initial node representation vectors (labeled as (X)). These initial vectors, along with the positions of atoms in Cartesian space containing geometry information (labeled as (R)), are then input into an interaction layer, which uses an interaction block that generates a correction ((Delta X)) to be added to the input ((X)) to generate the current layer output (i.e., (X+Delta X)), which serves as the input for the next interaction layer or output layer. The interaction block (detailed in Fig. 1d) captures how atoms interact with each other, by processing the message from the neighbors of atom (i), during which the interaction between edges from atom (i) (i.e., triplet moment interaction as detailed in Fig. 1e) is involved. There are (N) sequential interaction layers in total. The output after the last interaction layer is labeled as ({X}^{L}), which is then fed into target-dependent output block (detailed in Fig. 2). By aggregating the message of triplets, we construct a comprehensive representation of the molecular structure within the graph, which is then propagated through the interaction layers to update atomic representations. The resulting feature vectors for each atom are fed into the output block, tailored to predict specific properties, culminating in the model’s final predictions. The output module of the MGNN model is adept at generating scalar, vector, and tensor outputs, demonstrating its versatility in capturing and predicting a wide range of molecular properties.

The definition of MLP in current work (as shown in Figs. 1 and 2) is ({MLP}left(xright)={W}^{{out}}left(sigma left({W}^{{in}}x+{b}^{{in}}right)right)+{b}^{{out}}), where ({W}^{{in}}) and ({W}^{{out}}) are learnable weights and ({b}^{{in}}) and ({b}^{{out}}) are learnable biases, and (sigma left(xright)) denotes the element-wise nonlinear activation function. In our experiments, we utilize the SiLU^47,48 and Mish⁴⁹ activation functions. The action of “Concat”, “Reshape”, “Split” in current work (as shown in Figs. 1 and 2) are used to manipulate data dimensions. We use torch.split for “Split” operation. Given a tensor A, suppose the last dimension of A has size 2 F. By performing a split along the last dimension, we can divide A into two chunks, each with a last dimension of size F, while keeping the sizes of other dimensions unchanged. Of course, it’s also possible to split the tensor into other proportions, such as 0.5 F and 1.5 F. for more details, refer to the PyTorch⁵⁰ documentation. All the notations “(W)” and “(b)” with or without superscript are learnable parameters and we will hint their shape when necessary. Next, we will provide a detailed introduction to each block. The notions employed in this paper are summarized in Supporting Information S1.

Embedding

MGNN works on embedded feature, which is initialized by the embedding block shown in Fig. 1c. Throughout interaction layers, we represent the atoms using a tuple of features ({X}^{l}=left({x}_{1}^{l},ldots ,{x}_{n}^{l}right)), with ({x}_{i}^{l}in {{mathbb{R}}}^{F}) with the number of feature maps (F), the number of atoms (n) and the current interaction layer (l). We maintain a constant number of feature maps at (F=512) throughout the network, if not stated otherwise. The representation of atom (i) is initialized using an embedding depend on the atom type ({Z}_{i},{mathbb{in}},{mathbb{N}}):

$${x}_{i}^{0}={Embedding}left({Z}_{i}right)$$

(1)

The atom type embeddings are learned during training.

Interaction layer

Here we explain the interaction layer shown in Fig. 1c by implementing Eqs. (2)-(13). Within the interaction layer, an interaction block (as depicted in Fig. 1d) produces a correction (Delta X={Delta {x}_{i}}) that is used to update the input embedded feature (X=left{{x}_{i}right}). Formally, the update uses a residual connection inspired by ResNet⁵¹, where (l) is the layer index, and ({mathscr{N}}left(iright)) is the neighbor list of atom (i) that determines which edges and angles are involved in message passing.

$${x}_{i}^{l+1}={x}_{i}^{l}+varDelta {x}_{i}^{l}left({x}_{i}^{l},{mathscr{N}}(i)right)$$

(2)

The residual ({Delta x}_{i}^{l}) is computed through integrating information from edges and angles associated with ({mathscr{N}}left(iright)). For example, for the ({mathscr{N}}left(iright)) shown in Fig. 1a, one angular message is obtained from two edge vectors ({mathop{r}limits^{rightharpoonup}}_{ij}) and ({mathop{r}limits^{ rightharpoonup }}_{{ik}}) originating from node (i) to node (j) and (k) within one triplet ({ijk}), another two angular messages are from triplets ({ijm}) and ({imk}). Since each interaction layer has same structure, for clarity, we no longer indicate the layer index (l) unless necessary.

In one interaction block (as depicted in Fig. 1d), the atomistic representations obtained from the preceding layer (embedding block or previous interaction layer) are passed through a learnable MLP layer:

$${h}_{i}={MLP}left({x}_{i}right)$$

(3)

After updating the atomistic representations using Eq. (3), we concatenate the feature maps of the source and destination nodes of each edge (labeled as ({h}_{i}{||}{h}_{j}) where the symbol “(parallel)” denotes the concatenation of two vectors) to obtain vectors of dimension 2 F. Subsequently, we pass them through a linear layer to convert the feature size of each edge to dimension (beta F), and then reshape them into a matrix ({Phi }_{{ij}}in {{mathbb{R}}}^{Ftimes beta }) (Eq. (4)) and next perform matrix multiplication (Eq. (6)) with a vector ({q}_{{ij}}in {{mathbb{R}}}^{beta }) (Eq. (5)) which encodes the information of ({r}_{{ij}}) using Chebyshev polynomials. The outcome is an edge feature vector ({e}_{{ij}}in {{mathbb{R}}}^{F}) (Eq. (6)):

$${Phi }_{{ij}}={Reshape}left(Wleft({h}_{i}{rm{||}}{h}_{j}right)+bright)in {{mathbb{R}}}^{Ftimes beta }$$

(4)

$${q}_{{ij}}={psi }^{left(beta right)}left({r}_{{ij}}right)in {{mathbb{R}}}^{beta }$$

(5)

$${e}_{{ij}}={Phi }_{{ij}}{q}_{{ij}}in {{mathbb{R}}}^{F}$$

(6)

where (Win {{mathbb{R}}}^{beta Ftimes 2F}) (bin {{mathbb{R}}}^{beta F}).

The ({psi }^{left(beta right)}) in Eq. (5) is the continuous cutoff and Chebyshev polynomials-based radial basis function (“({CCCP}(beta ))” in Fig. 1d), denoted as

$${psi }^{(beta )}({r}_{ij})=left{begin{array}{ll}{C}^{(beta )}({r}_{ij}){({r}_{ij}-{r}_{cut})}^{2}, & {r}_{ij}, <, {r}_{cut}\ 0, & {r}_{ij}ge {r}_{cut}end{array}right.$$

(7)

Here ({C}^{left(beta right)}) are Chebyshev polynomials on the interval (left[{r}_{min },,{r}_{{cut}}right]), where ({r}_{min }) is the minimal distance between atoms and is usually set to zero, ({r}_{{cut}}) is the cutoff radius which is introduced to ensure a smooth behavior of MGNN when atoms leave or enter the interaction neighborhood. An illustration of the radial basis functions is given in Fig. S1 (Supporting Information S2).

Once the feature maps ({e}_{{ij}}) on the edge are obtained, we split its information (Eq. (8)) to prepare the update on the features of atoms and edges.

$${Delta}_{{ij}},,{f}_{{!ij}}={Split}left(W{e}_{{ij}}+bright)$$

(8)

where (Win {{mathbb{R}}}^{3Ftimes F}), (bin {{mathbb{R}}}^{3F}), ({Delta}_{{ij}},in {{mathbb{R}}}^{F}), and ({f}_{{ij}}in {{mathbb{R}}}^{2F}). The split information is used to compute the incremental node features as in Eq. (9).

$$varDelta {x}_{i}=Wleft(sum _{jin {mathscr{N}}(i)}sigma left({varDelta }_{ij}+{xi }_{ij}left({f}_{ij},{mathop{r}limits^{rightharpoonup }}_{ij},{mathscr{N}}(ij)right)right)right)+b$$

(9)

where (Win {{mathbb{R}}}^{Ftimes F}), (bin {{mathbb{R}}}^{F}), and ({xi }_{{ij}}) represents the moment interaction function, as in Eq. (10).

$${xi }_{{ij}}={W}^{{out}}left(mathop{sum}limits_{{ik}in {mathscr{N}}left({ij}right)}sigma left({W}^{{in}}left({m}_{{jik}}^{leftlangle 1rightrangle }parallel {m}_{{jik}}^{leftlangle 2rightrangle }right)+{b}^{{in}}right)right)+{b}^{{out}}$$

(10)

where ({W}^{{in}}in {{mathbb{R}}}^{Ftimes 2F}), ({b}^{{in}}in {{mathbb{R}}}^{F}), ({W}^{{out}}in {{mathbb{R}}}^{Ftimes F}), ({b}^{{out}}in {{mathbb{R}}}^{F}), the symbol “(parallel)” denotes the concatenation of two vectors. ({mathscr{N}}left({ij}right)) represents the set of neighboring edges of the edge ({ij}) that share (i) as a common vertex for all triplets, as shown in the example in Fig. 1b, where ({mathscr{N}}left({ij}right)={{ik},{im}}). Note that applying the non-linearity activation function before aggregation improve performance⁵².

In Eq. (10), the calculation of ({m}_{{jik}}^{leftlangle nu rightrangle }) (i.e., the message passed by triplet ({jik})) involves the moments that are the core part of our algorithm, as shown in Fig. 1b, e, which are calculated as

$${m}_{{jik}}^{leftlangle nu rightrangle }=sigma left(Wleft({f}_{{!ij}}^{leftlangle nu rightrangle }odot {f}_{{!ik}}^{leftlangle nu rightrangle }odot {M}_{{theta }_{{jik}}}^{leftlangle nu rightrangle }right)+bright),nu =1,2$$

(11)

where (Win {{mathbb{R}}}^{Ftimes F}), (bin {{mathbb{R}}}^{F}), and “(odot)” denotes element-wise product.

The ({M}_{{theta }_{{jik}}}^{leftlangle nu rightrangle }) in Eq. (11) is the rank (nu) moment of the triplet ({jik}), and is calculated as

$${M}_{{theta }_{{jik}}}^{leftlangle nu rightrangle }=leftlangle {hat{r}}_{{ij}}^{otimes nu },{hat{r}}_{{ik}}^{otimes nu }rightrangle ,nu =1,2$$

(12)

where ({hat{r}}_{{ij}}=frac{{mathop{r}limits^{ rightharpoonup }}_{{!ij}}}{{r}_{{ij}}}) is the unit vector of ({mathop{r}limits^{rightharpoonup}}_{{ij}}), ({hat{r}}_{{ij}}^{otimes nu }=underbrace{{hat{r}}_{{ij}}otimes cdots otimes {hat{r}}_{{ij}}}_{{nu ; times}}) is the Kronecker product of (nu) copies of the vector ({hat{r}}_{{ij}}in {{mathbb{R}}}^{3}) and is a tensor of rank (nu). Notation “({{langle }}cdot ,cdot {{rangle }})” signifies the contraction (product) of tensors, with (nu =1) corresponding to the vector dot product and (nu =2) corresponding to the Frobenius product of matrices. In other words, ({M}_{{theta }_{{jik}}}^{langle 1rangle }=langle {hat{r}}_{{ij}},{hat{r}}_{{ik}}rangle =Vert {hat{r}}_{{ij}}Vert times Vert {hat{r}}_{{ik}}Vert times cos (angle {hat{r}}_{{ij}},{hat{r}}_{{ik}})=cos (angle {hat{r}}_{{ij}},{hat{r}}_{{ik}})), and ({M}_{{theta }_{{jik}}}^{langle 2rangle }=langle {hat{r}}_{{ij}}^{otimes 2},{hat{r}}_{{ik}}^{otimes 2}rangle ={hat{r}}_{{ij}}^{otimes 2}:{hat{r}}_{{ik}}^{otimes 2}={cos }^{2}(angle {hat{r}}_{{ij}},{hat{r}}_{{ik}})), reflecting the angular relationship within the triplet ({jik}).

The ({f}_{{ij}}^{leftlangle nu rightrangle }) in Eq. (11) is calculated as

$${f}_{{ij}}^{leftlangle 1rightrangle },{f}_{{ij}}^{leftlangle 2rightrangle }={Split}left(sigma left({MLP}left({f}_{{ij}}right)right)right)$$

(13)

where ({f}_{{ij}}) is calculated using Eq. (8), and ({f}_{{ij}}^{leftlangle nu rightrangle }in {{mathbb{R}}}^{F},,nu =mathrm{1,2}). They serve as the coefficients of moments of edge ({ij}), which will be introduced in next subsection.

The rationale of moment interaction

We get motivation from the moment tensor descriptor⁴ for atom (i), which is defined as a summation from edge moments as ({sum }_{jin {mathscr{N}}left(iright)}{f}_{{ij}}^{leftlangle nu rightrangle }({r}_{{ij}}){mathop{r}limits^{ rightharpoonup }}_{{ij}}^{otimes nu }), which encapsulated radial part ({f}_{{ij}}^{leftlangle nu rightrangle }({r}_{{ij}})) and angular part ({mathop{r}limits^{ rightharpoonup }}_{{ij}}^{otimes nu }) containing angular information about the neighborhood ({mathscr{N}}left(iright)) and is tensor of rank (nu) ⁴. Our MGNN defines the moment of an edge ({ij}) as ({f}_{{ij}}^{leftlangle nu rightrangle }({r}_{{ij}},{z}_{i},{z}_{j}){hat{r}}_{{ij}}^{otimes nu }) to account for both ({r}_{{ij}}) and the charges ({z}_{i}) and ({z}_{j}), thereby enriching the function’s capacity to encapsulate the physicochemical context. Note that while ({f}_{{ij}}^{leftlangle nu rightrangle }({r}_{{ij}})) is originally a scalar function of ({r}_{{ij}}) in ref. ⁴, in MGNN ({f}_{{ij}}^{leftlangle nu rightrangle }({r}_{{ij}},{z}_{i},{z}_{j})) is approximated using neural network.

As the edge moments are contracted to formulate triplet information, edge features (i.e., ({f}_{{ij}}) in Eqs. (8), (9), (11), and (13)) play the role of coefficients to multiply with triplet moment (i.e.,({M}_{{theta }_{{jik}}}^{leftlangle nu rightrangle }) in Eq. (12)) to construct the message passed by triplet (i.e., ({m}_{{jik}}^{leftlangle nu rightrangle }) in Eq. (11)). Within a triplet (i.e., Fig. 1b) composed of nodes (i,j,k) and edges ({ij}), ({ik}), for (nu =1), ({hat{r}}_{{ij}}^{otimes nu }={hat{r}}_{{ij}}), so that ({M}_{{theta }_{{jik}}}^{leftlangle 1rightrangle }=langle {hat{r}}_{{ij}},{hat{r}}_{{ik}}rangle =cos (angle {hat{r}}_{{ij}},{hat{r}}_{{ik}})), and for (nu =2), ({hat{r}}_{{ij}}^{otimes nu }={hat{r}}_{{ij}}otimes {hat{r}}_{{ij}}), so that ({M}_{{theta }_{{jik}}}^{leftlangle 2rightrangle }=langle {hat{r}}_{{ij}}^{otimes 2},{hat{r}}_{{ik}}^{otimes 2}rangle ={cos }^{2}(angle {hat{r}}_{{ij}},{hat{r}}_{{ik}})) (i.e., the Frobenius product). The coefficients ({f}_{{ij}}^{leftlangle nu rightrangle }) and ({f}_{{ik}}^{leftlangle nu rightrangle }) for edges ({ij}) and ({ik}) are derived from Eq. (13).

As can be seen from Eq. (12), all the initial moment tensors are contracted to moment scalars (i.e., zero-order tensors) before the flow of message passing through triplets to edges to nodes, consuming less physical memory in computers. The outputs from message passing are also scalars. When the network is used to predict vectors or tensors, such as dipole moments or polarizabilities, these scalars and geometric information are then used as input formation in the output head to reconstruct vectors or tensors. These are fundamentally different from many existing equivariant models (i.e., GRACE⁵³, and CACE³⁹), where usually the higher-order tensor information is used throughout the entire information flow process.

Output head

When a scalar is needed, the last feedforward network transforms feature on each node into a scalar (Eq. (14)), as shown in Fig. 2a. We perform a sum aggregation over all nodes to predict scalar quantities like energy (Eq. (15)), as shown in Fig. 2b.

$${o}_{i}={MLP}left({x}_{i}^{L}right)$$

(14)

$$E=sum _{i}{o}_{i}$$

(15)

The force acting on atoms (i), ({mathop{F}limits^{ rightharpoonup }}_{i}), is computed using auto-differentiation according to its definition as the negative gradient of the total energy with respect to the position of atom (i):

$${mathop{F}limits^{ rightharpoonup }}_{i}left({Z}_{1},ldots ,{Z}_{n},{mathop{r}limits^{ rightharpoonup }}_{i},ldots ,{mathop{r}limits^{ rightharpoonup }}_{n}right)=-frac{partial E}{partial {mathop{r}limits^{ rightharpoonup }}_{i}}left({Z}_{1},ldots ,{Z}_{n},{mathop{r}limits^{ rightharpoonup }}_{i},ldots ,{mathop{r}limits^{ rightharpoonup }}_{n}right)$$

(16)

which gives an energy-conserving force field (Eq. (16)), as shown in Fig. 2b⁴⁴.

The molecular dipole moment (vec{mu }) is the response of the molecular energy to an electric field ({nabla }_{vec{F}}E) and, at the same time, the first moment of the electron density²⁶. It is often predicted using latent atomic charges, and in our model, we predict the dipole moment using Eq. (17) and the electronic spatial extent by Eq. (18), as shown in Fig. 2c, d respectively, consistent with PaiNN²⁶.

$$mathop{mu }limits^{ rightharpoonup }=sum _{i}{o}_{i}left({mathop{r}limits^{ rightharpoonup }}_{i}-mathop{c}limits^{ rightharpoonup }right)$$

(17)

$$leftlangle {R}^{2}rightrangle =sum _{i}{o}_{i}{left({||}{mathop{r}limits^{ rightharpoonup }}_{i}-mathop{c}limits^{ rightharpoonup }{||}right)}^{2}$$

(18)

This assumes the center of mass at (mathop{c}limits^{ rightharpoonup }) for brevity.

Specifically, when it is necessary to predict properties related to vectors or tensors, a gated equivariant block (for more information about rotational invariance and equivariance, please refer to Supporting Information S5)^26,54, as shown in Fig. 2f, is needed. If one needs to calculate the infrared and Raman spectra of a molecular system using deep learning potential functions, it is also necessary to compute the system’s dipole moment and polarizability. In practical experiments, we utilize Eq. (17) to predict the system’s dipole moment. However, to predict the molecular polarizability tensor, we require a vector feature to construct the tensor form. Therefore, we first initialize the node’s vector feature in the output head:

$${mathop{x}limits^{ rightharpoonup }}_{i}^{0}={sum }_{jin {mathscr{N}}left(iright)}sigma left(W{x}_{j}^{L}right)otimes {mathop{r}limits^{ rightharpoonup }}_{{ij}}$$

(19)

where (Win {{mathbb{R}}}^{Ftimes F}) is the learnable weight used to transform destination node feature to control how much the ({mathop{r}limits^{ rightharpoonup }}_{{ij}}) contributes to the initial node vectorial features. ({x}_{j}^{L}in {{mathbb{R}}}^{F}) is the destination node representation of the edge ({ij}), and the superscript (L) denotes it is the output of the last interaction layer. ({mathop{r}limits^{ rightharpoonup }}_{{ij}}in {{mathbb{R}}}^{3}) is the vector in Cartesian coordinates from source node to destination node. Then through Eq. (19) vectorial features ({mathop{x}limits^{ rightharpoonup }}_{i}^{0}in {{mathbb{R}}}^{Ftimes 3}) of atom (i) can be initialized, in which the superscript (0) denotes the initial vectorial features, as shown in Fig. 2e.

Considering the prediction of polarizability tensor (i.e., the response of the molecular dipole to an electric field²⁶), we construct polarizability tensors using

$$boldsymbol{alpha} =mathop{sum }limits_{i=1}^{N}{alpha }_{0}({x}_{i}^{L}){I}_{3}+mathop{nu}limits^{ rightharpoonup }left({mathop{x}limits^{ rightharpoonup }}_{i}right)otimes ({mathop{r}limits^{ rightharpoonup }}_{i}-mathop{c}limits^{ rightharpoonup })+({mathop{r}limits^{ rightharpoonup }}_{i}-mathop{c}limits^{ rightharpoonup})otimes mathop{nu }limits^{ rightharpoonup }left({mathop{x}limits^{ rightharpoonup }}_{i}right)$$

(20)

where ({I}_{3}) is the identical matrix with shape (3times 3), and (mathop{nu }limits^{ rightharpoonup }left({mathop{x}limits^{ rightharpoonup }}_{i}right)) transforms vector features (({mathop{x}limits^{ rightharpoonup }}_{i})) initialized by Eq. (19) and passing them through two gated equivariant blocks^26,54, each yielding atom-wise scalars and vectors. The first term describes isotropic, atom-wise polarizabilities. The other two terms are the anisotropic components of the polarizability tensor. The atom positions subtracting the center of mass ({mathop{r}limits^{ rightharpoonup }}_{i}-mathop{c}limits^{ rightharpoonup }) are used here to incorporate the global structure of the molecule²⁶. The calculation of polarizability tensor ({boldsymbol{alpha }}) is illustrated in Fig. 2f and the gated equivariant block in Fig. 2g.

We utilize the open-source deep learning frameworks PyTorch 2.0.1⁵⁰, SchNetPack 2.0⁵⁵, PyTorch Geometric 2.3.1⁵⁶, and PyTorch Lightning 2.1.3 for constructing and training MGNN (see more details in Supporting Information S3). Considering both moment orders (nu =1) and (nu =2), has better performance than considering (nu =1) alone (details in Supporting Information S4).

Benchmark

We show that our MGNN architecture can achieve many state-of-the-art results on QM9 dataset^41,42 and some state-of-the-art results on revised MD17 dataset⁴³, and MD17-ethanol⁴⁴. Furthermore, we validate the stability of MGNN on MD17-ethanol⁴⁴ and generalizability on 3BPA dataset⁴⁵ and computational efficiency on dataset 25-element high-entropy alloys⁵⁷. We refer the reader to the Supporting Information S3 for further training, data set and experimental details.

QM9

The QM9 dataset serves as a widely recognized benchmark for predicting various molecular properties at equilibrium^41,42. It comprises 130831 optimized structures with chemical elements (C, H, O, N, F) containing up to 9 heavy atoms (C, O, N, F). In our benchmarking of MGNN, we target 12 properties. To maintain consistency with prior research, MGNN is trained on 110000 examples, while 10000 molecules are reserved for validation purposes to regulate learning rate decay and enable early stopping. The remaining data is allocated as the test set, and results are averaged over three random splits. Table 1 presents a comparison with several state-of-the-art methods that learn the properties described above through a direct mapping from atomic coordinates and species. SchNet²⁸, Physnet³¹, Dimenet + +⁴⁰, Equiformer³⁶, PaiNN²⁶, Allegro¹², SphereNet⁵⁸, TorchMD-Net⁵⁹, and TensorNet³⁸ are included in Table 1 for reference. MGNN attains state-of-the-art performance in seven target properties and delivers comparable results in the remaining targets.

Table 1 MAE results on QM9 testing set for various chemical properties

Full size table

Revised MD17

In our evaluation on the revised MD17 dataset⁴³, which contains 100,000 recomputed structures of 10 molecules from original MD17 dataset^44,60,61, we assess MGNN’s capability to accurately learn the energies and forces of small molecules. This dataset comprises ten small, organic molecules, and we aim to predict both energy and forces on systems with fixed chemical composition exhibiting conformational changes. For each molecule, we allocate 950 configurations for training and 50 for validation, uniformly sampled from the entire dataset. Test error is evaluated on all remaining configurations in the dataset. A distinct model is trained for each trajectory. Table 2 presents a comparison with GAP², ACE⁵, GetNet-(T/D)²⁹, NequlP(l = 3)¹¹, Allegro¹², which are reported in ref. ¹², and TensorNet 3L³⁸, BOTNet⁶², and MACE⁶³ which are reported in ref. ³⁸. MGNN achieves state-of-the-art results for the molecules Ethanol, Benzene, and Uracil, and notably sets a new state-of-the-art for the energy MAE of Ethanol. Furthermore, in other molecular systems, MGNN surpasses previous descriptor-based machine learning interatomic potentials (GAP, ACE), invariant GNNs(GemNet-(T/Q)) or achieves comparable results to other equivariant GNNs(NequlP(l = 3), Allegro, TensorNet 3 L, BOTNet, MACE).

Table 2 MAE results on revised MD17 dataset for energy and force predictions in units of [meV] and [meV Å⁻¹], respectively

Full size table

MD17-ethanol

We also benchmark our MGNN in MD17-Ethanol Dataset⁴⁴, which is frequently used for stability test^39,64. Stability refers to the system’s ability to run sufficiently long trajectories while maintaining physically reasonable states. We followed the approach in ref. ³⁹ and used the ethanol molecules in MD17 dataset for training and validation. The models compared include Schnet²⁸, DimeNet³⁰, sGDML⁶⁵, PaiNN²⁶, SphereNet⁵⁸, NewtonNet³³, GemNet-T²⁹, NequIP¹¹, and CACE³⁹. MGNN-X means that X samples are used to train MGNN. Except for CACE and MGNN-1000, all models were trained using 10,000 configurations. The stability (S) in Table 3 measures how long MD simulations at 500 K remain physically plausible (without pathological behavior or entering physically prohibitive states) in MD runs within 300 ps limit⁶⁴. The stability values are sourced from the CACE³⁹ literature. We find that MGNN potential can remain stable for the full 300 ps period, demonstrating its superior stability.

Table 3 MAE of energy (E) and force (F) for ethanol in the MD-17 dataset (best in bold), in units of [meV] and [meV Å⁻¹], respectively

Full size table

We also produce state-of-the-art results on force and energy prediction. Compared with CACE, MGNN-1000 achieves similar energy errors, but with smaller force errors. Compared with NequIP, MGNN-1000 achieves similar energy and force errors, yet MGNN-10000 achieves even smaller energy and force errors.

25-element high-entropy alloys

The HEA25 dataset from ref. ⁴⁶ consists of 25,630 distorted crystalline structures containing 36 or 48 atoms on bcc or fcc lattices. We select a test set containing 1,000 configurations and use the remaining 24,630 data points for training and validation. The mean absolute errors (MAE) of energy and forces from MGNN for these tests are presented in Table 4. MGNN and PET⁵⁷ generalize significantly better than all other approaches. The learning curves from literature models³⁹ with respect to number of training samples are shown in Fig. 3, where MGNN is compared with other models, including a 3B model that has an atomic energy baseline and three-body terms, another 3B + NN model that further incorporates a full set of pair potentials and a non-linear term built on top of contracted power spectrum features⁴⁶, and CACE. Overall, CACE and MGNN show superior performance. When the training data size is larger than 2,000, MGNN performs better than CACE.

Table 4 Mean absolute error (MAE) of energy (E) and force (F) for the HEA25 dataset (best in bold), reported in units of [meV] and [meV Å⁻¹]

Full size table

**Fig. 3: Learning curve for different models.**

3BPA

The 3BPA dataset was introduced in ref. ⁴⁵ to evaluate the extrapolation capabilities towards other temperatures using machine-learned interatomic potentials. The training set consists of 500 configurations of the flexible drug-like molecule 3-(benzyloxy)pyridin-2-amine obtained from ab initio molecular dynamics at 300 K, while the test set includes a series of molecular dynamics simulations at 300 K, 600 K, and 1200 K. The root mean squared errors (RMSE) of energy and forces from MGNN for these tests are presented in Table 5, along with results from other state-of-the-art models. ACE^(g)⁵³ performs exceptionally well, achieving a significantly superior performance compared to other models. Models like MACE⁶³, NequIP¹¹, BOTNet⁶², and Allegro¹² yield similar results. CACE and MGNN exhibit comparable performance, with MGNN outperforming CACE at 300 K and 600 K, but underperforming at 1200 K. We attribute MGNN’s poorer performance at 1200 K to small training data size (e.g., Fig. 3).

Table 5 Energy and Force RMSE for the 3BPA temperature transferability dataset (best in bold), reported in units of [meV] and [meV Å⁻¹]

Full size table

Computational cost

In Table 6, we select models from the Matbench-discovery library⁶⁶ (e.g., Equiformer v2³⁷, GemNet-T²⁹, PaiNN²⁶, SchNet²⁸, SCN⁶⁷) for comparison with our MGNN. The “F” following MGNN represents the number of scalar features, and “L” represents the number of layers. We compare training times and inference times using the HEA25 dataset. With the exception of MGNN, these models use the default parameters from the Matbench-discovery library⁶⁶. We select the first 1000 data points from HEA25 without any shuffling for training and inferencing for speed test. The chosen batch size is 2 to ensure all selected models could run smoothly on our NVIDIA GeForce RTX 3090 (24 GB) without errors.

Table 6 Training time (ms) and inference time (ms) for energy and forces for single batch step for HEA25 dataset on an NVIDIA GeForce RTX 3090(24G)

Full size table

As shown in Table 6, for models with the similar number of parameters, MGNN training and inference speeds are slightly slower than those of GemNet. This is due to the additional processing on angle information for preparing moments, and parameter-learning during the triplet information propagation, both of which slightly reduce computational efficiency. However, MGNN clearly outperforms Equiformer v2 in terms of speed on the same device. This is likely because Equiformer v2 requires the calculation of spherical harmonics by Clebsch-Gordan coefficients, which involves a larger computational load. MGNN’s tensor computations are only performed in the first layer to produce contracted scalars, leading to higher computational efficiency.

Applications

We demonstrate MGNN’s advantage as molecular potential for predicting molecular dynamics simulations of amorphous electrolytes, and rapid simulation of molecular spectra in vacuum conditions for organic molecules. We refer the reader to the Supporting Information S3 for further training, data set, experimental details, and molecular dynamics simulation settings.

Li-ion diffusion in a phosphate electrolyte

Our investigation into the MGNN’s capabilities extends to the kinetic properties of Li-ion diffusion in Li₃PO₄ solid electrolyte, a material whose conductivity is intricately linked to its crystallinity. The Li₃PO₄ structure consists of 192 atoms. The simulations are conducted using a time step of 2 fs in the NVT ensemble, employing Nosé-Hoover thermostat⁶⁸. The dataset includes a 50 ps ab-initio molecular dynamic (AIMD) simulation in the molten liquid state as T = 3000 K, followed by a 50 ps AIMD simulation in the quenched state at T = 600 K. These trajectories are combined, and both the training set of 10,000 structures and the validation set of 1000 are randomly sampled from the combined dataset of 50,000 structures.

Note that the reference results for Allegro in Table 7 are sourced from their paper¹². The Ag model is trained and validated using 1000 structures, consist with ref. ¹², resulting in a final training performance with energy mean absolute error (MAE) of 0.168 meV atom⁻¹ and force component MAE of 9.5 meV Å⁻¹ on a test set comprising 159 structures. For both the Li₃PO₄ and Ag datasets, we have achieved competitive results comparable to Allegro.

Table 7 Energy and Force MAE for the Li₃PO₄ and Ag datasets (best in bold), reported in units of [meV atom⁻¹] and [meV Å⁻¹]

Full size table

When evaluating the Li₃PO₄ test set for the quenched amorphous state, which the simulation is conducted on, we obtained a MAE in the energies of 0.2 meV atom⁻¹, along with a MAE in the force components of 22.92 meV Å⁻¹. Subsequently, we conduct a series of ten MD simulations starting from the initial structure of the quenched AIMD simulation, each lasting 50 ps at T = 600 K in the quenched state, to scrutinize how well MGNN captures the structure and kinetics compares to AIMD. To assess the quality of the structure after the phase change, we compare the all-atom radial distribution functions (RDF) and the angular distribution functions (ADF) of the tetrahedral angle O-P-O (P central atom). As depicted in Fig. 4a, b, MGNN adeptly reproduces both distribution functions. Concerning ion transport kinetics, we evaluate how accurately MGNN models the Li mean-square-displacement (MSD) in the quenched state. Once more, we observe excellent agreement with AIMD, as shown in Fig. 4c. The structure of Li₃PO₄ is depicted in Fig. 4d. Reference data for the Li₃PO₄ is obtained from ref. ¹².

**Fig. 4: Structure properties of Li₃PO₄ and Li dynamic in Li₃PO₄.**

Molecular dynamics simulations are executed in LAMMPS⁶⁹ using the MGNN pair style, implemented in the SchNetPack 2.0⁵⁵ interface. Other details refer to Supporting Information S3.4. The RDF and ADF for Li₃PO₄ are computed with a maximum distance of 6 Å (RDF) and 2.5 Å (ADF). The simulation commences from the initial frame of the AIMD quenched simulation. RDF and ADF for MGNN are averaged over ten runs with different initial velocities, with the first 10 ps of the 50 ps simulation discarded in the RDF/ADF analysis to ensure equilibration is adequately accounted for.

Molecular spectra

MGNN is also employed to efficiently compute the infrared and Raman spectra of ethanol. Traditional methods, which might rely on the harmonic oscillator approximation from a singular molecular structure, often neglect critical phenomena such as the diversity of molecular conformations. To secure spectra of superior quality, it is imperative to undertake molecular dynamics simulations, which demand the prediction of forces at each timestep and the calculation of dipole moments and polarizabilities throughout the trajectory. The acquisition of infrared and Raman spectra through the Fourier transformation of time autocorrelation functions of dipole moments and polarizabilities respectively alleviates the complexity and computational intensity of using electronic structure methods. Given the necessity to account for nuclear quantum effects (NQE) for high-fidelity spectra, ring-polymer molecular dynamics (RPMD) simulations are employed, managing multiple replicas of the molecule simultaneously, thereby significantly increasing the computational burden²⁶.

A joint model has been trained on a dataset comprising 8000 conformations of ethanol, supplemented by an additional 1000 molecules each for validation and testing purposes. This model is tasked with the prediction of energies, forces, dipole moments, and polarizability tensors. Dipole moments and polarizability tensors are determined as described in Eqs. (17), (20), respectively. The unified model demonstrates precise predictions for energy, forces, dipole moments, and polarizabilities, as demonstrated in Table 8. Reference data for the ethanol molecule is sourced from ref. ²⁷, and the experimental spectra recorded in the gas phase are obtained from refs. ^70,71.

Table 8 Ethanol in vacuum results (best in bold)

Full size table

Results from Table 8 reveal that MGNN can acquire expressive atomic tensor embeddings, enabling simultaneous prediction of multiple molecular properties. Particularly noteworthy is MGNN’s superior performance, with energy and force errors approximately a factor of two and three smaller, respectively, compared to FieldSchNet²⁷ and PaiNN²⁶. Additionally, MGNN achieves smaller MAE in force component prediction compared to the equivariant neural network TensorNet³⁸, while attaining results comparable to TensorNet for dipole moments and polarizabilities.

The MGNN potential is then used for classical MD and RPMD⁷². While both methods can calculate the spectra, they focus on different aspects. The peak positions and intensities of the ethanol infrared spectra (Fig. 5a) computed using classical MD and MGNN closely align with the static electronic structure reference, particularly evident in the C-H and O-H stretching regions at 3000 cm⁻¹ and 3900 cm⁻¹. This indicates MGNN’s faithful reproduction of the original electronic structure method. However, in comparison to experimental data, both spectra exhibit shifts towards higher frequencies. In contrast, the MGNN RPMD spectrum demonstrates excellent agreement with experimental results, highlighting the significance of NQEs. Similar trends are observed for the ethanol Raman spectrum (Fig. 5b), where the MGNN RPMD spectrum once again provides the most faithful reproduction of experimental observations.

**Fig. 5: Infrared and Raman spectra of ethanol.**

More simulation details are shown in Supporting Information S3.6.

Discussion

In this work, we have introduced MGNN, a Graph Neural Network framework specifically designed for the rapid and accurate prediction of molecular properties. MGNN achieves multiple state-of-the-art results on the QM9 dataset and demonstrates comparable errors to the most recent state-of-the-art models on the revised MD17 dataset. On small molecule MD17-ethanol dataset for stability tests, MGNN shows lower energy errors, lower force errors, and highest stability value. On the 3BPA dataset, the generalization performance of MGNN is validated. On the HEA25 dataset, the learning and computational efficiency of MGNN is demonstrated. Finally, MGNN delivers remarkable consistency with ab-initio calculations, particularly in modeling the migration of lithium ions in electrolytes and in the calculation of infrared and Raman spectra of ethanol.

MGNN’s output module is adept at interfacing with both vector and tensor modules, which enables the model to tailor its output representation to the property being predicted. While our approach does not incorporate vector computations within the molecular representation learning process, the inclusion of these computations in the final output module significantly enhances the model’s computational efficiency. This is exemplified in our application of MGNN to predict vector and tensor features of molecules, such as the dipole moment and polarizability, which are crucial for computing the infrared and Raman spectra of molecules like ethanol in vacuum conditions.

Looking ahead, the potential applications of MGNN are vast and varied, extending to areas such as molecular generation and wavefunction prediction. The rapid and accurate prediction of molecular properties is contingent upon highly precise reference methods, which often come with a high computational cost. As such, machine learning models like MGNN stand to play a pivotal role in surmounting these challenges, thereby driving forward the frontier of computational chemistry and materials science research.

Methods

QM9

The QM9 data set⁴² consists of 130,831 optimized structures of molecules that contain up to 9 heavy elements from C, N, O and F. On top of the structures, several quantum-chemical properties computed at the DFT B3LYP/6-31 G(2df,p) level of theory are provided.

We use the MGNN model with a cutoff of 5 Å, F = 512, (beta) = 8, and 3 layers. More training details are provided in Supporting Information S3.2.