Accelerating crystal structure search through active learning with neural networks for rapid relaxations

Introduction
Novel crystal structures have allowed significant improvements across various research fields, for example, in the discovery of solar cells1,2, catalysts3,4, superconductors5, hardware components6,7 and batteries8,9. For a single crystal composition, a vast number of stable structures can be found with each having a unique set of physical properties, such as electrical conductivity, thermal conductivity, magnetism, and optical behavior. Stable crystal structures are local minima on the potential energy surface (PES) of the respective composition, and the number of possible stable structures increases exponentially with the number of atoms per cell10. As a result of the wide variety of possible stable structures and their physical properties, crystal structure search is an important challenge, with global energy optimization being the most fundamental task.
Many computational methods have been developed for crystal structure search and, in particular, global optimization. These approaches usually start from a pool of general candidate structures that are relaxed to local minima in the PES. During structure relaxation, atom positions and lattice vectors are adjusted such that a structure is moved toward its respective stable structure, resulting in a trajectory of intermediate structures for which, at every step, the energy, forces, and stress need to be evaluated. To reliably compute physical properties and identify all possible local minima in the PES of a crystal structure composition, accurate electronic structure computations such as density functional theory (DFT) are inevitable. The simplest crystal structure search method is ab initio random structure search (AIRSS)11,12,13, where the structures of the candidate pool are randomly selected for evaluation with structure relaxation. However, the calculation of physical properties for all intermediate structures is computationally expensive, making the high-throughput virtual screening (HTVS) of many candidate structures infeasible for large systems with many local minima. To reduce the need for DFT calculations required for structure relaxation, the number of trajectories can be reduced with iterative machine learning methods. Only the most promising candidates will be selected for relaxation, for example, using Bayesian optimization14,15. Alternatively, the computational cost can be reduced by transforming the structures in the candidate pool such that the probability for finding low-energy minima is increased. These approaches include evolutionary algorithms (EAs)16,17,18, basin hopping19, minima hopping20,21, particle swarm optimization22,23, and simulated annealing24,25. In addition, a combination of candidate selection and transformation of the candidate pool has been used for large-scale discovery of materials26. However, all these methods rely on computationally expensive full relaxation trajectories based on ab initio methods such as DFT, in order to find stable structures starting from general, usually non-stable candidates. As long as the full structure relaxation trajectories need to be computed, the potential for reducing computationally expensive DFT steps remains limited. To address this problem, a small number of promising methods have been developed that stop the relaxation of the structure in intermediate stages, using the energies, forces, and stress of those intermediate structures, combined with a scoring function27,28.
On the other hand, machine learning force fields (MLFFs)29,30,31,32 such as neural network models33,34,35,36,37,38,39,40,41,42,43,44 or kernel methods45,46, have shown to accurately learn the PES of molecules and materials based on a set of structures and labels for physical properties, while reducing computational cost by multiple orders of magnitude. When trained on a sufficiently large dataset, machine learning models have proven to be useful tools for structure relaxation47,48,49. However, in many applications, including crystal structure search, suitable datasets for training the MLFFs are often not available and need to be created. To assemble such datasets efficiently while avoiding faulty MLFF predictions for structures that are very different from the training data, active learning has been successfully applied to MLFF training50,51,52,53. There also exist some generative machine learning approaches that do not rely on the evaluation of candidate structures but instead use diffusion models54 that are trained on large databases of stable materials such as MaterialsProject55, the open quantum materials database (OQMD)56, AFLOWLIB.ORG57, and NOMAD58.
Iterative active learning has also been used to reduce costly DFT calculations in global optimization techniques like HTVS59,60,61 and EAs62,63,64. EAs start from a small population that gradually expands by iteratively adding a small number of child structures in the approximated region of interest. By focusing on creating structures in the target area, fewer structure relaxations need to be done at the cost of more iterations due to the gradual increase in population. For this reason, EAs often use lightweight models such as moment tensor potentials (MTPs)65, Gaussian processes regression (GPR)63,64 or smaller neural network architectures62, as these models can easily be retrained on small datasets. In contrast, HTVS starts from a very large candidate pool, and machine learning-based structure relaxation is used to narrow down the search space. Although less efficient with respect to the number of structure relaxations, expensive DFT computations for labeling training data can be done parallelized, and fewer retrainings of the MLFFs are needed, which enables to also use models such as state-of-the-art graph neural networks. So far, other neural network approaches without uncertainty estimation have been used in HTVSs of clusters59 and materials60, but learning interatomic forces with graph neural networks on relaxation trajectories of crystal structures has shown poor performance66. Furthermore, in uncertainty-driven HTVSs for global optimization within periodic systems, structure relaxation with MTPs61 and Monte Carlo sampling using ensembles of graph neural network energy predictors67 have proven effective.
In this work, we propose an iterative HTVS approach to global optimization in crystal structure search based on active learning and structure relaxations of large candidate pools with neural network MLFF ensembles. The key component of our method is a neural network ensemble that accelerates structure relaxations, selects new training data points toward the region of interest, finds low-energy clusters in the candidate pool, and finally provides a stopping criterion to measure convergence. Our process is initialized with unlabeled data, sampling a fixed number of structures for DFT labeling in each active learning cycle and we apply random perturbations to training structures to circumvent bad training performance on symmetric structures. Using ensembles of MLFFs to measure the uncertainty of predictions, which is needed for our active learning algorithm, does not pose any further limitations to the choice of model architecture and therefore allows for any model to be used, as long as it is able to accurately learn the PES of interest. Especially with the increasing availability of pre-trained models and datasets26,68, a flexible choice of model architectures allows one to start from trained models and, for this reason, possibly further reduces the need for expensive DFT calculations during active learning. Furthermore, our method enables straightforward parallelization of computationally demanding tasks, making it efficient for use in high-performance computing (HPCs).
We evaluate our method for global optimization of Si16, Na8Cl8, Ga8As8, and Al4O6. Here, we reduce the computational effort by up to two orders of magnitude compared to approaches without active learning. We show that the method is capable of finding multiple relevant local minima in the low-energy region of Si16 without a large increase in computational effort, compared to only finding the global energy minimum. Finally, we demonstrate the transferability of the neural network ensembles to relatively larger systems by training the models only on the smaller structures of Si16 and Al4O6 and using them for structure search of Si46 and Al16O24, respectively. This additionally diminishes the computational expense of DFT calculations throughout the active learning phase.
Results
Our approach is initialized with the generation of a pool of candidates, which are typically still far from their respective local minimum in the PES, and subsequently applies multiple rounds of active learning, which we will refer to as cycles, to find the global energy minimum. A schematic overview of the different stages of our method is shown in Fig. 1. Each cycle begins by sampling new training data based on a scoring function to specifically target the low-energy region of the PES. In the next step, we perform DFT computations to obtain energies, forces and stress for the sampled structures and update the training database accordingly. We then train an ensemble of neural network MLFFs on the updated training data and use the trained models for structure relaxation of all structures in the candidate pool until an uncertainty criterion is triggered for every structure, respectively. Finally, we define a convergence criterion based on low-energy clusters in the optimized candidate pool that are obtained by using the MLFFs. The cycle is repeated until the convergence criterion is fulfilled, and the low-energy structures are selected for further post-processing steps and validation with DFT. The following sections will provide a detailed description of the different stages that are used in our method.

Initially, a pool of candidate structures is generated based on random or symmetric generation. Next, the active learning cycle is started by the selection of promising data points and by training the neural network ensembles. These are then used to accelerate structure relaxations for all structures in the candidate pool. At the end of the active learning cycle, the neural network models propose the most promising clusters of low-energy structures, and a stopping criterion is applied. In the case of convergence, the most promising structures are further validated through structure relaxation with DFT.
Initial structure generation
Structure generation can either be done naively by generating an empty cell and filling it with atoms at random positions, which we call random generation, or by sampling a space group and generating a structure that satisfies the symmetry sampled with the use of PyXtal69, which we call symmetric generation. For both strategies, we use the structure generator as implemented in CrySPY70, and we show example structures for each method that have been rendered with VESTA71 in Fig. 2. To keep the initial candidates within a reasonable range, minimum interatomic distances and ranges for cell volumes are applied (see section “Methods: Constraints for initial structure generation”).

Structures on the left are generated with random generation and structures on the right are generated with symmetric generation.
We observe that candidates obtained by symmetric generation generally require fewer steps during structure optimization and are therefore well suited for classical crystal structure search algorithms. However, we also find that highly symmetric crystal structures provide challenging data points for training MLFFs, which will often lead to inaccurate predictions compared to models trained on structures without symmetries (see Supplementary Information A). In contrast, training MLFFs on structures that are created by random generation or by adding perturbations to symmetric structures66 significantly improves accuracy. This may be due to the fact that structural symmetry often leads to repeated local geometries within a single cell, causing some local atomic neighborhoods to be rotated copies of other local atomic neighborhoods, thus reducing variance in the training data and over-representing some local environments compared to others. The longer optimization trajectories of randomly generated structures make them poor candidates for classical crystal structure search algorithms. Furthermore, some symmetries might be hard to find based solely on fully random candidates.
As our method requires good candidates for discovering local minima through structure relaxation and for training neural network force fields, we generate candidates with both random and symmetric generation to bridge the gap. We use a split pool with randomly generated structures for training the MLFFs and symmetric structures that provide promising candidates to find local minima in the PES.
Neural network models
We use an SO(3)-equivariant neural network architecture as implemented in SchNetPack72 to learn the PES E(Z1, …, ZN, r1, …, rN) based on nuclear charges ({Z}_{i}in {mathbb{N}}) and positions ({{bf{r}}}_{i}in {{mathbb{Z}}}^{3}) with the atom index i. The neural network can be divided into two main components. First, for every atom i in the cell of N atoms, a message passing architecture computes an f-dimensional, roto-translational invariant feature representation ({{bf{x}}}_{theta ,i}in {{mathbb{R}}}^{f}), based on its local atomic environment. Subsequently, a multilayer perceptron reduces the dimensionality of the feature vectors xθ,i to a scalar Eθ,i, that is, the so-called local energy contribution, and aggregates these energy contributions of the local environments to obtain the total energy of the cell ({E}_{theta }=mathop{sum }nolimits_{i = 1}^{N}{E}_{theta ,i}). Response properties such as forces fθ and stress sθ are calculated as derivatives of the energy prediction with respect to atomic positions and cell parameters, respectively.
For active learning, it is crucial to find promising new data points for labeling and to filter out predictions with low accuracy, which is usually done based on uncertainty estimates of model predictions. Because uncertainty estimates cannot be easily derived from neural networks, we train an ensemble of independent models on the same dataset. The prediction of physical properties is easily obtained from the ensemble by using the mean predictions ({overline{E}}_{theta }), ({overline{{bf{f}}}}_{theta }) and ({overline{{bf{s}}}}_{theta }) and an uncertainty estimate can be defined based on the standard deviations σE,θ, σf,θ and σs,θ. Using ensemble prediction standard deviations to evaluate uncertainty does not constrain the choice of MLFF model architecture, permitting any MLFF for uncertainty estimation.
A detailed description of the model architecture and the training settings can be found in section “Methods: Neural network models.”
Clustering
In a large pool of candidates, many structures will, after a sufficiently large number of structure relaxation steps, converge to the same local minimum of the PES. To avoid costly redundant reference calculations and to identify local energy minima, clustering methods can find similar structures in the pool of candidates.
However, comparing the similarity between atomistic systems is not straightforward, and periodic boundary conditions (PBC) pose additional challenges, besides invariance against roto-translation and permutation of atoms. For example, duplicating the cell into any direction does not change any physical properties of the system, and small positional changes of atoms near the edges of the cell can result in displacement of the atom toward the opposite side of the cell. To address this, several descriptors73,74,75,76,77 have been introduced that map atomistic systems with an arbitrary number of atom types Zi and positions ri to a descriptor in a feature space with a fixed dimension. Because the trained neural network architectures allow to easily extract local roto-translational invariant descriptors xθ,i for every local atomic environment, we utilize these descriptors for detecting clusters of similar structures. Still, the local descriptors are not invariant to the permutation of atoms, which is why we apply a mean pooling over the atomic dimension i, resulting in a global descriptor xθ. Although global descriptors are straightforward to implement, computationally efficient, and yield good results in the evaluated systems, averaging over features leads to some loss of information. Thus, in some cases, this can reduce the quality of clustering, and the use of other methods, e.g., the Hungarian algorithm78,79 for comparing local descriptors becomes advantageous. A more comprehensive analysis is available in the Supplementary Information G.
After computing the global descriptors for all candidates in the pool, the descriptors are standardized to have the same mean and standard deviation in every dimension. In principle, the standardized global descriptors could be used with any clustering algorithm. In our problem setting, the number of local minima is unknown and many structures may be too far from their stable structure such that they should be regarded as outliers. To address these requirements, we use the density-based clustering algorithm DBSCAN80 as implemented in scikit-learn81, because it can treat un-relaxed structures as outliers and finds the required number of clusters automatically. However, DBSCAN necessitates specifying a density parameter. For all our experiments, we stick to a density parameter of 0.5 without further optimization, which is consistently effective across all evaluated systems.
Collecting training data
For the first cycle, no knowledge about the candidate structures is available except for cell shape, atom types and positions. For this reason, the first batch of candidate structures is selected at random. For all subsequent cycles, the property predictions ({overline{E}}_{theta }), ({overline{{bf{f}}}}_{theta }) and ({overline{{bf{s}}}}_{theta }) of the trained ensembles, as well as their uncertainty estimates σE,θ, σf,θ and σs,θ can be used to improve the selection of promising candidates. Traditionally, many active learning approaches have been designed for scenarios where the pool of unlabeled data points is fixed and uncertainty sampling82,83,84 has shown to provide good results to extract the most suitable candidates for computing labels. However, our scenario involves the relaxation of the candidate pool with the trained neural networks until an uncertainty threshold is reached. Therefore, all candidates have high uncertainty and we need a different selection criterion. To efficiently guide our algorithm toward finding low-energy minima, we select structures with a high potential for low relaxation energies. Although an abundance of selection strategies could be used to select the next batch of candidates, we use a fairly simple criterion based on the predicted energies ({overline{E}}_{theta }) and forces ({overline{{bf{f}}}}_{theta }). Our criterion is inspired by the LAQA27 selection method and assigns high scores Lθ,j to structures with low-energies ({overline{E}}_{theta }) and high forces ({overline{{bf{f}}}}_{theta }) according to equation (2).
The weighting parameter ωf can be used to find a trade-off between the exploitation of low-energy structures and the exploration of structures with a high potential for further optimization due to their large forces. After computing the selection scores for all structures in the candidate pool, we rank all structures accordingly. Because similar structures are assigned to similar selection scores, we detect clusters of similar structures with clustering, as described in section “Results: Clustering,” and only allow the structure with the highest score Lθ,j per cluster for selection. Now, the batch of candidates with the highest scores is selected for evaluation with single-point DFT calculations.
As already mentioned in section “Results: Initial structure generation,” highly symmetric structures lead to a possible reduction in the prediction accuracy of the trained MLFFs. Over the course of multiple cycles, structure relaxation will also increase the symmetry in the training structures that have been created through random generation and therefore make them less valuable training data points. To prevent this, we break the symmetry of the selected structures by adding Gaussian noise ϵ to the positions ri:
But because large perturbations disturb the stability of selected candidates, small values of interatomic forces fDFT are less likely, depending on the noise level. To allow for smaller forces in the dataset during subsequent cycles c of the method, the noise level is scheduled by a damping factor ωϵ according to equation (4).
Finally, the set of perturbed candidates is evaluated with DFT to compute the energies EDFT, the forces fDFT, and the stress sDFT. Details on the reference computations can be found in section “Methods: Reference computations.”
Structure relaxation
After retraining the MLFFs on the updated dataset, they are used for structure relaxation of the candidate pool. To reduce computational cost, we replace expensive DFT computations with fast MLFF predictions for energy, forces, and stress. We avoid relaxation steps in regions of the PES with large prediction errors by using an uncertainty criterion Uθ based on the mean force predictions ({overline{{bf{f}}}}_{theta }) and the standard deviation of the prediction σf,θ:
During structure relaxation, initial structures are transformed according to the derivatives of the PES with respect to positions and cell parameters, thus, the forces and stress values decrease during the process. Structures close to their local minima require very accurate predictions, whereas less accurate predictions are often sufficient for structures far from the local minima. To account for this behavior, we use a relative uncertainty criterion that allows larger absolute errors for structures with large force predictions. Although the stress computations ({overline{{bf{s}}}}_{theta }) also matter during structure relaxation with PBC, they are not considered in the uncertainty criterion. This is because the stress computations are directly dependent on accurate force predictions. Furthermore, we observe that stress predictions converge mainly during the early steps of structure relaxation, while forces are often optimized later in the process. Therefore, a relative uncertainty criterion for stress predictions would often yield very large values, although the models still predict meaningful values for the forces.
We optimize every structure in the pool of candidates until either the optimization has converged, the maximum number of steps has been reached, or the uncertainty criterion exceeds the threshold ({U}_{max }). Afterward, the original structure in the candidate pool is replaced with its optimized version. This drives the pool of candidates closer to their local minima with limited computational effort, while large inaccuracies of the neural network predictions are prevented by the applied uncertainty criterion.
Stopping criterion
After a number of cycles, structure relaxation with the MLFFs has decreased the energies of structures in the candidate pool to such an extent that they are close to their respective local minima in the PES. In a sufficiently large candidate pool, many structures converge to the same local minima so that they can be detected through clustering as it is described in section “Results: Clustering.”
To measure the convergence of our algorithm, we utilize this observation and compare low-energy clusters of the current candidate pool to the low-energy clusters that are found in candidate pools of previous cycles. That is, we track the candidate pools of the previous ncycle cycles and use the clustering algorithm with the most recent MLFF to obtain the respective clusters. We then define the cluster energies ({overline{E}}_{theta }^{l,c-m}), as the lowest energy that is present in one cluster as predicted by the MLFF and sort the clusters accordingly, with l being the sorted cluster index, c being the current cycle and m is an iterator over previous cycles. For a fixed cluster index l ∈ {1, …, k} we apply a threshold ϵ and all cluster energies of the m ∈ {1, …, ncycle} previous cycles obey the convergence criterion ({overline{E}}_{theta }^{l,c}-{overline{E}}_{theta }^{l,c-m} ,<, epsilon). If no improvement greater than ϵ is obtained for the k most stable clusters during the selected number of cycles, the method is terminated. However, if the convergence criterion is not met, the next cycle is started by selecting new structures for single-point computations. The number of stable clusters k, the number of previous cycles to track ncycle and the maximum energy difference ϵ are hyperparameters of our stopping criterion. An illustrative figure for the stopping criterion can be found in Supplementary Information H.
Post-processing
Once the stopping criterion has been triggered, we validate the top k stable structures. However, depending on the chemical composition, clustering can be inaccurate in some cases such that a local minimum can be split into sub-clusters, remains undetected, or two local minima could be merged into a single cluster, resulting in one minimum not being selected. For example in the compositions Si or GaAs, the global energy minimum and the second lowest local minimum are structurally very similar and only have a minor energy difference of about 0.01 eV/atom, and we observe that their clusters can falsely be merged into one cluster. But because the number of selected low-energy clusters is small, these mistakes can easily be detected. Merged clusters can be found, for example, by visually inspecting the clusters for dissimilar structures, using structure matching tools with a threshold value or by measuring energy differences within the cluster. Furthermore, by checking if the structures with the lowest energy predictions in the candidate pool are part of the selected low-energy clusters, we can account for undetected clusters or single low-energy structures.
Finally, after selecting the low-energy structures for validation, we confirm the energies of the respective local minima by structure relaxation with DFT. Structure relaxation with DFT involves multiple single-point DFT calculations, introducing significantly larger computational cost than the calculation of physical properties, as is done in section “Results: Collecting training data.”
Finding global energy minima
To evaluate the performance of our approach, we select a number of benchmark systems with well-known low-energy minima. We run simulations on Si16, Na8Cl8, Ga8As8 and Al4O6 and compare our findings with the baseline methods random search, Bayesian optimization, and LAQA as reported in ref. 27. The global energy minima that we target are the structures MP-149 for Si16, MP-2534 for Ga8As8, MP-22862 for Na8Cl8 and MP-1143 for Al4O6 that we obtain from the materials explorer of MaterialsProject55,68,85.
For every system, we generate a candidate pool containing 30 starting structures for all possible space groups that are valid for the respective composition as well as 1000 randomly generated structures. Only structures from the pool of initial candidates without any symmetry constraints are selected as data points for training. A detailed list of parameters for the DFT computations is shown in section “Methods: Reference computations” and the hyperparameters for the neural network ensembles are shown in section “Methods: Neural network models.” For the stopping criterion, we target the k = 3 lowest energy clusters and track the ncycle = 1 last cycles. All experiments are performed until the stopping criterion terminates the simulation.
To account for the stochasticity of our method, we perform three simulations for every composition and evaluate the total number of single-point DFT computations as well as the standard deviation which is shown in the error bars. Our analysis includes the number of computations that are used for labeling of training data, as well as the number of steps for validating the predicted local minima. Furthermore, it also includes unconverged DFT calculations that may occur due to exceeding the timeout limit or because the maximum number of convergence steps is reached. The results of the experiments are shown in Fig. 3.

The bar plot shows a comparison of DFT single-point evaluations required for finding the global energy minimum of different chemical compositions for random search, Bayesian optimization (BO), look ahead based on quadratic approximation (LAQA), and our method. Error bars show the standard deviation over a set of 3 simulations with equal settings.
For all systems tested, our algorithm requires a similar number of 400–700 single-point DFT evaluations until convergence. This stands in strong contrast to AIRSS, BO and LAQA, where the number of DFT computations highly depends on the composition of the selected material and, therefore, also on the number of possible local minima in the PES as well as the likelihood for the global energy minimum. Please note that we run our method until the stopping criterion is triggered, while the other methods only run until the targeted global minimum structure is observed. Since this cannot be done for compositions where the global minimum is unknown, the number of reference calculations required with the baseline methods should be interpreted as optimistic lower bounds. It remains unclear how many additional steps would be required until their stopping criterion is reached. In comparison to the baselines, we reduce the number of expensive DFT evaluations by up to two orders of magnitude for the most challenging systems Al4O6 and Ga8As8. We provide a detailed runtime analysis in Supplementary Information B, where we show that the additional computational cost of our approach due to training and utilizing the neural network ensembles is negligible compared to the DFT calculations. Except for one of the three simulations of Si16, our method reliably finds the global minimum based on clustering the candidate pool and without further post-processing steps. In this one case, where the clustering did not provide the global energy minimum, the two lowest energy clusters had been merged due to their similarity and could easily be separated by hand.
Finding multiple low-energy structures
In many cases, not only the structure at the global energy minimum but also other stable low-energy structures are of interest. Because our method performs structure optimization on all structures in the candidate pool, other local minima can also be found without large additional computational efforts. To investigate this, we continue with one simulation of Si16 from the previous section but change the parameter k that determines the number of low-energy clusters for monitoring from previously 3 to 10. Once the stopping criterion is triggered, we validate the 10 most promising low-energy clusters that are proposed by our algorithm through structure relaxation with DFT. We find that for this particular simulation, the stricter convergence criterion does not have any effect so the simulation does not perform any additional active learning cycles. This concludes a total number of 500 DFT calculations during the active learning cycles, 27 DFT steps to validate the lowest 3 energy clusters, and an additional 121 DFT calculations are required to validate the low-energy minima 4–10. Figure 4 compares the energies of structures proposed by our simulation before and after validation with the top 10 lowest energy structures for Si16 that are reported in MaterialsProject.

a DFT energies of the lowest 10 local minimum structures of Si16. Purple dots represent the energies of structures as they are proposed by our method after convergence and the blue dots show the energies of those structures after validation through structure relaxation with DFT. The dashed lines mark energy levels of known Si16 structures that have been reported in MaterialsProject and red circles are drawn for structures where our method matches one of the target structures. b Renders of Si16 local minima. Renders of structures that are found by our method and can be matched to structures in MaterialsProject (see red circles).
We observe that our simulation finds 10 local minima in the PES of Si16 with energies that are, on average, lower than the 10 lowest energy structures of MaterialsProject. The structure with the highest energy is still below the 7th lowest energy minimum of the target structures, and we are able to find 4 out of the 5 lowest energy structures of MaterialsProject. The other 6 structures found by our simulation are stable with regard to the DFT validation but are not reported in MaterialsProject. Figure 4b displays renders of our structures that correspond to those of MaterialsProject, and the full set of structures can be found in Supplementary Information C. Apart from one outlier, all structures proposed by our method are very close to the local minima based on validation through structure relaxation with DFT. To validate the structures, a median number of 10.5 steps is required per structure and the median energy difference between the structures before and after validation is 2.6 meV/atom. Additionally, to demonstrate how the structure search is improved with an increasing number of active learning cycles, we deactivate the stopping criterion and continue the simulation for 10 active learning cycles. The energies of the top 10 low-energy minima for all cycles are reported in Supplementary Information D.
Transferability to larger structures
The computational cost of DFT calculations increases substantially with growing system size. For complex systems with many atoms per periodic cell, even single-point evaluations for labeling training data can become computationally costly or infeasible. Because the MLFFs use local atomic environments to model the structures, they are by design also transferable to larger periodic cells. However, some local geometries of larger systems might not be captured by the training data and result in inaccurate predictions. To test the transferability of the MLFFs toward systems with more atoms per cell, we perform simulations on crystal structures of the two different chemical compositions Si and Al2O3.
For larger systems, the complexity of the PES and therefore also the number of possible local minimum structures significantly grows, such that the probability of a candidate structure for yielding the global minimum after relaxation is decreased. To account for this, we increase the sizes of the initial candidate pools compared to the experiments in section “Results: Finding global energy minima,” based on a rough estimate of the increasing difficulty of finding the global minimum. However, generating initial candidates is cheap, and during every cycle a fixed number is chosen for labeling, ensuring that only a small portion undergoes costly DFT calculations. For simulations on Si, we generate 100 symmetric structures per space group and for simulations on Al2O3, we generate 500 symmetric structures per space group. Accordingly, we also increase the randomly generated initial structures to guarantee a diverse subset of possible data points for training the MLFFs. Table 1 shows a detailed overview of the initial structures that have been generated for these simulations, as well as the number of DFT evaluations per active learning cycle.
As a target of our simulation in the PES of Si, we use the lowest energy structure MP-971662 that is reported in MaterialsProject for Si46 structures and we monitor the k = 3 lowest energy structures for measuring convergence. For our simulations on Al2O3, we target MP-2254 as the Al16O24 structure with the lowest energy reported in MaterialsProject and monitor the k = 10 lowest energy clusters to determine convergence. To account for stochasticity, we again perform three independent simulations for the baseline and transferability simulations of each chemical composition. We compare the number of DFT steps that have been performed during the active learning cycles as well as during the validation of the global energy minimum to estimate the reduction in computational effort due to training on smaller structures. Based on the average number of steps, the average computation times per step, and the number of 16 CPU cores that were used during the DFT computations, we estimate the total computation time of all DFT computations in CPU hours. The number of DFT steps and the computation times for all simulations of Si and Al2O3 are shown in Fig. 5 (left).

a Computational cost for global optimization simulations of Si and Al2O3 using transfer learning compared to a baseline without transfer learning. For Si, the transferability from the Si16 structures to the Si46 structures is tested, and for Al2O3, the transferability is tested from Al4O6 to Al16O24. The average number of DFT steps performed is shown by the blue bars, and the respective computation times are shown by the orange bars. The light-shaded bars represent the computational cost for validating the proposed global energy minimum, and the dark-shaded bars measure computational cost during the active learning cycles. Every simulation is performed three times, and the error bars show the standard deviation from the mean. b Comparison of energies for the top 10 low-energy clusters of Al16O24. All energies of the structures proposed by the MLFFs are reported after validation through DFT structure relaxation. Orange dots are structures suggested by a model that has only trained on Al4O6 structures (transfer) and blue dots are suggested by MLFFs that have been trained on Al16O24 (baseline). Dashed lines show the energy levels of the Al2O3 structures with lowest energy provided in MaterialsProject, whereas black lines show the structures with more than 10 atoms per cell. For all models, the structures are sorted by their energy above the hull.
The simulations on Si show that in both the transferability and the baseline simulation we are able to reliably find the target structure MP-971662. For both cases, this is achieved by selecting and validating the Si46 structure with the lowest energy prediction in the candidate pool after the stopping criterion has been triggered. It is worth noting that MP-971661, the lowest energy structure of Si23, is also found within the three lowest energy clusters of all simulations. However, for measuring the computation times for the Si simulations, we only count the validation of the global energy minimum. The transferability simulations require approximately 4–5 cycles until convergence with about 500 single-point DFT computations to find the global energy minimum, which is very similar to the baseline, but the computational cost is significantly smaller in the transferability simulations. We estimate the computation time per DFT step by averaging all DFT steps that have been done during the iterative cycles and during validation, respectively, and we clip the maximum computation time per DFT step at a timeout of 100 min because failed DFT calculations otherwise distort estimate of the mean. We further analyze the impact of timeout thresholds on the mean computation times in Supplementary Information E. Using 16 CPUs, we measure an average computation time of 36 min for labeling a Si16 structure and 65 min for labeling a Si46 structure during active learning cycles. During validation, the mean computation times per step are significantly lower at 28 min per step on Si46, because the structures are already close to their local minimum and therefore do not suffer from timeouts or non-converging computations. Multiplying the number of DFT steps by their average computation times yields a total computation time for all DFT steps of approximately 4322 CPU hours for the transferability simulation and approximately 8863 CPU hours for the baseline simulation, reducing the computational cost by 58%.
Transferability simulations in Al2O3 require approximately 1200 DFT calculations until the stopping criterion is triggered, which is slightly above the approximately 1000 steps that are performed in the baseline method. We measure an average computation time during the active learning cycles of 6 min per DFT calculation on Al4O6, 47 min per DFT calculation on Al16O24 and an average computation time of 19 min during validation so that the overall computational cost of all DFT calculations is reduced by approximately 74% due to transferability.
However, one out of three transferability simulations is not able to find the lowest energy structure MP-2254, while all baseline simulations do. Figure 5 (right) shows the energy levels of the top 10 low-energy minima and the corresponding simulations that have found the local minimum, as well as the target structures of MaterialsProject for Al2O3. For the sake of clarity, we show only local minima that have been discovered by at least two different simulations. The figure illustrates how all simulations find the structures MP-1143, MP-7048 and MP-776475, the second lowest energy minimum of Al16O24. The MP-2254 structure is found in all baseline simulations, but only two out of three transferability simulations are able to successfully discover it. Furthermore, we also discover several local minima that are not included in MaterialsProject, but these minima are usually not discovered by all simulations. Illustrations of all low-energy clusters from Fig. 5 (right) can be found in the Supplementary Information F.
Discussion
In this study, we introduced an iterative method for the efficient discovery of low-energy crystal structures based on active learning with ensembles of neural network MLFFs. We avoid up to 95% of the demanding DFT calculations in the relaxation trajectories by predicting the required physical properties with the trained MLFFs, compared to the baseline methods. Besides speeding up structure relaxations, the MLFFs are also used to select promising structures for active learning, to extract representations for clustering the pool of candidates, and to provide a stopping criterion for our algorithm. Thus, they enable automated, fast, and accurate global optimization of crystal structures. For efficient computations on HPCs, computationally demanding tasks can be easily parallelized. Since our uncertainty criterion is based on an MLFF ensemble, our method can be implemented with a plethora of state-of-the-art neural network architectures out of the box, providing the possibility for using pre-trained models or, eventually, foundation models that might arise in the near future. We show that training neural network models on symmetric crystal structures significantly reduces the prediction accuracy, ultimately leading to MLFFs that cannot be used for reliable structure relaxations. As remedies, we propose to create an additional training pool of randomly generated structures or to add small perturbations to selected structures before evaluation with DFT, which allows us to achieve the accuracy required for successful relaxation.
To evaluate the performance of our approach, we compared it with several approaches with full or partial structure relaxation trajectories on four different chemical compositions. Our simulations show that we reduce the computational cost for finding the global energy minimum by up to two orders of magnitude. Furthermore, because we simultaneously perform structure relaxation on the full candidate pool during every iteration, we demonstrate at the example of Si16 that our method can explore multiple low-energy structures while only adding minor computational effort. Finally, we show that transferability from smaller cells that are computationally less expensive to evaluate toward large, challenging cells is possible, and can further reduce the computational cost.
Because our method allows a wide range of model architectures to be used as MLFFs, the computational cost could further be reduced in future work by starting the training of the MLFFs from pre-trained models. Furthermore, extending our method with ML-based approaches for the initial candidate generation, such as diffusion models—which produce better initial candidates and therefore provide a stronger starting point for the method—presents another avenue to reduce the computational cost of DFT calculations for crystal structure search. Finally, the computational cost of screening a single composition could be significantly reduced by simultaneously searching for low-energy structures of multiple chemical compositions while sharing the MLFFs. Overall, we believe that our method will be broadly applicable as it constitutes a flexible framework for the accelerated exploration of crystal structures based on machine learning.
Methods
Neural network models
For all experiments, we use an ensemble of 5 SO(3)-equivariant neural networks based on spherical harmonics as implemented in SchnetPack72,86. After labeling new data points, the models are randomly initialized and trained independently on the updated database, excluding structures with forces that are larger than 3 standard deviations above the mean. Due to the comparably small number of training data, 2 interaction layers with a feature dimension of 64 are sufficient to accurately model the PES. Furthermore, we allow for spherical harmonics features up to degree ({l}_{max }=2) and use a cutoff of 7 Å. The models are trained using the Adam optimizer87 with a batch size of 10 and we use an initial learning rate of 0.0005. We monitor training progress on a 20% validation split and reduce the learning rate accordingly. To handle the different magnitudes of the predicted values, we scale the MSE-loss (see Equation (6)) with the weighting factors ωE = 0.0001, ωf = 0.9999, and ωs = 300 for energy, forces, and stress, respectively.
Reference computations
All DFT computations are performed with the projector-augmented wave (PAW) method88,89, using the implementation of Quantum Espresso90,91. We use pseudopotentials from PSlibrary92, and the generalized gradient approximation (GGA) by Perdew, Burke, and Ernzerhof (PBE)93 is employed for the exchange-correlation functional. We use plane-wave cutoffs of 58 Ry, 66 Ry, 259 Ry and 323 Ry for Si, NaCl, GaAs and Al2O3, respectively. The k-point mesh is dynamically generated with a mesh interval of 0.2 Å−1. Non-converging calculations are stopped with a timeout of 2 h during all baseline simulations and 5 h during the transferability simulations.
Structure relaxation
For all structure relaxations, we use the ASE94 implementation of the L-BFGS95 optimizer with a force convergence threshold of 0.05 eV/Å. Furthermore, we simultaneously update the cell shape and the atomic positions during each step. Structure relaxations based on property predictions of MLFFs use, in addition to the convergence threshold, an uncertainty stopping threshold according to equation (5) of Uθ = 0.5. To avoid large relaxation trajectories due to inaccuracies of the MLFFs, we stop structure relaxation if the energy has not decreased for more than 20 subsequent steps or if a maximum number of 300 steps is exceeded.
Constraints for initial structure generation
To ensure that the initial candidate structures remain within an acceptable range and to minimize the number of DFT calculations that do not converge, we establish constraints during the generation of structures. If a structure violates any constraint, it is not added to the pool of candidates and a new structure is generated. We specify dab as the shortest permissible distance between two atoms of types a and b, and the lengths of the lattice vectors are restricted to lie between ({l}_{min }) and ({l}_{max }). We report the constraints that have been used during our simulations in Table 2. We observe that during the baseline simulations in section “Results: Transferability to larger structures,” none of the DFT calculations for the initial cycle of Al16O24 structures converges with the default constraints of Al4O6 from section “Results: Finding global energy minima,” which is why we update the constraints for Al2O3 for the transferability simulations.
Responses