Constructing multicomponent cluster expansions with machine-learning and chemical embedding

Constructing multicomponent cluster expansions with machine-learning and chemical embedding

Introduction

The cluster expansion (CE)1,2 is a versatile tool to model atomistic interactions across several material classes. On-lattice CE models are routinely used to compute order-disorder3,4,5,6, vibrational7,8,9,10 and magnetic11,12,13,14 thermodynamics of multicomponent materials. Though CE models primarily serve as surrogates for formation energies of atomic configurations, the method has been extended to compute tensorial properties of materials15, energies of defects16,17, and to parameterize effective Hamiltonians that couple several microscopic degrees of freedom18.

Cluster expansion models are generally trained on zero Kelvin data, such as the formation energies of a set of orderings computed with density functional theory (DFT). CE models are then used together with statistical mechanics techniques to compute finite temperature properties of materials. Within the CE formalism, the formation energy (or other material property), is expanded as a linear series of cluster basis functions multiplied with expansion coefficients. Researchers have made significant strides towards simplifying the parametrization of CE models from a limited pool of first-principles calculations. For instance, predictive CE models can be obtained by regressing against cluster functions chosen from genetic algorithms19. Conventional data science techniques such as weighting20 and cross-validation21 are reported to improve the predictive power of these lattice models. More recently, regularization and cross-validation have been employed to obtain sparse cluster expansions22,23,24,25,26,27,28. Efforts have also been made towards choosing training datasets that minimize the number of expensive electronic structure calculations and in characterizing errors in finite temperature properties with Bayesian techniques29,30,31,32,33.

Effective Hamiltonians based on the CE have provided critical insights into the behavior of structural3,34,35,36, catalytic37,38, electrochemical39,40,41,42, thermoelectric6 and semiconductor8,43 materials. However, the use of CE models is usually limited to alloys containing ≈3-4 chemical species. Although there are no theoretical limitations to applying CE models to multicomponent materials, several practical difficulties arise when parameterizing and deploying multi-element CE. For instance, the number of fitting coefficients rises polynomially with the number of chemical species. As a result, parameterizing CE models for chemically-complex alloys requires large training datasets and is often computationally expensive to coarse-grain with statistical mechanics techniques. CE models have found limited applicability in predicting the finite-temperature properties of multi-principal element alloys. Alloys containing mixtures of several elements, also referred to as high-entropy alloys, are attractive candidates for high-performance structural materials44,45,46,47, energy storage48 and catalytic applications49. Accurate atomistic models are crucial to enabling the design of the next generation of high-performance multicomponent materials.

Inspired by recent efforts in chemical dimensionality reduction, we describe a formalism to build on-lattice cluster expansion models for alloys containing several elements or site degrees of freedom. The embedded cluster expansion (eCE), employs machine-learning techniques to simultaneously learn an embedding of site basis functions within a lower dimensional space, and the weights of an energy model. Symmetrized cluster functions constructed from the transformed site basis functions are used to compute the energy of a configuration. We apply the eCE formalism to build a formation energy cluster expansion for 6-component mixtures of elements in groups 5 and 6. Our results show that eCE models accurately predict formation energies in the complex alloy. Chemical trends in the alloying elements, such as the similarities between elements of the same group are naturally learnt by the model based on a small pool of electronic structure calculations. Site basis functions embedded in a three-dimensional space with the eCE model are sufficient to reproduce the energetics of the senary alloy to within 4 meV/atom. As the eCE models learn chemical similarities based on the electronic structure calculations, they are also able to extrapolate into chemical spaces that have not been sampled. As a proof of concept, we compare finite-temperature predictions of short-range order in a binary Cr-W alloy to a conventional CE and employ our new eCE framework to predict SRO in the equiatomic senary alloy.

Results

We begin by reviewing key aspects of the CE formalism, following which we describe a toy model to illustrate the embedded cluster expansion (eCE) before presenting the general eCE method. The eCE model is then applied to model the thermodynamics of a 6-component V-Nb-Ta-Cr-Mo-W refractory alloy.

On-lattice cluster expansions

Consider a crystal with N sites where each site, i, can be occupied by c chemical species (denoted ϵ1, ϵ2, , ϵc). In general, there are cN distinct arrangements of the c elements over the N sites. Any chemical decoration may be represented as a vector of occupation variables, (vec{sigma }=[{sigma }_{1},{sigma }_{2},cdots ,,{sigma }_{N}]). The occupation variable, σi, takes a unique value for each element ϵl. For instance σi = l if ϵl occupies site i. The site basis functions at site i can be represented as:

$$vec{varphi }({sigma }_{i})={[{varphi }_{1}({sigma }_{i}),{varphi }_{2}({sigma }_{i}),cdots ,{varphi }_{c}({sigma }_{i})]}^{T}$$
(1)

where (vec{varphi }({sigma }_{i})) is a vector containing c site basis functions. To ensure completeness of the cluster expansion, the site functions, {φ1, φ2, , φc}, must be linearly independent. Common choices for site basis functions include Chebychev polynomials1, occupation or indicator functions2, and trigonometric or sinusoidal functions50. Cluster functions are then constructed by taking products of site basis functions across all N sites of the crystal:

$${Phi }_{vec{alpha }}(vec{sigma })=mathop{prod}limits_{(i,nu )in vec{alpha }}{varphi }_{nu }({sigma }_{i})$$
(2)

(vec{alpha }) is a list of size N with each entry being a tuple that contains the site index i and the site function index ν. Every cluster function is a product of N site basis functions. It is usually convenient to choose site basis functions such that one of the functions is constant, i.e. ϕ1(σi) = 1, σi. This allows for the construction of a hierarchical cluster expansion. For example, if all site basis functions are chosen to be 1 except that of the first site, the corresponding cluster function is given by Φ1 = ϕ2(σ1) × 1 × 1 × × 1 = ϕ2(σ1). Φ1 is the cluster function associated with a point cluster located at site 1. Similarly, choosing all site functions to be 1 except for two sites will result in a cluster function that represents a pair cluster.

Any scalar property, such as the formation energy of a configuration, is given by1,2:

$$begin{array}{ll}E(vec{sigma }),=,mathop{sum}limits_{vec{alpha }in Lambda }{tilde{J}}_{vec{alpha }}{Phi }_{vec{alpha }}(vec{sigma })\qquadquad ,=,{tilde{J}}_{0}+mathop{sum}limits_{vec{alpha }in {Lambda }_{point}}{tilde{J}}_{vec{alpha }}{Phi }_{vec{alpha }}(vec{sigma })\qquadquad ,+mathop{sum}limits_{vec{beta }in {Lambda }_{pair}}{tilde{J}}_{vec{beta }}{Phi }_{vec{beta }}(vec{sigma })+cdots end{array}$$
(3)

where (E(vec{sigma })) is the formation energy of configuration (vec{sigma }), ({tilde{J}}_{vec{alpha }}), are effective cluster interations (ECI) for cluster (vec{alpha }), (Lambda ={{vec{alpha }}_{1},{vec{alpha }}_{2},cdots ,}) is the set of all clusters in the crystal, Λpoint is the set of point clusters, and Λpair is the set of pair clusters. Choosing the first site basis functions to be constant partitions the total energy of a crystal into contributions arising from points, pairs, triplets etc.

Symmetry reduces the number of independent expansion coefficients in eq. (3). If cluster (vec{alpha }) can be transformed to another cluster (vec{lambda }) by a symmetry operation of the undecorated crystal, the interaction coefficients for both functions must be equal, i.e. ({tilde{J}}_{vec{alpha} }={tilde{J}}_{vec{lambda} }). All symmetrically equivalent cluster functions can be collected together in an orbit, denoted as ({Omega }_{vec{alpha }}={vec{alpha },vec{lambda },cdots ,}). (vec{alpha }) refers to a prototype cluster function that represents the entire orbit. The symmetrized cluster expansion contains energy contributions from each orbit:

$$E(vec{sigma })=mathop{sum}limits_{{Omega }_{vec{alpha }}in Lambda }{tilde{J}}_{{Omega }_{vec{alpha }}}mathop{sum}limits_{vec{beta }in {Omega }_{alpha }}{Phi }_{vec{beta }}(vec{sigma })$$
(4)

where (Lambda ={{Omega }_{vec{alpha }},{Omega }_{vec{gamma }},cdots ,}) is the set of all orbits. The expansion of eq. (4) can be further partitioned into energy contributions arising from each site in the crystal51:

$$E(vec{sigma })=mathop{sum}limits_{i=1}^{N}{E}_{i}(vec{sigma })=mathop{sum}limits_{i=1}^{N}mathop{sum}limits_{{Omega }_{vec{alpha }}^{i}in {Lambda }^{i}}frac{{tilde{J}}_{{Omega }_{vec{alpha }}}}{| vec{alpha }| }mathop{sum}limits_{vec{delta }in {Omega }_{vec{alpha }}^{i}}{Phi }_{vec{delta }}(vec{sigma })$$
(5)

Ei is the energy contributed by site i to the total energy of the crystal, Λi is the set of all clusters radiating from site i, ({Omega }_{vec{alpha }}^{i}) is the orbit of cluster function (vec{alpha }) centered around site i and (| vec{alpha }|) is the number of sites in the cluster. The additional factor (frac{{tilde{J}}_{{Omega }_{vec{alpha }}}}{| vec{alpha }| }) arises due to overcounting of each cluster in the site-centric cluster expansion. For simplicity of notation we will define ({J}_{{Omega }_{vec{alpha }}}=frac{{tilde{J}}_{{Omega }_{vec{alpha }}}}{| vec{alpha }| }). The symmetrized cluster functions ({Theta }_{{Omega }_{vec{alpha }}^{i}}=mathop{sum}nolimits_{vec{delta }in {Omega }_{vec{alpha }}^{i}}{Phi }_{vec{delta }}(vec{sigma })) are site-centric descriptors that can distinguish between all symmetrically distinct neighborhoods around site i. Figure 1 schematically shows the symmetrically equivalent sets of nearest neighbor and next-nearest neighbor pair clusters on a triangular lattice. The symmetrized site-centric descriptors can serve as inputs to any regression model that parameterizes the site energy Ei51:

$$E(vec{sigma })=mathop{sum}limits_{i=1}^{N}{E}_{i}({{Theta }_{{Omega }_{vec{alpha }}},{Theta }_{{Omega }_{vec{beta }}},cdots ,})$$
(6)

Although there are an infinite number of site-centric descriptors, in practice, the number of cluster functions must be truncated. Linear CE models typically enumerate cluster functions up to a maximal cluster size and cluster radius before fitting the CE model with techniques such as compressed sensing, genetic algorithms etc. Non-linear CE models may be advantageous as they have been found to converge at significantly smaller cluster sizes than linear models51.

Fig. 1: Schematic illustration of a triangular lattice with two symmetrically distinct pair clusters.
Constructing multicomponent cluster expansions with machine-learning and chemical embedding

The central site (shown in red) has six symmetrically equivalent nearest neighbor pair clusters marked in purple. Next-nearest neighbor pair clusters are marked in green. All pair clusters shown in a single color belong to the same orbit.

Full size image

Though exact and complete, the cluster expansion formalism is not easily amenable to capture atomistic interactions in multicomponent alloys. The number of cluster functions in a cluster containing k sites with c chemical species scales polynomially as (c−1)k. As shown in Fig. 2, in alloys with over 5 chemical species, the number of features can exceed a thousand even for relatively small cluster sizes. The rapid growth in the number of cluster functions with chemical complexity is thus a major impediment to parameterizing accurate multicomponent cluster expansions.

Fig. 2: Polynomial increase in the number of features with the number of alloying elements.
figure 2

Variation in the number of symmetrically distinct cluster functions with the number of unique clusters in multicomponent alloys containing between 2 and 10 alloying elements. Clusters are enumerated on a bcc crystal structure with a maximum size of 10Å for the pair clusters and 7Å for the triplet clusters.

Full size image

Encoding chemical similarity through a linear transformation of site basis functions

Multicomponent alloys often exhibit additional degeneracies that arise due to chemical similarities between elements. This is typically manifested as relationships between interaction coefficients of the multicomponent CE. For instance, consider a ternary alloy with three chemical elements A, B and C that can occupy each site of the triangular lattice (Fig. 1). If the elements B and C are chemically similar, we would expect that the ECI of cluster functions that involve either the B or C element are related. This is readily seen in a CE that employs the occupation basis. Occupation site basis functions adopt the following values:

$${{boldsymbol{varphi }}}=left[begin{array}{ccc}{varphi }_{1}(A)&{varphi }_{1}(B)&{varphi }_{1}(C)\ {varphi }_{2}(A)&{varphi }_{2}(B)&{varphi }_{2}(C)\ {varphi }_{3}(A)&{varphi }_{3}(B)&{varphi }_{3}(C)end{array}right]=left[begin{array}{ccc}1&1&1\ 0&1&0\ 0&0&1end{array}right]$$
(7)

The matrix φ contains the value of the three site basis functions φ1, φ2, φ3. The columns of the matrix correspond to the values of the basis functions if either A, B or C occupies the site. Assuming that nearest neighbor pair interactions are sufficient to describe the ordering energies in the ternary alloy, the formation energy of a configuration is given by:

$$begin{array}{ll}E(vec{sigma }),=,N{J}_{0}+{J}_{B}mathop{sum}limits_{i}{varphi }_{2}({sigma }_{i})+{J}_{C}mathop{sum}limits_{i}{varphi }_{3}({sigma }_{i})\ qquadqquad,+,{J}_{BB}^{NN}mathop{sum}limits_{i,jin NN}{varphi }_{2}({sigma }_{i}){varphi }_{2}({sigma }_{j})\ qquadqquad,+,{J}_{CC}^{NN}mathop{sum}limits_{i,jin NN}{varphi }_{3}({sigma }_{i}){varphi }_{3}({sigma }_{j})\ qquadqquad,+,{J}_{BC}^{NN}mathop{sum}limits_{i,jin NN}({varphi }_{2}({sigma }_{i}){varphi }_{3}({sigma }_{j})+{varphi }_{2}({sigma }_{j}){varphi }_{3}({sigma }_{i}))end{array}$$
(8)

where J0 is the energy of the empty cluster, JB, JC are the point energies of the B and C chemical elements, ({J}_{BB}^{NN},{J}_{CC}^{NN},{J}_{BC}^{NN}) are the pair energies of a B-B, C-C and B-C pair respectively. The chemical similarity of B and C should manifest in the energy expansion as equalities between the interaction coefficients. Specifically, JB = JC and ({J}_{BB}^{NN}={J}_{CC}^{NN}={J}_{BC}^{NN}). In practice, the relationships between ECI are learnt based on a training dataset of electronic structure calculations, but are never exploited.

The degeneracies between ECI for this system can be used to learn a simpler CE. Consider the following linear transformation of the site basis functions of eq. (7):

$$begin{array}{lll}qquadqquad{{mathcal{T}}}vec{varphi }({sigma }_{i})qquadquadqquadquad=,vec{tilde{varphi }}({sigma }_{i})\ left[begin{array}{l}1quad0quad0\ 0quad1quad1end{array}right]left[begin{array}{l}{varphi }_{1}({sigma }_{i})=1\ quad{varphi }_{2}({sigma }_{i})\quad {varphi }_{3}({sigma }_{i})end{array}right],=,left[begin{array}{l}qquadquad1\ {varphi }_{2}({sigma }_{i})+{varphi }_{3}({sigma }_{i})end{array}right]end{array}$$
(9)

where the original site basis functions, (vec{varphi }({sigma }_{i})), have been transformed by the matrix ({{mathcal{T}}}), into a set of new site basis functions (vec{tilde{varphi }}({sigma }_{i})). The transformed site basis functions, (vec{tilde{varphi }}({sigma }_{i})) span a two-dimensional sub-space within the three-dimensional space of the original site functions. The projected space is embedded as shown in Fig. 3. The new site functions are not mathematically complete and evaluate to identical values for both the B and C specie:

$$tilde{{{boldsymbol{varphi }}}}=left[begin{array}{ccc}{tilde{varphi }}_{1}(A)&{tilde{varphi }}_{1}(B)&{tilde{varphi }}_{1}(C)\ {tilde{varphi }}_{2}(A)&{tilde{varphi }}_{2}(B)&{tilde{varphi }}_{2}(C)end{array}right]=left[begin{array}{ccc}1&1&1\ 0&1&1end{array}right]$$
(10)

In the site basis function space of (vec{tilde{varphi }}), it is impossible to distinguish between the B and C specie. However, a CE constructed through tensor products of the transformed site basis functions, (vec{tilde{phi }}) can reproduce the ordering energies of the three chemical species on a triangular lattice:

$$begin{array}{ll}E(vec{sigma }),=,N{J}_{0}+{J}_{B}mathop{sum}limits_{i=1}^{N}{tilde{varphi }}_{2}({sigma }_{i})\ qquadqquad,,+,{J}_{BB}^{NN}mathop{sum}limits_{i,jin NN}^{N}{tilde{varphi }}_{2}({sigma }_{i}){tilde{varphi }}_{2}({sigma }_{j})end{array}$$
(11)

As the transformed site basis functions evaluate to identical values for both the B and C specie, the CE of Eq. (11) will compute the exact same energy for orderings where B atoms are replaced with C or vice-versa. Embedding the “chemical similarity” of elements B and C into the site basis functions allowed us to reduce the number of cluster basis functions from 6 to 3. Though chemical rules are often not known a-priori, they can be simultaneously learnt together with the interaction coefficients based on a small pool of electronic structure calculations.

Fig. 3: Effect of the embedding matrix on site basis functions.
figure 3

The depicted embedding matrix corresponds to the transformation in eq. (9). The values of the site basis functions evaluated for A, B and C are shown as solid circles. The original 3-dimensional space is transformed into a two-dimensional sub-space (shown in red). The values of the site basis functions, evaluated in the transformed basis for the B and C chemical specie are the same. Site basis function values for the A specie remain unchanged.

Full size image

Embedded cluster expansions (eCE)

In general, any set of linearly independent and complete site basis functions can be embedded in a lower dimensional space through a linear transformation:

$$vec{tilde{varphi }}({sigma }_{i})={{mathcal{T}}}vec{varphi }({sigma }_{i})$$
(12)

(vec{varphi }({sigma }_{i})) is a vector of size c × 1, the transformed site basis (vec{tilde{varphi }}({sigma }_{i})) is a vector of size k × 1 and the transformation ({{mathcal{T}}}) is a matrix of size k × c, where k ≤ c and the rank of the transformation matrix is k. The elements of ({{mathcal{T}}}) linearly mix the site functions of (vec{varphi }) to obtain a lower-dimensional site basis. Maintaining the hierarchy of the cluster expansion requires that the constant site basis function remains in the transformed site basis. This can be enforced by fixing the first row and column of ({{mathcal{T}}}) to [1, 0, 0 ]. The remaining entries of the transformation are learnable parameters of the model.

Cluster functions at each site are computed from the transformed site basis functions similar to eq. (2) and symmetrized as detailed in eq. (5):

$${tilde{Theta }}_{{Omega }_{alpha }^{i}}=mathop{sum}limits_{vec{delta }in {Omega }_{alpha }^{i}}mathop{prod}limits_{(j,nu )in vec{delta }}{tilde{varphi }}_{nu }({sigma }_{j})$$
(13)

Site-centric energies can then be expanded in terms of the symmetrized cluster functions:

$$E(vec{sigma })=mathop{sum}limits_{i=1}^{N}{E}_{i}({{tilde{Theta }}_{{Omega }_{alpha }^{i}},cdots ,})$$
(14)

The symmetrized cluster functions are invariant under all symmetry operations of the disordered phase51 and can be used as inputs to any regression method to parameterize Ei.

We refer to the CE formalism that embeds the site basis functions in a lower dimensional space as embedded cluster expansions or eCE. Throughout this study, we will indicate the number of effective chemical elements, i.e. the number of rows in ({{mathcal{T}}}) with a number. For instance, 2-eCE refers to a model with the site functions embedded in a two-dimensional space. Though the model is mathematically incomplete, exploiting chemical similarities between the elements significantly reduces the complexity of the energy expansion and results in fewer site-centric descriptors. For instance, 2-eCE models have an identical number of descriptors as a binary cluster expansion. A 2-eCE model of a 6-component alloy with all clusters shown in Fig. 2 will require ≈101 features, while the exact CE will contain ≈103 descriptors.

The elements of the transformation matrix and regression coefficients for the energy model can be simultaneously learnt by minimizing a loss function through gradient descent techniques:

$${{mathcal{L}}}= mathop{argmin}limits_{{{boldsymbol{w}}},{{mathcal{T}}}}mathop{sum}limits_{vec{sigma }}{left({E}^{DFT}(vec{sigma })-mathop{sum}limits_{i = 1}^{N}{E}_{i}^{eCE}(vec{sigma },{{boldsymbol{w}}},{{mathcal{T}}})right)}^{2}+{{{mathcal{L}}}}_{reg}$$
(15)

where the first term is a least-squares error between the computed formation energies and the values predicted by the model, ({{{mathcal{L}}}}_{reg}) is a regularization term to prevent overfitting, w are the energy weights and ({{mathcal{T}}}) is the transformation matrix.

We demonstrate the advantages of the eCE formalism in a senary V-Nb-Ta-Cr-Mo-W alloy. This senary alloy is of current interest for high-temperature applications44,45,46,47. Mixtures of elements in groups 5 and 6 of the periodic table form disordered solid solutions on the body-centered cubic crystal structure, or ordered Laves phases52. We focus on the configurational thermodynamics of orderings on the bcc crystal structure for the rest of this study.

Hyperparameter optimization

Parameterizing embedded cluster expansions (eCE) with Eqs. (12)–(14) requires several hyperparameters to be carefully tuned. Figure 4 shows the variation in validation error with the number of effective chemical species in the eCE (i.e. the embedding dimension), the number of symmetrically distinct clusters, and the number of datapoints in the training dataset. The CE models of Fig. 4a are trained to reproduce formation energies of 2936 randomly sampled datapoints. Separate models were parameterized over 10 random instantiations of the training dataset. The site energies of eCE models are computed with neural networks. Additional information about the neural network parameters, and regularization are provided in the methods section.

Fig. 4: Learning curves for eCE and conventional CE.
figure 4

Average test errors (solid markers) and the standard deviations (error bars) are computed over 10 separate model parameterizations. a Variation in the test error with the number of clusters for training and test datasets containing 2936 and 1147 configurations, respectively. b Variation in the test error with the number of training datapoints for eCE models with 14 clusters.

Full size image

The mean validation error in Fig. 4a is computed over 1147 configurations that are not included in the training set. A cluster expansion containing two effective chemical species, 2-eCE, results in high validation errors of ≈15meV/atom. As 2-eCE embeds the 6 linearly independent site basis functions in a two-dimensional sub-space, this model is similar to a “binary” cluster expansion. Unlike a conventional binary cluster expansion where one of the site functions can take two possible values, the site function in 2-eCE can take six distinct values. The saturation in validation error with increasing number of clusters suggests the model with two site basis functions lacks the chemical flexibility to reproduce the ordering energies of this senary alloy. Increasing the dimensionality of the projected site basis functions to three lowers the validation error to ≈3 meV/atom. In fact, the “ternary” 3-eCE model is essentially as accurate as cluster expansions with six linearly independent site basis functions (purple line in Fig. 4a). The relatively small spread in validation errors for eCE models suggests that the models are not very sensitive to the exact configurations included in the training dataset.

Figure 4 a also compares a linear CE model (parameterized with ridge regression) to non-linear models that use neural networks. Similar to a previous study51, the linear model requires more cluster basis functions to reach accuracies comparable to non-linear models. Parameterizing a linear cluster expansion model with triplet cluster sizes larger than 4Å was prohibitively expensive due to the large memory requirements needed for ridge regression. Cluster functions built with pair clusters up to a size of 10Å and triplet clusters with a size of 4Å are found to be highly accurate for non-linear models. Linear eCE models are found to have significantly higher prediction errors than non-linear eCE models that employ neural networks to compute site energies.

Having identified the optimal number of clusters, we next study the variation in validation error with the size of the training dataset. Figure 4b shows learning curves of eCE models with projection dimensions ranging from 2-6. 2-eCE models trained with up to 3000 datapoints have validation errors of over 15 meV/atom. Interestingly, ≈500−1000 randomly chosen training datapoints are sufficient to reproduce the complex chemical interactions in the senary alloy for eCE models that contain 3 or more embedding dimensions. The validation errors for a linear senary CE are higher than the non-linear models.

Often, despite low validation errors, atomistic models may fail to reproduce phase stability at low temperatures. Figure 5 depicts the formation energies of orderings in three binary alloys. DFT calculations (gray circles in Fig. 5) predict a single binary ground state in Cr-Ta, no stable ground states in the Nb-V alloy, and several stable ground states in the Mo-Ta alloy system. A 3-eCE model trained with 2936 datapoints and 14 symmetrically distinct clusters reproduces all the salient features of the zero Kelvin energies in binary alloys (green crosses in Fig. 5). The stable states in Cr-Ta and Nb-V are exactly reproduced by the 3-eCE model, while most ground states are captured by the 3-eCE in Mo-Ta. Minor discrepancies between eCE and the DFT calculations may be due to fitting errors or small numerical errors in the electronic structure calculations. Nevertheless, there is excellent agreement between the 3-eCE model and DFT, indicating that eCE models are able to capture thermodynamic ground states in addition to the overall ordering energetics of multicomponent alloys.

Fig. 5: Formation energies in the Cr-Ta, Mo-Ta and Nb-V binary alloys.
figure 5

Gray circles are formation energies computed with DFT and green crosses with 3-eCE. Configurations on the convex hull are shown with larger markers. Convex hulls for each system are shown with either a green line (3-eCE) or a gray line (DFT).

Full size image

Extrapolating in chemical space

Training datasets for conventional CE models usually contain orderings that span the entire composition space of an alloy. As a result, multicomponent alloys with 3 or more chemical species can require orders of magnitude more data than simpler binary alloys. eCE models are able to leverage chemical similarities between alloying elements to extrapolate into unsampled regions of composition space. Figure 6 compares two models that are trained on all configurations except those containing both molybdenum and tantalum. The linear CE and 3-eCE models should be unable to learn the interactions between Mo and Ta as configurations containing both elements are not included in the training dataset. Surprisingly, the 3-eCE model in Fig. 6 reproduces the ordering energies of binary Mo-Ta configurations. 3-eCE is able to reproduce several ground states and the overall shape of the convex hull. In contrast, the conventional linear CE of Fig. 6 fails to reproduce the energies of binary orderings. The failure of the conventional CE is not surprising as the model has to extrapolate into the binary Mo-Ta space. On the other hand, the 3-eCE model learns from chemical similarities in the dataset to effectively extrapolate into unseen composition spaces.

Fig. 6: Chemical extrapolation.
figure 6

Formation energies of orderings in the Mo-Ta binary alloy as computed with DFT (gray circles), a 3-eCE model (green triangles) and a linear CE (gray stars). The 3-eCE and CE models are trained on a dataset that does not contain any configurations with both Mo and Ta. Convex hulls are shown as dark gray (DFT), green (3-eCE) and light gray (linear CE) lines.

Full size image

Figure 7 benchmarks the extrapolation ability of eCE models across all possible pairs of left-out elements. Similar to Fig. 6, 15 separate training datasets were constructed by including all configurations from the senary dataset, except those containing a specific pair of elements. Each training dataset was then used to parameterize five eCE models with chemical embedding dimensions ranging from 2 to 6. A linear senary CE was also parameterized for each dataset. The linear CE serves as a baseline against which we compare the predictions of eCE. The validation errors computed over all left-out data and the left-out binary configurations are shown in Fig. 7. The benchmarks of Fig. 7 are similar to the leave one out cross validation (LOOCV) metric. Rather than individual datapoints being left out of the dataset, entire alloys are used to cross-validate models.

Fig. 7: Extrapolating the formation energies over left-out element pairs.
figure 7

Average RMSE computed over all left-out element pairs for eCE models with embedding dimensionality from 2-6 and a linear CE model. Training sets are composed of all orderings except a specific pair of elements. Blue bars correspond to the errors computed over all left-out data. Green bars are the errors computed over only binary orderings of the left-out element pair. The spread for each bar is the minimum and maximum errors computed over all pairs of left-out elements.

Full size image

Figure 7 clearly demonstrates that eCE models are able to extrapolate into unseen composition spaces with significantly higher accuracy than conventional CE models. Average extrapolation errors range from ≈30 meV/atom for a linear CE to ≈8 meV/atom for 3-eCE models. eCE models with 3 or 4 embedding dimensions are found to have the smallest extrapolation error, while all other eCE models have significantly larger errors. Although the average energy errors of 3-eCE models are small, we do find some degree of sensitivity to the exact pair of elements left out of the training dataset. Figure 7 shows the range of energy errors across all 15 pairs of elements that are left out of the training dataset. For instance, the 3-eCE model is found to have extrapolation errors on left-out binary configurations that range from ≈5−30 meV/atom. The binary alloy comprised of the left-out elements corresponds to the most challenging datapoints to reproduce with eCE models. As a result the average prediction errors over configurations containing just the left-out elements (green bars in Fig. 7) is higher than the prediction error over configurations containing other elements in addition to the left-out pair. The highest extrapolation errors are found to occur for binary alloys containing either Cr and V or Cr and Nb. This suggests that even with chemical compression, the bonding between some elements may be too complex to extrapolate from interactions in other alloys.

Embedded cluster expansions are able to utilize chemical similarities to achieve low prediction errors in chemical spaces that would be entirely extrapolative when using conventional CE. For instance, as shown in Fig. 7, eCE models with embedding dimensions of 3 and 4 perform significantly better than eCE models with higher embedding dimensions. As the number of effective chemical species increase, the eCE model treats chemical species with a greater degree of independence. Thus, higher dimensional embeddings lose the ability to leverage chemical trends to lower prediction errors. In Fig. 7, where all configurations containing a pair of elements are left out, 6-eCE models perform poorly as they are fully extrapolative. In contrast, 3-eCE models utilize chemical trends to achieve lower prediction errors.

Similar to the CALPHAD method, CE models of multicomponent alloys can be parameterized starting from training data that spans lower-order constituent systems such as binaries, ternaries, etc53. Figure 8 tests the ability of eCE models to extrapolate into multicomponent space when it is trained on lower-order chemical spaces. All models in the first panel of Fig. 8 were parameterized with a dataset containing only binary orderings of the elements in groups 5 and 6 of the periodic table. The models were then tested on higher-order composition spaces containing 3 or more elements. As shown in Fig. 8, 3-eCE models have extrapolation errors ranging between 13-16 meV/atom, with the more complex orderings being better determined than the simpler ternary orderings. In contrast, 2-eCE and linear CE models have significantly larger errors. Similar to Figs. 4, 6 and 7, 2-eCE lacks sufficient flexibility to capture the chemical interactions of this alloy system. The energies of binary datapoints are likely insufficient to accurately describe the multicomponent energetics in the senary alloy as 3-eCE models parameterized over separate random instantiations resulted in errors ranging from 10 to 20 meV/atom.

Fig. 8: Extrapolation error in multicomponent chemical space.
figure 8

Each panel in the figure corresponds to a training dataset constructed by choosing orderings with at most the number of elements indicated. For instance, the ternary training dataset included all ternary and binary orderings. The RMSE is then computed for higher order systems. eCE models with embedding dimensions ranging from 2-6 are compared with a linear model.

Full size image

Adding configurations with orderings of up to 3 elements drastically lowers the extrapolation error of most CE models. In fact, adding configurations beyond ternary orderings shows only marginal improvement in extrapolation errors (≈3–4 meV/atom). While configurations with multiple alloying elements may be critical to accurately capturing low-energy ground states, the results of Fig. 8 suggest that the senary alloy formed from elements in groups 5 and 6 of the periodic table are primarily composed of unary, binary and ternary interactions. This is in excellent agreement with our findings from previous sections indicating that 3-eCE models are able to accurately reproduce the ordering energetics of this senary alloy.

Finite-temperature predictions

Surrogate models such as CE are ultimately used to compute finite-temperature quantities such as free energies, heat capacities, or short-range order parameters. Figure 9 compares the nearest neighbor Warren-Cowley short-range order parameters (SRO) in the Cr-W binary alloy computed with a linear CE to the values obtained from a 3-eCE model. The linear CE was parameterized with binary orderings of Cr and W in our dataset. Clusters containing up to 4 sites and a distance of 5.5Å were included in the model that achieved an RMSE of 3.5 meV/atom. The 3-eCE model is identical to the model used in Fig. 5. Canonical Monte-Carlo simulations are employed to compute ensemble averages of the Cr-W SRO as a function of temperature. Both models show essentially identical values of the SRO at elevated temperatures and start to slightly deviate at temperatures approaching ≈1600 K. The Cr-W SRO value is found to be slightly negative, indicating an elevated number of Cr-W pairs as compared to the truly disordered alloy. Similar values for the Cr-W alloy have been computed recently with both CE and off-lattice interatomic potentials36. The agreement in finite-temperature properties indicates that the 3-eCE model has comparable accuracies to a conventional CE.

Fig. 9: Warren-Cowley short-range order (SRO) parameters for Cr-W pairs in an equiatomic Cr-W binary alloy.
figure 9

The SRO values computed from canonical Monte-Carlo simulations with a conventional CE is compared against the values based on a 3-eCE model.

Full size image

Having established the accuracy of eCE, we employ the 3-eCE model to compute finite-temperature SRO values in an equiatomic senary V-Nb-Ta-Cr-Mo-W alloy (Fig. 10). Figure 10a shows the SRO values for elements plotted based on their group numbers. Pairs of elements in the same group display positive values of SRO, corresponding to fewer nearest neighbor pairs as compared with a random alloy. Choosing one element from group 5 and another from group 6 results in negative values of SRO. The short-range order parameter values for all pairs of elements at 3000 K are shown in Fig. 10b. Element pairs involving one element from group 5 and another from group 6 either have SRO values close to zero or are strongly negative even at elevated temperatures. Our finite-temperature simulations suggest that pairs of elements from groups 5 and 6 are attractive and there should be an elevated number of such pairs even at elevated temperatures. In turn, this causes a decrease in the number of element pairs from the same group.

Fig. 10: Warren-Cowley short-range order (SRO) parameters in an equiatomic senary V-Nb-Ta-Cr-Mo-W alloy.
figure 10

The SRO is computed with a 3-eCE model and canonical Monte-Carlo simulations. a SRO variation with temperature for pairs of elements labeled by the group number, i.e., group 5 elements with group 5 elements in blue, group 5 elements with group 6 elements in orange, and group 6 elements with group 6 elements in green. b SRO values for all pairs of elements at 3000 K.

Full size image

Discussion

Cluster expansions are the tool of choice to study phase transformations and finite temperature properties of multicomponent alloys. The embedded cluster expansion (eCE) model introduced in this study leverages chemical similarities between elements to construct atomistic models for alloys containing several elements. The eCE model simultaneously learns a lower-dimensional embedding of site basis functions along with the regression coefficients of a site-centric energy model. The site energies within eCE use cluster functions constructed from the transformed site functions that lie in a lower-dimensional space. As fewer site functions are required to describe occupants at any given site, there is a drastic reduction in the number of cluster functions. The results of Fig. 4 show that eCE models can reach accuracies comparable to conventional CE models. Zero Kelvin phase stability predicted by eCE models is also found to be quantitatively accurate (Fig. 5). Allowing the model to learn chemical similarities between elements enables robust extrapolation into unsampled chemical spaces. Despite leaving pairs of elements out in Figs. 6, 7, eCE models capture the energetics of the left-out alloy. The results of Fig. 8 suggest that orderings from simpler chemical sub-systems may be sufficient to capture multicomponent interactions in concentrated alloys. Finite temperature properties such as short-range order are also found to be well-reproduced by eCE models (Fig. 9), allowing us to investigate trends in SRO for the multicomponent senary alloy (Fig. 10).

The values of the site basis functions learnt by an eCE model can shed light on chemical similarities between alloying elements. Site function values learned by a 3-eCE model for the six refractory elements in groups 5 and 6 of the periodic table are shown in Fig. 11. 10 separate 3-eCE models were parameterized starting from an identical initialization for the elements of the embedding matrix (({{mathcal{T}}}) in Eq. (12)). The initial values of the site basis functions for each element are shown as circles and the final values learnt by the 3-eCE model are represented as triangles in the figure.

Fig. 11: Site basis function values learnt by 3-eCE models.
figure 11

The initial values of the site basis functions are shown as circles. The final values learned by the 3-eCE model across 10 separate instantiations are represented as triangles. Each element is denoted with a different color.

Full size image

Figure 11 shows several chemical trends across all 3-eCE models. Pairs of elements, such as molybdenum and tungsten or tantalum and niobium have similar site basis function values. Chromium and vanadium on the other hand are separated from the site basis function values of the other elements. Both elements are not found to cluster with any other element in Fig. 11. The clustering of site basis function values can be correlated with chemical similarities between elements. For example, molybdenum and tungsten have similar metallic radii and belong to group 6 of the periodic table. This results in very similar chemical interactions as reflected by embedded values of the site basis functions for both elements in Fig. 11. In contrast, the other element of group 6, chromium, is smaller than Mo and W. This results in qualitatively different interactions of Cr and causes the model to separate the element in the projected space. The elements of group 5 in the periodic table have very different site basis function values than the elements of group 6. The chemically similar elements, niobium and tantalum, have embedded site function values that are in close proximity, while vanadium, that is smaller than both elements is clearly differentiated in Fig. 11. The lack of elements similar to Cr and V may be related to the large extrapolation errors observed for Cr-V containing configurations in Fig. 7. Some degree of scatter is evident over the models due to the stochasticity of the gradient descent technique used to minimize eq. (15). Nevertheless, all parameterizations show similar trends.

Our results also suggest that learning the transformation matrix, ({{mathcal{T}}}), and careful initialization of the embedding matrix is crucial to obtaining predictive models. Figure 12 compares the validation errors of four different learning schemes. Two sets of models are parameterized while allowing for the transformation matrix to be learnt during model training. In the other two groups of models, the embedding matrix is fixed to its initial value. We also attempt two different initialization strategies. The first group of models are initialized based on similarities in the chemical properties of each element (details are outlined in the methods section). Random orthogonal projection vectors are used for the initial embedding matrix in the other models. Figure 12 shows the range of validation errors obtained over 10 instantiations of a 3-eCE model. Random initializations result in a large variance of the validation error. Difficulty in learning predictive eCE models from random initializations could be due to the existence of multiple local minima of the loss function. All models that are initialized with a transformation matrix containing some chemical information are able to achieve significantly lower prediction errors than random embedding matrices. The 3-eCE model with the lowest validation error that was initialized with a random projection matrix was also able to learn elemental similiarities like those shown in Fig. 11. Interestingly, a learnable transformation matrix seems to be necessary to enhance the predictive power of the model. Comparing the models with chemically informed initializations of the transformation matrix, Fig. 12 suggests that learning the best embedding matrix could lower errors by ≈3 meV/atom. This can also be seen in the reduced spread of validation errors for random initializations with a learnable embedding matrix. While the benefits of learning the embedding matrix for the senary refractory alloy system are not very large, this may be important for materials with more alloying elements.

Fig. 12: Effect of the embedding matrix initialization and learning scheme on validation error.
figure 12

Box plots of the validation RMSE computed for 10 different 3-eCE parameterizations for four different projection schemes. Each group of models is either initialized with an embedding matrix as described in the methods section or with a random embedding matrix. One set of models are allowed to learn the embedding matrix starting from the initial value, while in the other set of models the embedding matrix is fixed.

Full size image

The eCE framework of eqs. (12) to (14), is similar to recently proposed chemical embedding schemes for off-lattice interatomic potentials54,55,56,57,58. The application of chemical compression schemes to off-lattice models have enabled researchers to investigate alloys containing several elements59,60. Parameterizing such off-lattice models can be very expensive, often requiring tens of thousands of calculations. Additionally, as suggested in a recent study59, resolving the small energy differences between competing intermetallic orderings with a general interatomic potential can be challenging. The eCE surrogate model could provide sufficient energy accuracy and computational speed to bridge this gap and allow researchers to study complex alloy thermodynamics. As shown by Fig. 4, relatively small training datasets are required to parameterize on-lattice models with chemical compression schemes. The eCE models are sufficiently flexible to extrapolate into higher dimensional composition spaces. Additionally, computing finite-temperature properties from eCE models is relatively straight forward and computationally cheap. This could enable alloy designers to rapidly screen materials for compositions with desirable properties through eCE based surrogate models. Promising alloy chemistries that require more accurate simulations that account for all sources of entropy can subsequently be studied with bespoke interatomic potentials.

The chemical flexibility of the eCE formalism and the smaller dataset sizes needed to parameterize these models will enable the systematic exploration of high-dimensional composition spaces. eCE models will provide significant advantages against conventional CE in alloys where some chemical trends (similarities or dissimilarities) exist between groups of elements. In materials where elements are chemically uncorrelated or with very complex chemical trends, eCE models may require larger embedding dimensions, perhaps even approaching the number of elements in the alloy. Benchmarks such as the learning curves of Fig. 4 can be used to discern the appropriate dimensionality of the embedding space.

The eCE formalism is also subject to several of the same restrictions as conventional CE. For instance, alloys with long-range interactions, or significant structural relaxations will continue to remain challenging to parameterize with eCE models. The problem can be somewhat alleviated through structure matching algorithms61 to prune large relaxations out of training databases and the explicit inclusion of additional terms accounting for long-range interactions39,62. Further, all significant sources of entropy will need to be included in the formalism to enable a rigorous comparison with experiment. This may require the coupling of eCE models to other site degrees of freedom such as magnetic moments, vibrations or lattice distortions. The extension of the eCE model to such coupled effective Hamiltonians can be done similarly to existing methods that couple site occupancy with other discrete or continuous degrees of freedom.

Methods

DFT calculations

Formation energies of 4083 symmetrically distinct orderings between elements of groups 5 and 6 (Cr-Mo-Nb-Ta-V-W) are calculated with the generalized gradient approximation (GGA-PBE) to density functional theory (DFT) and projector augmented-wave (PAW) pseudopotentials as implemented in the Vienna Ab-Initio Simulation Package (VASP)63,64,65,66. A plane-wave cutoff energy of 550 eV with a k-point grid density of 55Å and smearing of 0.1 eV were used to relax the positions of atoms and lattice parameters of all orderings. Symmetrically distinct orderings on a parent bcc crystal structure are enumerated with the CASM code4. 2487 symmetrically distinct orderings on supercells of sizes up to 12 are enumerated within the binary and ternary sub-systems of the senary V-Nb-Ta-Cr-Mo-W alloy. Symmetrically distinct equiatomic orderings in the quaternary, quinary, and senary alloys are enumerated in supercells containing up to 6 atoms. 387 random arrangements of the 6 elements in a supercell containing 8 atoms are also included in the training dataset. SRO values are computed with canonical Monte-Carlo simulations performed in a 10 × 10 × 10 supercell of the conventional bcc cell. The short-range order parameters are computed by averaging over 1000 Monte-Carlo passes.

Embedded cluster expansion

The embedded cluster expansion was implemented in python using the PyTorch library. Gradient descent of the loss function in eq. (15) is performed with the stochastic Adam algorithm for 100 epochs. A learning rate scheduler is applied and overfitting is controlled through the L2-regularization. A graph for the site energy is built within PyTorch starting from Chebyshev site basis functions that are projected into a lower dimensional space through a learnable embedding matrix. Symmetrized site-centric cluster functions constructed as tensor products of the embedded site basis functions are used as input to a 4-layer neural network (32 × 32 × 8 × 1) that uses the ReLU activation function for each node except the final layer, where we use a linear activation function. The rows of the learnable linear transformation ({{mathcal{T}}}) are re-normalized after each iteration.

eCE models were initialized with a projection matrix, ({{mathcal{T}}}) computed from chemical properties of each element. 8 elemental properties (atomic number, radius, electronegativity, density, melting point, bulk modulus, Youngs modulus and Brinell hardness) were collected for each element from pymatgen67. The material properties for each element were used to form the columns of a matrix, A. The rows of A were standardized to have zero mean and a standard deviation of 1. An embedding matrix, with an embedding dimensionality of k was initialized with the first k right-singular vectors of A.

Related Articles

First-principles and machine-learning approaches for interpreting and predicting the properties of MXenes

MXenes are a versatile family of 2D inorganic materials with applications in energy storage, shielding, sensing, and catalysis. This review highlights computational studies using density functional theory and machine-learning approaches to explore their structure (stacking, functionalization, doping), properties (electronic, mechanical, magnetic), and application potential. Key advances and challenges are critically examined, offering insights into applying computational research to transition these materials from the lab to practical use.

Accelerating crystal structure search through active learning with neural networks for rapid relaxations

Global optimization of crystal compositions is a significant yet computationally intensive method to identify stable structures within chemical space. The specific physical properties linked to a three-dimensional atomic arrangement make this an essential task in the development of new materials. We present a method that efficiently uses active learning of neural network force fields for structure relaxation, minimizing the required number of steps in the process. This is achieved by neural network force fields equipped with uncertainty estimation, which iteratively guide a pool of randomly generated candidates toward their respective local minima. Using this approach, we are able to effectively identify the most promising candidates for further evaluation using density functional theory (DFT). Our method not only reliably reduces computational costs by up to two orders of magnitude across the benchmark systems Si16, Na8Cl8, Ga8As8 and Al4O6 but also excels in finding the most stable minimum for the unseen, more complex systems Si46 and Al16O24. Moreover, we demonstrate at the example of Si16 that our method can find multiple relevant local minima while only adding minor computational effort.

The design space of E(3)-equivariant atom-centred interatomic potentials

Molecular dynamics simulation is an important tool in computational materials science and chemistry, and in the past decade it has been revolutionized by machine learning. This rapid progress in machine learning interatomic potentials has produced a number of new architectures in just the past few years. Particularly notable among these are the atomic cluster expansion, which unified many of the earlier ideas around atom-density-based descriptors, and Neural Equivariant Interatomic Potentials (NequIP), a message-passing neural network with equivariant features that exhibited state-of-the-art accuracy at the time. Here we construct a mathematical framework that unifies these models: atomic cluster expansion is extended and recast as one layer of a multi-layer architecture, while the linearized version of NequIP is understood as a particular sparsification of a much larger polynomial model. Our framework also provides a practical tool for systematically probing different choices in this unified design space. An ablation study of NequIP, via a set of experiments looking at in- and out-of-domain accuracy and smooth extrapolation very far from the training data, sheds some light on which design choices are critical to achieving high accuracy. A much-simplified version of NequIP, which we call BOTnet (for body-ordered tensor network), has an interpretable architecture and maintains its accuracy on benchmark datasets.

Rapid brain tumor classification from sparse epigenomic data

Although the intraoperative molecular diagnosis of the approximately 100 known brain tumor entities described to date has been a goal of neuropathology for the past decade, achieving this within a clinically relevant timeframe of under 1 h after biopsy collection remains elusive. Advances in third-generation sequencing have brought this goal closer, but established machine learning techniques rely on computationally intensive methods, making them impractical for live diagnostic workflows in clinical applications. Here we present MethyLYZR, a naive Bayesian framework enabling fully tractable, live classification of cancer epigenomes. For evaluation, we used nanopore sequencing to classify over 200 brain tumor samples, including 10 sequenced in a clinical setting next to the operating room, achieving highly accurate results within 15 min of sequencing. MethyLYZR can be run in parallel with an ongoing nanopore experiment with negligible computational overhead. Therefore, the only limiting factors for even faster time to results are DNA extraction time and the nanopore sequencer’s maximum parallel throughput. Although more evidence from prospective studies is needed, our study suggests the potential applicability of MethyLYZR for live molecular classification of nervous system malignancies using nanopore sequencing not only for the neurosurgical intraoperative use case but also for other oncologic indications and the classification of tumors from cell-free DNA in liquid biopsies.

Solution-processable 2D materials for monolithic 3D memory-sensing-computing platforms: opportunities and challenges

Solution-processable 2D materials (2DMs) are gaining attention for applications in logic, memory, and sensing devices. This review surveys recent advancements in memristors, transistors, and sensors using 2DMs, focusing on their charge transport mechanisms and integration into silicon CMOS platforms. We highlight key challenges posed by the material’s nanosheet morphology and defect dynamics and discuss future potential for monolithic 3D integration with CMOS technology.

Responses

Your email address will not be published. Required fields are marked *