Soliquidy: a descriptor for atomic geometrical confusion

Introduction
Rational control of the properties of materials requires understanding the phase transitions involved in the synthesis of the material1,2,3. During the transition from liquid/gas/plasma to solid/crystalline, the movement of the atoms (transport) and the specific heat rate change (latent heat) determine the final outcome. Balancing the interplay between these factors can lead to the formation of non-ground-states configurations, often technologically advantageous. Despite the critical importance of kinetics, searches for new materials often focus primarily on the energetic landscape. Even theories of synthesizability are mostly discussed in terms of initial and final energetic states, neglecting the process used for the synthesis4. Until now, geometrical descriptors mostly aim to capture the interconnection of the atoms or encode the crystalline structure of a material itself, as it is commonly used in machine learning approaches5,6,7,8,9,10. Hence, the need for characterizing atomic movements in liquid-to-solid transitions remains.
In this article, we introduce Soliquidy (({mathfrak{S}})), a geometrical descriptor capturing the optimal-transport aspect of the transition between translationally disordered (melt, plasma, solution, vapor – atom position is not fixed) versus ordered materials states in a single numerical value, measuring the integrated Euclidean distance between a structured and an uniformly randomized state. We test Soliquidy on bulk metallic glasses (BMGs)11,12,13,14,15,16,17, perfect candidates since the rapid solidification needed for their formation emphasizes the kinetic (Greer’s) “confusion” of atomic rearrangements18. During the first stage of solidification, many different crystal structures are created concurrently: the more distinct, low-energy structures can energetically coexist, the more the alloy is likely to form an amorphous phase19,20. This correlates with Soliquidy: geometrical organizations capable of moving atoms around equilibrium positions with small transport costs will be capable of generating many dissimilar structures with similar energy.
The approach is fourfold. (i) We extend our experimental thin-film library of glasses21,22,23,24, analyze the mixtures, and label the compositions as glass or ordered; (ii) We equip such data with geometrical/energetic information extracted from the the aflow.org database25; (iii) By using a machine learning/artificial intelligence approach, specifically the sure independence screening and sparsifying operator (SISSO)26, we find several effective classifiers based on Soliquidy and enthalpy values (note that to account for the composition spread in the nucleation phase, we apply a new weighting scheme that treats this problem statistically). (iv) We apply the best-found classifier to an independently published dataset of BMGs27,28. The final results demonstrate the usefulness of Soliquidy to tackle amorphous systems.
Discussion
Soliquidy ({mathfrak{S}})
The goal is to develop a robust geometric descriptor that captures the transition from a translationally disordered to an ordered (solid, crystalline – atom positions are fixed) state. The descriptor should be based on the full atomic distribution in three dimensions without relying on symmetry, making it suitable for even structurally disordered systems. The term Soliquidy derives from the merge of solid- and liquid-like states.
We describe the transition as an optimal-transport problem29,30, where the goal is to address the logistical problem of moving atoms at the minimal cost, functional of the distances between the atoms’ positions from the translationally disordered state to the final ordered one. The idea is the following: (i) one sits on an atom and creates a Voronoi cell around it; (ii). the atom is then uniformly smeared inside such cell; (iii) an integral transport cost is defined for bringing the smeared atom back to its original position, where the cost is a functional of the traveled distance; and (iv) the procedure is repeated and summed for all the available atoms to give a macroscopic transport cost; i.e., Soliquity.
Voronoi tesselation
The choice of Voronoi tesselation31,32 comes from its uniquitous use in representing geometric properties of materials, for example: to estimate coordination numbers33, to capture structural heterogeneity34, to define degenerate Delaunay clusters to study crystallization35, and to generate the representation of crystalline compounds in machine learning models36. However, as the atoms of a specific element are distributed evenly over the observed volume in the translationally disordered state, each atom needs to be assigned a surrounding cell with the same volume fraction VVor = V/N, where V and N are the volume and the number of atoms of the cell. The standard Voronoi tessellation breaks this requirement. This limitation is overcome by weighting the growth rate of the different Voronoi cells so that the resulting cells are all equal in size, as shown in the example in Fig. 1. For this purpose, we employ the Kitagawa, Mèrigot and Thibert (KMT) algorithm37, giving the unique distribution of weights assigned of the Voronoi cells. This approach has no limitations when the chosen volume is periodic. Otherwise, the shape and weight associated with Voronoi cells bordering the surface will depend on the chosen boundary — yet the influence of the errors caused by the outer layer decreases with the number of Voronoi cells. As such, large amorphous systems can still be tackled. Nevertheless, all systems in this article are periodic, so the limitation does not appear.

a The grey filling of the cell represents the translationally disordered state, while the black dots represent the center of atoms fixed in the solid state (randomly placed to emphasize different cell forms); (b) each atom is assigned a cell using Voronoi tessellation; and (c) final generated Voronoi cells with optimized growth rate resulting in equal portions of the cell area, the original cells are outlined with dashed gray lines.
Euclidean transport cost
Given the whole cell weights/tesselation, the value of Soliquity for each atom can be calculated. Integrating through the Voronoi volume VVor(i), around the i-atom and located in position ri, the transport cost C(i) is defined from the distance of the smeared atom to the equilibrium position:
In such definition, we have chosen to associate the cost to the Euclidean distance — linked to the typical transport costs; e.g., total fuel or time spent in moving an object30. A normalization is necessary for comparisons based on shape. Thus, C(i) is normalized by a sphere having a four-dimensional volume given by the three-dimensional spherical radius of the Voronoi cell. One gets ({r}_{{rm{S}}}(i)equiv root3of{3{V}_{{rm{Vor}}}(i)/4pi }) and ({C}_{{rm{S}}}(i)equiv pi {r}_{{rm{S}}}^{4}). Soliquidy is then expressed as the sum over all atoms of the root mean squared difference of all normalized costs:
For multi-component systems, one can obtain a species-restricted Soliquity, by limiting the sum in Equation (2) to atoms belonging to a given j-species, thus obtaining ({{mathfrak{S}}}_{j}). A concentration-average can also be performed to obtain the macroscopic Soliquity value:
An example of Soliquidy from the different species in C3Ti4W is shown in Fig. 2. There, ({{mathfrak{S}}}_{{rm{Ti}}}) is significantly lower than ({{mathfrak{S}}}_{{rm{W}}}). This difference arises by the shape of the generated cells: close to spherical for Ti, and elongated for W. The location of the atoms in their cells also affects the total ({mathfrak{S}}): the higher, the further from the center.

The volumes highlighted in gray for each element are the equal sized Voronoi cells, that form the basis of the Soliquidy descriptor.
Figure 3 depicts the distribution over the whole aflow.org computational database. The value of Soliquidy ranges from approximately 0.9 to 100. The experimentally observed materials listed in the Inorganic Crystal Structure Database (ICSD)38,39 show a smooth distribution with most materials having ({mathfrak{S}} ,<, 10). When looking at aflow.org dataset, including many computationally derived structures, the fraction of systems having ({mathfrak{S}} ,<, 2) is significant higher. The reason lies in historical convenience: i.e., easy-to-calculate systems with few atoms per cell plus their combinatorial chemical combinations were the first to be added during the fast paced growth of the repository40,41,42,43. Conversely, the region with high Soliquidy is populated with more “computationally-demanding” complex structures (e.g., like layered composites, multicomponent and disordered ceramics), and therefore they are appearing at lower rate4,44,45,46.

({mathfrak{S}}) values in the whole aflow.org database and a sub-collection based on the experimentally observed materials listed in the Inorganic Crystal Structure Database (ICSD)38,39.
Nuclei composition kernel
As discussed above, entries in materials repositories like aflow.org are clustered at a limited number of compositions. In contrast, the composition of a material can be finely controlled in an experimental setting. Bridging this difference is crucial to using the enormous amount of data available in computational material databases. Our approach averages entries in the database into “virtual entries” at any given concentration, to make predictions possible. For this task, we introduce the nuclei composition kernel (NCK) to weigh database entries according to their distance from the target concentration.
In our chosen testbed of metallic glasses, the transition from the translationally disordered into the ordered solid phase with fixed atom positions occurs under a time constraint. Numerous nucleation seeds with different compositions and crystal systems are competing at the same time, and due to the high cooling rates, thermodynamic balancing does not have enough time. This results in geometrical confusion, as proposed by Greer in 199318. If a high enough number of different phases with incompatible structures are created alongside each other, the growth from the nucleation sites is severely hindered, and the resulting material will lack long-range ordering.
This important insight into solidification guides us in the design of the weighting approach for the database entries, by considering the earliest stage of nucleation as a statistical problem. Let us consider a small random alloy sample during its transition from the translationally disordered to the ordered state, as shown in Fig. 4. Local compositions in the beginning phase deviate quite substantially from the overall composition.

The selected example nuclei show diverging compositions for the four small samples (a) 11.1∣33.3∣55.5 %, b 33.3∣33.3∣33.3 %, c 11.1∣55.5∣33.3 % and, d 22.2∣44.4∣33.3 %.
To calculate the weight w of an ordered sample, we use the multinomial discrete probability density function. Here, the composition of a sample is distributed onto the chosen number of atoms in the nucleus NN. So for each of the NS species in the system, an integer number ND is set by rounding the product from concentration fraction and NN. As the sum of these integers can differ from NN, the discrepancy is corrected by adjusting the number with the largest error accordingly. Then, the weight can then be expressed as:
where NE represents the number of equal outcomes and pj is the concentration of the jth species in the translationally disordered phase. As the number of atoms in the nucleus grows, the accessible compositions increase. This leads to a distribution centered around the target composition, lowering the probability of deviation from the target. In the limit of NN → ∞ the distribution will resemble a Gaussian. An example of an alloy with three species and different nuclei sizes is shown in Fig. 5.

The red dot is placed at the nominal composition. Each grey point represents a possible composition given the number of atoms in the nuclei. The color around each point shows the relative weight of that region of the composition space. The composition area sharing the same weight increases with the decreasing number of atoms from (a) to (c).
A further desirable characteristic of the NCK is that the original composition is recovered even when the samples are randomly distributed in the composition space. While this property is already helpful in effectively averaging database information for specific compositions, in the case of general material databases, an additional step is needed to avoid a shift in the resulting overall composition from highly sampled points. For example, atomic compositions numbers having the largest “greatest common factor” have the smallest basis in the primitive cell, thus requiring less computational resources. As such, they tend to have significantly more calculations available in databases compared to other compositions. For this reason, all samples that belong to the same nucleus (gray points in Fig. 5) are binned together. The combined entries created with this methodology are available in Supplementary Datasets 2 and 3.
Results
Constructing and integrating the thin-film dataset with the aflow.org database
This comprehensive dataset was constructed by combining our labeled (amorphous, crystalline) experimental data of thin-film alloys produced by DC Magnetron co-sputtering21 with theoretical properties of crystalline materials with similar compositions retrieved from the aflow.org database. The resulting dataset provides a unique platform to test if Soliquidy can be an integral part of a classifier capable of differentiating between the amorphous and crystalline labeled entries. For further details on the aflow.org computational database and the experimental setup employed in this study, readers are referred to the Method section.
SISSO analysis
By exploring the combined thin-film dataset and aflow.org database with SISSO, we found several possible combinations of features able to tackle glass formation. The ten best-performing classifiers all rely on Soliquidy as part of the model (quality is defined from the distances of wrongly classified entries from the separator boundary). The best classifying manifold we found is the following:
This two-dimensional locus combines: Soliquidy, volume per atom, enthalpy per cell, and two coefficients, A = 7.44 Å3eV and B = 2.73Å3. By using Equation (6), our experimental dataset can be divided by a simple linear equation into the two categories amorphous and crystalline (Fig. 6a), with a categorization success of 83.8%. This split means that out of 1126 samples, 943 are correctly categorized. When leaving parts of the dataset out of the fit in a 5-fold cross-validation utilizing Support Vector Classification, the average positive classification rate is 80.0%. The isolated island in the upper left corner of Fig. 6a is formed by the entries from CuMgY and AlCuZr, two alloy systems known as good glass formers, that exhibit amorphous structures over the full experimentally covered concentration range. Overall, our positive categorization rate is similar to other approaches. For example, by using a general-purpose machine learning framework, Ward et al. obtained a characterization rate between 80–90%34. Other models, leveraging only composition information, can reach accuracies of 89%6. Our model is based on the glass-agnostic assumption – the purposeful exclusion of features that focus on closeness to known glass formers. Training on such features would drive models to learn that compositional neighbors of glass formers are also quite likely glass formers without attempting to elucidate the underlying mechanism of solidification. Our method aims to predict — with a certain degree of accuracy — if an alloy at a specific composition can form an amorphous phase purely on the first-principle calculation of geometric and energetic features of crystalline structures.

a The solid black line separates the amorphous and crystalline systems in our thin-film library at a positive categorization rate of 83.8%. In the case of (b) the public bulk metallic glasses (BMG) dataset, the original separator needs to be shifted in parallel to accomplish the optimal positive categorization rate of 78.9% (dashed line).
Testing soliquity on a BMG public dataset
To test the proposed classifier, we are also applying it to the public dataset published by Ward in 201827,28. It needs to be noted that the published dataset contained over 150 entries with stoichiometries that did not correspond to the published works from where they were taken. These mistakes were corrected before its use.
In Fig. 6b, all ternary compositions that are labeled in the public dataset as BMGs/none are plotted. As the classifier depends on aflow.org data, only ternary alloy systems with at least 300 entries were considered. The solid black line in Fig. 6b represents the same separator used for our experimental data. It is clear that the separation would not be sufficient. This is not very surprising as the experimental methods used in both cases are quite different. The cooling rate of the samples in the thin-film library (≈ 108 K s[−147) is orders of magnitude faster than the bulk cooling rates experienced by those in the public dataset. It is to be expected that the higher effective cooling rate during the sputtering process led to a higher ratio of amorphous systems. Splitting the public dataset optimal reveals that only a parallel shift of the original separator into the amorphous region was necessary. By changing the B parameter of the model from 2.73 to 3.89 Å3 (dashed line in Fig. 6b) we can achieve a positive categorization rate of 78.9%. The 5-fold cross-validation for this dataset results in a positive classification rate of 73.0% While a considerable amount of sample points are misclassified, the trend inside the corresponding alloy systems still follows the same separation. Overall, we can observe a change in the balance between the samples’ energetically driven crystallization and the geometric hurdles opposing it. The shift demonstrates that energetically less favorable structures are more important at lower cooling rates.
In addition to a binary separation, we can also consider the distance from the dividing line to visualize the distribution within the ternary alloy system. We define the separation distance dS as:
Positive/negative values of dS indicate an amorphous/crystalline structure. In Fig. 7 this is plotted for AlCuV and AlCuMo respectively.

Tiles (a) AlCuV and (b) AlCuMo show the separation distance for these training set alloys with colored backgrounds. The symbols on top indicate the experimental outcome in the thin-film library. Bold black symbols show regions with correct predictions.
The results demonstrate a strong correlation between the newly introduced classifier utilizing Soliquidy and the tendency of a specific composition to form an amorphous solid phase. By using SISSO, we generated a straightforward relationship between the information available in the aflow.org database and our experimental data. The transferability of the trends observed in our data to a public dataset reinforces our confidence in the found classifier and the Soliquidy descriptor itself. The shift of one coefficient for the separation line in the case of the public BMG datasets is expected, as the coefficients capture the environment of the observed phase transition. In this case, the cooling rates in our experimental setup are significantly higher than for the public dataset. Overall, these findings validate Soliquidy’s capability to capture relevant geometrical information to investigate phase transitions.
The reliance on a consistently filled material database limits the number of systems that could be examined from the public dataset, as we only considered alloys with at least 300 entries. To further improve the comparability between systems and reduce the impact of over-sampled compositions, a subset of prototypes should be selected and populated for all systems in question. This approach leads to a selection process in well-explored systems, while in under-explored systems, it provides a roadmap of required calculations. For this study, we only employed the prototype approach to expand the number of entries in alloy systems, which had experimental results but needed to be computationally explored more thoroughly.
In this article we have introduced Soliquity, a descriptor that measures confusion in an atomic geometrical organization. Soliquity is calculated with the Euclidean transport cost of each atom moving through the surrounding Voronoi cell to its final equilibrium position. By extending our library of thin-film alloys and combining it with geometrical/energetic quantities extracted from the aflow.org computational database, we show that the combination of Soliquity and formation enthalpies can generate effective classifiers for metallic-glass formation. Most importantly, the glass-agnostic assumption does not require any knowledge of glass formation of nearby compositions. Soliquity can then be used to tackle other kinetic-controlled transformations where translational disorder plays a critical role, such as spinodal decomposition or solidification.
Methods
AFLOW.org database
To enable predictions regarding the glass-forming abilities of metallic systems, a reliable data source is required. For this work, we use entries from the aflow.org database25. The systems collected are computed based on the AFLOW standard48, enabling a consistent baseline for comparisons and analytical tasks. With over 4 million entries, many alloy systems can be explored directly. The inclusion of ICSD structures ensures that relevant and representative structures are available in different alloy systems. Overall, 9805 entries are used here. The list, grouped by alloy system, is available in Supplementary Dataset 1. In cases of under-explored systems, the AFLOW library of crystallographic prototypes40,41,49,50 can be used to fill in the gaps in a reproducible way.
SISSO (Sure independence screening and sparsifying operator)
The algorithm combines symbolic regression and compressed sensing26 to identify mathematical functions that best predict a target property of a dataset. It can model complex phenomena using simple descriptors for regression and classification tasks from tens to thousands of data points. For this work, we are utilizing the SISSO++ package51, an open-source implementation of the SISSO method.
Data points of ternary alloys from the aflow.org database are combined, using the nuclei composition kernel, into a collection of features for each composition available in the experimental dataset. For this, a nucleus size of NN = 25 is selected, ensuring that the nucleus is significantly smaller than the critical size while still focusing on aflow.org entries close to the experimental composition. Overall, a collection of 1126 entries, matching the experimental data across six ternary metal alloy systems (AlCuMo, AlCuV, AlCuZr, AlMoPd, AlMoV, CuMgY) was created. Each entry is labeled either amorphous or crystalline, based on the experimental results. Both labels serve as the target for the training.
After a first exploration using all the available properties in the database, we removed the ones having little or no correlation, and we focus on the following feature subset: energy per atom (eV), enthalpy per atom (eV), enthalpy per cell (eV), formation enthalpy per atom (eV), Soliquidy (unitless), and volume per atom (Å3). All features are based on computed quantities. To allow SISSO to be effective and efficient, the exploration was limited to the following set of mathematical operations: A + B, A − B, ∣A − B∣, A ⋅ B, A/B, ∣A∣, (exp (A)), (exp (-A)), (ln (A)), A−1, A2, A3, A6, (sqrt{A}), and (sqrt[3]{A}).
Experimental thin-film library preparation
The thin-film library is fabricated through confocal DC Magnetron co-sputtering (AJA International ATC2200). Sputtering targets of purity better than 99.95% are used (AJA International and Kurt J. Lesker Company). Silicon wafers of 4 inches with a 100 nm thick thermal oxide layer are used as substrates. Prior to sputtering, the chamber is evacuated to a pressure of 1 × 10−6 Torr or better. The films are sputtered in ultra-high purity argon gas at a pressure of 5.8 × 10−3 Torr. Sputtering guns are arranged in a tetrahedral geometry with the substrate. This arrangement leads to a compositionally graded film, where the film at each point in the library represents an alloy with a different chemistry. Typically, the species atomic fraction can vary by up to 40% within an approximate range of 10 to 95%. If one considers that a distinction of alloys requires at least one atomic percent52, up to 1000 alloys can be represented within one compositional library21,22,23,24.
The chemical composition of the various alloys in the thin-film library is characterized using high-throughput energy-dispersive x-ray spectroscopy (Helios G4 Focused Ion Beam – Scanning Electron Microscopy with UltraDry EDX detector, 25 kV accelerating voltage). The atomic structure is characterized using high-throughput x-ray diffraction (XRD), performed at beamline 1–5 at the Stanford Synchrotron Radiation Lightsource (SSRL) at SLAC National Accelerator Laboratory (12.7 keV photon energy, automated xy-stage to perform measurement across a 6 mm square grid)53. Each ternary alloy system is measured through 177 evenly spaced xy positions representing 177 alloys with unique compositions. The only exception is CuMgY, measured on a finer grid containing 241 points.
Responses