Active learning of effective Hamiltonian for super-large-scale atomic structures

Introduction

First-principles (FP) methods based on density functional theory (DFT) have become indispensable to scientific research in physics, chemistry, materials, and other fields¹. However, studying the structure and properties of large-scale structures, such as thermally-driven phase transitions or multidomain states in ferroic materials, remains a great challenge due to the large computational cost of using ab initio molecular dynamics. The recent development of first-principles-based machine-learning force fields (MLFFs) for molecular dynamics makes it possible to study the large-scale structure with good accuracy, similar to first-principles^2,3,4,5,6. Another method that can handle large-scale structures is the first-principles-based effective Hamiltonian, which is also physically interpretable and faster than MLFF-based molecular dynamics. The first-principles-based effective Hamiltonian approach has been proposed to describe the couplings between local order parameters (both long-ranged and short-ranged), in which the coupling parameters for the effective Hamiltonian are computed by first principles and have direct physical meanings^7,8. Such a method has successfully reproduced or predicted the structure phase transitions^9,10,11,12 and various properties of many compounds^{11,13,14,15,16}, such as piezoelectric effect, electrocaloric effect, dielectric response and optical response, and so on. Moreover, interesting complex polar vortices¹⁷, ferroelectric labyrinthine domains¹⁸, polar skyrmion¹⁹, and merons²⁰ were also recently found by effective Hamiltonian methods in complex perovskite systems.

For the effective Hamiltonian, the parameters of order-parameter-couplings are obtained by fitting FP calculations for many structures with special structural distortions^7,21. These fitting procedures may be tricky and complex, and some approximations (such as virtual crystal approximation)^22,23,24 may need to be included, leading to uncertainties and even errors for some complex interactions and structures. Additional manual adjustment of the values of some parameters may be necessary to reproduce experimental results^11,12. Therefore, to avoid complications and some approximations in the parameterization of the effective Hamiltonian, a new scheme of parameterization in a reliable, precise, convenient, and automatic way is highly demanded. Recently, there have been some reports of building and fitting first-principle-based models using machine learning and energy mapping schemes^25,26,27. Though mainly focused on magnetic effective Hamiltonian, they also shed some light on the building and parameterization of the atomic effective Hamiltonian with machine-learning-based approaches.

In this article, a general effective Hamiltonian is proposed, and an on-the-fly active-learning method is applied to the parameterization of this general effective Hamiltonian. Only a small number of FP calculations are required in this parameterization process. Perovskite structures [BaTiO₃, PbTiO₃, Pb(Zr,Ti)O₃, and (Pb,Sr)TiO₃] are taken as examples, where the active-learning method provides simulations results that agree very well with other first-principles-based calculations and experiments. Such a reliable and highly-automatic way to construct the effective Hamiltonian parameters makes it possible to mimic the super-large scale and complex atomic structures.

Results

Effective Hamiltonian

The effective Hamiltonian describes the couplings between order parameters, and it is developed based on the Taylor expansion of small distortions around the reference structure. Various order parameters are considered (see Fig. 1a for an example), which is further explained in “Mode and basis”. Briefly, the degrees of freedom of the effective Hamiltonian are: (1) local modes attributed to each unit cell i, to be denoted as {s₁}, {s₂}, ⋯ , representing atomic displacements with respect to the reference structure, usually associated with different phonon modes; (2) the homogeneous strain tensor η⁷; and (3) the variable {σ} representing the atom occupation in unit cell i [for example, in Pb(Zr,Ti)O₃, σ_i = 1 (respectively, σ_i = 2) represents a Ti (respectively, Zr) atom sitting in unit cell i]²².

The potential energy E_pot contains four main parts: (i) E_single, which contains the self energies of each mode, involving only one site in each term; (ii) E_strain, which contains all the energy terms directly related to the strain tensor η; (iii) E_inter, which contains several terms describing the two-body interactions between different local modes or the same local modes at different sites; and (iv) E_spring, which describes the effect of atomic configuration of different elements (known as “alloying effect”²²), which consists of several “spring” terms (using the terminology of ref. ²⁸).

In principle, the formalism of effective Hamiltonian could be applied to any structures where a reference structure with high symmetry can be defined. Here, we only focus on the perovskites in formula ABX₃, in which A- or B-sites can be occupied by multiple elements. Practically, the local dipolar mode vector {u}, antiferrodistortive (AFD) pseudovector {ω}, and inhomogeneous strain vector (acoustic mode) {v} are considered as the modes {s} (see Fig. 1a, and in total nine degrees of freedom describe the state of each unit cell). More details of the effective Hamiltonian for perovskites are described in “Methods”.

Mode and basis

The mode is the local collective displacement of atoms in a specified pattern (see Fig. 1a), which could be either “local mode”⁷ or, more generally, “lattice Wannier function”^29,30. In the examples depicted in this work, the local mode is simply employed. As discussed in Sec. VI of the Supplementary Information, for simple perovskites like BaTiO₃, this simple local mode basis is sufficient to capture the soft phonon bands related to the onset of ferroelectricity. In perovskites, the local mode basis of dipole motion u is typically chosen to be the local phonon mode having Γ₁₅ symmetry centered on A or B site²¹. Typically, the local mode basis is determined from the eigenvector associated with the dipolar mode of the force constant or dynamic matrix of cubic perovskite, which takes the form ξ = (ξ_A, ξ_B, ξ_X1, ξ_X2). For example, the displacement of local mode motion u_iα centered at B site consisting of the displacement of center B atom by amounts of u_iξ_B, the displacement of eight neighbor A atom by amounts of u_iξ_A/8, and the displacement of the six neighbor X atom by amounts of u_iξ_X1/2 or u_iξ_X2/2, all along the α direction. Note that it is also possible to get the local mode basis by fitting against the atomic displacement between the reference structure and low energy structure³¹. The local motion v is similar to u but with the basis corresponding to the translation motion of all the atoms in the unit cell, i.e. with ξ_A = ξ_B = ξ_X1 = ξ_X2.

The AFD mode ω is kind of different from the u and v modes since a neighboring BX₆ octahedron shares the same X atom, and thus the ω modes are not completely independent from each other. The actual movement of the X atom shared by the i and j sites associated with the AFD mode is given by

$$Delta {{boldsymbol{r}}}_{X}=frac{{a}_{0}}{2}{hat{{boldsymbol{R}}}}_{ij}times ({{boldsymbol{omega }}}_{i}-{{boldsymbol{omega }}}_{j}),$$

(1)

where ({hat{{boldsymbol{R}}}}_{ij}) is the unit vector jointing the site i and site j. By definition of Eq. (1), there are multiple (actually, infinite) different sets of {ω} modes representing the same atomic structure (i.e. with the same set of atomic displacement)³². For example, it is clear that adding an arbitrary amount of ω₀ to all of the AFD modes does not change the displacement Δr_X, since the displacement only depends on the difference between ω_i and ω_j. To eliminate such arbitrariness, we typically impose the following extra restrictions on the AFD vectors and their cyclic permutations:

$$forall {x}_{0},sum _{i,{n}_{x}(i)={x}_{0}}{omega }_{i,x}=0,$$

(2)

where i is the index of unit centered at ({n}_{x}(i)hat{x}{a}_{0}+{n}_{y}(i)hat{y}{a}_{0}+{n}_{z}(i)hat{z}{a}_{0}), (hat{x},hat{y},hat{z}) are unit vectors along the x, y, z axes, and a₀ is the lattice constant of the five-atom perovskite unit cell. The summation runs over all the sites in the same layer marked with n_x(i) = x₀. Note that our definition of atomic displacement [Eq. (1)] is identical to that in ref. ³² [Eq. (1) there in], but our formalism is different from that of ref. ³² by the extra restrictions [Eq. (2)].

It is clear from above that all of the u, v, and ω modes are linked linearly with atomic displacement about the reference structure. For a periodic supercell containing N = L_x × L_y × L_z five-atom perovskite unit cells, the relation between the modes and atomic displacements could be written as

$${bf{M}}{bf{s}}={bf{x}},$$

(3)

where s is a 9N column vector containing the modes u, v, ω in each unit cell, x is a 15N column vector containing the atomic displacement of each atom in the supercell, and M is the matrix containing the information of local mode basis. The force acting on the mode could then be obtained from the chain rule

$${f}_{s,i}=-frac{partial {E}_{{rm{pot}}}}{partial {s}_{i}}=-sum _{j}frac{partial {E}_{{rm{pot}}}}{partial {x}_{j}}{M}_{ji}.$$

(4)

This equation can be written in matrix form as

$${{bf{f}}}_{{bf{s}}}={{bf{M}}}^{T}{{bf{f}}}_{{bf{x}}},$$

(5)

where f_s and f_x gather the forces acting on the modes and atoms, respectively.

Similar to the second-principle lattice dynamics formalism³³, the actual atomic coordinates in a supercell with homogeneous strain η and atomic displacement is defined as

$${{boldsymbol{r}}}_{lk}=({mathbb{1}}+{boldsymbol{eta }})({{boldsymbol{R}}}_{l}+{{boldsymbol{tau }}}_{k})+{{boldsymbol{x}}}_{lk},$$

(6)

where ({mathbb{1}}) is the 3 × 3 identity matrix, η is the homogeneous strain (in 3 × 3 matrix format), R_l is lattice vector corresponding to the unit cell l, τ_k is the coordinate of the atom inside the unit cell. Thus, the stress compatible with that calculated from FP should be obtained using the chain rule

$${sigma }_{m}=-frac{{partial }^{{prime} }{E}_{{rm{pot}}}}{{partial }^{{prime} }{eta }_{m}}=-frac{partial {E}_{{rm{pot}}}}{partial {eta }_{m}}-mathop{sum}limits_{lkalpha }frac{partial {E}_{{rm{pot}}}}{partial {x}_{lkalpha }}frac{partial {x}_{lkalpha }}{partial {eta }_{m}},$$

(7)

as described in Appendix A of ref. ³³. Practically in this work, such relation is used inversely. The stress (-{partial }^{{prime} }{E}_{{rm{pot}}}/{partial }^{{prime} }{eta }_{m}) obtained from the FP calculations is converted to −∂E_pot/∂η_m, compatible with the direct definition of the effective Hamiltonian.

Formalism of the parametrization

Instead of doing many FP energy calculations on special structures with distortions to compute the coefficients of order-parameters coupling in effective Hamiltonian as in previous reports^7,21, our present approach is to use the on-the-fly active-learning approach to automatically compute the parameters for effective Hamiltonian. The parameters related to the long-range dipolar interaction E_long [Eq. (S6) in the Supplementary Information] [i.e. the lattice constant a₀, the dipolar mode Born effective charge Z^* and optical dielectric constant ϵ_∞ (using the notations of ref. ⁷)] are first determined directly from first principles calculations. Then, all the remaining parameters are determined through an on-the-fly machine-learning process. As indicated in Methods, the effective Hamiltonian can be written in the following form

$${E}_{{rm{pot}}}={E}_{{rm{long}}}+mathop{sum }limits_{lambda =1}^{M}{w}_{lambda }{t}_{lambda }({{boldsymbol{u}}},{{boldsymbol{v}}},{{boldsymbol{omega }}},{sigma },eta ),$$

(8)

where E_long is fixed during the fitting process, M is the number of parameters to be fitted, w_λ is the parameter to be fitted, and t_λ is the energy term associated with the parameter w_λ, which is called symmetry-adapted term (SAT), using the terminology of ref. ³³. In other words, except long-range dipolar interactions, the energy of the system is linearly dependent on the parameters. Moreover, the force (respectively, stress) has similar forms to the energy, which is obtained by taking derivative over local mode (respectively, strain) on E_long and t_λ. Such linearity is similar to the second-principle lattice dynamics³³ and MLFF with Gaussian approximation potential³⁴, allowing the application of similar regression algorithms. Here, we use the Bayesian linear regression algorithm similar to that previously used for MLFF², with several key modifications for the effective Hamiltonian context.

Given the linearity above, the linear parts of energy, force, and stress for each structure a calculated from the effective Hamiltonian could be written in the following matrix form

$${tilde{{bf{y}}}}_{a}equiv {{bf{y}}}_{a}-{{bf{y}}}_{a}^{{rm{long}}}={{boldsymbol{phi }}}_{a}{bf{w}},$$

(9)

where y is a vector containing the energy per unit cell with respect to the reference structure, the forces acting on the modes, and the stress tensor (in total m_a = 1 + 9N_a + 6 elements, where N_a is the number of unit cells in the structure a, and we consider the above mentioned nine local modes in our models), y^long is a vector in similar layout associated with the E_long term, and w is a vector that consists of all the parameters w_λ, λ = 1, ⋯ , M; and ϕ_a is an m_a × M matrix consisting of the SATs and their derivatives with respect to modes and strain. Note that in the effective Hamiltonian formalism, the potential energy of the reference structure is zero by definition. Thus, the energy obtained from the FP calculations in y should be subtracted by the energy of the reference structure to be consistent with the effective Hamiltonian.

In the parametrization process, a set of structures is selected as the training set (see “On-the-fly learning”), and the structures are indexed by a = 1, ⋯ , N_T. First-principles calculations are performed on these structures to get the energy per unit cell, forces acting on the atoms, and the stress tensor. The forces acting on the modes are then obtained by applying Eq. (5). The ({tilde{{bf{y}}}}_{a}) vector of all the structures in the training set then constitutes the vector Y containing ∑_am_a elements. On the other hand, the ϕ_a matrices of the structures in the training set constitute the Φ matrix. In this form, the parametrization problem is to adjust w to fit Φw against Y. To balance the energy, force and stress values with different dimensions properly, they are typically divided by their standard deviation in the training set to get dimensionless values. Furthermore, an optional weight could be assigned to each of the types to adjust the preference between different fitting targets. Practically, this is achieved by left multiplying a diagonal matrix H made up of h_i/σ_i to Φ and Y, where h_i and σ_i are the weight and standard deviation of the specified type of values (energy, force of different modes and stress), respectively.

Given two necessary assumptions satisfied (see Appendix B of ref. ²), the posterior distribution of the parameter is a multidimensional Gaussian distribution

$$p({bf{w}}| {bf{Y}})={mathcal{N}}(bar{{bf{w}}},{mathbf{Sigma }}),$$

(10)

where the center of the distribution

$$bar{{bf{w}}}=frac{1}{{sigma }_{v}^{2}}{mathbf{Sigma }}{{mathbf{Phi }}}^{T}{bf{Y}}$$

(11)

is the desired optimal parameters, and the variance

$${mathbf{Sigma }}={left[frac{1}{{sigma }_{w}^{2}}{bf{I}}+frac{1}{{sigma }_{v}^{2}}{{mathbf{Phi }}}^{T}{mathbf{Phi }}right]}^{-1}$$

(12)

is a measure of the uncertainty of the parameters. Here, I is the identity matrix, σ_v is a hyperparameter describing the deviation of FP data from the model prediction ϕ_αw, and σ_w is a hyperparameter describing the covariance of the prior distribution of parameter vector w (see Appendix B of ref. ² for more details).

Given the observation of the training set, the posterior distribution of the energy, forces, and stress of a new structure is also shown to be a Gaussian distribution²

$$p(tilde{{bf{y}}}| {bf{Y}})={mathcal{N}}({boldsymbol{phi }}bar{{bf{w}}},{boldsymbol{sigma }}),$$

(13)

where the covariance matrix

$${boldsymbol{sigma }}={sigma }_{v}^{2}{bf{I}}+{boldsymbol{phi }}{mathbf{Sigma }}{{boldsymbol{phi }}}^{T}$$

(14)

measures the uncertainty of the prediction of the new structure. Following ref. ², the diagonal elements of the second term is used as the Bayesian error. If the Bayesian error is large, the prediction on the energy, forces, and stress by the current effective Hamiltonian model is considered unreliable, then a new FP calculation is required to fit the parameters.

The hyperparameters σ_v and σ_w are determined by evidence approximation², in which the marginal likelihood function corresponding to the probability of observing the FP data Y with σ_v and σ_w is maximized [see Eq. (31) and Appendix C in ref. ²]. Practically, the hyperparameters σ_v and σ_w are calculated along with Σ and (bar{{bf{w}}}) by executing self-consistent iterations at each time when FP data for a new structure is collected.

The Bayesian linear regression described above is equivalent to the ridge regression^5,35 in which the target function

$${mathcal{O}}=parallel {mathbf{Phi }}{bf{w}}-{bf{Y}}parallel +lambda parallel {bf{w}}{parallel }^{2}$$

(15)

is minimized, where λ is the Tikhonov parameter which is equivalent to ({sigma }_{v}^{2}/{sigma }_{w}^{2}) here^5,35. The main purpose of imposing the Tikhonov parameter is to prevent overfitting³⁵. However, in the context of effective Hamiltonian parametrization, there are only a small amount of parameters to be determined (typically from several tens to over a hundred), while the number of values collected from FP calculations is typically much larger, which means the linear equations Φw = Y is greatly overdetermined. In such case, the regularization is usually not necessary. If the regularization term λ∥w∥ in Eq. (15) is removed, the problem becomes a simple linear least square fitting, and the parameter could be simply solved by

$${bf{w}}={{mathbf{Phi }}}^{+}{bf{Y}},$$

(16)

where Φ⁺ is the Moore-Penrose pseudoinverse of the matrix Φ, which could be computed by performing the singular-value decomposition of Φ³⁶. Indeed, our numerical tests show that the resulting parameter from such fitting without regularization is pretty close to that obtained by Bayesian linear regression. Similar observation is also reported in the context of MLFF with Gaussian approximation potential model (or its analogs)³⁶. On the other hand, in effective Hamiltonian, unlike Gaussian approximation potential, the parameters in w have different dimensions, and it is hard to balance the values between different parameters, indicating that the regularization may be not suitable for effective Hamiltonian. Based on the two reasons above, in our fitting scheme, it is typically assumed that σ_w → ∞, and thus the equivalent Tikhonov regularization parameter λ approaches zero, and the fitting scheme is equivalent to the linear least square fitting.

On-the-fly learning

In our approach, the parameters of effective Hamiltonian are fit in a scheme similar to that generating on-the-fly machine-learning force field (MLFF)² with some modifications for the effective Hamiltonian scheme. The parameters are fitted during effective Hamiltonian MD simulations on relatively small cells. The effective Hamiltonian MD simulations are performed by solving the equations of motions of each degree of freedom

$$begin{array}{rcl}displaystylefrac{partial {p}_{i}}{partial t}&=&-displaystylefrac{partial {E}_{{rm{pot}}}}{partial {s}_{i}},\ displaystylefrac{partial {s}_{i}}{partial t}&=&displaystylefrac{{p}_{i}}{{m}_{i}},end{array}$$

(17)

where p_i is the momentum associated with the degree of freedom s_i; m_i is the effective mass of the degree of freedom s_i; and t is the time. In this work, the effective mass of the degrees of freedom is obtained by taking the weighted average mass of the cooperated atoms according to the square of their normalized distortion amplitudes in the basis, as in ref. ³⁷. Note that there are also some other strategies for choosing the effective masses, for example, fitting against the phonon frequencies as in ref. ³¹. As shown in Fig. 1b, in each MD step, the energy, forces and stress tensor on the structure as well as their uncertainties are predicted by the effective Hamiltonian with the current parameters and collected data using the Bayesian linear regression. If the uncertainty (Bayesian error) of the energy, forces, or stress tensor is large, the FP calculation is executed, the corresponding results are stored in the training set, and the parameters are refitted using the updated training set; otherwise, the FP calculation is skipped. Then, the structure is updated by executing one MD step with the forces and stress from the FP calculation (if available) or those from the effective Hamiltonian.

During the fitting process, the Bayesian errors of the energy, forces and stress predicted by the effective Hamiltonian are calculated by Eq. (14) and compared to the threshold to determine whether FP calculation is necessary. At the beginning, the threshold is typically initialized with zero. Before the setting up of the non-zero threshold, the FP calculations take place in a fixed interval of several steps (say, 10 or 20 MD steps). The threshold is then updated dynamically during the fitting process using the flow similar to that in ref. ², with the exception that the spilling factor is not used in this work. Note that different from ref. ², the parameters are typically fitted immediately as soon as the new FP calculation is finished, instead of fitted after several FP results are obtained. This difference stems from the observation that the parameter fitting for the effective Hamiltonian is typically very fast compared to the FP calculations. Such immediate fitting is helpful for reducing the required number of FP calculations and improving the fitting efficiency.

Applications

Simple perovskite BaTiO₃

The on-the-fly learning effective Hamiltonian is first applied to simple perovskite BaTiO₃, which is one of the most studied ferroelectric perovskites. Figure 2a, b, and c show the parameter evolutions during on-the-fly machine learning. The simulation is performed on 2 × 2 × 2 supercell (40 atoms) at the temperature of 50 K. The Bayesian error during the fitting process is displayed in Fig. 2a. At the beginning (about 500 steps), the Bayesian error is quite large, and FP calculations are called frequently. As the fitting progresses, more FP data is collected, and the parameters are updated, leading to the rapid decline of Bayesian error. The threshold is also adjusted dynamically in this process. After about 1000 MD steps, the threshold is nearly unchanged and the FP calculations are only rarely required. Figure 2b shows the potential energy predicted by the effective Hamiltonian and that computed from FP calculations in the simulation, showing they are close to each other at each step. Figure 2c shows the mode evolution during the simulation. In the range shown in Fig. 2a–c, about 35000 MD steps are taken, and only 36 FP calculations are performed. The fitting process is further taken on 2 × 4 × 4 supercell (160 atoms) to get the parameters corresponding to Fig. 2d, e.

**Fig. 2: On-the-fly machine learning of parametrization for BaTiO₃.**

Figure 2d, e show the phase diagrams of BaTiO₃ obtained from the effective Hamiltonian simulation with parameters from the conventional parameterization and from on-the-fly learning, respectively. The supercell size is chosen to be 12 × 12 × 12 (corresponding to 8640 atoms). At high temperatures, all components of the dipolar mode are zero, characterizing a paraelectric cubic (C) phase. With the decreasing of temperature, the C phase sequentially transforms into ferroelectric tetrahedral (T), orthogonal (O), and rhombohedral (R) phases, characterized by one, two, and three non-zero components of the dipolar mode, respectively. Such C-T, T-O, and O-T phase transition sequences simulated by the effective Hamiltonian with both sets of parameters are correctly reproduced and are consistent with experimental results^7,9. For the calculations with the parameters from conventional FP calculations (Fig. 2d), the C-T, T-O, and O-T phase transition temperatures are about 280, 230, and 200 K, respectively. While for the calculation with on-the-fly learning parameters, they are 380, 270, and 220 K, respectively, much closer to the experimental values of 403, 278, and 183 K, respectively⁹. Note that such improvement of critical temperature mainly originates from the inclusion of new anharmonic intersite interactions. More precisely, it is found that the following two terms play important roles in the improvement of phase transition temperatures, namely,

$${E}_{{rm{inter}},{u}^{3}-{u}^{1},2}={K}_{2}^{{u}^{3}-{u}^{1}}left({u}_{i,z}^{2}{u}_{i,y}{u}_{i+{boldsymbol{z}},y}+,{text{symmetrically}},{text{equivalent}},{text{terms}},right),$$

(18)

and

$${E}_{{rm{inter}},{u}^{3}-{u}^{1},5}={K}_{5}^{{u}^{3}-{u}^{1}}left({u}_{i,x}^{2}{u}_{i,y}{u}_{i+{boldsymbol{z}},y}+,{text{symmetrically}},{text{equivalent}},{text{terms}},right),$$

(19)

where u_i,x indicates the x component (here the x, y, z Cartesian coordinates are along the pseudocubic [100], [010], [001] directions, respectively) of the local dipolar mode at the unit cell indexed i, u_i+z,y indicates the y component of the dipolar mode at the neighbor unit cell of i with relative position vector ({a}_{0}hat{{boldsymbol{z}}}) (where a₀ is the lattice constant of the five-atom perovskite unit cell, and (hat{{boldsymbol{z}}}) is the unit vector along the z direction). For simplicity, only one representative product is given for each of the two terms, with other (11 for each) symmetrically equivalent terms omitted. Such terms could be understood as “transverse” modification to the interaction between parallel local dipolar modes (see Fig. S3 in the Supplementary Information) at finite temperatures, making the ferroelectric phase more stable and thus improving the prediction of the phase transition temperatures. A more detailed discussion is given in the Supplementary Information.

Multidomain PbTiO₃

The on-the-fly learning approach is then applied to the ferroelectrics PbTiO₃ to investigate the domain wall structure. It was previously reported³⁸ that the multidomain PbTiO₃ could exhibit Bloch feature in the domain wall. To investigate the domain wall structure with our effective Hamiltonian scheme, the multidomain PbTiO₃ with a Bloch-like domain wall is built (Fig. 3a). After the simulated annealing using the effective Hamiltonian, we obtain the configuration depicted in Fig. 3b, c. The local dipolar mode (directly proportional to local polarization) is along the x-axis in both domains. Near the domain wall, the magnitudes of the x component of the dipolar mode decrease and are then reversed. Meanwhile, the y component of local dipolar mode is almost zero in both domains, while it shows relatively significant values in the domain wall. Such changes in the local dipolar mode show a Bloch-like character in the domain wall, which is qualitatively consistent with what was previously reported. Note, however, that there are some quantitative differences from the cited work. More specifically, in ref. ³⁸, the polarization (along the y direction) in the domain wall is comparable to that in the domain, while in this work, the polarization in the domain wall is much smaller than that in the domain. Such difference may be attributed to the reduced degrees of freedom of the effective Hamiltonian, and/or some details of the physical conditions of the simulation (for example, the starting temperature of the simulated annealing simulation, the size of the supercell, etc.) or some details of the construction of our model (for example, the local mode basis, fitting configurations or even details of the employed FP method). While the detailed investigation into these conditions remains for another work, it is remarkable that our on-the-fly learning scheme—with MD simulations restricted to relatively small supercells and no explicit consideration of ferroelectric domain walls—already captures the essential Bloch character of this boundary.

**Fig. 3: Dipolar mode distribution for a PbTiO₃ multidomain with domain walls.**

Solid-solution Pb(Zr_0.75Ti_0.25)O₃

The on-the-fly learning approach is then applied to the solid solution of ferroelectric Pb(Zr_1−xTi_x)O₃, which is of great interest because of its high piezoelectricity and widespread applications³⁹. We choose the solid solution of Pb(Zr_1−xTi_x)O₃ (x = 0.25) (PZT25) to demonstrate our on-the-fly learning effective Hamiltonian. The active learning is performed on PZT25 using 2 × 2 × 2 and 2 × 4 × 4 (40 and 160 atoms, respectively) with random arrangements of Ti and Zr atoms (see Supplementary Information). Using the parameters from this active learning, effective Hamiltonian calculations with supercell of 12 × 12 × 12 show that PZT25 possesses cubic (C), rhombohedral with space group R3m, and rhombohedral with space group R3c phases as the temperature decreases from 700 K to 20 K, with the transition temperature around 540 K (C-R3m) and 340 K (R3m–R3c) (see Fig. S4 in the Supplementary Information), very close to the experimental values of 593 K (C to R3m) and 390 K (R3m to R3c)⁴⁰, indicating the validity of our on-the-learning scheme for solid-solution structures.

Polar skyrmion-like nanodomains

The emergent and exotic phases of polar topological configurations, such as polar skyrmions, have garnered enormous interest in condensed-matter physics. Most polar topological configurations were found near the interface of polar superlattices or heterostructures^19,41. Here, we find polar skyrmion in SrTiO₃/PbTiO₃ bilayer by our on-line-fly learning effective Hamiltonian.

The structure of PbTiO₃ with surface capping by a few SrTiO₃ layers is considered. Supercell of 48 × 48 × 48 with 43 PbTiO₃ layers, 5 SrTiO₃ layers, and vacuum layers along the z direction is used. The parameters of the effective Hamiltonian are obtained by performing our on-the-fly learning approach on (Pb_7/8Sr_1/8)TiO₃ solid solutions with random arrangements of Pb and Sr atoms. Figure 4a shows the local dipole configuration averaged over the top 10 PbTiO₃ layers obtained from hybrid Monte Carlo (HMC) simulations at 10 K. One can clearly see that there are topological upwards-oriented nanodomains embedded in a downwards-oriented matrix. The in-plane polarization within such nanodomains has a center-divergent character with the two-dimensional winding number equal to one of each domain^19,42. Figure 4b shows the local dipoles in (010) plane of the nanodomain delimited by white circle of Fig. 4a, indicating the center-divergent polar skyrmion-like nanodomain (see the sketch in Fig. 4c). Note that such polar skyrmion-like nanodomain is consistent with the experimental findings, as show in Sec. IX in the Supplementary Information. The above simulation and verification confirm the validity of our on-the-fly learning effective Hamiltonian approach for such complex interaction and complex system.

**Fig. 4: Polar distribution of SrTiO₃/PbTiO₃ bilayer.**

Discussion

To demonstrate the computational efficiency of our effective Hamiltonian methods, we compare the time consumed by the effective Hamiltonian MD with other methods, such as deep potential MD⁴³, MLFF MD, and ab initio MD simulations². As shown in Fig. 5, the time consumed by ab initio MD simulations increases drastically with the increase of supercell size, and is much slower than other methods, as consistent with common beliefs. The time spent by other methods increases slowly with the increase of supercell size within a similar slope in the log-log scale. For the same supercell size, the time spent by effective Hamiltonian simulation is less than the deep potential MD and MLFF MD by about 3 orders of magnitude, respectively. Notably, the size of the investigated structure by the effective Hamiltonian here is up to 128 × 128 × 128 supercells, corresponding to 10485760 atoms, with only one CPU core. The time spent by effective Hamiltonian simulation shows a nearly linear dependency on the number of atoms in the log-log scale with a slope of about 1.036 as obtained from linear fitting. Such a result indicates that, numerically, the consumed time t_MD scales nearly linearly with respect to the number of atoms N_at as ({t}_{{rm{MD}}}approx C{N}_{{rm{at}}}^{1.036}), where C is a constant. Note that the theoretical asymptotic time complexity should be (O({N}_{{rm{at}}}{log }_{2}{N}_{{rm{at}}}))⁴⁴, in reasonable consistent with the numerical result. Note that these methods possess similar accuracy as they are all based on FP calculations. Although the effective Hamiltonian method only includes 6 ionic degrees of freedom in each five-atom perovskite unit cell (i.e., the local dipolar mode {u_i} and inhomogeneous strain variable {v_i}, each containing three degrees of freedom per unit cell) while the other methods include full sets of 15 degrees of freedom, all of these methods capture the most important distortions related to the ferroelectric phase transitions.

Fig. 5: Computational time for 100 MD steps calculations as a function of the number of atoms in the simulated BaTiO₃ supercell, using the effective Hamiltonian (H_eff), MLFF, deep potential MD, and ab initio MD (AIMD).

It is worth noting that in some previous MLFF or second-principles works, the parameters are fitted against some pre-generated distorted structures. For example, AIMD trajectories are used in ref. ³³. In fact, we have also tried to fit the effective Hamiltonian using the trajectories generated from AIMD, but the resulting parameters are not accurate (as they could not reproduce some important phase transitions) and show a strong dependency on the choice of AIMD conditions (for example, the starting configuration of AIMD simulations; the temperature of AIMD simulation, etc). This fact indicates that the effective Hamiltonian parameters are rather sensitive to the training set, and the pre-generated dataset must be constructed with special care. In this sense, active-learning strategies are preferred.

In summary, an on-the-fly active-learning scheme is developed to obtain the parameters of the effective Hamiltonian methods. The parameters are computed during MD simulations. The energy, forces, and stress, as well as their Bayesian errors, are computed at each MD step based on the effective Hamiltonian, and FP calculations are called to fit the parameters when the Bayesian errors are large. Typically, very few FP calculations are required in this process (usually much less than 1% of the total MD steps). The fitting procedure based on Bayesian linear regression provides not only the values of the parameters but also their uncertainties. Such a learning scheme offers a new way with high precision to parametrize the effective Hamiltonian in a universal and automatic process and is especially highly applicable for systems that have complex interactions in complex systems.

Methods

Effective Hamiltonian for perovskites

In the effective Hamiltonian of perovskites, the local modes {s_i} include the following types (1) the local dipolar mode u_i in each five-atom perovskite unit cell i, which is directly proportional to the local electric dipole in unit cell i⁷; (2) the pseudovector ω_i centered at B site, characterizing the BX₆ octahedral tilting, also known as antiferrodistortive (AFD) distortions⁴⁵; and (3) the local variable v_i characterizing the inhomogeneous strain around the unit cell i⁷. Note that the u_i and v_i vectors could be chosen to be centered at either A site or B site for different materials. The potential energy of perovskites contains four main parts

$$begin{array}{ll}{E}_{{rm{pot}}},=,{E}_{{rm{single}}}({{{boldsymbol{u}}}_{i}},{{{boldsymbol{omega }}}_{i}},{{{boldsymbol{v}}}_{i}})+{E}_{{rm{strain}}}({{{boldsymbol{u}}}_{i}},{{{boldsymbol{omega }}}_{i}},{{{boldsymbol{v}}}_{i}},eta )\ qquadquad+{E}_{{rm{inter}}}({{{boldsymbol{u}}}_{i}},{{{boldsymbol{omega }}}_{i}},{{{boldsymbol{v}}}_{i}})+{E}_{{rm{spring}}}({{{boldsymbol{u}}}_{i}},{{{boldsymbol{omega }}}_{i}},{{{boldsymbol{v}}}_{i}},{{sigma }_{i}}).end{array}$$

(20)

The first two terms E_single and E_strain contains mainly the terms already reported in previous effective Hamiltonian works^7,12 (with a small amount of extension, see the Supplementary Information for more details). The last two terms E_inter and E_spring are different from previous works (see, e.g., refs. ^45,46,47). Such terms are derived directly from symmetry here, making them more general, and the accuracy of the effective Hamiltonian can then be improved systematically.

The E_inter in Eq. (20) contains several two-body interaction terms that take the following form

$${E}_{{rm{inter}}}^{pq}=sum _{ijab}{p}_{a}({{boldsymbol{R}}}_{i}){q}_{b}({{boldsymbol{R}}}_{j}){K}_{ab}({{boldsymbol{R}}}_{i}-{{boldsymbol{R}}}_{j}),$$

(21)

where R_i and R_j are the position of the sites indexed by i and j, p and q are two variables participating in this interaction, and a, b are their subscripts. The interaction matrix K_ab(R_i − R_j) contains the symmetry and parameter of the interaction. The specific form of the interaction matrix is determined by finding the symmetry invariant terms under the symmetry operations of the reference structure (see ref. ⁴⁸). This interaction term [Eq. (21)] may be either long-ranged or short-ranged. For long-range interactions, both the i and j indexes run over all the sites in the simulated supercell. On the other hand, for short-range interactions, the i index runs over all the sites in the simulated supercell, while the j runs over the neighbor sites around i (within certain range for each type of interaction). In such a case, the interaction matrix is localized. Note that, the interaction variable p, q here could be not only the primitive degrees of freedom (i.e. u, v, ω), but also their onsite direct products. For example, the 6-dimension vector U expressing the onsite direct product of u ⊗ u with subscript a being Voigt notation (({U}_{1}={u}_{x}^{2},{U}_{4}={u}_{y}{u}_{z})) could be a valid interaction variable in Eq. (21). Throughout this article, the expression “p^m − qⁿ interaction” denotes the interaction that includes mth order contribution from p and nth order contribution from q. For example, the “u¹ − ω² interaction” is the interaction that equivalent to

$${E}_{{rm{inter}}}^{{u}^{1}-{omega }^{2}}=sum _{ijalpha beta gamma }{u}_{alpha }({{boldsymbol{R}}}_{i}){omega }_{beta }({{boldsymbol{R}}}_{j}){omega }_{gamma }({{boldsymbol{R}}}_{j}){K}_{alpha beta gamma }({{boldsymbol{R}}}_{i}-{{boldsymbol{R}}}_{j}).$$

(22)

As in previous MD and HMC works⁴⁴, the interaction in Eq. (21) could be handled in the reciprocal space by using fast Fourier transformation to improve the computational efficiency.

The detailed interaction terms that are used in this work are listed in the Supplementary Information. A brief discussion about the “inhomogeneous strain” η_I introduced in previous works^7,12 is also given in the Supplementary Information.

The E_spring term in Eq. (20) consists of several so-called “spring” terms that take the following form

$${E}_{{rm{spring}}}^{p}=sum _{ija}{p}_{a}({{boldsymbol{R}}}_{i}){J}_{a}({sigma }_{j},{{boldsymbol{R}}}_{j}-{{boldsymbol{R}}}_{i}),$$

(23)

where p is variable [like that in Eq. (21)] that can be primitive local modes (u, v, ω) or their onsite direct products, and J_a(σ_j, R_j − R_i) is the interaction matrix containing the symmetry and parameter that depending on the occupation on site j and the position difference between i and j sites. To determine the specific form of J matrix, the σ variable is treated as an onsite scalar variable that is invariant under any symmetry operations. Then, the interactions allowed by symmetry are found by performing symmetry operations (of the reference structure space group) on the products p(R_i)σ(R_j) and finding the invariant terms under such operations. Practically, the following spring interactions are considered: (1) The spring interaction of u of first, second, and third order. (2) The spring interaction of v of first order. (3) The spring interaction of ω with second order. Note that for both the cases of multi A- or B-site element (i.e., the σ variables are centered on A- or B-site) and ω centering on B site, the first order of spring interaction is forbidden by symmetry. Thus, the second-order interaction is the lowest-order one. A brief discussion about the relations and differences of the treatment of “alloying effect” (using the terminology of ref. ²²) between previous works and current work is given in the Supplementary Information.

Note that, for specified materials, some of the above terms may not be used since their effects are not important.

Computational details

On-the-fly machine learning for parametrization of effective Hamiltonian is performed on 2 × 2 × 2 or 2 × 4 × 4 supercells (corresponding to 40 or 160 atoms). The MD simulations are performed with isothermal-isobaric (NPT) ensemble using Evans-Hoover thermostat⁴⁹. Typically, each MD simulation on a given structure is executed for 20 ps to 200 ps. For each MD step, the FP calculation is required by the on-the-fly machine-learning process, and first-principles self-consistent calculation within density functional theory (DFT) is performed. All the FP calculations are performed using the VASP package⁵⁰ with the projector augmented wave (PAW) method. The solid-revised Perdew-Burke-Ernzerhof (PBEsol)⁵¹ functional is used. The 3 × 3 × 3 and 3 × 2 × 2 k-point meshes are used for the supercells with 40 and 160 atoms, respectively, and the plane wave cutoff of 550 eV is employed. The optical dielectric constant is computed using the density functional perturbation theory (DFPT)⁵². The Born effective charge of the local mode is obtained by fitting the polarization against the local mode amplitude, where the polarization is computed using the Berry phase method⁵³.

The phase transition simulations are conducted by Monte Carlo (MC) simulations with Metropolis algorithm⁵⁴ or hybrid MC algorithm⁴⁴ (HMC) with the effective Hamiltonian method. Each HMC sweep consists of 40 MD steps. Supercells of 12 × 12 × 12 (corresponding to 8640 atoms) are used unless specially noted. For the phase transition simulations, the systems are cooled down from high temperatures (450 K and 700 K for BaTiO₃ and PZT25, respectively) to 20 K with relatively small temperature steps of 10 K.

In the MD and MC simulations, the following quantities are computed: (i) the average dipolar mode defined as ({boldsymbol{u}}=frac{1}{N}{sum }_{i}{{boldsymbol{u}}}_{i}), (ii) the average amplitude of dipolar mode defined as (| u| =frac{1}{N}{sum }_{i}| {{boldsymbol{u}}}_{i}|), and (iii) the AFD at R point defined as ({{boldsymbol{omega }}}_{R}=frac{1}{N}{sum }_{i}{{boldsymbol{omega }}}_{i}{(-1)}^{{n}_{x}(i)+{n}_{y}(i)+{n}_{z}(i)}).

In the study of BaTiO₃, the dipolar mode {u_i} is chosen to be centered at B site, the inhomogeneous strain variable {v_i} is chosen to be centered at A site, and the AFD {ω_i} variables are frozen at zero since they are not important in BaTiO₃. Such configuration of the degrees of freedom is consistent with previous works⁷. In the effective Hamiltonian, the dipolar mode onsite energy [Eq. (S2)] is considered up to the quartic order, the elastic energy [Eq. (S4)] is considered up to the quadratic order, and the η − u interaction is considered only up to the first term in Eq. (S5). The 2 × 2 × 2 and 2 × 4 × 4 supercells (corresponding to 40 and 160 atoms, respectively) are used in turn for on-the-fly learning. The parameters associated with Fig. 2d are obtained using conventional method^7,21 with PBEsol functional. The j₅ and j₇ parameters (see ref. ⁷) are set to zero, as in ref. ⁵⁵. For the phase transition simulations, negative pressure of −3 GPa is applied for both models (Fig. 2d, e) to correct the possible underestimation of the lattice constant by the DFT calculations.

In the study of PbTiO₃, the dipolar mode {u_i} is chosen to be centered at A site, the inhomogeneous strain variable {v_i} is chosen to be centered at B site, and the AFD {ω_i} variables are frozen at zero since they are not important in PbTiO₃. Such configuration of the degrees of freedom is consistent with previous works³¹. The local mode basis is chosen by fitting the atomic displacement between the cubic reference structure and the ferroelectrics tetrahedral structure. In the effective Hamiltonian, the dipolar mode onsite energy [Eq. (S2)] is considered up to the quartic order, the elastic energy [Eq. (S4)] is considered up to the quadratic order, and the η − u interaction is considered only up to the first term in Eq. (S5). The 2 × 2 × 2 and 2 × 4 × 4 supercells (corresponding to 40 and 160 atoms, respectively) are used in turn for on-the-fly learning.

The domain wall structure is simulated using a 12 × 12 × 20 supercell (corresponding to 14400 atoms). The simulated annealing process starts from an HMC simulation at a relatively low temperature, followed by a sequence of MD simulations with decreasing temperature down to 1 × 10⁻⁷ K. The negative pressure of −6 GPa is applied to correct the potential underestimation of the lattice constant by the FP calculations.

In the study of PZT25, the local dipolar mode {u_i} is chosen to be centered at A site, the inhomogeneous variable {v_i} and the AFD pseudovector {ω_i} are centered at B site. The variable {σ_i} is introduced to describe the occupation of Zr and Ti atoms at B site, where σ_i = 1, 2 denote Ti, Zr atom sit at the B site indexed by i, respectively. The dipolar mode onsite energy is expanded up to the fourth order. The spring interaction of u is considered up to the third order and the nearest neighbor, the spring interaction of v is considered up to the first order and second nearest neighbor, and the spring interaction of ω is considered up to the second order (which is the symmetry-allowed interaction of the lowest order) and the nearest neighbor. The Zr and Ti atoms are distributed randomly in the simulated supercells.

In the study of SrTiO₃/PbTiO₃ bilayer, the effective Hamiltonian is fitted to (Pb_7/8Sr_1/8)TiO₃ solid solutions. The local dipolar mode {u_i} is chosen to be centered at A site, the inhomogeneous variable {v_i} and the AFD pseudovector {ω_i} are centered at B site. The variable {σ_i} is introduced to describe the occupation of Pb and Sr atoms at A site, where σ_i = 1, 2 denote Pb, Sr atom sit at the A site indexed by i, respectively. The local mode basis of the local dipolar mode is obtained from fitting against the atomic distortion between the ferroelectric tetrahedral phase and the cubic perovskite phase. The dipolar mode onsite energy is expanded up to the fourth order. The spring interaction of u is considered up to the third order and the nearest neighbor, the spring interaction of v is considered up to the first order, and first nearest neighbor, and the spring interaction of ω is considered up to the second order (which is the symmetry-allowed interaction of the lowest order) and the nearest neighbor. The Pb and Sr atoms are distributed randomly in the supercell during the fitting process. The SrTiO₃/PbTiO₃ bilayer is modeled by 48 × 48 × 48 supercell (corresponding to approximately 552960 atoms) that consists of 43 unit cell layers of PbTiO₃ and 5 unit cell layers of SrTiO₃ along the z-axis, where periodic boundary condition is induced in x, y axes but not z-axis. Both (001) bilayers are terminated with A site layer. An epitaxy strain of −0.58% is imposed to mimic the SrTiO₃ substrate. The local configuration of Fig. 4a, b is obtained from a quench simulation (fast cool from 410 K to 10 K with temperature step of 100 K and 5000 HMC sweeps at each temperature).

In the computational efficiency tests for Fig. 5, the effective Hamiltonian simulation is conducted with one CPU core on different supercell sizes of BaTiO₃ for 10000 steps, the average time spent by each MD step is then calculated (with the initial preparation time excluded). For each size of a supercell, such process is repeated 5 times to get the average time. The MLFF simulation is performed using the VASP² package version 6.4.2. The force field for BaTiO₃ is first trained within 2 × 2 × 2 supercell (40 atoms) at 300 K using 10000 MD steps. In this process, 461 local reference structures are collected. Then, it is switched to the prediction-only mode (ML_MODE=run) after refitting the field (with ML_MODE = refit) to measure the consumed time. For each supercell size, the simulation is performed with 1 CPU core for 100 MD steps, and the consumed times by each step (excluding the first and last step) are averaged to produce the results. The deep potential MD⁴³ simulation is performed with the LAMMPS package with one CPU core. Five repeat simulations, each lasting for 1000 steps, are conducted for each supercell size, and the time spent by each MD step is averaged. All the tests above are performed on an Intel(R) Xeon(R) Silver 4210R CPU using one core. The ab initio MD simulation is performed using the VASP package on 2 × 2 × 2 and 3 × 3 × 3 supercells (40 and 135 atoms, respectively) with Gamma-centered K point mesh of 3 × 3 × 3 and 2 × 2 × 2, respectively. For each supercell size, the simulation lasts for 100 steps. The time consumed by each step (apart from the first step) is averaged. The ab initio MD simulations are performed on the Intel(R) Xeon(R) CPU E5-2680 v3 CPU using 24 cores.

Sample deposition

The SrTiO₃/PbTiO₃ bilayer heterostructures are deposited by pulsed laser deposition. The PbTiO₃ films, about 40 nm in thickness, were deposited on (001)-oriented SrTiO₃ substrates with 80 nm thick SrRuO₃ electrodes, followed by depositing a 2 nm thick SrTiO₃ capping layer. The SrRuO₃ electrode, PbTiO₃ film and SrTiO₃ capping layer were deposited at 660, 620, and 700^∘C, respectively, using a 248-nm KrF excimer laser (COMPex Pro 205F, Coherent) with an energy flux density of 1.5 J/cm² on SrRuO₃, PbTiO₃ and SrTiO₃ ceramic targets and a repetition rate of 3 Hz. 20% excessive Pb was added into the PbTiO₃ target to compensate for the Pb loss during deposition. The oxygen partial pressure for the deposition of SrRuO₃ and SrTiO₃ is 100 mTorr, and for the deposition of PbTiO₃ is 80 mTorr.

PFM measurement

Ferroelectric domain structures of various SrTiO₃/PbTiO₃ bilayers were characterized at room temperature by atomic force microscope (Cypher ES, Asylum Research). NanoWorld EFM platinum/iridium-coated tips and Adama Supersharp Au tips, both 2.8 N/m in force constant, were used in PFM measurements. The ac signal applied on the tip for all the PFM measurements is 800 mV in amplitude. The samples were grounded in all the measurements. Piezoresponse phase-voltage hysteresis loops were collected in the dual a.c. resonance tracking mode. The vector PFM was conducted with different in-plane sample rotation angles to reconstruct the domain structures^56,57.