A physics-enforced neural network to predict polymer melt viscosity

Introduction

Additive Manufacturing (AM) enables the rapid creation of metal or polymer parts with previously unimaginable features and topologies and is therefore poised to disrupt a variety of industries^1,2. For polymers, achieving desired properties in the final component is determined by the appropriate choices of material chemistries with suitable rheological properties, as well as conditions adopted during the AM process such as temperature, extrusion rates, etc. At present, a limited palette of chemistries, properties, and conditions is utilized, generally guided by experience, intuition, and empiricism.

In this contribution, we adopt an informatics approach relevant to AM across the chemical and process condition space, to predict one critical rheological property of polymers, namely, the melt viscosity η. Informatics approaches have made major inroads in recent years within materials research^3,4,5, leading to accelerated means for property predictions and providing guidance for the design of new materials^6,7,8,9,10. These methods start with available materials data on properties of interest. The materials are then represented numerically to capture and encode their essential features in a machine-readable format. The numerical representations, or fingerprints, are then mapped to available property data using machine learning (ML) algorithms, ultimately producing predictive models for the property considered^{6,11,12,13,14,15}. Within the AM space, similar methods have been used for final component property prediction¹⁶, process monitoring¹⁷, geometric configuration¹⁸, composition optimization¹⁹, and optimization of printing parameters (albeit mainly for powder-bed AM^1,18, but not as much for polymer melt extrusion AM²). Extrusion AM relies on the precise control of polymer melts, which currently requires data from extensive rheological experiments for each new chemistry. This is a bottleneck in the ink development process². Therefore, predictive capabilities for rheological properties, such as η, are useful to reduce the number of physical experiments aimed at optimization and design.

Melt viscosity of polymers, beyond being a critical property, is attractive to model with ML because there is a reasonable amount of related literature data, although with limited chemical diversity compared to other polymer property datasets^{5,12,15,20,21}. Additionally, there are known physical equations (albeit with empirical parameters) that describe the dependence of η on its governing conditions: temperature (T), average molecular weight (M_w), and shear rate (({dot{gamma }})) (Fig. 1A). For instance, it is known that η increases with increasing M_w (via piece-wise power law dependencies), decreases (non-linearly) with increasing ({dot{gamma }}), and decreases (exponentially) with increasing T. Explicit functional forms and additional background on the behaviors are provided in the Methods section. Molecular weight distributions, quantified by the polydispersity index (PDI), are also known to affect melt viscosity^22,23,24. With this situation in mind, previous works have also addressed the modeling of η using ML^25,26,27,28. While promising, the majority of these works have focused on specific scenario or are shown to predict unphysical results, making them difficult to apply.

A physics-enforced neural network to predict polymer melt viscosity — **Fig. 1: The melt viscosity (η) physical trends, learning problems, and machine-learning workflow.**

In the present work, we create a physics-enforced neural network (PENN) framework that produces a predictive model of polymer melt viscosity which explicitly encodes the known physical equations while also learning the empirical parameters for new chemistries directly from available data. Physics-informed ML frameworks have shown great promise recently in many application spaces, including atomic modeling, chemistry-informed materials property prediction, and Physics Informed Neural Networks (PINNs) that solve partial differential equations^{29,30,31,32,33}. Our PENN for polymer melt viscosity prediction involves a Multi-Layer Perceptron (MLP) that takes as input the polymer chemistry (fingerprinted using our Polymer Genome approach⁶) along with the PDI of the sample, and predicts the empirical parameters as a latent vector (listed in Table 1), used to estimate η as a function of T, M_w, and ({dot{gamma }}). A computational graph then encodes the dependence of M_w, ({dot{gamma }}), and T on η (see Fig. 1A) using the equations described in the Methods section. The entire framework (Fig. 1B) is trained on our dataset (elaborated in the Dataset section). The detailed architecture of this framework is described in the Methods section.

Table 1 Definitions of empirical parameters predicted by the Physics Enforced Neural Network (PENN)

Full size table

We find that this strategy is critical to obtain results that are physically meaningful in extrapolative regimes (e.g., ranges of T, M_w and ({dot{gamma }}) where there is no training data for chemistries similar to the queried new polymer). This ability is vital given our benchmarking dataset’s sparsity, containing only 93 unique repeat units, although the total number of datapoints is 1903 (including T, M_w, ({dot{gamma }}), and composition variations. As baselines to assess this PENN, we trained artificial neural network (ANN) and Gaussian process regression (GPR) models without any physics encoded. We find that the PENN model is more useful in obtaining credible extrapolative predictions. Our results indicate that informatics-based data-driven and physics-enforced (when possible) strategies can aid and accelerate extrusion AM innovations in sparse data situations.

Results

Dataset

Melt viscosity data was collected from the PolyInfo repository³⁴ and from the literature cited by PolyInfo. Cited literature data was extracted from tables and figures with the help of the WebPlotDigitizer tool³⁵. The final dataset shown in Fig. 2 includes a total of 1903 datapoints composed of 1326 homopolymer datapoints, 446 co-polymer datapoints, and 113 miscible polymer blend datapoints. The dataset spans a total of 93 unique repeat units with variations in M_w, ({dot{gamma }}), T, and PDI. For datapoints without a recorded PDI, we impute 2.06, the median PDI of the dataset. Due to inconsistencies with the reporting of polymer structures such as branching and crosslinking for a wide variety of chemistries, the dataset only includes linear polymers.

We found that η at low M_w were underrepresented when compared to η measurements at high M_w. Using the zero-shear viscosity (η₀) relationship with M_w (Fig. 1A), we added 126 datapoints at low M_w (included in the 1903 datapoints). This was achieved by identifying polymer chemistries with more than five η₀ datapoints at high M_w and a recorded M_cr³⁶. Equation (6) (Methods Section) was fit to each chemistry and extrapolated to estimate η values at low M_w.

Because the viscosity values span several orders of magnitude (Fig. 2), we use the Order of Magnitude Error (OME) to assess ML model accuracy. OME is calculated by taking the Mean Absolute Error of the logarithmically scaled η values. Models with lower OME exhibit more accurate predictions.

Overall assessment of physical intuition with sparse chemical knowledge

An important future use case of our ML models is to estimate the melt viscosity in new physical regimes, given a small amount of knowledge of a given polymer and other chemistries. For example, given a few costly tests of a new polymer at a few molecular weights, one should be able to predict the viscosity at remaining molecular weights, and, likewise, across different shear rates and temperatures. Figure 2E depicts how this ability was tested through a unique splitting of data into test/train sets across the chemical and physical regimes. First, the monomers were split into train (90%) and test (10%) sets. Within the test monomers, the median of the distributions of the test monomers with respect to a variable in the physical space was calculated. The median was used to split all datapoints containing that monomer: half for a final test split, and the other half for training. The upper or lower half going to testing was randomly chosen. This approach ensures that all the test data focuses on predicting in new physical regimes given a sparse amount of monomer data. This process was repeated three times for each of M_w, ({dot{gamma }}), and T to ensure that diverse tests were used for evaluation.

Figure 3 shows the combined results of three trials for splits across all three physical variables. Supplementary Fig. 1 shows parity plots that specify the results from each trial. The GPR, ANN, and PENN predictions have acceptable OMEs, indicating that all three can capture some chemical information and physical trends. The PENN results in a distinct decrease in OME (an average of 35.97% improvement), and an increase in R² (up to 79% for the ({dot{gamma }}) split) from the ANN. The PENN also outperforms the GPR for the M_w and T splits, but the GPR is more accurate on the test set for ({dot{gamma }}). In further analysis, we show how the physical viability of these predictions is scrutinized beyond the high-level trends of the parity plot.

**Fig. 3: Parity plots for unseen viscosity samples.**

Distribution of predicted empirical parameters

Despite the high overall performance of all three models, only the PENN model can produce physically credible predictions in regimes with restricted and sparse data. A comparison of the GPR, ANN, PENN models in estimating crucial empirical parameters (found in Table 1) from sparse data in the held-out set is detailed in Fig. 4.

**Fig. 4: Comparisons of empirical parameter predictions of different models.**

To establish a benchmark for comparing the three models, we obtained ground truth values of the parameters from the dataset. We did this by identifying subsets of our dataset involving the same polymer with measured η of several T, M_w, or ({dot{gamma }}). If a subset contained at least five points, we fitted the corresponding equation (Eqs. (5), (6), or (2)) to obtain empirical parameters. The distributions of these ground truth parameter values are shown in the first row of Fig. 4. There are a limited number of ground truth values because a small number of datapoints satisfy the above conditions. Nevertheless, this small sample set allowed us to make a few inferences about expected viscosity trends. The ground truth values of α₁ and α₂ are close to the theoretical values of 1 and 3.4, respectively³⁷ (background provided in Methods). α₂ values were occasionally less than the expected 3.4, possibly due to outliers or errors in fitting a small number of datapoints. The fitted (log {M}_{cr}) values fell within a range of 10^2.5 − 10⁵ g/mol. For shear parameters, the majority of samples are found to have n in a range of 0.2 − 0.8, which is typical for polymer melts³⁸. The obtained ({dot{gamma }}_{cr}) values were found in the range of 10⁻³ − 10⁴ 1/s. The fitted T_r values are mostly in a range of T_r < 250K. This is low when compared to T_g values found in thermal property datasets¹². In our dataset, the datapoints that could be fitted to the η–T relationship were observed at T < 475K, so low T_r values could be overrepresented in the ground truth. The C₁ parameter average was 11.8 and the C₂ parameter average was 159.42 K. This analysis of the ground truth data suggests the desired parameter values our models should predict.

We used two different methods to obtain parameter estimations from the models: one method is unique to the PENN model, and another approach for the purely data-driven ANN and GPR. The PENN model automatically predicts each of the empirical parameters (see Fig. 1B), which are used in the computational graph to predict η. The ANN and GPR do not directly predict the parameters, so we used a fixed extrapolation procedure. The procedure involved selecting an unseen data point and varying a physical variable (one of M_w, (dot{gamma }), and T) within a predetermined range while holding the other two constant. The ranges for each variable encompass similar orders of magnitude as those present in the training dataset (Fig. 2). For M_w extrapolation, a range of 10² − 10⁷ g/mol was used to encompass low and high M_w. For shear rate extrapolation, a range of 10⁻⁵ − 10⁶ 1/s was used to model behaviors in zero-shear and shear-thinning regimes. For temperature extrapolation, ranges of ± 20 K from the original data point’s temperature were used to stay within the boundary constraints of Eq. (5). Using this procedure, sets of predictions were made on every unseen datapoint and fit using Eqs. (5), (6), or (2), yielding estimated values of the empirical parameters.

In Fig. 4, we show the feasibility of the models’ empirical parameter predictions evaluated against the ground truth values and accepted values (elaborated in the Methods section). For parameters where a theoretical value is well-defined, the Root Mean Square Error (RMSE) of the predictions’ deviation from this value is calculated. The parameter prediction distribution is also compared to the ground truth distribution through a discrete Kullback-Leibler (KL) divergence,

$$KL(Pparallel Q)=sum _{i}P(i)log left(frac{P(i)}{Q(i)}right).$$

Intuitively, the KL divergence is a measure of how one probability distribution P deviates from a reference distribution Q over a set of intervals i. A lower divergence indicates that the predicted parameter distribution is closer to the ground truth. The KL divergence was calculated by finding the entropy between the discretized probability distributions of the ground truth and the ML prediction.

From Fig. 4, it can be seen that GPR struggles to predict expected parameter values. The GPR predictions for α₁ deviate from 1 by an RMSE of 1.26. For some polymers, GPR predicts α₁ ≤ 0. The GPR predictions for α₂ deviate from 3.4 by an RMSE of 2.87, and are significantly lower than the ground truth values in the dataset. Most predicted values for (log {M}_{cr}) are within the same range as the ground truth, but the proper low and high entanglement behavior is not captured which decreases the credibility of these fittings. For the shear thinning parameter n, some values fall within the expected range of 0.2 − 0.8³⁸ for polymer melts, but others are closer to 0, indicating that the expected shear thinning behavior is not always predicted. The predicted ({dot{gamma }}_{cr}) distribution is lower than the ground truth, indicating that the GPR forecasts the onset of shear-thinning at a significantly lower (dot{gamma }) than observed (if shear thinning is predicted at all). On temperature dependence, some T_r values are predicted higher than what is seen in the dataset.

The ANN’s failure to capture correct physical trends is also evident in the distributions of its fitted parameters. The RMSEs for the ANN’s estimated α₁ and α₂ values are 1.31 and 2.79, respectively. ANN overestimates α₁ and underestimates α₂ and therefore does not capture the effects of high M_w chain entanglement. The ANN predictions estimate a low n for a subset of polymers, which goes against the definition of shear thinning. The predicted ({dot{gamma }}_{cr}) values are lower than the ground truth distribution, indicating that the ANN struggles to capture the shear-thinning transition region from the dataset. The ANN predictions for the T trend are closest to the ground truth in comparison to its trends of the other variables, because T is a smoother, exponential function (Fig. 1A), enabling an easier average fitting.

The PENN outperforms the ANN in estimating feasible empirical parameters as depicted by lower KL Divergence values in the last row of Fig. 4. The RMSEs of the predicted α₁ and α₂ values are 0.05 and 0.17, which are substantially smaller than that of the ANN. Moreover, all the predicted values of (log {M}_{cr}) are within the ground truth range of 2.5 − 5. The PENN model can also learn the correct shear thinning phenomenon by predicting n values between 0.2 − 0.8³⁸ and a ({dot{gamma }}_{cr}) distribution that mirrors the dataset. The PENN’s predicted range of T_r is closest to the ground truth. For the C₁ parameter, the PENN predicted distribution is closest to the proposed value of C₁ = 7.60 (detailed in the Methods section), also having the lowest divergence from the ground truth. For C₂ predictions, although the KL Divergence of the PENN is lower than the ANN, the PENN is confined to much lower values of C₂, and has an average much lower than some experimentally derived values, such as C₂ = 227.3 K³⁹.

Overall, the average KL divergence across all parameter distributions for the GPR, ANN, and PENN are 14.59, 22.24, and 1.74, respectively. The overall distributions of empirical parameters points to the PENN having greater capabilities for producing physically correct results, than a purely data-driven model.

Performance in extrapolative regimes

In Table 2, we summarized the performance of predicted η profiles over wide ranges of M_w (256 extrapolations), (dot{gamma }) (71 extrapolations), and T (127 extrapolations) for all three models considered. We define a successful extrapolation as a model that is able to predict the correct trends while maintaining accuracy over the train and test points. In this study, a prediction is considered accurate for an experimental point if the experimental data falls reasonably within the predicted uncertainty bounds. However, in practice the required precision may vary depending on the specific practical use case. Overall, the PENN successfully predicts 80.4% of M_w extrapolations, 49.2% of (dot{gamma }) extrapolations, and 54.1% of T extrapolations. The ANN rarely achieves correct physical trends for M_w or (dot{gamma }) extrapolations in the span of the dataset and only predicts successful profiles for 17.2% of T extrapolations. The GPR model also exhibits a low performance in extrapolation. There are several instances (given in the brackets in Table 2) where the ANN and GPR successfully fit the data points but fail to extrapolate correctly beyond the dataset. This underscores the need for information beyond experimental data to enable extrapolation to new physical regimes.

Table 2 Extrapolative predictive performance of the PENN, ANN, and GPR models along the unseen molecular weight (M_w), shear rate ((dot{gamma })), and temperature T regimes

Full size table

Figure 5 shows a few examples of the extrapolation results summarized in Table 2. A much larger set of examples of both successful and unsuccessful extrapolations by the PENN compared to the GPR and the ANN are provided in Supplementary Figs. 2–7. Figure 5A–C shows examples of the PENN correctly extrapolating η given a small amount of information about a monomer in another part of the physical regime in unseen regimes. The ANN and GPR models are uncertain in these unseen regimes, resulting in large confidence intervals. In Fig. 5A, the PENN model accurately predicts the region near M_cr where the η–M_w relationship transitions from unentangled to entangled, and can therefore accurately predict η values at high M_w, despite not having seen any data in this region. The errors for the ANN and GPR in Fig. 5A are low, within approximately an order of magnitude of error. However, the ANN predictions have a near-constant slope around M_cr (implying α₁ ≈ α₂) and are inconsistent with the effects of polymer chain entanglements at high M_w. The GPR model also fails to predict a higher α₂ slope. In Fig. 5B, only the PENN model predicts a zero-shear and shear-thinning region when predicting the η–(dot{gamma }) relationship of the given copolymer. The GPR model fits the training points but mispredicted shear-thinning at high shear rates. The ANN model predicts a decreasing relationship consistent with shear-thinning but doesn’t predict the zero-shear region. This could be an example of spectral bias within neural networks, which describes how ANNs prioritize global or “low frequency” patterns in data over local or “high frequency” patterns⁴⁰. The general decreasing trend of η–(dot{gamma }) is “low frequency” and is captured by the ANN. In contrast, the transition regions are of a “higher frequency” and are not captured by the ANN. In Fig. 5C, the PENN model predicts the correct η–T relationship. The ANN model also predicts an exponential relationship but with a higher inaccuracy. The GPR model fits both the training and unseen datapoints, but predicts an unphysical trend beyond this. Overall, the PENN model makes predictions that follow the expected behaviors (Fig. 1A) of polymer melts.

**Fig. 5: Example extrapolations of melt viscosity for different physical variables.**

Correctly extrapolated samples by the PENN model, such as the ones in Fig. 5A–C make up 67.5% of the extrapolated test cases, which is a significant improvement relative to both the ANN and GPR. The PENN model also has room for improvement, especially when applied to datasets with low chemical diversity. Overfitting to a small set of chemistries in training can lead to the inaccurate prediction of parameters when making predictions for unseen chemistries. This behavior is demonstrated in Fig. 5D–F, where the PENN predicts a plausible rheological trend but incorrect values for unseen polymers. However, the PENN model introduces a layer of interpretability unavailable to physics-unaware models. Based on the predictions we can reasonably infer which parameters were over- or under-estimated. In Fig. 5D, the PENN model predicts near-correct α₁ and α₂ slopes, but the predicted M_cr and k₁ values are underestimated. Figure 5E depicts how an underestimated η₀ (caused by inaccuracies in predicted M_cr, α₁, α₂ and/or k₁) can cause inaccurate η predictions for all other (dot{gamma }) values. We also see this phenomenon in Fig. 5F, where T_r is likely underestimated. The propagating error causes the PENN model to predict an inaccurate trend across the entire spectrum of T. Despite these errors, the pinpointing of the PENN’s weak spots can be used to add targeted training data to improve the model. This level of interpretation is unique to the PENN and cannot be done for the GPR and ANN.

These examples of extrapolations provide insights into the applicability of PENN versus pure data-driven methods when using datasets that contain limited chemistries. The equations used in the PENN are based on assumptions and generalizations, and may not account for all physical nuances. These must be considered when applying PENNs to future material design and process optimization problems.

Discussion

In this study, we introduce a Physics Enforced Neural Network (PENN), a strategy that combines data-driven techniques with established empirical equations, to predict the melt viscosity of polymer melts with better physics-guided generalization and extrapolation. The PENN makes predictions across many chemical compositions and relevant physical parameters, including molecular weight, shear rate, temperature, and polydispersity index. We compared our PENN approach against the purely data-driven, physics-unaware, Artificial Neural Network and Gaussian Process Regression. In extrapolative regimes, our PENN model outperforms the physics-unaware counterparts and offers an elevated level of interpretability and generalizability. To enhance generalizability across chemistries, future work could increase the chemical space in the dataset through new experiments, molecular dynamics simulations, and/or more aggressive data acquisition from literature.

This work has profound implications for additive manufacturing (AM) and materials informatics. The PENN model’s capability to guide the rheological control of diverse polymer resins accelerates the development of new printing materials, thereby expanding AM’s utility. Our methodology offers a blueprint for modeling other properties governed by empirical equations. The initial success of the PENN architecture for melt viscosity is a powerful demonstration of how data-driven insights combined with established knowledge can propel us into a new era of rapid advancements in materials science and engineering.

Methods

Fingerprinting and feature engineering

The chemical attributes of a polymer are represented by a unique fingerprinting scheme. The fingerprints (FPs) contain features derived from atomic-level, block-level, chain-level, and morphological descriptors of a polymer as described at length earlier⁶. The dataset contains homo- and co-polymers, and miscible polymer blends. Co-polymers and blends contain multiple repeating units, each with a separate FP. For co-polymers, the FP of each unit was aggregated to a single copolymer FP using a weighted average (with weight equal to composition percentage)¹². Similar to previous work¹², all co-polymers were treated as random. For miscible polymer blends, the FP of each unit was aggregated to a single FP using a weighted harmonic average (with the weight equal to composition percentage)²⁰. For blends containing units with different M_w and/or PDI, the weighted average over each unit was used.

Enforced polymer physics trends

In this section, we detail the physics-based correlations included within the Physics Enforced Neural Network (PENN).

We enforce dependencies of η on temperature (T), molecular weight (M_w), and shear rate ((dot{gamma })) through (eta ({M}_{w},T,dot{gamma })), which we derive below.

Preamble: smoothing of piecewise functions

When going from one function g(a, b) in a low regime (a < b) to another function h(a, b) in a high regime (a > b), we can use the smoothened Heaviside step function,

$${H}_{beta }=frac{1}{1+exp (-beta x)},$$

(1)

where β is a tunable rate of transition.

A function f(a, b) that transitions from g(a, b) to h(a, b) is given by

$$f(a,b)=g(a,b)times {H}_{beta }(b-a)+h(a,b)times {H}_{beta }(a-b)$$

η dependence on (dot{gamma }), T, and M
_w

The η dependence on (dot{gamma }) follows the physics of shear-thinning fluids^41,42,43. In these fluids, at low (dot{gamma }), there is not enough force between chains to break entanglements and cause movement, so η remains constant at η₀. At a critical shear rate, ({dot{gamma }}_{cr}), the shear force is high enough to cause chain alignment, making chain diffusion easier. Beyond ({dot{gamma }}_{cr}), η decreases according to a shear-thinning linear power law⁴¹. This trend can be represented by a function (Equation (2)) across both the zero-shear and shear thinning regimes^41,44,45,46,

$$begin{array}{ll}eta ({M}_{w},T,dot{gamma }),=,frac{{eta }_{0}({M}_{w},T)}{{left(1+frac{dot{gamma }}{{dot{gamma }}_{cr}}right)}^{1-n}}\qquadquad log eta ,=,log {eta }_{0}({M}_{w},T)+(n-1)log left(1+frac{dot{gamma }}{{dot{gamma }}_{cr}}right)end{array}$$

(2)

where the parameter n describes the sensitivity to shearing⁴⁷. For shear-thinning fluids, n < 1. For most polymer melts, n is empirically known to be in the range 0.2–0.8³⁸.

Equation (2) is unfavorable to use directly because (dot{gamma }) spans several orders of magnitude, so (log dot{gamma }) must be used as an input. Eq. (2) cannot be adapted to use (log dot{gamma }) as an input (due to the +1 in the denominator), so we depict the relationship across the low (dot{gamma }) and high (dot{gamma }) regimes as a piecewise function on the log-scale,

$$log eta =left{begin{array}{ll}log {eta }_{0} &,{text{if}},dot{gamma } ,< <, {dot{gamma }}_{cr}\log {eta }_{0}+(n-1)log left(frac{dot{gamma }}{{dot{gamma }}_{cr}}right) &,{text{if}},dot{gamma } ,> >, {dot{gamma }}_{cr}end{array}right.$$

(3)

We smooth Eq. (3) with ({H}_{{beta }_{dot{gamma }}}) to get (log eta ({M}_{w},T,dot{gamma })) (Eq. (4)),

$$begin{array}{ll},,log eta ({M}_{w},T,dot{gamma })=log {eta }_{0}({M}_{w},T)times {H}_{{beta }_{dot{gamma }}}(log {dot{gamma }}_{cr}-log dot{gamma })\ ,,+left(log {eta }_{0}({M}_{w},T)+(n-1)log left(frac{dot{gamma }}{{dot{gamma }}_{cr}}right)right)times {H}_{{beta }_{dot{gamma }}}(log dot{gamma }-log {dot{gamma }}_{cr}),end{array}$$

(4)

where ({beta }_{dot{gamma }}) is a parameter that dictates the rate of shift from zero-shear to shear-thinning. For our implementation, we found that optimization over the (dot{gamma }) domain was optimal when ({beta }_{dot{gamma }}=30).

(log {eta }_{0}({M}_{w},T)) is defined by the T dependence. As temperature increases, so does the rate of molecular self-diffusion, resulting in lower η seen in fluidic polymer melts⁴³. The William–Landel–Ferry (WLF) equation^39,48 describes the exponential decrease in η as the temperature increases. Therefore, we can encode temperature dependence as

$$begin{array}{rcl}{eta }_{0}&=&{eta }_{{M}_{w}}times 1{0}^{frac{-{C}_{1}(T-{T}_{r})}{{C}_{2}+(T-{T}_{r})}},forall Tge {T}_{r}\ log {eta }_{0}({M}_{w},T)&=&log {eta }_{{M}_{w}}+frac{-{C}_{1}(T-{T}_{r})}{{C}_{2}+(T-{T}_{r})},forall Tge {T}_{r}end{array}$$

(5)

where T_r is a reference temperature and C₁ and C₂ are material-dependent empirical parameters. The values for these are dependent on polymer chemistry. C₁ = 7.60 and C₂ = 227.3 K are examples of values that have been proposed³⁹ from observations of experiments on a small subset of polymers. The reference temperature T_r is within a few degrees of the glass transition temperature T_g. It has been proposed that the WLF relationship holds within the range of T_g to T_g + 200K³⁹.

({eta }_{{M}_{w}}) is defined by the M_w dependence. Longer and heavier polymer chains experience increased entanglements, which hinder chain reptation in the polymer melt at low shear^37,43. Equation (6) is a piece-wise power law that describes this phenomenon.

$${eta }_{{M}_{w}}=left{begin{array}{ll}{k}_{1}{M}_{w}^{{alpha }_{1}}quad &,{text{if}},{M}_{w} ,<, {M}_{cr}\ {k}_{2}{M}_{w}^{{alpha }_{2}}quad &,{text{if}},{M}_{w},ge, {M}_{cr}end{array}right.$$

(6)

where

$${k}_{2}={k}_{1}{M}_{cr}^{{alpha }_{1}-{alpha }_{2}}.$$

M_cr is the critical molecular weight, above which entanglement density is high enough to increase the impact of M_w on η₀. The two power laws intersect at M_w = M_cr³⁷. M_cr is found to be approximately 2-4 times the molecular weight at which chain entanglement starts, but the exact value is polymer dependent⁴³. α₁ is the slope of the (log {eta }_{0})–(log {M}_{w}) curve if M_w < M_cr and α₂ is the slope if M_w≥M_cr. Typically, α₁ is theoretically and empirically determined to be about 1, while α₂ is found to be about 3.4^37,43, but the exact value is dependent on the polymer. k₁ and k₂ are the y-intercepts of each power law and are polymer-dependent.

M_w and η₀ span several orders of magnitude, so we use Eq. (6) in the log-scale to get Eq. (7),

$$log {eta }_{{M}_{w}}=left{begin{array}{ll}log {k}_{1}+{alpha }_{1}log {M}_{w} &,{text{if}},{M}_{w} < {M}_{cr}\log {k}_{1}+({alpha }_{1}-{alpha }_{2})log {M}_{cr}+{alpha }_{2}log {M}_{w} &,{text{if}},{M}_{w}ge {M}_{cr}end{array}right.$$

(7)

Smoothing Eq. (7) with ({H}_{{beta }_{{M}_{w}}}) gives Eq. (8),

$$begin{array}{ll}log {eta }_{{M}_{w}},=,[log {k}_{1}+{alpha }_{1}log {M}_{w}]* {H}_{{beta }_{{M}_{w}}}(log {M}_{cr}-log {M}_{w})\ qquadqquad,,+,left[log {k}_{1}+({alpha }_{1}-{alpha }_{2})log {M}_{cr}right.\qquadqquadleft.,+,{alpha }_{2}log {M}_{w}right]* {H}_{{beta }_{{M}_{w}}}(log {M}_{w}-log {M}_{cr}),end{array}$$

(8)

where ({H}_{{beta }_{{M}_{w}}}) is the smoothened Heaviside step function using ({beta }_{{M}_{w}}), a parameter which dictates the rate of shift from α₁ to α₂.

Therefore, Eqs. (4), (5), and (8) determine the (log eta ({M}_{w},T,dot{gamma })). The predicted parameters n, ({dot{gamma }}_{cr}), ({beta }_{dot{gamma }}) determine the (dot{gamma }) dependence in (log eta ({M}_{w},T,dot{gamma })), which is also a function of η₀(M_w, T). The predicted parameters C₁, C₂, and T_r determine the T dependence in η₀(M_w, T), which is also a function of ({eta }_{{M}_{w}}). The predicted parameters α₁, α₂, M_cr, ({beta }_{{M}_{w}}), and k₁ determine the M_w dependence in ({eta }_{{M}_{w}}). The parameter outputs of the MLP have physically appropriate bounding ranges (reported in Supplementary Table 1).

η dependence on P
D
I

The dispersity of molecular weights in a polymer melt affects the bulk motion of polymer chains⁴³. For example, a short and long chain may diffuse differently compared to two medium-sized chains. Therefore, using just the M_w without any knowledge of dispersity can mislead the ML model. We account for dispersity by using the polydispersity index (PDI),

$$PDI=frac{{M}_{w}}{{M}_{n}},$$

where M_n is the number average molecular weight. Empirical models for this relationship^22,24 may require detailed information on the specific shape of the molecular weight distribution of a polymer melt. Not all of our data points contain proper information on PDI (as discussed in the Results section), so we do not directly encode η-PDI trends within the computational graph. Instead, the PDI could affect the transitions in the critical regimes of the η₀–M_w relationship and the η–(dot{gamma }) relationship (when (dot{gamma }={dot{gamma }}_{cr}) or M_w = M_cr)^22,23,24. We incorporate this effect through the parameters ({beta }_{{M}_{w}}) and ({beta }_{dot{gamma }}) (described in Table 1). A higher value of ({beta }_{{M}_{w}}) or ({beta }_{dot{gamma }}) creates a quicker transition within their respective critical regimes.

PENN training

This entire PENN architecture is trained, in part, to minimize the error of viscosity predictions. The sum of these errors across all n training points is called the viscosity loss ({{mathcal{L}}}_{eta }), defined in Eq. (9). Each data point is denoted by its index i.

$${{mathcal{L}}}_{eta }=frac{1}{n}mathop{sum }limits_{i=1}^{n}{({hat{eta }}_{i}-{eta }_{i})}^{2}$$

(9)

During training, we add loss terms (see Eq. (10)) to penalize the predicted α₁ and α₂ for the ith training point ((hat{{alpha }_{1,i}}) and (hat{{alpha }_{2,i}}), respectively) for deviating from their average values. The viscosity loss plus the penalty terms form the total loss ({mathcal{L}}).

$${mathcal{L}}={{mathcal{L}}}_{eta }+frac{1}{n}mathop{sum }limits_{i=1}^{n}{w}_{alpha }[{(hat{{alpha }_{1,i}}-1)}^{2}+{(hat{{alpha }_{2,i}}-3.4)}^{2}]$$

(10)

w_α is a hyperparameter that controls the impact that known values of the α₁ and α₂ parameters have on the final loss.

Machine learning approaches

The PENN and ANN models were implemented in PyTorch⁴⁹. All models were trained on the same 9:1 (Train:Test) split. Before training, the features and η were scaled to a range of (−1,1). The polymer fingerprint, PDI, and temperature were scaled with the Scikit-Learn MinMaxScaler⁵⁰ to a range of (−1,1). The (dot{gamma }) was scaled by first adding a small value of 10⁻⁵, taking the ({log }_{10}), and then scaling to (-1,1). M_w was scaled by taking the ({log }_{10}) value and then scaling to (−1,1). For the PENN, (log {M}_{w}) and (log dot{gamma }) use the same scaling bounds as η.

Within the training set, a 10-fold cross-validation (CV) was used to ensure that the models did not overfit the training set. The ANN and PENN models also had separate models trained for each CV split. Hyperparameter optimization was performed using the Hyperband⁵¹ optimization algorithm over each CV fold for both the ANN and the PENN models, with RayTune⁵² implementations, respectively. The ANN and PENN models, both containing 4 layers (including 2 hidden layers), involved optimization of the same hyperparameters: layer 1 size (64, 128, 256, 512), layer 1 dropout (0,0.01, 0.015,0.02,0.025,0.03), layer 2 size (64, 128, 256, 512), layer 2 dropout (0,0.01,0.015,0.02,0.025,0.03), and weight decay (0.00001, 0.00005, 0.0001, 0.0005, 0.001). For the PENN, w_α (0.001, 0.005, 0.01, 0.03, 0.05) was also optimized. The value corresponding to the lowest ({{mathcal{L}}}_{eta }) (Eq. (9)) of the CV test split was used.

The Adam optimizer was used to train the models with a learning rate (LR) reduction by a factor or 0.5 on the plateau of the validation loss given a patience of 20 epochs. An initial LR of 0.0001 was used for the PENN. Empirically, we found that the PENN tuning was sensitive to high LR. The initial LR for the ANN was 0.001. Training was stopped with an Early Stopping patience of no improvement in the validation loss after 25 epochs.

The GPR model was implemented using Scikitlearn⁵⁰ trained using Bayesian optimization to tune key hyperparameters. The hyperparameters optimized include the noise level (alpha) with a range of [10⁻², 10¹], the length scale of the RBF kernel (length_scale) with a range of [10⁻², 10²], and the constant value used in the kernel (constant_value) with a range of [10⁻², 10²], each with a logarithmic uniform prior. The optimization was performed over 50 iterations each over the 10-fold cross-validation, with the best-performing model parameters selected based on the results. The scaling for the inputs and outputs of the GPR were the same as the ANN.