Deep-learning real-time phase retrieval of imperfect diffraction patterns from X-ray free-electron lasers

Introduction

The phase problem, a well-known inverse problem involving the extraction of phase information hidden in the interference fringes of measured intensities, is prevalent in nature and complicates the direct interpretation of diffraction signals related to target objects¹. Its impact spans various research modalities, including X-ray crystallography and high-resolution imaging, driving active development to recover lost phase information. While several methods exhibiting good performance have been developed, they often require significant data handling time. The effectiveness of these techniques depends on the completeness of the measured data, noise contamination, and technique-specific challenges in data collection. Deep learning (DL) has shown considerable potential in addressing these issues^2,3. It reduces processing time by replacing conventional approaches with non-iterative operations after appropriate training, which can be accelerated with graphic processing units (GPUs). Due to these advantages, substantial efforts have been directed toward using DL for denoising, classification, and phase retrieval^{4,5,6,7,8,9,10,11,12}, though these tasks remain challenging in X-ray diffraction.

Recent advancements in the development of brighter X-ray sources that provide ultrashort X-ray pulses, such as X-ray free-electron lasers (XFELs), have significantly enhanced the ability to observe ultrafast molecular bonding processes, transient material dynamics, and hidden material phases in strongly driven nonequilibrium states^13,14,15,16. Diffraction imaging, which retrieves phase information through numerical iterations as an optimization process, holds great promise for determining the structure of single specimens. However, the diffraction signals, often plagued by low signal-to-noise ratios due to limited photons and data imperfections, have constrained the practical application of DL for interpreting experimental data¹⁷.

In this study, we propose a new deep neural network (DNN) for phase retrieval of imperfect diffraction patterns, enabling real-time image reconstruction for single-particle imaging experiments using XFELs. The network is based on a residual neural network (ResNet) with weight-corrected convolution layers designed to handle diffraction signals¹⁸. It was trained using masked diffraction patterns as inputs, which were generated from pseudo-random objects. We imposed a nonnegative real value constraint to the images, which is legitimate in X-ray diffractions of nanoparticles with weak scattering and negligible absorption^19,20. We demonstrated the network’s excellent performance with simulated data by comparing it to conventional iterative phase retrieval algorithms. After verifying its effectiveness, we applied the network to single-pulse diffraction data obtained with XFELs, where it exhibited robust real-time image reconstruction with improved image quality. By providing a solution to the phase problem in X-ray diffraction imaging, this study addresses a significant bottleneck in data processing time by eliminating the need for computationally expensive iterative phase retrievals. With advancements in the development of new light sources that offer high brilliance and repetition rates, data accumulation rates have increased exponentially, necessitating rapid data processing. We believe that the proposed DNN method will serve as a crucial basis for advancing scientific discoveries through effective data mining.

Results

DNN for the phase problem of diffraction patterns

By replacing computer vision tasks using pretrained parameters, DNNs deliver superior performance on such tasks and have become increasingly efficient with a combination of convolution operations²¹. Various convolution types have been developed to enhance performance for specific applications. Depth-wise separable convolution (DSC) is an efficient convolution method, typically using ten times fewer parameters than plain convolution²². Partial convolution (PC) functions as a mask-aware convolution, allocating occluded data based on known data²³. Fast Fourier convolution (FFC) provides a global receptive field through an additional convolution of the Fourier transform of the input²⁴. Recently, a ResNet-based DNN named LaMa has demonstrated exceptional performance in image inpainting, even with large masks²⁵. LaMa’s straightforward architecture, which includes downscaling layers, residual blocks, and upscaling layers, employs FFC across the entire network to leverage the global receptive field. Building on this basic architecture, we introduce deep phase retrieval (DPR), a new DNN featuring an encoder–decoder architecture with two novel operations: an encoder with weighted partial convolutions (WPC) and a two-stage decoder with intermediate Fourier modulation (Fig. 1b and Supplementary Table 1). DPR is a promising network for the immediate reconstruction of imperfect, noisy diffraction patterns, utilizing WPC and FFC to reflect the nature of X-ray diffraction.

Deep-learning real-time phase retrieval of imperfect diffraction patterns from X-ray free-electron lasers — **Fig. 1: DNN for real-time phase retrieval of imperfect single-pulse diffraction patterns.**

While PC equally distributes known information to missing values within convolving regions, WPC employs a physics-based approach to assign position-dependent weights based on the Guinier–Porod model. This model describes the radial intensity distribution in a small-angle region (see Methods)²⁶. As the diffraction intensity typically decreases by ({Q}^{-4}) for the momentum transfer, ({boldsymbol{Q}}) ((={{boldsymbol{k}}}_{f}-{{boldsymbol{k}}}_{i})), with wave vectors, ({{boldsymbol{k}}}_{left(i,fright)}) of incoming ((i)) and outgoing ((f)) light, the WPC assigns (Q)-dependent weight factors from the Guinier–Porod model for a smooth sphere to known values during the operation of the PC. In the two-stage decoder, the diffraction-compensated decoder, which performs Fourier modulation before the unit blocks of the residual structure, is connected serially to the base decoder. The Fourier modulation is achieved by replacing the Fourier magnitudes of the primary outputs with the inputs. This operation retrieves the initial diffraction patterns that attenuate through the deep layers of the network, aiding in the accurate generation of Fourier transform pairs. Additionally, FFC operates analogously with conventional phase retrieval algorithms, which iteratively connect Fourier space information using discrete Fourier transform (DFT) and inverse DFT between diffractions and objects. These components work together to reconstruct the lost phase information from imperfect, photon-limited diffraction data.

For the dataset generation, we constructed a diffraction model reflecting experimental conditions and used with pseudo-random objects (see Methods). The shape and density of the objects were assigned separately from two pre-existing datasets: EMNIST (handwritten character digits) and CIFAR-100 (images of real-world objects), respectively, with proper image transformations^27,28. These objects maximally exclude any physical or human biases and are more appropriate for the goal of DPR: finding real-space pairs corresponding to measured diffraction patterns from general objects, than using projected images from plausible three-dimensional objects based on specific models. Diffraction patterns were generated by performing a fast Fourier transform (FFT) along with the diffraction model, incorporating additional operations to account for experimental conditions such as spatial coherence, limited photon counts, and measurement noise. The X-ray scattering of the sample was considered within the first-order Born approximation ignoring multiple scattering, which is prevalent in X-ray regime¹⁹. The approximation is valid when the sample thickness satisfies (D,lesssim, 2pi lambda C/left|1-nright|), where (n) is the complex refractive index and (Capprox 0.2), and the limit is typically few microns in X-ray regime, which is much larger than the size of typical nanoparticles²⁹. The resulting patterns were then partially obscured with irregular masks, simulating data loss due to pixel arrangements on a detector and a central beam stop that blocks intense direct beams in actual measurements. These patterns were fed into the network as inputs (Fig. 1a). We used the AdamW optimizer for backpropagation with a custom-designed loss function for back propagation of the network (see Methods)³⁰. The loss function consisted of the mean absolute error (MAE) of final outputs, ({mathcal{L}}), MAE of their gradients, ({{mathcal{L}}}_{{rm{grad}}}), the perceptual loss, ({{mathcal{L}}}_{{rm{perc}}}), and the R-factor with the ground-truth Fourier magnitudes, ({R}_{{rm{F}}}^{{rm{GT}}})³¹. After an ablation study of each component, the final loss function was settled on ({ {mathcal L} }_{{rm{total}}}= {mathcal L} +10{ {mathcal L} }_{{rm{grad}}}+0.1{ {mathcal L} }_{{rm{perc}}}+0.01{R}_{{rm{F}}}^{{rm{GT}}}) (Supplementary Fig. 1).

DPR in phase retrievals and evaluation of its performance

We first validated the improved performance of the WPC-based encoder and diffraction-compensated decoder within the DPR framework. Compared to encoders based on PC and FFC, the WPC-based encoder exhibited lower validation loss during training and showed significant improvements in the R-factor (({R}_{{rm{F}}})), while maintaining or even surpassing the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) (see Methods, Fig. 1c, d). PSNR and SSIM are well-known metrics that commonly used to evaluate image qualities based on given reference images. As the noises, including intrinsic shot noise, predominate the single-pulse diffraction signals, evaluating the validity of image reconstructions solely from the ({R}_{{rm{F}}}), that is a normalized MAE of Fourier magnitudes, is insufficient for the noisy diffraction signals. To avoid such inaccurate evaluation, we implemented additional metrics, PSNR and SSIM, to evaluate the validity in the real-space using given ground-truth images, complementing the limitations from using ({R}_{{rm{F}}}). Additionally, the two-stage decoder including the diffraction-compensated decoder outperformed a single base decoder with doubled residual blocks, despite having 43% fewer total trainable parameters (Fig. 1c, d). This confirms that DPR provides enhanced performance with an efficient architecture and effective handling of imperfect diffraction signals.

We also examined how the reconstruction performance of DPR depends on the size of the training dataset by doubling the size from 12,000 to 192,000 (Fig. 2a). Although a slight improvement was observed in ({R}_{{rm{F}}}), PSNR, and SSIM, the network performance quickly saturated with larger training datasets. This indicates that the current amount of training data is sufficient for DPR, given the efficiency of network training. After validating the DPR architecture, we compared its phase retrieval performance with those of conventional iterative projection algorithms, specifically hybrid input–output (HIO) and generalized proximal smoothing (GPS) (see Methods)^32,33. We simulated diffraction patterns using the same diffraction model as for the training datasets, applying irregular masks. Additionally, we generated a second set of test data with a constant central mask covering 16 × 16 pixels at the center to compare scenarios with and without substantial data loss. We compared the image reconstructions from DPR; DPR with refinement; and the two conventional phase retrieval algorithms, HIO and GPS. Refinement for DPR involved few iterations of GPS to obtain the final images (see Methods). The results demonstrate that DPR significantly outperformed HIO and GPS in reconstructing real-space images (Fig. 2b). While HIO and GPS produced indistinct images, especially with large masks, DPR consistently provided superior performance regardless of masking areas. Moreover, DPR reconstructed more detailed features with refinement, though this came at the cost of increased noise from the diffraction signals.

**Fig. 2: Phase retrieval performance of DPR on simulated data.**

Overall, DPR demonstrated significantly better image quality with higher PSNR and SSIM than GPS, indicating its robust performance on noisy diffraction signals (Fig. 2c). Additionally, DPR exhibited similarly high levels of PSNR and SSIM in both the irregular and constant center mask cases, highlighting its superior performance in scenarios involving partial data loss. DPR with refinement achieved a significantly lower ({R}_{{rm{F}}}), even lower than that of GPS, though this came at the expense of image quality. For further application to experimental data, the AdamWR optimizer with adaptive sharpness-aware minimization (ASAM) was employed to mitigate failures by improving generalization (see Methods)^30,34. This approach also led to a notable improvement in the quality of the reconstructed images from the test data (Fig. 2c).

DPR also benefits from having fewer training parameters (1.52 × 10⁷ in total) than conventional deep convolutional neural networks, despite the large input size of 512 × 512. It efficiently addresses complex phase problems using WPC and FFC, handles imperfect diffraction signals with an appropriate physical model, and utilizes Fourier-space information similar to conventional phase retrieval methods. The processing times were 9.02 ± 0.00215 ms and 52.2 ± 2.45 ms per data for DPR and DPR with refinement, respectively, using a single NVIDIA GeForce RTX 3090 GPU (Fig. 2d). This represents more than a 1000-fold increase in speed over that of conventional iterative phase retrieval algorithms, underscoring the superiority of DPR for real-time processing of data from upcoming MHz-repetition-rate XFELs³⁵.

DPR in phase retrievals of experimental data from the XFEL

After demonstrating the performance with simulated data, DPR was applied to experimental data obtained from XFELs. Single-pulse X-ray diffraction imaging experiments were conducted at the Pohang Accelerator Laboratory-XFEL (PAL-XFEL) (see Methods). X-rays with a photon energy of 5 keV and a spectral bandwidth of (Delta E/E) ≈ 5 × 10⁻³ were focused into a 5 µm (horizontal) × 7 µm (vertical) area, giving an effective photon flux of 8 × 10⁹ photons·μm⁻² per pulse. Single-pulse diffraction patterns were recorded by a 1-megapixel multi-port charge-coupled device (CCD) detector with a pixel size of 50 × 50 μm² positioned 1.60 m downstream of the sample, providing a pixel resolution of 15.5 nm for a 512 × 512 window. Specimens of Ag nanoparticles, with characteristic flower and cube shapes, were randomly dispersed and mounted on thin Si₃N₄ membranes for measurement.

Real-space images were directly obtained from the diffraction signals using as-trained DPR without any fine-tuning or additional data processing (Fig. 3a). The central regions of the diffraction patterns were obscured by a beam stop and strong parasitic scattering near the direct beam. The fringe oscillations from the specimen appeared blurry due to the experimental conditions, including imperfect spatial coherence and other signal contaminations. Despite these challenges, DPR successfully extracted accurate images from the measured diffraction patterns. The images obtained with DPR and DPR with refinement showed distinct shapes with relatively high contrast compared to results from conventional iterative algorithms, HIO and GPS. As reference real-space images could not be given for experimental data, local ({R}_{{rm{F}}}), that is a pixelwise ({R}_{{rm{F}}}), was introduced for more detailed evaluation of the results (see Methods). Notably, DPR is not biased toward low-(Q) signals near the diffraction center, which represent a significant portion of the total diffraction intensities, resulting in a lower local ({R}_{{rm{F}}}) for high-(Q) signals (Fig. 3a, and Supplementary Fig. 2a). Since the diffraction signals in the high-(Q) region provide high-resolution information on internal structures, DPR produces real-space images with clearer shapes and more detailed structures.

**Fig. 3: DPR on data from single-pulse X-ray diffraction imaging experiments using the XFEL.**

The results showed strong positive correlations (above 0.8) to those from conventional algorithms, indicating a high degree of agreement in their morphologies (Fig. 3b). An important advantage of the DNN method is that DPR does not require support constraints. It directly converts Fourier-space data into corresponding real-space objects without additional information, unlike conventional phase-retrieval algorithms that require support estimation for real-space constraints. As a hybrid option, refined DPR with 50 iterations of GPS after DPR achieved an improved ({R}_{{rm{F}}}), even lower than the GPS result with a thousand iterations (Supplementary Fig. 2a). This indicates that DPR provides an efficient approach to optimization by giving starting points already close to the global minima.

To further evaluate the phase retrieval performance of DPR on general single-pulse diffraction data, we applied it to public datasets from the Coherent X-ray Imaging Data Bank (CXIDB)³⁶. We obtained three datasets, i.e., chlorovirus PBCV-1, bacteriophage T4, and Fe₂O₃ ellipsoid nanoparticles, from the repository³⁷. In these experiments, X-rays with a photon energy of 1.2 keV were used, and diffraction signals were measured with a 1-megapixel pnCCD with a pixel size of 75 × 75 μm² positioned 0.738 m downstream of the sample. This setup provided ideal pixel resolutions of 19.9 nm with a 512 × 512 window and 9.93 nm after 2 × 2 binning for the Fe₂O₃ ellipsoid dataset.

Despite the completely different samples and experimental conditions, real-space images were successfully obtained using DPR (Fig. 4a). The images produced by DPR exhibited distinct shapes with clear internal structures and higher contrasts compared to those from conventional algorithms. DPR generated real-space objects that were better aligned with high-(Q) diffraction signals and demonstrated strong positive correlations with the results from conventional algorithms, similar to findings from independent experiments (Fig. 4a, b, and Supplementary Fig. 2b). Thus, DPR was validated as effective for extracting real-space information from diffraction patterns, showing robustness to experimental noise and partial data loss. Notably, DPR was trained without any physical bias and did not require fine-tuning procedures for different types of samples. The consistently improved performance across various datasets confirms the general applicability of DPR. This method enables rapid reconstruction of real-space images from imperfect, noisy diffraction patterns within 10 ms using a single GPU, regardless of experimental conditions or sample types, achieving real-time phase retrieval for XFEL data. In addition, although multiple scattering effect is ignored in the present DPR, an extension to implement such effect can be made by modifying the diffraction model with the multislice method³⁸. As DNN has shown a superior performance for various types of data buried by strong noises, even succeeded phase retrieval of very low photon count measurements in optical regime, DPR also has a potential to deal with diffraction data having extreme noise levels by training with much weaker diffraction datasets³⁹. Moreover, the techniques employed in DPR, such as WPC, are not limited to phase retrieval but are applicable to solving various problems in X-ray diffraction experiments, including classification and denoising of measured data.

**Fig. 4: DPR on publicly available single-pulse X-ray diffraction data.**

Discussion

The DNN with the newly proposed architecture excels in solving the phase problem, demonstrating outstanding performance in the phase retrieval of X-ray diffraction patterns. Notably, this network shows excellent tolerance to experimental noise and partial data loss. When applied to single-pulse XFEL diffraction patterns, it achieves rapid and direct reconstruction of real-space images, enabling real-time phase retrieval. The increasing importance of high-speed data processing arises from the large volumes of data generated in a short time by next-generation X-ray sources, and DPR is well-suited to handle such extensive datasets. Whilst the present form of DPR is not designed to deal with complex-valued or 3D objects, it can be readily adapted to those datasets with simple modifications: generating two-channel outputs with each channel corresponding to real and imaginary parts or replacing all 2D operations in DPR to 3D ones for complex-valued objects and 3D diffraction data, respectively. It is also possible to reconstruct 3D objects from 2D images by using real-space tomographic reconstruction algorithms like RESIRE⁴⁰. On the other hand, as DPR does not take account of any correlation between input diffraction patterns, the reconstruction of 3D objects from correlated 2D diffraction patterns of a specimen with random orientations is not available, limiting its use in 3D single-particle imaging yet^41,42.

The WPC, which utilizes the Guinier–Porod model to guide lost information, highlights the importance of properly handling diffraction data to extract structural information. Despite the WPC-based encoder comprising only 10% of the total trainable parameters in DPR, this approach can be easily adapted to various types of incomplete experimental data, such as X-ray absorption or emission data, by applying appropriate physical models for further improvements in DL-based operations. Thus, DPR not only provides real-time phase retrieval for imperfect diffraction patterns but also represents a novel method for managing partially damaged data from various experiments with distinctive characteristics. The approach is particularly relevant for time-resolved diffraction imaging with high-repetition-rate XFELs, allowing observation of femtosecond dynamics in systems driven far from equilibrium, thus revealing hidden material phases not accessible through equilibrium thermodynamics. DPR is poised to significantly advance this research area by fully utilizing massive datasets in parallel with data collection.

Methods

Weighted partial convolution

Building on the concept of PC, WPC incorporates position-dependent weights based on the Guinier–Porod model, which describes the radial intensity distribution in small-angle scattering data^23,26. For an ideal sphere with a smooth surface, the Guinier–Porod model provides the relationship between intensity (I) and momentum transfer (Q) as follows:

$$Ileft(Qright)=left{begin{array}{c}Gexp left(-frac{{R}^{2}{Q}^{2}}{5}right),qquad{rm{for}}Qle {Q}_{1}\ Gexp left(-frac{{R}^{2}{Q}_{1}^{2}}{5}right){left(frac{{Q}_{1}}{Q}right)}^{4},qquad{rm{for}}Q, >, {Q}_{1}end{array}right.$$

(1)

where (G) is the Guinier scale factor; (R) is the sphere radius; and ({Q}_{1}) is the boundary momentum transfer between the Guinier and Porod models, defined as ({Q}_{1}=sqrt{10}/R). Here, (R) is given by (pi /sigma), where (sigma) is the oversampling ratio along an axis, to match a unit of momentum transfer with a pixel of the measured diffraction pattern. The position-dependent weights of the WPC were determined using Eq. (1), with (sigma =min left(H,Wright)/64), where (H) and (W) are the height and width of the input for each layer, respectively, and 64 represents the matrix size allocated for the final real-space images. The operation of WPC with convolution kernel ({boldsymbol{K}}) is defined as.

$${x}^{{prime} }=left{begin{array}{c}{{boldsymbol{K}}}^{T}left({boldsymbol{X}}odot {boldsymbol{M}}right)frac{sum _{i}{{boldsymbol{W}}}_{i}}{sum _{{{boldsymbol{M}}}_{i}ne 0}{{boldsymbol{W}}}_{i}},qquad{rm{if}}sum _{i}{{boldsymbol{M}}}_{i}ne 0\ 0,qquad{rm{otherwise}}end{array}right.$$

(2)

where ({boldsymbol{X}}) is the input, ({boldsymbol{M}}) is the binary mask for the valid data points, ({boldsymbol{W}}) is the weight in the region covered by the kernel during convolution, and (odot) is an element-wise multiplication.

Diffraction model

The diffraction model generates diffraction patterns from objects, reflecting the properties of single-pulse X-ray diffraction imaging experiments using XFELs. Basic diffraction patterns are produced by taking the absolute square of the FFT of pseudo-random objects derived from a combination of EMNIST and CIFAR-100 datasets^27,28. EMNIST consists of handwritten character digits that define the shapes of the objects, while CIFAR-100 includes images of real-world objects in 100 classes that provide internal density distributions. Specifically, EMNIST images are enlarged using maximum filters with random widths ranging from 3 to 7 pixels, then modified by affine transforms with random angles (0° to 90°) and scales (0.8 to 1.5), and finally cropped to 64 × 64 pixels. This results in an oversampling ratio of approximately 10 to 20 within a 512 × 512 window. CIFAR-100 images are cropped with random scales and aspect ratios ranging from 0.08 to 1 and 0.75 to 1.33, respectively, and then resized to 64 × 64 pixels. After generating the basic diffraction patterns, the Gaussian Schell model is used to account for the finite spatial coherence length of the radiation from the XFELs, as follows:

$${{boldsymbol{I}}}^{{prime} }=left|{rm{FT}}left{{{rm{FT}}}^{-1}left[{boldsymbol{I}}right]odot exp left(-frac{{{boldsymbol{r}}}^{2}}{4{sigma }_{mu }^{2}}right)right}right|$$

(3)

where ({boldsymbol{I}}) is the diffraction pattern, ({boldsymbol{r}}) is the matrix of radial distances from the center, and ({sigma }_{mu }) is the spatial coherence length⁴³. ({sigma }_{mu }) is given by 200 pixels with 10% random deviations. Then, the diffraction patterns are scaled to have total diffraction intensities in the range of 10⁶–10⁷, and mixed Poisson–Gaussian noise was added to the patterns as follows:

$${{boldsymbol{I}}}_{i}^{{prime} }={rm{Pois}}left({{boldsymbol{I}}}_{i}cdot frac{{{boldsymbol{I}}}_{{rm{total}}}}{sum _{j}{{boldsymbol{I}}}_{j}}right){{+}}{mathcal{N}}left(0,sigma right)$$

(4)

where ({rm{Pois}}left({rm{lambda }}right)) generates random values from a Poisson distribution with λ events, and ({mathcal{N}}left(mu ,sigma right)) generates random values from a normal distribution with mean (mu) and standard deviation σ. σ was set to 1/2.35482, giving full width at half maximum of 1 for Gaussian normal distribution. The final diffraction patterns were paired with random masks. These masks were created using a combination of center masks with random radii (ranging from 8 to 32 pixels) and positional deviations (ranging from −8 to 8 pixels along each axis), along with irregular masks from the NVIDIA Irregular Mask Dataset²³. The occlusion ratio for the irregular masks was limited to 50%. The total number of generated patterns was 96,000 for training, 12,000 for validation, and 12,000 for testing.

Loss function and network training

The loss function comprises MAE, MAE of image gradients, perceptual loss, and ({R}_{{rm{F}}}) with the ground-truth Fourier magnitudes. These functions are defined as

$$begin{array}{c} {mathcal L} ({boldsymbol{X}},{boldsymbol{Y}})=frac{1}{N}mathop{sum }limits_{i=1}^{N}{{|}}{{boldsymbol{X}}}_{i}-{{boldsymbol{Y}}}_{i}|,qquad{ {mathcal L} }_{{rm{grad}}}({boldsymbol{X}},{boldsymbol{Y}})=frac{{sum }_{{{boldsymbol{Y}}}_{i}ne 0}||nabla {{boldsymbol{X}}}_{i}-nabla {{boldsymbol{Y}}}_{i}||_{1}}{{sum }_{{{boldsymbol{Y}}}_{i}ne 0}1}\ { {mathcal L} }_{{rm{perc}}}({boldsymbol{X}},{boldsymbol{Y}})= {mathcal L} (Phi [{boldsymbol{X}}],Phi [{boldsymbol{Y}}]),qquad{R}_{{rm{F}}}^{{rm{GT}}}({boldsymbol{X}},{boldsymbol{Y}})=frac{{sum }_{i}||FT[{rm{X}}]{|}_{i}-|{rm{FT}}[{rm{Y}}]{|}_{i}|}{{sum }_{i}|FT[Y]{|}_{i}}end{array}$$

(5)

where ({boldsymbol{X}}) is the output from the network, ({boldsymbol{Y}}) is the target, and (Phi) is the pretrained neural network. For perceptual loss, intermediate outputs after the 4th and 5th blocks of ImageNet-pretrained VGG-19 were used³¹. Additional weights were applied to the outputs based on the square root of the total diffraction intensity to reduce the influence of weak data. Based on the loss function, the network is trained by the AdamW optimizer with ({beta }_{1}=0.9), ({beta }_{2}=0.999), and a weight decay of 0.0001 for 500 epochs followed by 100 epochs with learning rates of 0.001 and 0.0001, respectively³⁰. For the case using the AdamWR optimizer with ASAM, parameters of ASAM were set as (rho =0.2) and (eta =0.01); the learning rate was determined by cosine annealing with a warm restart scheduler as ({alpha }_{i}={alpha }_{min }+0.5left({alpha }_{max }-{alpha }_{min }right)left(1+cos left(pi T/{T}_{i}right)right)), where ({alpha }_{min }={10}^{-8}), ({alpha }_{max }=0.005), (T) is the number of epochs after a recent restart, and ({T}_{i}) is the number of epochs between two restarts, initially set to 40 and doubled after each restart³⁴. Twelve NVIDIA GeForce RTX 3090 GPUs were used for network training.

Evaluation metrics

The performance of DPR was evaluated using three metrics: ({R}_{{rm{F}}}), PSNR, and SSIM⁴⁴. The metrics are defined as follows:

$$begin{array}{c}{R}_{{rm{F}}}left({boldsymbol{X}},{boldsymbol{I}}right)=frac{sum _{i,{rm{valid}}}left|{left|{rm{FT}}left[{boldsymbol{X}}right]right|}_{i}-sqrt{{{boldsymbol{I}}}_{i}}right|}{sum _{i,{rm{valid}}}sqrt{{{boldsymbol{I}}}_{i}}},qquad{rm{PSNR}}left({boldsymbol{X}},{boldsymbol{Y}}right)=20{log }_{10}frac{max left({boldsymbol{Y}}right)}{sqrt{displaystylefrac{1}{N}{sum }_{i=1}^{N}{left({{boldsymbol{X}}}_{i}-{{boldsymbol{Y}}}_{i}right)}^{2}}}\ {rm{SSIM}}left({boldsymbol{X}},{boldsymbol{Y}}right)=frac{left(2{mu }_{{boldsymbol{X}}}{mu }_{{boldsymbol{Y}}}+{c}_{1}right)left(2{sigma }_{{boldsymbol{XY}}}+{c}_{2}right)}{left({mu }_{{boldsymbol{X}}}^{2}+{mu }_{{boldsymbol{Y}}}^{2}+{c}_{1}right)left({sigma }_{{boldsymbol{X}}}^{2}+{sigma }_{{boldsymbol{Y}}}^{2}+{c}_{2}right)}end{array}$$

(6)

where ({boldsymbol{X}}) is the output from the network, ({boldsymbol{Y}}) is the target, ({boldsymbol{I}}) is the diffraction pattern, ({mu }_{{boldsymbol{X}}}) is the mean of ({boldsymbol{X}}), ({sigma }_{{boldsymbol{X}}}^{2}) is the variance of ({boldsymbol{X}}), and ({sigma }_{{boldsymbol{XY}}}) is the covariance of ({boldsymbol{X}}) and ({boldsymbol{Y}}). ({c}_{1}) and ({c}_{2}) in SSIM are given by ({left(0.01max left({boldsymbol{Y}}right)right)}^{2}) and ({left(0.03max left({boldsymbol{Y}}right)right)}^{2}), respectively. A two-sided Mann–Whitney U test was also performed using the evaluation metrics to identify statistical differences in DPR. For cases involving experimental data, the local ({R}_{{rm{F}}}) and Pearson correlation coefficients (PCCs) for all pairs were calculated. The local ({R}_{{rm{F}}}) was calculated pixelwise for data points with photon counts exceeding 0.5, while the PCC was defined as ({rm{PCC}}left({boldsymbol{X}},{boldsymbol{Y}}right)={sigma }_{{boldsymbol{XY}}}/{sigma }_{{boldsymbol{X}}}{sigma }_{{boldsymbol{Y}}}).

Phase retrieval parameters

For phase retrieval using HIO and GPS, 1000 iterations were performed with 100 initial random phases^32,33. The HIO algorithm was employed with (beta =0.9,) and the error reduction algorithm accounted for 10% of the total iterations. GPS was executed as R variants (GPS-R) with the following parameters: (t=1), (s=0.9), (sigma) linearly increasing from 0.01 by a factor of 10 at 40% and 70% of the total iterations, and (gamma =1/2{alpha }^{2}) with (alpha) linearly decreasing from 1024 by 10% every 100 iterations. Both algorithms also used the shrink-wrap algorithm with (sigma) linearly decreasing from 3 pixels by 1% and a threshold of 20% of the maximum value to update the support constraints every 50 iterations. The initial supports were 60 × 60 pixels for the test data and 30 × 30 pixels for the experimental data. The final images were selected based on ({R}_{{rm{F}}}): a single image for the test data and an average of five images for the experimental data. To refine the outputs from DPR, the support constraints were derived from the output images by thresholding at 1% of the 99th percentile values. Using these supports, GPS-R was conducted for 50 iterations with the following parameters: (t=1), (s=0.9), (sigma) increasing from 0.1 to 1 at 40% of the total iterations, and (gamma =1/2{alpha }^{2}) with (alpha) linearly decreasing from 1024 by 20% every 10 iterations.

Single-pulse X-ray diffraction imaging experiments

The experiments were conducted at the nanocrystallography and coherent imaging (NCI) beamline of the PAL-XFEL⁴⁵. X-ray pulses from self-amplified spontaneous emission with a nominal photon energy of 5 keV and a bandwidth of (Delta E/E) ≈ 5 × 10⁻³ were used for the experiments. The X-ray pulses were focused into a 5 µm (horizontal) × 7 µm (vertical) area by a pair of Kirkpatrick–Baez mirrors installed 5 m upstream of sample position, giving an effective photon flux of approximately 8 × 10⁹ photons·μm⁻² per pulse. Diffraction patterns were recorded using a 1-megapixel multi-port CCD with a pixel size of 50 × 50 μm², located 1.6 m downstream of the sample position. A beam stop was placed in front of the detector to block the direct X-ray beam and cover a quadrant of the detector plane. The samples included Ag flower and cube nanoparticles with approximate widths of 250 nm and 150 nm, respectively. These were spread on 100 nm-thick Si₃N₄ membranes and loaded into the imaging chamber. All beam paths, including the imaging chamber, were kept under vacuum during the measurements. Background signals were subtracted from the measured diffraction patterns, and multiple scattering effect was ignored based on the first-order Born approximation¹⁹. Missing values were substituted with values at centrosymmetric positions in accordance with Friedel’s law, ignoring imaginary parts of atomic form factors, which are much smaller than their real parts, in the experiments. As Ewald sphere curvatures only gave up to 3.20 × 10⁻³% and 3.38 × 10⁻²% difference in in-plane components of the momentum transfers for the data from PAL-XFEL and CXIDB, respectively, their contributions were ignored. Out-of-plane components of the momentum transfers from Ewald sphere curvatures, up to 8.11 × 10⁻⁴ nm⁻¹ and 2.06 × 10⁻³ nm⁻¹, respectively, were also negligible, considering the size of the samples (Supplementary Note 1).