Phase change computational sensor

Introduction

Much of the information we perceive about the external world comes in the form of real-valued signals, and is conveyed through sensory organs. These organs perform data pre-processing such as signal filtration, compression, amplification, and digitization^1,2,3,4. For example, neural circuits in the cochlea (retina) and cortex leverage non-volatile plasticity in synapses to pre-process auditory (visual) sequences, in order to enable subsequent down-streamed computations in the brain^1,5,6,7. This observation suggests that emerging brain-inspired non-Von Neumann hardware concepts^8,9 could enhance compute efficiency (in terms of energy and latency) and data privacy by incorporating ‘processing’ capabilities within sensor units. Recent progress has demonstrated processing within sensors using three-terminal photodiodes based on 2D materials^10,11,12. These leverage the modulation of the photoresponsivities of pixels using field-effect. The first category of devices use an active gate terminal signal. Therefore, the compute feature is lost when the gate signal is removed. The volatility, therefore, necessitates buffers for storage of model weights (thereby strictly following the von Neumann architecture). In more recent demonstrations, charge-trapping effects have been proposed to program the photoresponsivities. While benefiting from non-volatility, this approach is more generally challenged by poor cyclability and high voltage requirements¹³. One promising approach can be decoupling the sensing and compute elements within the commercialized image pixel unit, while still maintaining dense integration. Such an approach can enable more manufacturable computational sensors for certain in-sensor-in-memory processing tasks (see Fig. 1a).

Phase change computational sensor — **Fig. 1: Computational sensor concept.**

Here, we propose a computational sensor that utilizes embedded phase-change memory^8,9 (PCM). Key idea behind our approach is that two-terminal non-volatile memory technologies, such as PCM can be readily integrated at the back-end-of-the-line with commercial sensors. Indeed, circuits using conductive ReRAM devices within image sensors have been proposed for tasks such as spike generations and averaging¹⁴, dynamic background subtraction¹⁵, image recording¹⁶, adaptive dynamic range modulation¹⁷, among others. Broadly, in these applications, the conductance states of the devices are either fixed (in enabling reference thresholds)^14,15, or they dynamically and incrementally evolve during exposure to the stimuli signal^16,17,18. An application where memristive devices are pre-programmed to select states within active sensor units to enable real-time visual inference remains to be demonstrated. Here, we consider an image sensor that incorporates PCM computational memory devices within its^19,20,21,22 active m × n pixel array to perform dot product operations for in-sensor visual inference. The PCM devices are pre-programmed in an analog manner to execute scalar multiplication operation on the photo-generated current, thereby transforming the pixel output into an effective computational result. The accumulation step, which involves adding products from multiple pixels, is achieved by summing the output of neighboring pixels—which are determined by the kernel size—in parallel along the interconnects of the sensor’s crossbar. Hence, as a k × k kernel traverses a segment of the pixel array (see Fig. 1b), the corresponding (m, n) pixels can be read out, and their values accumulated as output signals. Consequently, the sensor generates an image that represents a pre-computed version of the raw input. This output can be further downstreamed to the PCM computational memory cores for subsequent processing, as has been previously suggested with other memristive-type memories^{23,24,25,26,27}.

Pixel Characteristics

A PCM device integrated into the sensor’s active pixel provides a signal division in the output (see Fig. 2a(i)), obeying ({v}_{m,n}={v}_{mo,no}times (frac{{D}_{m,n}}{{D}_{m,n}+{G}_{m,n(kappa )}})). Here, v_m,n is the output of the P_m,n pixel in the array. G_m,n(κ) is the conductance value programmed into the k^th PCM device of a pixel and D_m,n is the conductance of the pixel, that scales with the input signal ϕ. v_mo,no is the PCM device independent output of the pixel. Crucially, the expression suggests that for a fixed input, v_m,n increases with decreasing G_m,n(κ), and for a fixed G_m,n, it increases with increasing amplitude of ϕ. The device characteristics can be collectively represented using load lines (RL). The figure (see Fig. 2a(ii)) shows a simulated current-voltage characteristics of a pixel under increasing light flux (ϕ_1→5), and decreasing G_m,n (RL_1→5). The plot illustrates that by modulating G_m,n, it becomes possible to configure a selection of sensitivity and dynamic range to light detection at the individual pixel level. For instance, in bright environmental conditions, a large dynamic range can be achieved to avoid pixel saturation (at the expense of sensitivity) using high G_m,n, while under dark conditions, high sensitivity can be enabled for faint signal detection (at the expense of dynamic range) using small G_m,n.

In our toy demonstration, a pixel is comprised by isolated components: a protoytype circuit board that hosts the phototransistor circuitry, and a silicon chip containing isolated PCM devices (in supporting information section S1, the setup and a SPICE simulation of the circuity is shown). The PCM device is of the contemporary mushroom-type and utilizes 80 nm thin film of Ge₂Sb₂Te₅ (or GST) phase-change material. During read-out under illumination, the state of the PCM device modulates the output. Programming the state involves write operations, specifically electrical current pulses that induce Joule heating for the amorphization (RESET) and crystallization (SET) of the phase-change material within the PCM device. A PCM device can be programmed to various non-volatile conductance states by adjusting the amplitude of the programming pulses. Figure 2b demonstrates the dependency of the output signal of a pixel on the conductance state of the PCM device. The experiment is conducted under constant illumination, and the measurement is repeated 10 times in this plot. The plot validates the configurable sensitivities of the phototransistor through the phase configuration of the PCM device. The diode can be persistently tuned to high sensitivity (HS) by programming to the RESET states within a PCM device and to low sensitivity (LS) by programming to the SET states. Furthermore, the extent of this tunability can be significant, constrained only by the memory window (G_Set − G_Reset) of the PCM device (in our measurements, a conductance compliance of 10 μS reduces this range by an order of magnitude, to ~ 30x).

Thus, a computational sensor unit enables optimal detection of changing environmental conditions via non-volatile modulations of the conductance states of the PCM devices, as is highlighted in the inset of Fig. 2b. In this experiment, we performed pixel reads 1800 times for three conductance states of the PCM device (SET, partial RESET, RESET) to demonstrate the sensor’s adaptability in responding to varying brightness conditions. In Fig. 2c, we showcase the scalability of the sensor’s output under different illumination conditions. In this measurement, the PCM device is configured to the SET state. The output exhibits a proportional increase with illumination intensity, attributed to the rising photocurrent generated in the diode (the measurement is repeated 10 times). Given the expected low noise in the SET state of PCM devices, this measurement suggests that the spread in the output is primarily influenced by peripheral components on the circuit board. In the inset of Fig. 2c, we plot the sensor’s output immediately after programming its PCM device to a partial RESET state. The measurement extends over 1500 s and illustrates the stable nature of the output signal. This stability is attributed to two factors: the signal divider read-out scheme (as opposed to the standard current read-out in which conductance drift becomes prominent) and the pseudo-projection²⁸ rendered by the conductance-limiting component in the pixel.

In-sensor convolutions

A prominent class of computational models that stands to gain from in-sensor computations are convolutional operations. Images can be blurred, sharpened, or embossed for standalone use cases with convolutions or prepared in real-time as formatted/pre-processed inputs for deep computing networks (see Fig. 3a), such as in convolutional neural networks (CNNs). In a convolution operation between an image of dimension n × n and a filter of dimension k × k, the number of MAC (multiply and accumulate) operations required to process the image, scales as (n−k)². When n > > k, which is a typical case (e.g., 1280 × 1024 pixels sensor using 16 × 16 canonical filters²⁹), the compute becomes very expensive. Therefore, one approach toward an efficient hardware can be to divide the computational effort between the sensor and the processor (see Fig. 3a–b). That is, by performing convolutions as when the data is captured using in-sensor computing, convolution operations of the first layer can be offloaded from the processor. As an example, with data gathered from our experimental setup, in Fig. 3c we simulate in-sensor convolutions for an image blurring operation. Image blurring (or smoothing), provides a point-spread capacity by reducing the amount of noise and speckles in the input, and is a common pre-processing task.

**Fig. 3: In-sensor-in-memory operations.**

Additionally, depending on the circuit design, the accumulations can be made either on the image sensor array (MAC_Sensor), which is the mode discussed so far, or on the word lines of a PCM computational memory array (MAC_PCM-tile) (in supporting information section S2, illustrations of these configurations are shown). In either case, we note that the most optimal scenario for in-sensor convolutions is when s ≥ k, where s is the fixed stride that defines the number of pixel shifts of the kernel between subsequent MAC operations. This constraint has two benefits: (i) convolutional operations on all pixels in select rows can be carried out in parallel, reducing the computational complexity to O(c) (or O(fc) with f filters) under MAC_Sensor where c(k, s) < m, and (ii) the number of PCM devices can be kept to a minimum within each pixel. For the case s = k, the number of PCM devices in a pixel scales with f, thus simplifying the integration and arbitration schemes. In contrast, when s < k, the kernels overlap, leading to the loss of parallelization (owing to requirement to toggle between different kernel values in the overlapping regions). Such overlaps also create disproportionate number of PCM devices per pixel. For example, considering s = 1, the number of PCM devices in an m^th, n^th pixel follow f ⋅ k² for m^th ≥ k − 1 and n^th ≤ n − k − 1. Nonetheless, it is worth noting that since n > > k is a typical condition, the constraint s = k may not be a limitation —the resolution of the output or the quality of the image transformation can be reasonably preserved.

Model-based learning

Beyond contemporary CNNs, convolutional operations remain crucial in model-based vision. An instance of this need arises in tasks like model-based object recognition, where the types and instances of a set of objects in a given scene are known beforehand. As an illustration, we delve into the example of lane/line detection in an image using Hough transformation³⁰. The computational workflow involves image preprocessing (conducted through in-sensor convolutions, using the framework discussed earlier) followed by the downstream task of Hough transformation performed in the computational memory (see Fig. 3d). To showcase this, we utilize the IBM HERMES Project Chip, fabricated using 14 nm complementary metal-oxide-semiconductor technology³¹, featuring a 256 × 256 crossbar array of PCM unit cells.

The transformation converts each point (x, y) in the image to the parameter space coordinate (r, θ) using the expression (overrightarrow{r}=xcos (theta )+ysin (theta )), where r is the distance from the origin to the closest point on the straight line, and θ is the quantized angle between the x axis and (overrightarrow{r}), representing the line in the image. This operation is succeeded by a voting procedure in the accumulator space. The coordinates (cells) with the highest counts in the parameter space signify the most likely parameters describing a shape (in supporting information section S3 a more comprehensive discussion about implementation of Hough transformation is discussed). As an initial step, we adapt these transformations for in-memory computations. This can be accomplished using in-memory matrix-vector multiplications (MVMs) for the parametric space and conductance accumulations to implement the accumulator space. Interestingly, the same task utilizes the two—and otherwise disparately used- computational primitives for PCM devices: scalar multiplication computations from the multilevel conductance values and the accumulative behavior arising from crystallization dynamics^32,33. In the MVM, columns of the crossbar array are assigned θ_n values, such that m × n PCM devices can encode fixed values for cos(θ_n) and sin(θ_n). This way, parallel Multiply and Accumulate (MAC) operations are performed on the inputs, and the outputs represent the r(θ) values. The accumulator operation is then performed in a computational memory array whose elements are represented by the (θ, r) tuples. In this accumulation scheme, all cells are initialized in the RESET state. The cell’s conductance evolves according to the number of constant amplitude crystallization pulses, and the computation result is stored in place due to PCM’s non-volatility. By reading out the PCM devices with the highest conductance values using a threshold scheme, the most likely lines are extracted, and their approximate geometric definitions are determined. In Fig. 3e, these operations are illustrated. Both MVM and accumulation operations are carried out in the same computational memory array, leveraging non-overlapping areas. Figure 3f(i) illustrates the matrix encoding the trigonometric values, and Fig. 3f(ii) shows an example of conductance change from pulse accumulations. Figure 3g shows MVM results performed for 82 points in an input image. The results of MVM are then used to locate the (θ, r) pairs for the accumulation operations, as illustrated in Fig. 3h. Starting in the RESET state, different devices attain different conductance values after processing the entire image. The most conductive devices encode the correct angles the lines subtend.

Discussion

Processing data, quasi-locally, i.e., in the edge, has traditionally required substantial processing power, memory, and communication bandwidth. One of the key ideas we propose is to implement the convolutional operations within the sensor: in particular, the initial layer of the computing networks. Under the typical rolling shutter scheme, when performing the convolutional operations, k rows in the sensor are read-out in parallel. For s = k, the read-out time of a single frame becomes ({T}_{R}=frac{{t}_{{rm{R}}}}{mtimes s}), where t_R is the digitization of a single row. Therefore, larger-sized kernels inherently improve the frame rate of the sensor. However, it appears that this improvement is only valid for the case f ≤ s. Since f depends on the application, this improvement metric must be considered application specific. An added gain also appears from the reduction in the data volume that must transferred to the memory or processor. This is because an image of dimension m × n, undergoes dimension reduction (m − k + 1) × (n − k + 1) from convolutions. We also discuss approaches to speed-up model based approaches, all the while by leveraging the crossbar topologies of the sensor and computational memory units. As an exemplar problem, we discuss Hough transformation based object detection model. We discuss how, by embedding this model, into the proposed approach, the time complexity³⁴ (O(N⁴)) can be reduced to a constant O(c), where c < < N (in supporting information section S3 we estimate the time complexities). It is also worth noting that in-sensor computations can benefit standalone imaging sensors, by providing the pixel’s a means to adapt to varying lighting conditions. Since this occurs at low power expense owing to the non-volatility of the PCM devices, the battery lives of sensors, such as hand-held devices can be extended. Although our concept can be applied to other non-volatile memory technologies, we believe PCM holds the most promise for computational sensors. PCM is at a very high maturity level of development and has been commercialized as both standalone memory and embedded memory^8,35,36. This fact, together with the ease of embedding PCM on logic platforms make this technology of unique interest^31,37.

We identify the following limiting cases in which in-sensor computations are expected to accelerate processing. When applied to shallower networks (eg. single or few user-defined filters), when applied to downsampled images (smaller m values) in deep networks, when s number of filters are offloaded from the processor to the sensor, and when applied to certain preprocessing tasks for machine learning. In supporting information section S4, we have estimated the performance gains (areal, energy and latency gains) by emulating the implementation of convolutions on ISC-IMC. Some important challenges, however, must be pointed out. To avoid read disturbance of PCM devices, the output voltage range must be kept below the threshold voltages of the phase configurations. When considering scaling up, that is the integration of PCM with stacked CMOS sensor chips, interconnects and their connectivity will become an important factor. This could, altogether, necessitate novel integration methods, including hybrid bonding³⁸ (i.e, physical stacking of wafers). In summary, we make a proposal for a computational sensor that combines the contemporary phase-change memory technology with contemporary sensors to enable in-sensor-in-memory computing for edge intelligence.

Methods

Electrical characterization

The devices for optoelectronic measurements comprised an 80 nm thick film of a GST phase-change material, sandwiched between bottom and top metal-nitride electrodes, where the bottom electrode radius was 20 nm. The IBM HERMES Project Chip comprised similar mushroom-type devices but with doped-GST phase-change material. See reference³¹ for more information about the chip. The electrical measurements were performed in a custom-built probe station. DC measurements of the device state and biasing of the optoelectronic circuitry were performed with a Keithley 2600 System SourceMeter. AC signals were applied to the device and the white LED for illumination with an Agilent 81150 A pulse function arbitrary generator. A Tektronix oscilloscope (DPO5104) recorded the voltage pulses applied to and transmitted by the device and the LED. For read-out and programming of the pixel unit, switching between the circuit for DC and AC measurements was achieved with mechanical relays. See Supporting Information Section 1 for more information about the measurement circuitry.