Efficient optimisation of physical reservoir computers using only a delayed input

Introduction

In recent years, the quest for efficient and powerful machine learning paradigms has led researchers to explore unconventional computational architectures inspired by the dynamics of physical systems. Reservoir Computing (RC)¹ is a promising approach. RC diverges from conventional neural networks by using an architecture characterized by a fixed, randomly initialized recurrent layer, called “reservoir”, coupled with a simple linear readout layer: its characteristic structure reduces the training computational complexity, and consequently, the energy consumption. RC is very good at processing temporal data, distinguishing itself across a large variety of tasks, such as equalization of distorted nonlinear communication channels^1,2, audio processing^3,4, weather prediction⁵, image^6,7 and video^8,9 classification. RC has also gathered attention for its feasibility to being implemented on a wide range of physical substrates^10,11. Amongst the different platforms, optical implementations of RC stand out as one of the most promising, thanks to its advantages in parallelization^12,13, high speed¹⁴ and minimal hardware requirements¹⁵.

Hyperparameter optimisation is a crucial aspect of tuning reservoir computers, aiming to enhance their performance across various tasks. Various optimisation techniques are employed to search for the most effective hyperparameter values. Grid search, random search, and more sophisticated methods like Bayesian optimisation^16,17 and genetic algorithms^18,19 have been proposed. Nonetheless, the optimisation of hyperparameters remains a critical challenge to the widespread adoption of RC²⁰. The challenge lies in exploring the vast hyperparameter space to find configurations that yield optimal results for a given task, and the sensitivity of the reservoir to hyperparameters. The high-dimensional nature of the hyperparameter space can make the search process computationally expensive and time-consuming, even more so in the case of slow experimental reservoirs that require more hyperparameters and for which one iteration of the grid search takes up to several hours⁸.

In this work we study an approach, first introduced in ref. ²¹ in numerical simulations, based on the use of a delayed input. In²¹ and later in ref. ²², Jaurige at al. show that adding a time-delayed version of the input can considerably improve the performance of a reservoir whose hyperparameters are not adjusted. This powerful approach requires to tune only two parameters instead of the many hyperparameters usually associated to a reservoir, thus allowing the optimisation with little computational cost. This optimisation approach is similar to that proposed in ref. ²³ where, instead of feeding the delayed inputs into a reservoir, one takes (digitally) polynomials of the delayed inputs. More recently, the use of time delayed inputs was investigated numerically in ref. ²⁴ in the context of a RC based on silicon microrings.

We test this approach on an experimental RC. We use the architecture based on time multiplexing first introduced in ref. ²⁵, but in an optoelectronic version as introduced in refs. ^26,27 and since studied extensively, see e.g.^14,28,29,30. Previous works^31,32 show how the insertion of a second delay loop in the feedback can be beneficial to an optoelectronic delay-based reservoir in terms of computational capabilities. Contratry to the later approaches, in our work we enhance the information processing without modifying the reservoir itself, since we add a delay only in the input layer of the reservoir. In the present proof of principle demonstration the addition of the delayed input is implemented in a digital preprocessing step. However we emphasize that a physical implementation using a delay line should be rather simple to implement.

We assess the impact of this technique on different tasks involving time-series forecasting and audio classification, so as to verify its effectiveness using data of various nature with different intrinsic timescales. We show the superiority of this procedure versus the standard hyperparameters optimisation. The present work thus demonstrates that these improvements, already reported in software implementations in refs. ^21,22, continue to hold even in the presence of experimental constraints such as added noise. The use of delayed inputs could thus be particularly relevant for hardware reservoir computing where noise is present and where it may not be possible to optimise all hyperparameters. We further discuss the perspectives of our work in the conclusion.

Methods

Time-delay Reservoir Computing

A reservoir computer consists of three main components: an input layer, a reservoir and an output layer. The input layer maps the input signal into the reservoir. The reservoir is a recurrent neural network with fixed interconnections. The reservoir layer exploits the rich dynamics of its internal state, acting as a complex computational substrate that temporally transforms input signals into high-dimensional representations. During the training phase, the reservoir’s states are used to evaluate the parameters of the linear output layer.

In this work, we use the well-known delay-based reservoir computer based on a single nonlinear node²⁵. The reader can refer to^15,33 for reviews on time-delay RC. In this implementation, the reservoir nodes are arranged in a topology similar to a ring structure where each node is connected to its neighboring nodes. The dynamics of a time-delay reservoir of size N (with N being the number of virtual nodes) in discrete time is described by:

$$begin{array}{rcl}&&{x}_{0}(n+1)=f(alpha {x}_{N-1}(n-1)+{beta }_{1}{M}_{0}u(n)+{I}_{0})\ &&{x}_{i}(n+1)=f(alpha {x}_{i-1}(n)+{beta }_{1}{M}_{i}u(n)+{I}_{0})quad i=1,..,N-1,end{array}$$

(1)

while the reservoir’s output y is evaluated through a simple linear operation y(n) = W_outx(n). Here n = 1, . . . , K is the discrete time, x(n) is the N-size vector of the reservoir states at timestep n; u(n) is the temporal dependent input signal; f is a nonlinear activation function (in our experimental system described below f is a sinusoidal function); α is the feedback attenuation, which represents the strength of the interconnections between reservoir’s nodes; the input mask M maps the input signal in the reservoir, its coefficient are usually drawn at random from a uniform distribution in the range [−1, +1]; β₁ is the scaling of the input signal, usually called input strength; I₀ is a constant bias. After running the reservoir, all the state vectors x(n) are collected in a NxK state matrix X which is used for the training, i.e. to obtain the N weights W_out of the linear readout layer. In this work we use the regularized linear regression [43]:

$${W}_{{{{rm{out}}}}}={({X}^{T}X+lambda I)}^{-1}{X}^{T}tilde{y},$$

(2)

where (tilde{y}) is the target signal and λ is the regularization parameter. In the case of a classification task with C output classes, W_out becomes a CxN matrix. The reservoir’s output y is then used to evaluate the reservoir computer’s performance using the task-specific figure of merit.

Reservoir Computing with Delayed Input

The idea of the present work is to use a delayed version of the input time-series u(n) as the input signal to the reservoir. By adding a delay d in the input layer, it is possible to augment the memory of the reservoir²², and therefore the achievable performance. Here, by ‘memory’ of the reservoir we refer to the influence exerted on the reservoir states at the current timestep by the input signal at past timesteps. More specifically, better performance can be achieved by tuning the amount of delay d itself, so that the reservoir’s memory is augmented in a way that corresponds to the task requirements.

To construct the new input, we use two NxK input masks M₁ and M₂ with values sampled from uniform distributions in the range [−1, 1]. The value of the new driving signal J at timestep n is then defined as:

$$J(n)={beta }_{1}u(n){M}_{1}(n)+{beta }_{2}u(n-d){M}_{2}(n)+{J}_{0},$$

(3)

where d is the delay, β₁ and β₂ are the scaling parameters of the input and the delayed input, respectively, and J₀ is a constant bias. The dynamics of the reservoir of equation (1) become then:

$$begin{array}{rcl}&&{x}_{0}(n+1)=f(alpha {x}_{N-1}(n-1)+J(n+1))\ &&{x}_{i}(n+1)=f(alpha {x}_{i-1}(n)+J(n+1))quad i=1,..,N-1.end{array}$$

(4)

The new driving signal J contains two different components: one related to the input at the present timestep, and one related to the input value d timesteps in the past. The masking process is now made with the two different masks M₁ and M₂. The idea is to optimise the reservoir tuning only the values for the delayed input part which are β₂ and d, while keeping fixed the parameters specific to the reservoir (such as the internal feedback strength α and the input scaling β₁). Unlike traditional methods that predominantly rely on modifications within the reservoir itself (for instance by tuning α in equation (1)), our approach manipulates the input signal. In other words, here the enhancement of the memory capability can be achieved without modifying the reservoir. In this way, it is possible at the same time to augment the memory and simplify the hyperparameter optimisation.

Experimental Setup

Our optoelectronic setup, shown in Fig. 1, implements the time-delay RC introduced above. It is similar to experimental systems used in previous works^28,29. A superluminescent diode (Thorlabs SLD1550P-A40) generates broadband light at 1550 nm. A Field Programmable Gate Array (FPGA) board generates the reservoir’s driving signal J(n) described by equation (3). This means that in this experiment we are not implementing the input delay with a physical delay line, but rather J(n) is obtained through digital preprocessing. In this way, an electrical signal proportional to the driving signal J(n) is generated by the FPGA and drives an electro-optic Mach-Zehnder (MZ) intensity modulator (EOSPACE AX2X2-0MSS-12): the driving signal gets thus represented in the optical domain. Moreover, the sinusoidal transfer function of the MZ modulator works in intensity only, so that the MZ acts as ({I}_{0}si{n}^{2}(V+frac{pi }{4})={I}_{0}(frac{1}{2}+frac{1}{2}sin(2V))). Because the photodetector and the amplifier have a low pass filter, the constant term has no role in the dynamics, leading to the equations (4), with f a sinusoidal nonlinearity. The output of the FPGA takes roughly 5 ns to reach a steady state response, while the mask step duration is in the order of a hundred of ns: it is therefore possible to describe the system using discrete time equations. Note that much faster implementations are possible, see for instance¹⁴.

**Fig. 1: Optoelectronic reservoir computer.**

The light passes trough an Optical Attenuator (JDS HA9) which attenuates the light intensity in the loop by a fixed configurable factor. Then, the light travels in a fiber spool, whose length of 1.7 km corresponds to a delay τ. This is the feedback delay used to implement the delay-based RC, and corresponds to τ = 7.94 μs. Since τ is fixed by hardware constraint, to select the size of the reservoir N we have to select the proper operating frequency. In this work we implement the reservoir in a ‘desynchronized’ regime (which gives rise to the equations (1) and (4)): the relationship between the feedback delay τ and the clock cycle T is given by τ = T + θ, where θ is the duration of a single mask value and is equal to θ = τ/(N + 1). Given τ fixed by the hardware and N fixed by design choice, we derive the value of T and use it to implement our reservoir. At this point, after the spool, the light represents the time-multiplexed values of reservoir’s nodes. Part of this light is collected by the feedback photodetector PD_f, electrically summed to the input coming from the FPGA, and amplified (with a Mini Circuits ZHL-32A+ coaxial amplifier) to drive the MZ modulator over its full V_π range. The other part of the light is collected by the readout photodetector PD_r and stored by the FPGA for offline computations of the output signal y(n) using the output weights W_out given in equation (2)).

Tasks

We test our optoelectronic reservoir computer on four different tasks: Mackey-Glass system, NARMA10, Spoken Digit Recognition and Speaker Recognition. These are all widely used benchmark tasks in RC community. The first two are time-series predictions tasks, whose goal is to forecast future values in a sequence based on historical information; the last two are classification tasks, aiming to accurately identify and categorize audio-based temporal signals.

Mackey-Glass system

The Mackey-Glass system describes a time-delay differential equation that exhibits chaotic behavior³⁴:

$$frac{dx}{dt}=beta frac{x(t-{tau }_{{{{rm{M}}}}})}{1+x{(t-{tau }_{{{{rm{M}}}}})}^{n}}-gamma x.$$

(5)

We take as parameter values β = 0.2, τ_M = 17, n = 10, γ = 0.1 and timestep dt = 1. The aim is to forecast the value of the series ten time steps into the future. We evaluate the accuracy of the system using the normalised mean square error (NMSE), defined as

$$NMSE=frac{langle {left(y(n)-tilde{y}(n)right)}^{2}rangle }{{langle tilde{y}(n)rangle }^{2}},$$

(6)

where y(n) is the reservoir’s output and (tilde{y}(n)) is the target signal. (Specifically, for the Mackey-Glass task, (tilde{y(n)}=x((n+10)dt)) is the value of the time series 10 time steps ahead).

NARMA10

The NARMA10 task involves predicting the next value in a sequence generated by a NARMA (Nonlinear Auto-Regressive Moving Average) process of order 10. The input signal u(n) is randomly drawn from a uniform distribution between [0, 0.5]. The output q(n) of the NARMA10 system is defined as:

$$q(n+1)=0.3q(n)+0.05q(n)left({sum}_{i=0}^{9}q(n-1)right)+1.5u(n-9)u(n)+0.1.$$

(7)

The aim is, given u(n), to predict q(n). For this task we also use the NMSE (equation (6)) as figure of merit.

Spoken Digit Recognition

The Spoken Digit Recognition dataset consists of 500 total utterances of the 10 (from 0 to 9) digits. Each digit is repeated ten times by five different individuals. We use a dataset with a 3 dB Signal-To-Noise (SNR) ratio babble noise, to increase the difficulty of the task and to evaluate the system’s performance in noisy environments. The dataset is split 450/50 for train/test. The audio signals are pre-processed using the Lyon Passive Ear model³⁵. This model emulates the auditory canal’s biological response and transforms the original time-domain utterances in a 86-channels frequency representation, which is used as input signal. The length of the utterances ranges from 22 to 95 timesteps, with an average length of approximately 68 timesteps each.

In order to address the small size of our dataset containing 500 digits, we implement k-fold cross-validation with k=10. This technique involves dividing the dataset into 10 equal parts of 50 digits each. The training process is then repeated 10 times, where each time a different subset of 50 digits is used for testing while the remaining 450 are used for training. Since this is a classification task with 10 output classes, we use 10 distinct linear classifiers: each classifier is designed to output “+1” if the input matches the corresponding digit and “−1” otherwise. The system selects one of the output classes as the winning one for every timestep of an utterance, and the most voted class during the duration of an utterance is selected as the prediction for that particular utterance: this approach is usually referred to as winner-takes-all. Since this is a classification task and not a time-series prediction task, we don’t use anymore the NMSE as a figure of merit but the Error Rate. The Error Rate is simply defined as the ratio between the utterances predicted correctly and the total amount of utterances.

Speaker Recognition

The Japanese Vowels dataset³⁶ consists of 640 utterances of the Japanese vowel ‘ae’, pronounced by 9 individuals. The goal is to correctly identify the speaker for each utterance. To keep consistency with previous works, the database is split in 270 train sequences and 370 test sequences. The audio samples are pre-processed using the Mel-frequency cepstral coefficients (MFCCs) to obtain their frequency representation, used as input signal to the reservoir. Each sample consists thus of 12 MFCC coefficients for every timestep. The length of the utterances ranges from 7 to 29 timesteps, with an average length of approximately 15 timesteps each. For statistical purpose, the training procedure is repeated 10 times: each time we select at random 270 audio samples for the training and use the rest for testing. Also here, as with the Spoken Digit Recognition task, the Error Rate and the winner-takes-all approach are used to evaluate the accuracy of the system.

Results

In this section we show the experimental results obtained with our optoelectronic reservoir on the tasks listed above. We use N = 50 reservoir nodes for the Mackey-Glass and NARMA10 tasks, and N = 100 nodes for the audio tasks. For the Mackey-Glass and NARMA10 tasks, we use a time-series length of K = 21000. We divide these data points in four sets: the first 500 points are removed for washout, the next 10000 points are used for training, then 500 points are again removed for washout, and the last 10000 points are used for testing. For washout we mean the removing of initial or transient states in a system, to ensure that the model focuses on the system’s steady-state behavior rather than transient effects that might not represent the system’s long-term dynamics. For the Digits and the Speakers tasks, the input length is given by the dataset itself. The frequency-encoded audio samples are sent to the reservoir sequentially: the state of the reservoir are not reset between every digit, meaning that the delayed input can combine data points coming from the present and the previous digit.

To test the effectiveness of the optimisation using the delayed input, we use the same approach for all the tasks. The input scaling β₁ and the bias J₀ (cf. equation (3)) are selected for each task so that the range of the driving signal J(n) falls in the range [0.2, 1.4] when β₂ is set to zero. These values, though not optimised, are considered reasonable inputs for the reservoir. The feedback attenuation is set with the OA of Figs. 1 to 2 dB (which roughly corresponds to a α = 0.15 in equation (1)), a reasonable operating value for our optoelectronic reservoir, and the ridge regression parameter is set to λ = 10⁻⁵ for all the tasks. We emphasize that we do not optimise the value of α, β₁ and λ as hyperparameters for every task, as it is usually done in RC. Another important parameter that is not optimised is the feedback delay time. In fact, we remark the reliability of using the delayed input for optimisation without having to rely on the traditional tuning of hyperparameters. Moreover, we test the Mackey-Glass and NARMA10 tasks also with a value of feedback parameter of 15 dB (which roughly corresponds to α = 10⁻⁴), very far from reasonable values, to assess the effectiveness of this approach when the hyperparameters are off from acceptable values. Table 1 contains in a more compact form all the above-mentioned values, for all the tasks.

**Fig. 2: Experimental results for the Mackey-Glass system.**

Table 1 Parameters of the driving signal and the reservoir, for all the considered tasks

Full size table

The only two parameters that we need to tune for each task are then the scaling of the delayed part of the input β₂ and the delay d (cf. equation (3)). We include also the case where β₂ is scanned while d = 0, which gives a rough idea of how the reservoir behaves when scanning the input strength with no delayed input.

Effectiveness of optimisation with delayed input

Mackey-Glass system

Panel (a) of Fig. 2 shows the results for the Mackey-Glass system. We scan the parameters β₂ and d looking for the best configuration. Without delay (β₂ = 0) the performance are already acceptable (NMSE = 0.20), but they get further improved with the delay, reaching an NMSE = 0.063 with d = 11. Similar results were obtained on the same task in numerical simulations in ref. ²¹, using the delayed input approach. Our results are also comparable to previous works^37,38 which use similar time-delay reservoirs but with thousands of nodes.

Additionally, we wanted to test how the optimisation with delayed input works in the case where the reservoir hyperparameters are rather poor. Panel (b) of Fig. 2 shows the results when an attenuation of 15 dB is applied. Without delay the system performs very poorly, with the NMSE exceeding 2. Instead, by tuning the delayed input we reach an NMSE = 0.141. This proves that, even when very inadequate hyperparameters are selected, the use of a delayed input can still bring the reservoir in a valid operating region.

Note that one would naively expect that the optimal delay d is identical to the delay τ_M = 17 in equation (5). However, in both examples above this is not the case. This complex interplay between reservoir nonlinearity, reservoir memory and task requirement was previously reported in²². Further studies are needed to fully understand them.

NARMA10 task

Figure 3 (a) shows results on the NARMA10 task. Once more we scan the parameters β₂ and d looking for the best configuration. Our best results is a NMSE of 0.4 obtained for a delay of d = 9 timesteps, which corresponds to the intrinsic timescale of the NARMA10 task (cf. equation (7)). Figure 3(b) shows results for to reservoirs with a very strong feedback attenuation α. Here also the delay d = 9 considerably improves performance. (We note that the NMSE of 0.4 reported for this task is not very good compared to the state of the art, possibly because of excess noise during the experiment. However, the trends of how the input delay improves the performance (shown in Fig. 3) are exactly what is expected).

**Fig. 3: Experimental results on the NARMA10 (Nonlinear Auto-Regressive Moving Average process of order 10) task.**

Spoken digit recognition

The results on “noisy” Spoken Digit Recognition are reported in Fig. 4. For this task the error rate is considerably improved using the delayed input, decreasing from 0.250 with no delay (β₂ = 0) to 0.156 with a delay d = 23. For comparison, in ref. ³⁹, on the same task and with the same optoelectronic setup, we reported similar performance but using a deep architecture with 2 interconnected reservoir of size N = 100 each, or using a single reservoir of size N = 600, while in this work we use only one reservoir of size N = 100.

Speaker recognition

Figure 5 shows results on the Speaker Recognition. Also in this case the use of a delayed input increase the performance. The error rate decreases from 0.0384 with no delay (β₂ = 0) to 0.0170 error with a delay d = 5. As a comparison, in ref. ⁴⁰ authors reach a best error of roughly 0.020 with an optical reservoir of N = 200 nodes, whereas in ref. ⁴¹ authors report an error of roughly 0.035 with a reservoir size comparable to ours.

For these two tasks involving audio processing, the explanation for the optimum values of the delay is not straightforward. For both tasks, the optimum delay value is shorter than the average duration of an utterance: the Digits task has an average sample length of 68 timesteps and an optimum delay d = 23, while the Speakers task has an average sample length of 15 timesteps and an optimum delay d = 5. A delay shorter than the utterance duration means that most of the times the delayed input term and the non-delayed input term refer to the same utterance. Thus, these results suggest that the performance of the reservoir can be increased when different temporal components of the same utterance are combined. While d and the sample duration seem somewhat correlated, their relationship does not look immediately quantifiable and it is not possible so far to estimate the optimum delay a priori. This may be due to the different audio preprocessing algorithms used for the two datasets. Nonetheless, these results suggest that the memory increase due to input delay (when tuned to the correct timescale) and the memory inherent in the reservoir’s recurrency both contribute to the overall performance of the system.

Optimisation with delayed input vs standard hyperparameters optimisation

The previous section reports how, for all the tasks, the use of a delayed input outperforms the case where no delay is used, i.e. β₂ = 0. However, all the results presented so far refer to a reservoir whose hyperparameters are not optimised, but rather set to a reasonable value. What happens instead if the hyperparameters are optimised for each task? Would the superiority of the optimisation with delayed input still hold true? We try to answer this question with Fig. 6. The purple curves represent the “standard” approach in RC with no delay, i.e. tuning the hyperparameters (in this case α, β₁ and λ) for every task. The orange curves represent the optimisation using the delayed input, by setting the parameters to the reasonable values of Table 1 and tuning only the delay parameters β₂ and d. In both cases, we explore the behaviour of the system when sweeping the optical attenuation (expressed in dB), which is related to the feedback attenuation α.

**Fig. 6: Experimental results on all the tasks, to compare the standard approach (without input delay) and the approach proposed in this work (with input delay).**

For all the tasks, it is clear that the optimisation with a delayed input prevails: its accuracy is higher for every value of attenuation considered. This is especially true for high optical attenuation, where performance without delay is very bad, while the use of a delay ensures that accuracy remains high.

Figure 7 presents another illustrative example of the difference of the two approaches. Here the comparison is done as a function of the number of nodes N. Typically, in delay-based time-multiplexed RC, one can increase N by extending the length of the feedback loop, and thus the feedback delay τ; alternatively, one can increase the operating clock frequency (cf. section Experimental Setup) while keeping τ fixed. In this work we use the latter approach, since our feedback loop (mainly constituted by the fiber spool) is fixed by hardware constraints. In other words, Fig. 7 shows how the performance of the reservoir is impacted by the two different delays in the system: the input delay line (orange curve), and the feedback delay line (which increases with N). These results show that, for the same N, the optimisation with the delayed input is systematically superior to the standard hyperparameters optimisation. Moreover, without the delayed input, one needs to roughly double N to reach the same performance as with the delayed input. This is an important consideration, as it means that for a given performance one could use a faster or smaller experimental reservoir.

**Fig. 7: Experimental results on the NARMA10 (Nonlinear Auto-Regressive Moving Average process of order 10) task.**

Discussion

In this work we investigate a recently introduced approach for the optimisation of physical reservoir computers using a delayed input, investigated numerically in²¹. We verify its effectiveness using an experimental setup. In this way, we transition from a purely theoretical simulation to practical, experimental validation, in which noise, parameter drift, hardware constraints could limit performance. We evaluate this approach on a series of benchmark tasks, including time-series prediction and audio signal classification, which differ widely in their temporal scales. We report similar performance in experiment to those reported numerically in²¹, thus confirming the efficacy and robustness of this technique.

We emphasize the advantages of this approach. It is a very simple technique that does not require complex search algorithms. In fact, the reservoir’s performance can be optimised only by tuning two parameters: the strength of the delayed component (β₂) of the input, and the delay itself (d). Therefore, it is not necessary to carefully select the reservoir’s hyperparameters for every task: it is sufficient to set them to some reasonable value and fine-tune the delayed input. In addition, the performance is not very sensitive on the value of β₂: in our setup values in the range 1-3 are good, and only need a bit of refinement to get the best values.

Adding delayed versions of the input can be done in any implementation of reservoir computers, whether it be software or physical. Whether or not a delayed input can improve the performance depends on the memory requirements of the task and the memory of the reservoir. If the task does not require memory, or the reservoir already fulfills the memory requirements, then additional delayed input will not help. This also follows from the fact that the total information capacity of the reservoir, as defined in⁴², depends only on the number of internal nodes. Introducing delayed inputs will modify the tradeoff between memory and nonlinearity described in⁴², but will not increase the total memory capacity. There will therefore be some tasks for which adding delayed inputs do not help. What the examples studied here and in²² show is that for many practical tasks, adding delayed inputs is useful for improving performance, or for making the system more robust to parameter changes and other imperfections. Further, optimizing the memory of a reservoir via hyperparameter tuning can be cumbersome and it often not a viable option in experimental settings, in such cases, including a delayed input can improve performance with minimal effort.

We remark that in this work we implement the delayed input just by acting on the input signal before sending it to the reservoir. This means that we augment the memory property of the reservoir with no hardware modifications. Moreover, differently from other approaches^31,32 we do not need to modify the reservoir internally since we act only on the input layer. Reservoir computers have been implemented with a wide variety of diverse experimental setups and physical substrates: nonetheless, a common fundamental requirement of these systems is the ability to drive the reservoir with an input signal. Therefore, our proposed optimisation technique is universally applicable in a simple way to all such reservoir computers, also the ones whose hyperparameters cannot be tuned due to real-world constraints.

Furthermore, even in the case where the hyperparameters are considerably off from acceptable values, the use of a delayed input can still bring the reservoir to satisfactory performance. Moreover, for a given reservoir size, this optimisation procedure gives better performance than the traditional hyperparameter tuning. Alternatively, as illustrated in Fig. 7, if the goal is to reach a specific performance, then this approach allows one to use a smaller reservoir, which will be beneficial in terms of computational or hardware resources (e.g. smaller footprint, or faster operation).

Some open questions which we leave open for future research concern the relationship between the timescale of a temporal task and the optimum amount of delay to employ; the use of multiple delays (in which case improved optimisation schemes, such as Bayesian optimisation, rather than the simple grid scan used here, would be useful); and on the experimental side the implementation of the delayed input with a physical delay line.

Introduction

Methods

Time-delay Reservoir Computing

Reservoir Computing with Delayed Input

Experimental Setup

Tasks

Mackey-Glass system

NARMA10

Spoken Digit Recognition

Speaker Recognition

Results

Effectiveness of optimisation with delayed input

Mackey-Glass system

NARMA10 task

Spoken digit recognition

Speaker recognition

Optimisation with delayed input vs standard hyperparameters optimisation

Discussion

Related Articles

Responses