Epidemic forecast follies

Introduction
Now that the most severe (we hope) manifestations of the Covid-19 epidemic have passed, one can’t help but realize that many of the early forecasts of the Covid-19 epidemic toll were wildly inaccurate and inconsistent with each other. Moreover, individual forecasts could change dramatically over a period of few days. For the USA, in particular, the earliest estimates for the Covid-19 epidemic death toll ranged from tens of thousands to many millions, with the current death toll (as of September 2023) reported to be 1.175 million out of a total of 108.5 million cases (all data taken from ref. 1). Perhaps even more striking are the huge fluctuations and the dramatically different time courses in the daily death rate in different countries.
To illustrate these statements, Fig. 1 plots the reported daily death rates for the six countries in the world with populations greater than 60 million and with the largest total death rates. They are: USA (3.507 deaths/1000), UK (3.339/1000), Brazil (3.275/1000), Italy (3.174/1000), Russia (2.743/1000), and France (2.556/1000). For reference, the country with the largest reported total death rate is Peru (6.582/1000), while the world average is (0.887/1000). For many reasons, the accuracy of the data may vary widely from country to country so that some of the numbers reported in ref. 1, such as the suspicious smoothness of the data for Russia, should be interpreted with caution.

The reported daily Covid death rates (7-day moving average) for (a) the USA, (b) UK, (c) Brazil, (d) Italy, (e) Russia, and (f) France. These data cover the period from Feb. 15, 2020 until July 29, 2023 and are all taken from ref. 1.
One of the many confounding features of Covid-19 is asymptomatic transmission, in which the epidemic may be unknowingly spread by individuals who did not know that they were contagious. Partly because of this feature, a wide variety of increasingly sophisticated multi-compartment models were developed that build on the classic SIR and SIS models of epidemic spread. These models typically attempted to faithfully account for subpopulations in various stages of the disease and recovery, as well as the transitions between these stages. Models of this type gave rise to complex dynamical behaviors that could sometimes mirror reality in a specific setting or over a limited time range. However, embellishments of SIR and SIS-type models still seem to be incomplete because of the difficulty in simultaneously accounting for both the disease dynamics and its interaction with social forces.
The discrepancy between the observed wildly varying features of Covid-19 and supposedly deterministic outcomes of SIR and SIS models is especially striking. In fact, the determinism of the SIR and SIS models is actually illusory. The SIR model, for example, is an inherently stochastic process2,3 that is characterized by the reproductive number R0. This quantity is defined as the average number of individuals to whom a single infected individual transmits the infection before this single individual recovers. In the supercritical regime, R0 > 1, it is possible that the outbreak may quickly die out. This happy event occurs with probability ({R}_{0}^{-1}) if one individual was initially infected. Otherwise, the infection quickly spreads, and the behavior becomes effectively deterministic because the distribution of the epidemic size becomes narrow. In this case, a finite fraction c = c(R0) individuals catch the disease, with c implicitly determined by the criterion (c+{e}^{-c{R}_{0}}=1)4.
Conversely, if R0 < 1, the outbreak quickly dies out, so while the subcritical SIR process is still manifestly stochastic, it is not a threat to the population at large. The interesting and the most strongly stochastic behavior emerges in critical SIR and SIS models5,6,7,8,9,10,11,12,13,14,15,16. For the SIR model, in particular, the distribution of the number of infected individuals has a power-law tail. For a finite population of size N, the critical SIR model does not lead to a pandemic, because the average number of individuals who contract the disease scales as N1/3.
We argue here that significant forecasting uncertainties are an integral feature of processes caused by the interplay between the dynamics of the disease transmission and the social forces that arise in response to the epidemic. Each attribute alone typically leads to either exponential growth (due to disease transmission at early times) or to exponential decay (due to effective mitigation strategies). Within our model, the competition between these two exponential processes leads to a dynamics that is extremely sensitive to seemingly minor details.
While a variety of models have been proposed to incorporate these competing effects and to understand how they give rise to significant uncertainties in the outcome of an epidemic17,18,19, here we present a different perspective to account for forecasting uncertainties. Our approach is based on mimicking the inherent stochasticity in the development of epidemics through a stochastic dynamics in the reproductive number R0. The basic mechanism in our modeling is that R0 can sometimes decrease, due to the imposition of public-health measures, such as social distancing, vaccinations, etc., and sometimes increase, because of the relaxation of these measures. Focusing only on the dynamics of the reproductive number serves as a useful proxy for the myriad of influences that control the true epidemic dynamics. The central variable in our model is the number of newly infected individuals in each incubation period. Within this framework, we will determine the duration of an epidemic, the time dependence of the number of infected individuals, and the total number of individuals infected when an epidemic finally ends. All three quantities exhibit huge fluctuations that are reminiscent of the actual data.
Results
Systematic mitigation
In this section we investigate what we term as the systematic mitigation strategy. Here, increasingly stringent controls are imposed as soon as an outbreak is detected, in which the reproductive number R0 exceeds 1, to reduce R0 to less than 1. The condition R0 = 1 defines the peak of the epidemic because the number of newly infected individuals reaches a maximum at this point. Once R0 becomes less than 1, progressively fewer individuals are infected after each incubation period and the epidemic begins to disappear. The number of individuals that become infected after R0 has been reduced to less than 1 decays exponentially with time and constitute a small contribution to the total number of infections.
Because society is a complicated, with many competing social forces in play, we posit that it is not possible to reduce R0 instantaneously, but rather, the reduction happens gradually. We therefore assume that after each successive incubation period R0 is decreased by a random number r whose average value 〈r〉 is less than 1. Let us define Rk as the reproductive number in the kth period. Then Rk is given by
where rk is the value of the random variable r in the kth period. The typical number of periods k until R0 reaches 1 is determined by R0 〈r〉k = 1. In what follows, we assume that when the epidemic is first detected, the reproductive number R0 = 2.5, and we take 〈r〉 = 0.95 for illustration. Using these values,
Thus the epidemic typically reaches its peak after 18 periods. However, because of the inherent randomness in the mitigation, with R0 sometimes decreasing by less than 0.95 and sometimes by more than 0.95 after each incubation period, the true epidemic dynamics can be very different, as illustrated in Fig. 2.

a The probability q(k) that the epidemic reaches its peak after k periods. b The probability p(I) that I people have been infected when the epidemic reaches its peak (under the assumption that the initial epidemic size is one person).
We simulate the systematic mitigation strategy by starting with a single infected individual and reproductive number R0 = 2.5. We then choose a set of random numbers r1, r2, r3, …, each of which are uniformly distributed between 0.9 and 1, so that 〈r〉 = 0.95. We first measure how long it takes until Rk is reduced to 1, which signals the epidemic peak. We perform this same measurement for 5 × 106 different choices of the set of random numbers r1, r2, …, rk. As shown in Fig. 2a, the probability q(k) that the epidemic reaches its peak in the kth period has a maximum at roughly k = 18 periods, in agreement with the above naive estimate. If one is lucky, that is, if most of the reduction factors ri are close to 0.9, the epidemic reaches its peak in as little as 11 periods. If one is unlucky (many of the ri close to 1), the epidemic can can continue to grow for more than 30 periods.
While the distribution of epidemic durations is fairly narrow, the total number I of people who were infected during the course of an epidemic can vary by several orders of magnitude. The number of people infected in the kth period, Ik is given by Ik = Rk−1Ik−1. Thus according to the dynamics of the reproductive number in (1), the total number of infected individuals is
Thus the average number of infected individuals is
This expression converges because the kth term quickly decreases with k for an arbitrary distribution of r with support on [0, 1). It is important to point out that the number of newly infected people at each incubation period is based on the assumption that this number is small compared to the total population size, so that the growth in the number of new infections is truly exponential. As shown in Fig. 2b, while the most probable epidemic size is ≈104 (again starting with a single infected individual), there is a non-vanishing probability that the outbreak size can be as small as a few hundred or greater than 107. This large disparity in outbreak sizes illustrates how small changes in the way that the epidemic is mitigated can lead to huge changes in the outbreak size.
More dramatically, suppose that the mitigation strategy is slightly less effective and that the reproductive number is reduced at each period by a uniform random variable that lies between [0.95, 1] rather than between [0.9, 1]. Now the peak of the epidemic can occur between 22 and 55 periods, with a most probable duration of 36 periods. However, the epidemic size when the peak of the epidemic is reached ranges between roughly 105 and 1012, with a most probable size of roughly 7 × 107. The upper value is much larger than the world population and the finiteness of the population would now provide the upper bound. Although the peak of this second epidemic occurs a factor 2 longer as the first one, it typically infects 7000 times more people! We emphasize that the stochastic nature of the random variables rj plays a decisive role. Very different behaviors emerge in the deterministic case20.
Vacillating mitigation
During the acute period of the pandemic in 2020–2021, there was considerable and even vitriolic debate about the efficacy of various mitigation strategies, or even about the utility of any mitigation. If the epidemic is severe, as quantified by the reproductive number Rk in the kth period being substantially greater than 1, people may be more likely to accept restrictions on their behaviors, such as isolating, masking, vaccinating, etc., to reduce their risk of getting sick. These adaptations will reduce the reproductive number. If, however, the reproductive number becomes less than 1, then people will want to relax their vigilance and may also advocate for the opening of various public venues, such as schools, theaters, stadiums, etc. We model this tug-of-war between increased and decreased restrictions by what we term as the vacillating mitigation strategy. This perspective of treating the competition between epidemiology and social behavior was previously treated in more sophisticated models21,22. We emphasize that our model merely a proxy for the two competing influences of epidemiology and social behavior.
The two competing steps of the vacillating strategy are the following:
-
Mitigation: if Rk > 1, decrease Rk by a factor r that is uniformly distributed in [a, 1], with a < 1.
-
Relaxation: if Rk < 1, change Rk by a factor s that is uniformly distributed in [a, 3 − 2a].
The first option is the same as in the systematic mitigation strategy. We construct the second option by requiring that (langle srangle =1+frac{1}{2}(1-a)) and (langle rrangle =1-frac{1}{2}(1-a)) are symmetrically located about 1. That is, the average decrease in Rk in a mitigation step equals the average increase in Rk in the relaxation step. This symmetrical construction seems appropriate to probe the long-term influence of vacillation on the dynamics. If the vacillation strategy was biased towards relaxation, R0 would remain greater than 1 and the entire planet would be infected. If this strategy was biased towards mitigation, the epidemic would be similar to that in systematic mitigation. Neither of these cases is interesting from the viewpoint of probing long-time behaviors.
In this vacillating strategy, Rk varies between values greater than 1 and values less than 1. This would lead to an eternal epidemic. To avoid this unrealistic outcome, the other important feature of the relaxation step is that the value of Rk could still decrease during a relaxation step because a < 1. This possibility ensures that eventually less than one person will be infected in the current incubation period. We now define this event as signaling the end of the epidemic.
Figure 3a–d shows a few representative trajectories of the number of people infected I(t) as a function of time (incubation periods) from the same initial condition of a single infected person and R0 = 2.5. While there are some qualitative differences between the trajectories of Fig. 1 and the model outcomes, the important points that are common to the real data and the simulation results are the disparities in the individual trajectories and the strongly fluctuating temporal behavior.

a–d Representative trajectories for the number of people I(t) infected at time t for the vacillating mitigation strategy when starting with R0 = 2.5 and a single infected person. The four realizations shown illustrate the highly unpredictable outcomes of individual epidemics.
For the vacillating strategy and for the choice a = 0.9, the most likely duration of the epidemic is roughly 400 periods (Fig. 4a), compared to 18 periods for the systematic strategy. The probability that the epidemic lasts much longer than the most likely value decays exponentially with time. An even more dramatic feature of the vacillating strategy is the number of people that are ultimately infected. The most probable outcome is that 3 × 105 people are infected when the epidemic ends (Fig. 4b). However, the size of the epidemic can range from 104 to 108. Compared to the systematic mitigation strategy with a reduction factor uniformly in the range [0.9, 1], the epidemic now lasts roughly 20 times longer and infects a factor 30 more individuals.

a The probability Q(k) that the epidemic lasts k periods. b The probability P(s) that the epidemic ultimately infects s people starting with R0 = 2.5 and a single infected person.
Discussion
This work should not be construed to mean that public-health measures should be ignored. Indeed, the extremely rapid development of a vaccine that is effective against Covid-19 is an outstanding triumph of modern medical science. It should also be pointed out that some of the many forecasting models for Covid-19 were useful during the early stages of the pandemic. However, when social influences with competing viewpoints began to dictate individual and collective policy decisions, much of the predictive power of forecasting models was lost.
We also emphasize that our simplistic model has little connection to the actual epidemiological and social processes that determine the spread of the epidemic and the changes in individual and collective behaviors in response to the epidemic. Nevertheless, our model seems to capture the tug of war between public-health mandates to control the spread of the disease and the social forces that often advocate for a more laissez-faire approach. Our main message is that there are huge uncertainties in predicting the time course of an epidemic, its ultimate duration, and the final outbreak size. This unpredictability seems to be intrinsic to the dynamics of epidemics where epidemiological influences occur in concert with social forces. In this setting, forecasting ambiguity is unavoidable.
Responses