Main
Main
a r t i c l e i n f o a b s t r a c t
Article history: COVID-19 is caused by a novel coronavirus and has played havoc on many countries across the globe. A
Received 5 May 2020 majority of the world population is now living in a restricted environment for more than a month with
Revised 8 June 2020
minimal economic activities, to prevent exposure to this highly infectious disease. Medical professionals
Accepted 15 June 2020
are going through a stressful period while trying to save the larger population. In this paper, we develop
Available online 16 June 2020
two different models to capture the trend of a number of cases and also predict the cases in the days to
Keywords: come, so that appropriate preparations can be made to fight this disease. The first one is a mathematical
COVID-19 model accounting for various parameters relating to the spread of the virus, while the second one is a
Discrete cosine transform (DCT) non-parametric model based on the Fourier decomposition method (FDM), fitted on the available data.
Fourier decomposition method (FDM) The study is performed for various countries, but detailed results are provided for the India, Italy, and
Gaussian mixture model (GMM) United States of America (USA). The turnaround dates for the trend of infected cases are estimated. The
Mathematical model
end-dates are also predicted and are found to agree well with a very popular study based on the classic
Susceptible-infected-recovered (SIR) model
susceptible-infected-recovered (SIR) model. Worldwide, the total number of expected cases and deaths
are 12.7 × 106 and 5.27 × 105 , respectively, predicted with data as of 06-06-2020 and 95% confidence in-
tervals. The proposed study produces promising results with the potential to serve as a good complement
to existing methods for continuous predictive monitoring of the COVID-19 pandemic.
© 2020 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.chaos.2020.110023
0960-0779/© 2020 Elsevier Ltd. All rights reserved.
2 A. Singhal, P. Singh and B. Lall et al. / Chaos, Solitons and Fractals 138 (2020) 110023
cussed using a case study in Li et al. [8]. A research conducted by offered to the infected individual. The patient recovers after the
Singapore University of Technology and Design (SUTD) [9] is us- virus has degraded substantially. Total number of active cases on
ing a data-driven SIR model characterized by regular updating of nth day are obtained as
parameters, to predict the end of this pandemic in different parts
n−1
of the world. The author in Batista [10] has implemented the SIR
Yn = Xn + Xn−i (1 − γ )i pi , (3)
model for the estimation of the final size and other parameters of
i=1
the COVID-19 epidemic across the globe.
This research area is still nascent, and hence it is difficult to rely where the multiplicative factors (1 − γ ) and pi account for the
on any single model for prediction. In this work, we design two number of deaths and recovery of the infected people, respectively.
contrasting models for capturing the daily variations in the num- The value of Nc depends on the precautions being practiced by the
ber of cases. Herein, the first model is in the form of mathemat- people, such as social distancing, wearing of masks, washing hands
ical series with different parameters to account for various phys- on a regular basis, and staying in a quarantine environment after
ical phenomena dictating the count of people getting infected by any suspected exposure to the virus. Government measures includ-
the virus. The model estimates the parameter values for three dif- ing the closing of shops, schools, offices, markets, restaurants, and
ferent countries, India, Italy, and United States of America (USA), travel restrictions or imposition of a complete lock-down also help
and thereafter, the prediction is performed for the next 30 days to in reducing Nc and thus contain the spread of this highly conta-
forecast the turnaround (peak active cases) day. On the other hand, gious disease. Further, as the value of α increases, more and more
the second model extracts the trend and variability from the avail- infected people are quarantined and hence cannot infect others,
able data using the Fourier decomposition method (FDM) based on thereby reducing the number of new cases Xn . The total number of
the discrete cosine transform (DCT). The DCT works as an optimal infected cases depends on all the parameters, as discussed above,
method for many applications such as image de-noising, Fractal- with Nc and α being the most significant of these. On the basis
based least mean squares (LMS) algorithm, image compression, and of the most recent values of these parameters, as observed from
first-order Gauss-Markov random signals [11]. Prediction is per- the data available, the model can be used to predict the number
formed using the Gaussian mixture model curve-fitting approach of cases in the near future.
to predict the total number of cases and the end-dates (occurrence
of 99% of the total expected cases) for the disease in various parts 2.2. The Fourier decomposition method
of the world.
The rest of the paper is organized as follows: Section 2 dis- The Fourier representation is a widely-used tool for the mod-
cusses the two models proposed in this work, defines various pa- eling and analysis of various physical phenomena. It decomposes
rameters associated with these models, and lays out the strategies a time-series in terms of sine and cosine basis functions. Here,
for predicting the cases in the next few days. Results are presented the main concept is to decompose the COVID-19 time-series into
in Section 3 for the three countries considered in this work with a set of desired frequency bands using the Fourier decomposition
an end-date prediction for some other countries as well. Finally, method (FDM), and obtain various trends (low-pass components
the paper is concluded in Section 4. capturing the average behavior) and variabilities (high-pass com-
ponents denoting the variations from the trend). These trends are
2. Proposed methodology
then fitted with a mixture of Gaussian functions to predict the
size of the pandemic. The FDM is an adaptive time-series and data
2.1. Mathematical model
analysis approach based on the zero-phase filtering [13]. It de-
composes a time-series into a constant and a set of band-limited
In this model, we signify the role of various parameters on the
components termed as Fourier intrinsic band functions (FIBFs). The
total number of active cases Yn nth day after the disease started
FIBFs are zero-mean, adaptive, and energy preserving functions.
spreading. The average number of people who came in contact
The FDM can be practically implemented using (a) Fourier rep-
with an infected person on a daily basis are denoted by Nc . Pa-
resentations such as discrete Fourier transform, discrete sine trans-
rameters α and γ represent the daily rate of testing and the daily
form, and discrete cosine transform (DCT); (b) Finite impulse re-
death rate, respectively, i.e., α is the ratio of people getting tested
sponse and infinite impulse response based zero-phase filtering.
and quarantined out of the total number of unidentified active
In this study, we have used the DCT based implementation of the
cases on any given day, while γ is the ratio of people dying in a
FDM. Let c[n] be a time-series of a length N. The DCT type-2 of
day out of the total number of active cases on that day. The num-
c[n] is defined as [14]
ber of new confirmed cases Xn reported on nth day, are computed
as
2 π k ( 2n + 1 )
N−1
C [k] = σ c[n] cos , (4)
Xn = Xn−1 (1 − α )(1 − γ ) p1 + Xn−2 (1 − α ) (1 − γ ) p2
2 2
N k 2N
n=0
+ Xn−3 (1 − α )3 (1 − γ )3 p3
where 0 ≤ k ≤ N − 1, σk = 1 for k = 0 and σk = √1 for k = 0. The
+ . . . + X1 (1 − α )n−1 (1 − γ )n−1 pn−1 Nc , (1) 2
original time-series c[n] is recovered using the inverse DCT (IDCT)
where pi denotes the probability of an infected person causing in- as
fection to another person i days after he/she got infected. The virus
2 π k ( 2n + 1 )
N−1
is said to have an average life of 14 − 15 days inside a human, and
in the first few days, it multiplies in numbers before its degrada-
c [n ] = σkC [k] cos . (5)
N 2N
k=0
tion starts. Hence, we assume that for d days after catching the
virus, pi remains unity and decays exponentially thereafter [12],
The DCT basis functions cos π k(22Nn+1 ) are a class of discrete poly-
i.e.,
nomials [14] which form an orthogonal set. The time-series c[n]
1 1≤i≤d
pi = (2) can be written as superposition of M FIBFs
exp[−λ(i − d )] i > d,
2 π k ( 2n + 1 )
N−1 M
where the rate of decay λ = 1/7, and d is assumed to vary between c[n] = σkC[k] cos = c0 + ci [n ], (6)
6 − 10 days, depending on the immunity levels or the treatment N 2N
k=0 i=1
A. Singhal, P. Singh and B. Lall et al. / Chaos, Solitons and Fractals 138 (2020) 110023 3
Fig. 1. Mathematical model fitted to the number of active cases for India (top), Italy (middle) and USA (bottom).
Fig. 2. Plots of confirmed cases (or new cases) per day, various trends and variabilities. Trend and variability estimations in six time-scales from the COVID-19 data using
the FDM with six frequency bands (FBs): (i) Trend ≥ 7 days with FB [0, 1/7], variability with FB (1/7, 0.5], (ii) Trend ≥ 14 days with FB [0, 1/14], variability with FB (1/14,
0.5], (iii) Trend ≥ 21 days with FB (0, 1/21], variability with FB (1/21, 0.5], (iv) Trend ≥ 28 days with FB [0, 1/28], variability with FB (1/28, 0.5], (v) Trend ≥ 35 days with
FB [0, 1/35], variability with FB (1/35, 0.5], (vi) Trend ≥ 42 days with FB [0, 1/42], and variability with FB (1/42, 0.5].
Fig. 3. Trend estimation from the COVID-19 data using the FDM; estimated trends are fitted with the Gaussian mixture model for the prediction of number of cases per day
for World (top) and USA (bottom).
estimated, it is fitted using the Gaussian mixture model (GMM) de- sure how well g[n] fits the estimated trend τ [n], mean absolute
fined as error is obtained as
L
n − μi 2
1
N−1
g[n] = ai exp − , (8) MAE = |e[n]|. (9)
i=1
σi N
n=0
where parameters ai , μi and σ i represent the amplitude, location Finally, predictions are obtained by extrapolating the GMM (8) for
and width, respectively, and L is the number of peaks to fit. All the time Q > N. Total number of cases is obtained by computing the
parameters are computed using the MATLAB tool with 95% confi- area under the curve, i.e., summation of g[n] over the time range
dence bounds by minimizing the error e[n] = τ [n] − g[n]. To mea- n ∈ [0, Q − 1].
A. Singhal, P. Singh and B. Lall et al. / Chaos, Solitons and Fractals 138 (2020) 110023 5
Fig. 4. Trend estimation from the COVID-19 data using the FDM; estimated trends are fitted with the Gaussian mixture model for the prediction of number of cases per day
for Italy (top) and India (bottom).
3. Results and discussion India: The first case in India appeared on 30-01-2020, but
the number of cases started increasing rapidly after 01-03-2020.
3.1. Mathematical model Hence, the proposed model is applied considering 01-03-2020 as
day 1. The initial values for γ , Nc and α are empirically estimated
In this work, the data for the number of active cases (COVID- as 0.0031, 1.59, and 0.5, respectively and d = 7. γ is estimated from
19) for India, Italy, and USA is taken from [17], last updated the data regarding daily deaths, and the average value is then com-
on 06-06-2020. The average value of γ is computed individu- puted for the time period in consideration. The values for Nc and
ally for each country. The values of Nc and α are updated as per α are estimated in order to minimize the mean square error (MSE)
change in the pattern of data owing to various precautions ob- between simulated Yi and actual value for the number of active
served by the government and the residents of the nation, or oc- cases Yi∗ , i.e.,
currence of some sporadic events leading to a sudden spike in
the spread of infections. The results for the three countries are as
1 Yi − Yi∗
2
n
follows: (Nc , α ) = arg min , (10)
Nc ,α n Yi∗
i=1
6 A. Singhal, P. Singh and B. Lall et al. / Chaos, Solitons and Fractals 138 (2020) 110023
Table 1
Parameters of the GMM (8) for confirmed cases (or new cases) per day with 95% confidence intervals (CI) for World,
USA, Italy and India.
L 2 2 2 2
a1 1.171e+05 2.207e+04 3310 468.9
CI (1.124e+05, 1.217e+05) (1.733e+04, 2.682e+04) (2914, 3706) (128.4, 809.4)
μ1 153.8 82.45 55.51 126.9
CI (148.7, 158.8) (81.7, 83.2) (55.13, 55.89) (125.8, 128)
σ1 52.41 16.28 13.33 2.796
CI (45.48, 59.34) (14.18, 18.38) (12.32, 14.35) (0.476, 5.115)
a2 5.44e+04 2.45e+04 3200 1.537e+04
CI (4.863e+04, 6.018e+04) (2.366e+04, 2.534e+04) (3008, 3393) (1.124e+04, 1.95e+04)
μ2 89.99 117.1 75.75 165.8
CI (89.38, 90.6) (113.6, 120.7) (73.57, 77.93) (153.6, 178)
σ2 18.96 35.09 27.3 52.63
CI (17.4, 20.52) (28.92, 41.26) (25.84, 28.77) (47.52, 57.74)
Table 2
Parameters of the GMM (8) for deaths per day with 95% confidence intervals for World,
USA, Italy and India.
L 2 2 2 2
a1 4766 1603 495.8 129
CI (3927, 5606) (1479, 1728) (442.3, 549.4) (103.1, 154.8)
μ1 96.39 88.16 60.35 130.9
CI (95.48, 97.3) (87.77, 88.54) (59.87, 60.84) (129.2, 132.7)
σ1 25.39 11.49 12.91 11.54
CI (24.09, 26.69) (10.66, 12.32) (11.79, 14.03) (9.61, 13.48)
a2 4126 1571 433.4 129.8
CI (4026, 4226) (1515, 1627) (404.2, 462.5) (119.9, 139.7)
μ2 143.4 110.9 80.56 116.7
CI (140.7, 146) (109.4, 112.5) (78.19, 82.92) (113.1, 120.2)
σ2 42.75 29.93 29.41 32.82
CI (32.92, 52.58) (28.39, 31.48) (27.76, 31.06) (30.52, 35.11)
where the subscript i denotes the ith day, and n is the number ing reported on 20-01-2020. The initial values are estimated as
of days considered. The initial values are carried until the MSE 0.0041, 1.75, and 0.5 for γ , Nc and α , respectively and d = 8. In
crosses a threshold e0 , and updated parameters are obtained to our model, we consider 22-02-2020 as day one as the number of
minimize the MSE again. In this work, we consider e0 as 0.02. The cases started increasing at a faster pace post this day. It is observed
number of active cases is depicted in Fig. 1(top) as a function of from Fig. 1(bottom) that after crossing 1,30 0,0 0 0 active cases as of
number of days. The lock-down was imposed on 22-03-2020, and today, the turnaround may occur in 28 days from now, i.e., 05-07-
thereafter the slope of the plot has started reducing barring some 2020. No lock-down was imposed in the country; however, suitable
sporadic occurrences on a few occasions. As per current statistics, restraining orders were observed by the various states, leading to
the approximate values for Nc and α are 0.94 and 0.48, respec- a gradual decline in the slope of the curve for the active cases.
tively. The model is used to predict the cases for the next 30 days,
and it is observed that the plot indicates a turnaround (peak active
3.2. FDM-based model
cases) after 30 days from now, i.e., 07-07-2020. The less number
of deaths in India than other countries is a result of early action of
This model derives the trends and variabilities of COVID-19 data
government, and probably a higher immunity of people than de-
[18] for daily confirmed cases, using the FDM, as shown in Fig. 2.
veloped nations.
Since the new confirmed cases are reported on a daily basis, there-
Italy: The first case was identified on 29-01-2020, and the pro-
fore, sampling of the COVID-19 data is per day. Considering the
gression was not that rapid in the early days. However, the dis-
normalized sampling frequency of data as Fs = 1, the maximum
ease started spreading fast after 19-02-2020, which we consider
frequency component present in the data is fmax = 0.5, as per the
as day 1. The parameter values for Italy are initialized as 0.0072,
Nyquist sampling criteria. For example, the low-pass signal with
2.49, and 0.5 for γ , Nc and α , respectively with d = 8. Fig. 1(mid- 1 1
cutoff frequency 14 is present in band [0, 14 ], which corresponds
dle) shows the active cases in the country. The lock-down orders
to 14 days or longer time-scale trend, and the remaining high-pass
were passed by the government on 09-03-2020, but the number of
signal component in frequency band ( 14 1
, 0.5] represents the cor-
deaths has been more, owing to a lack of preparedness and lower
responding variability. A single time-scale may not suffice in cap-
immunity levels of the people. Moreover, after a sharp increase in
turing the trend for all the countries. Moreover, it is evident from
the early days, the active cases have started declining since 21-04-
Fig. 2, that a trend with a time-scale of 35 days or more may not
2020 (turnaround date) as the medical staff and the government
capture local maxima of smaller magnitude as it represents a long-
put up a consolidated fight with people adhering to the advisories
term trend, while a shorter time-scale of 7 (or 14) days is more
circulated by global health organizations.
capable of capturing the local variations. However, one may argue
USA: It is a very big country with a population spread across
whether the local variations should be captured in the trend or
large areas. In sparsely populated areas, it is thus easier to obey
simply be referred to as variability. Also, the predictions for the
social distancing. Most of the cases have been reported from the
future depend on the choice of time-scale, and it is difficult to as-
densely populated areas of the country, with the first case be-
certain a single time-scale, given the uncertain nature of the fu-
A. Singhal, P. Singh and B. Lall et al. / Chaos, Solitons and Fractals 138 (2020) 110023 7
Table 3
Prediction of the total expected cases and end-date (date to reach 99% of the total expected cases), SIR prediction [10] with data as of
06-06-2020, and proposed prediction with data as of 06-06-2020 [18] with 95% confidence intervals.
S. No. Country Name Total cases as of 06-06-20 Total expected cases (Proposed) End-date (Proposed) End-date (SIR)
ture trend. A time-scale of 14 days may turn out to be accurate for the next 30 days. The measures taken by the authorities to con-
for one country but rather inaccurate for another. Further, a given tain the infections are analyzed for three different countries, i.e.,
time-scale may be suited for current data and become unfit for fu- India, Italy, and USA. The second method develops a data-driven
ture data. Therefore, trends are estimated on various time-scales model to segregate the trend and variability from the data for daily
(14 days or longer time-scale trend to 35 days or longer time-scale cases of infection. The Gaussian mixture model is developed to ob-
trend). They are extrapolated using GMM to obtain a forecast for tain suitable predictions for the trend, which are used to ascertain
the future. Fig. 3 depicts these trends and the future predictions the peak value and the corresponding date for the fresh cases re-
for the world and USA, while the plots for Italy and India are pre- ported in a single day. Further, the total number of cases, as well as
sented in Fig. 4. Considering multiple trends and corresponding the end-dates for this pandemic spread across various parts of the
predictions, averaging operation, excluding outliers, if any, is per- world, are estimated with 95% confidence intervals and are com-
formed to obtain the final predictions. Total expected cases are ob- pared with a similar study performed earlier. This study is per-
tained as a cumulative sum of the cases reported daily. formed for academic and research purposes only, and the predic-
All the predictions are performed with 95% confidence intervals tions for the future are based on the assumption that the current
(CI). The parameter values for bi-modal GMM (L = 2) and their CI restrictive conditions would continue.
estimated from the data are listed in Table 1 for the world, USA,
Italy, and India. The parameters a1 and a2 indicate the peak val- Declaration of Competing Interest
ues, while μ1 and μ2 mark the time of the peaks, with σ 1 and
σ 2 referring to flatness (or sharpness) of these Gaussian curves. We declare that we have no conflict of inertest.
For example, the peak number of daily cases for Italy occur on
25-03-2020 (55th day after the outbreak on 29-01-2020). Simi- CRediT authorship contribution statement
lar dates for world, USA, and India are estimated as 25-06-2020,
26-04-2020, and 05-07-2020, respectively. Further, the trends esti- Amit Singhal: Conceptualization, Methodology, Software, Val-
mated from data for daily deaths are also fitted using GMM, and idation, Writing - original draft, Writing - review & editing.
corresponding predictions are obtained. Table 2 shows the GMM Pushpendra Singh: Conceptualization, Methodology, Software, Val-
parameters and their CI estimated from this data. It is observed idation, Writing - original draft, Writing - review & editing. Brejesh
from this Table that the peak number of daily deaths for USA Lall: Supervision, Visualization, Validation, Writing - review & edit-
occurs on 04-05-2020. In order to measure the accuracy of pro- ing. Shiv Dutt Joshi: Supervision, Visualization, Validation, Writing
posed GMM model, we obtain the mean absolute error (MAE): - review & editing.
(i) for daily new cases: World (MAE: 1842.5), USA (MAE: 731.00),
Italy (MAE: 102.38), and India (MAE: 53.53), and (ii) for daily new
Acknowledgment
deaths: World (MAE: 135.25), USA (MAE: 41.26), Italy (MAE: 15.51)
and India (MAE: 2.13).
We would like to thank the editors and reviewers of this
The end-date is defined as the date to reach 99% of the total
manuscript, who took out some precious time during these diffi-
expected cases. These dates are estimated from the predicted val-
cult times of COVID-19 pandemic, and provided valuable sugges-
ues for various countries and shown in Table 3 along-with the to-
tions to improve the overall quality of the paper.
tal number of cases currently and total cases expected till the end-
date. Similar results obtained by SIR [9] are also indicated for com- References
parison. The data source considered by SIR is different from the
one considered in this work, and thus the values differ at some [1] Coronavirus disease 2019 (COVID-19) situation report–73. World Health Orga-
instants. The results reported in this work are more accurate in nization 2020.
[2] Zhang X, Ma R, Wang L. Predicting turning point, duration and attack rate
comparison to the earlier works [2,5]. of COVID-19 outbreaks in major western countries. Chaos Solitons Fractals
2020;135:109829.
4. Conclusion [3] Ghosal S, Sengupta S, Majumder M, Sinha B. Prediction of the number of
deaths in India due to SARS-CoV-2 at 5–6 weeks. Diabetes Metab Syndr
2020;14:311–15.
In this paper, we have proposed two distinct methods for mod-
[4] Tomar A, Gupta N. Prediction for the spread of COVID-19 in India and effec-
eling the number of people getting infected with the novel coro- tiveness of preventive measures. Sci Total Environ 2020;728:138762.
navirus (COVID-19). Firstly, a mathematical model captures vari- [5] Fanelli D, Piazza F. Analysis and forecast of COVID-19 spreading in China, Italy
ous factors critical in determining the spread of the virus, and and France. Chaos Solitons Fractals 2020;134:109761.
[6] Chintalapudi N, Battineni G, Amenta F. COVID-19 disease outbreak forecast-
appropriate values are estimated using the available data. The ing of registered and recovered cases after sixty day lockdown in Italy: a data
turnaround day for active cases is forecasted by predicting values driven model approach. J Microbiol Immunol Infect 2020;53(3):396–403.
8 A. Singhal, P. Singh and B. Lall et al. / Chaos, Solitons and Fractals 138 (2020) 110023
[7] Zhong L, Mu L, Li J, Wang J, Yin Z, Liu D. Early prediction of the 2019 novel [13] Singh P, Joshi SD, Patney RK, Saha K. The Fourier decomposition method for
coronavirus outbreak in the mainland China based on simple mathematical nonlinear and non-stationary time series analysis. Proc R Soc Lond A: Mathe-
model. IEEE Access 2020;8:51761–9. matical, Physical and Engineering Sciences 2017;473:1–27. 20160871
[8] Li L, Zhang Q, Wang X, Zhang J, Wang T, Gao T-L, et al. Characterizing the prop- [14] Ahmed N, Natarajan T, Rao KR. Discrete cosine transform. IEEE Trans Comput
agation of situational information in social media during COVID-19 epidemic: 1974:90–3.
a case study on weibo. IEEE Trans Comput Soc Syst 2020;7(2):556–62. [15] Britanak V, Yip PC, Rao KR. Discrete cosine and sine transforms: general prop-
[9] When will COVID-19 end? Data-driven prediction. https://ddi.sutd.edu.sg/ erties. Fast Algorithms Integer Approx. 1 edition. Academic Press; 2006. IS-
when- will- covid- 19- end/; Accessed: 04-05-2020. BN-10: 0123736242, ISBN-13: 978-0123736246.
[10] Batista M.. Estimation of the final size of the COVID-19 epidemic. medRxiv [16] Singh P. Novel Fourier quadrature transforms and analytic signal representa-
preprint2020;01–11URL https://doi.org/10.1101/2020.02.16.20023606. tions for nonlinear and non-stationary time series analysis. Royal Society Open
[11] Gupta A, Joshi SD, Singh P. On the approximate discrete KLT of fractional Brow- Science 2018;5:1–26. 181131
nian motion and applications. J Frankl Inst 2018;355:89899016. [17] Novel Coronavirus (COVID-19) Cases Data. https://data.humdata.org/dataset/
[12] Singhal A, Mallik RK, Lall B. Performance analysis of amplitude modulation novel-coronavirus-2019-ncov-cases; Accessed: 06-06-2020.
schemes for diffusion-based molecular communication. IEEE Trans Wireless [18] WHO COVID-19 Dashboard. https://covid19.who.int/; Accessed: 06-06-2020.
Commun 2015;14(10):5681–91.