Wednesday, April 01, 2020

Corona virus situation in India: A statistics perspective


The world is battling the corona virus pandemic for last several months and the situation does not seem to get in the control. As the days are passed, the toll of affected persons is rising and has crossed 850,000 mark worldwide. The contagion is now spreading so rapidly that there are around 75,000 new cases being recorded daily with a phenomenal growth rate of approximately 9% per day. Such a high growth rate implies a doubling period of about 8 days i.e. total number of cases worldwide is doubling in little over a week. Such a rapidly growing pandemic is sure to cause panic and fear in people and a great amount of uncertainty and anxiety about the future. We have seen the effects of this in tumbling of the share markets across the world and social media outbreaks. Many Governments have taken drastic measurements to contain the spread of the virus such as sealing of national and provincial borders and nationwide lock-downs.

India came on the radar of corona virus rather late at the end of the January. The first case was reported on 30th January in Kerala. However, cases did not grow substantially until the first week of March while the numbers remained below 10. In this period the affected people were the ones with recent foreign travel history, the stage 1 cases. Beyond this point in time the cases started to rise rapidly. India had probably entered the stage 2 around this time. Measures to contain this virus started on a low scale at many institutions, corporate houses etc. on individual basis in a unsystematic ways. Many firms from the service sector started allowing work from home to their employees, some enforcing it to a select employee groups. Among this came the total curfew of 22 March. Entire country observed the curfew and then expressed its gratitude towards the emergency situation workers by clapping, blowing conches and beating plates as drums. However, the situation kept worsening and a nationwide lock down was imposed barring any public gathering as well as commute, except for essentials. Even after these strict measures the total number of cases has well surpassed the thousand mark as of now.

After a week of the lock down people are now experiencing depressions, anxiety and mental agitations about the situation since they are unable to grasp the extent to which such measures will require to be in effect. Also there is an uncertainty of whether the life will return to the normalcy after the official lock down is removed, or the threat of contagion will still loom over. It is not really clear if this lock down will help in containing the spread of the disease. Not to ignore, such a drastic measure will have an impact on the economic situation of the country as well, the worst hit would be the people operating nearly entirely on cash economy, such as daily wage workers.
We here try to look at some simple answers based on the data available regarding the number of cases recorded in India, with an objective to assess the near term growth dynamics[1]. There are a myriad of estimates provided regarding the growth of pandemic that are based on inputs from virology, pandemic studies, population studies etc. We however wish to keep it simple. Without going into the causes of growth and containment, we simply look at it from the statistical perspective and attempt to provide some estimates by employing the techniques of time series analysis.

Growth Rate

The first metric we look at is the growth rate. The growth rate is the percentage increase in the number cases on a daily basis. The growth rate thus defined is an important statistical parameter since it can succinctly capture the exponential growth pattern of the pandemic. This number is meaningful only after crossing a threshold for the number cases. We see in the figure below that the daily growth rate has varied between 3.3% to 35.2% covering a wide range. Neither do we see any clear trend in this plot. Slightly lower numbers are seen after the lock-down of 23rd March, but it is not really significant to deduce a negative trend on the basis of this. Also we witnessed a high growth recorded on 30th March where the number of affected people rose by 22%, adding 227 new cases in a day which the highest increase observed in a single day.

Time series models

Though it appears that the daily growth is random from the figure below, detailed scrutiny shows that it is not the case. In fact, the growth rate at any given date depends on the growth rates observed for the last few days. This phenomenon is called autocorrelation in time series models where a significant correlation is observed between a variable and its own past (lagged) values. Many time series processes can give rise to autocorrelation, the one particularly relevant here is the moving average (MA) process. It can be shown that the growth rate in India is following MA-2 process which involves the information from last two time steps. Similar model is also applicable for other countries, though the number of lags required are different in each case. The model that can be used for the growth rates observed in India can be specified as Gt = 0.157 + 0.016 εt-1 + 0.41 εt-2 + εt. We can see from the coefficients that the growth rate depends strongly upon the observed growth rate on day before yesterday. Also the first constant term is the average growth rate level, which is 15.7%. This implies a doubling period of 5 days. We also tested the GARCH model on εt, however it did not produce significant results.

The MA model identified for the growth rate can now be used to simulate the future growth rates and in turn expected cases in future. Also we can possibly comment on when we can expect a downturn in number of cases, eventually extinguishing the pandemic. A simple investigation of the above formula tells us that it will not be the case within this model. This is because the average of an MA process is positive equal to the initial value, 0.157 in this case. Thus the number of cases will continue to grow at 15.7%, if there is no dampening effect. We can see this in the plot above where we see 20 such simulated paths, all growing exponentially. If the current evolution process of the growth rate continues along we are looking at nearly 1,00,000 cases by the end of the month. Actual number could lie somewhere between 54,000 to 1,56,000. Although it appears a little unrealistic, it is worthwhile considering where we will be heading if no measures are taken to curtail the pandemic. 
More realistic model will have a dampening term. Since the objective here is not to assess the causation, but to only have statistical estimates based on simple time series model, we achieve the dampening by adding a simple linear trend. Thus the growth rate with dampening trend will be given by, Gt = (0.157 + ω t) + 0.016 εt-1 + 0.41 εt-2 + εt. The dampening rate, ω  here will be kept as a free parameter and we will assess the results as a function of ω. One immediate effect of the dampening is that the growth rate will slowly turn negative (though the MA process will keep pulling it to positive side from time to time). A reversal in the sign of growth rate would indicate a peak in the number of cases beyond which they will start declining. We see this effect clearly in the figure below where the average number of cases turn around after reaching the peak.

As the dampening rate is increased from 0.5% per day to 2.5% per day, the peak is hit earlier in time and its magnitude is also lower. We investigate this further by looking at the peak reaching time, peak number of cases and the expected range of cases at the end of one month period.

Dampening Rate
Peak reaching time
Peak number of cases
Terminal cases
-0.5%
23 – 30
7,420 – 20,930
6,320 – 19,680
-0.1%
11 – 19
2,800 – 5,960
570 – 1,930
-0.15%
6 – 13
2,060 – 3,770
40 – 150
-0.2%
4 – 10
1,750 – 2,930
2 - 8

The table above shows that to achieve significant containment of the pandemic, we must have the dampening rate around -0.15% consistently over a prolonged period of at least two weeks. Lower rates of dampening could also limit the number of cases significantly, but the peak reaching times could be delayed to nearly a month in that case. It must also be remembered that the model uses linear dampening rate. In reality the dampening could vary from day to day and we need to see at the effective dampening rate. If we look at the change in growth rate for last fifteen days we see that it is not always negative. In fact, there are equal number of positive turns as there are on negative side. The average trend appears positive due to a large uptrend shock received on 30th if we look at only last few days. However, averaging over 8 – 10 days gives a weak negative trend of around -0.5%.


Summary

Simple time series models imply that the number of corona virus cases can hit 1,00,000 mark in next month if not contained properly. We expect that the current measures taken such as the nationwide lock-down would help the matter assuage. Unfortunately, the data from the first week of lock down is not really suggesting that to be the case. Given that the incubation period of the virus is around 14 days, we may hope for some downturn in the second week, i.e. by 7th of April. However, as we saw before, even if we contain the total number cases, the peak may come very late if the dampening is not strong enough. Hence precautions must be taken even after the lock-down is removed and we should refrain from public gatherings as much as possible. What we certainly don’t want is a fresh outbreak after 14th  April which shall badly affect not just the medical and economic situation but also the public moral that is kept great during this lock-down. We must remain vigilant until the last battle is won.





[1] The data used in various statistics for this article is from the GitHub repository of John Hopkins University

1 comment:

VandanaK said...

Very informative and knowledge is the need of the hour.