Thursday, April 16, 2020

Corona Virus: New Data

Since I wrote last time on COVID-19 a lot of new developments have happened towards the spread dynamics of the disease as well as measures taken by the government to prevent the contagion. Therefore I thought it would be best to revert on this in a couple of weeks time and take a status of the situation, again from numbers, stats and charts perspective. So here I am with some new numbers and plots.

Since my last blog, there was huge surge in the number of cases as well as growth rate. In the plot below we see a large surge on 1st of April with growth rate exceeding 40%, followed by around 25% growth on 2nd. 3rd April showed a very low value which could just be some statistical glitch, where the cases were probably reported next day. This trend continued for few more days, hitting 30% mark again on 6th. It was a large setback to the support systems fighting against COVID-19 but more to the morale of people who had survived two weeks of lock down and were looking at the last week in suspicion. It was argued by many that this surge was related to the Markaz incident at Delhi where a large congregation of Tabligh Jamat happened, flaunting the social distancing norms. The congregation had happened some two weeks prior, just before the lock down and it was surmised that the cases were visible after the known 7-14 days of incubation time for COVID-19. I would say that although the argument seems plausible due to the concurrence of the events, we cannot say firmly if there was any causal link.



Near the end of first lock down we saw that many chief ministers of various states imposed a further lock down in respective states. That was a signal for the stricter measures and now India has entered a second lock down until 3rd of May. This nearly one and half month of total lock down has already affected negatively the economy and there are reports that growth rates for India will be severely low this year. It is also expected to lead to large number of job cuts. However, the situation may not be as worse since the calamity is worldwide and we will just be as bad as the rest of the world (probably a little better).

So the key question is, do we see any improvements in the situation due to this two-phase lock down? This time I intend to bring simpler answers, not through some complex time series models, but via a simple chart. We saw above the daily growth rate. But to make a better sense of the numbers, we can simply take a moving average of this. Those who are familiar with technical analysis of stocks would quickly recall that this is one of the simplest indicators of the trend. One key difference must be noted here. Since growth rate is an exponential beast, simple averaging does not work here. What is more appropriate is what is called a 'Geometric Mean'. However, once we do that the rest of the logic just follows. So what do we see from these moving average charts?


We see that the longer period moving average looks smoother. This is expected since moving average is a smoothing filter. Then we see repeated cross overs between 5-day and 10-day moving averages. We can see a negative crossover around 27 March that was probably the effect of first lock down. We clearly see that there is a positive crossover around 1st of April, indicating there was significant rise in the growth rate and hence the number of cases around that. And then we see a very assuring negative crossover again around 7 April. Keen and cautious observers would also see switching and repeated cross overs every 6-7 days. But we have avoided such a positive crossover around 13 or 14 of April. In fact the 5-day moving average is now firmly below the 10-day one.

So do we take that as a good sign? Yes, but only a sign. We can see that the daily growth rate is still lingering around 10% mark. That has to go substantially below 10% to say that we are in any kind of recovery mode. And even after that, social distancing is probably going to stay till September. So we could truly meet all our relatives directly in Diwali. Till then, stay home stay safe.

Wednesday, April 01, 2020

Corona virus situation in India: A statistics perspective


The world is battling the corona virus pandemic for last several months and the situation does not seem to get in the control. As the days are passed, the toll of affected persons is rising and has crossed 850,000 mark worldwide. The contagion is now spreading so rapidly that there are around 75,000 new cases being recorded daily with a phenomenal growth rate of approximately 9% per day. Such a high growth rate implies a doubling period of about 8 days i.e. total number of cases worldwide is doubling in little over a week. Such a rapidly growing pandemic is sure to cause panic and fear in people and a great amount of uncertainty and anxiety about the future. We have seen the effects of this in tumbling of the share markets across the world and social media outbreaks. Many Governments have taken drastic measurements to contain the spread of the virus such as sealing of national and provincial borders and nationwide lock-downs.

India came on the radar of corona virus rather late at the end of the January. The first case was reported on 30th January in Kerala. However, cases did not grow substantially until the first week of March while the numbers remained below 10. In this period the affected people were the ones with recent foreign travel history, the stage 1 cases. Beyond this point in time the cases started to rise rapidly. India had probably entered the stage 2 around this time. Measures to contain this virus started on a low scale at many institutions, corporate houses etc. on individual basis in a unsystematic ways. Many firms from the service sector started allowing work from home to their employees, some enforcing it to a select employee groups. Among this came the total curfew of 22 March. Entire country observed the curfew and then expressed its gratitude towards the emergency situation workers by clapping, blowing conches and beating plates as drums. However, the situation kept worsening and a nationwide lock down was imposed barring any public gathering as well as commute, except for essentials. Even after these strict measures the total number of cases has well surpassed the thousand mark as of now.

After a week of the lock down people are now experiencing depressions, anxiety and mental agitations about the situation since they are unable to grasp the extent to which such measures will require to be in effect. Also there is an uncertainty of whether the life will return to the normalcy after the official lock down is removed, or the threat of contagion will still loom over. It is not really clear if this lock down will help in containing the spread of the disease. Not to ignore, such a drastic measure will have an impact on the economic situation of the country as well, the worst hit would be the people operating nearly entirely on cash economy, such as daily wage workers.
We here try to look at some simple answers based on the data available regarding the number of cases recorded in India, with an objective to assess the near term growth dynamics[1]. There are a myriad of estimates provided regarding the growth of pandemic that are based on inputs from virology, pandemic studies, population studies etc. We however wish to keep it simple. Without going into the causes of growth and containment, we simply look at it from the statistical perspective and attempt to provide some estimates by employing the techniques of time series analysis.

Growth Rate

The first metric we look at is the growth rate. The growth rate is the percentage increase in the number cases on a daily basis. The growth rate thus defined is an important statistical parameter since it can succinctly capture the exponential growth pattern of the pandemic. This number is meaningful only after crossing a threshold for the number cases. We see in the figure below that the daily growth rate has varied between 3.3% to 35.2% covering a wide range. Neither do we see any clear trend in this plot. Slightly lower numbers are seen after the lock-down of 23rd March, but it is not really significant to deduce a negative trend on the basis of this. Also we witnessed a high growth recorded on 30th March where the number of affected people rose by 22%, adding 227 new cases in a day which the highest increase observed in a single day.

Time series models

Though it appears that the daily growth is random from the figure below, detailed scrutiny shows that it is not the case. In fact, the growth rate at any given date depends on the growth rates observed for the last few days. This phenomenon is called autocorrelation in time series models where a significant correlation is observed between a variable and its own past (lagged) values. Many time series processes can give rise to autocorrelation, the one particularly relevant here is the moving average (MA) process. It can be shown that the growth rate in India is following MA-2 process which involves the information from last two time steps. Similar model is also applicable for other countries, though the number of lags required are different in each case. The model that can be used for the growth rates observed in India can be specified as Gt = 0.157 + 0.016 εt-1 + 0.41 εt-2 + εt. We can see from the coefficients that the growth rate depends strongly upon the observed growth rate on day before yesterday. Also the first constant term is the average growth rate level, which is 15.7%. This implies a doubling period of 5 days. We also tested the GARCH model on εt, however it did not produce significant results.

The MA model identified for the growth rate can now be used to simulate the future growth rates and in turn expected cases in future. Also we can possibly comment on when we can expect a downturn in number of cases, eventually extinguishing the pandemic. A simple investigation of the above formula tells us that it will not be the case within this model. This is because the average of an MA process is positive equal to the initial value, 0.157 in this case. Thus the number of cases will continue to grow at 15.7%, if there is no dampening effect. We can see this in the plot above where we see 20 such simulated paths, all growing exponentially. If the current evolution process of the growth rate continues along we are looking at nearly 1,00,000 cases by the end of the month. Actual number could lie somewhere between 54,000 to 1,56,000. Although it appears a little unrealistic, it is worthwhile considering where we will be heading if no measures are taken to curtail the pandemic. 
More realistic model will have a dampening term. Since the objective here is not to assess the causation, but to only have statistical estimates based on simple time series model, we achieve the dampening by adding a simple linear trend. Thus the growth rate with dampening trend will be given by, Gt = (0.157 + ω t) + 0.016 εt-1 + 0.41 εt-2 + εt. The dampening rate, ω  here will be kept as a free parameter and we will assess the results as a function of ω. One immediate effect of the dampening is that the growth rate will slowly turn negative (though the MA process will keep pulling it to positive side from time to time). A reversal in the sign of growth rate would indicate a peak in the number of cases beyond which they will start declining. We see this effect clearly in the figure below where the average number of cases turn around after reaching the peak.

As the dampening rate is increased from 0.5% per day to 2.5% per day, the peak is hit earlier in time and its magnitude is also lower. We investigate this further by looking at the peak reaching time, peak number of cases and the expected range of cases at the end of one month period.

Dampening Rate
Peak reaching time
Peak number of cases
Terminal cases
-0.5%
23 – 30
7,420 – 20,930
6,320 – 19,680
-0.1%
11 – 19
2,800 – 5,960
570 – 1,930
-0.15%
6 – 13
2,060 – 3,770
40 – 150
-0.2%
4 – 10
1,750 – 2,930
2 - 8

The table above shows that to achieve significant containment of the pandemic, we must have the dampening rate around -0.15% consistently over a prolonged period of at least two weeks. Lower rates of dampening could also limit the number of cases significantly, but the peak reaching times could be delayed to nearly a month in that case. It must also be remembered that the model uses linear dampening rate. In reality the dampening could vary from day to day and we need to see at the effective dampening rate. If we look at the change in growth rate for last fifteen days we see that it is not always negative. In fact, there are equal number of positive turns as there are on negative side. The average trend appears positive due to a large uptrend shock received on 30th if we look at only last few days. However, averaging over 8 – 10 days gives a weak negative trend of around -0.5%.


Summary

Simple time series models imply that the number of corona virus cases can hit 1,00,000 mark in next month if not contained properly. We expect that the current measures taken such as the nationwide lock-down would help the matter assuage. Unfortunately, the data from the first week of lock down is not really suggesting that to be the case. Given that the incubation period of the virus is around 14 days, we may hope for some downturn in the second week, i.e. by 7th of April. However, as we saw before, even if we contain the total number cases, the peak may come very late if the dampening is not strong enough. Hence precautions must be taken even after the lock-down is removed and we should refrain from public gatherings as much as possible. What we certainly don’t want is a fresh outbreak after 14th  April which shall badly affect not just the medical and economic situation but also the public moral that is kept great during this lock-down. We must remain vigilant until the last battle is won.





[1] The data used in various statistics for this article is from the GitHub repository of John Hopkins University