Real-time data without context can be either too troubling or not troubling enough, leading to panic and wrong decisions.
I’ve been at my laptop, continuously refreshing, eyes glued to the screen. In the evening, after my laptop is shut, I am obsessively checking my mobile phone. I am looking for the latest COVID-19 updates, along with the Johns Hopkins University dashboard showing the march of the COVID-19 virus across the globe — with red bubbles marking outbreaks spreading like measles across a world map. I know many others who are also looking at this dashboard or similar ones as well.
Looks like three new cases just popped up in Denmark. Twenty-two more souls lost in Italy. Oops, there’s one more in Russia. As if this writing, we’re passing the 10,000-case mark in the United States.
This is global reporting data as close to real-time as you can get it, pure and simple. It’s not fun data, as it’s a scoreboard of disease and death. And we watch and get more and more scared. The estimates and percentages fly — two percent mortality rate here, eight percent there.
It’s wonderful that all this real-time data is freely available for everyone to see and share. The challenge is, these are the early stages of the pandemic with a still-relatively unknown virus, so its missing context. Real-time data without context can be either too troubling or not troubling enough, leading to panic and wrong decisions. You look at the data, and its growth in each country, and you wonder: how bad is this? How does this compare to other outbreaks? What can you deduct about the lethality of the disease? Is it affecting certain population segments? Fascinating grist for data scientists and analysts, the stuff of anxiety for non-analytic viewers.
Nate Silver of FiveThirtyEight notes in a recent tweet that “there’s a role for someone to emerge as the world’s leading interpreter of coronavirus statistics. You see a lot of data reported in real time but not always obvious what’s ‘real’ and what’s an artifact of insufficient testing and under-reporting etc.”
The potential erroneous assumptions that come from raw data are explored in great detail in a website led by Max Roser, an economist at the University of Oxford, in partnership with Drs. Hannah Ritchie and Esteban Ortiz-Ospina. As they observe: “In an ongoing outbreak the final outcomes – death or recovery – for all cases is not yet known. The time from symptom onset to death ranges from two to eight weeks for COVID-19. This means that some people who are currently infected with COVID-19 will die at a later date. This needs to be kept in mind when comparing the current number of deaths with the current number of cases.”
Testing may help reveal the extent to which populations are infected, though this is fallible as well, Roser and his co-authors point out. “The number of COVID-19 tests does not reflect the number of people who have a definitive diagnosis. Some people require more than one test because of false-negative outcomes.” Under-reporting of cases is a challenge since it can dramatically recast the apparent lethality of diseases. Rosen’s team looks at the CDC’s calculations for annual influenza outbreaks, for example, which conducts disease outbreak modeling which attempts to adjust for under-reporting of outbreak numbers — those who did not seek medical attention. Such modeling will be applied to COVID-19 as the outbreak continues, and ultimately may lower the fatality percentages.
In short, we still don’t have all this raw real-time data in its proper context. There’s even a lesson somewhere in here for businesses making out-of-context decisions based on real-time data feeds. Until then, I — like many others — will keep our eyes glued to the real-time dashboards, hoping for the best, and washing my hands as often as possible.