The COVID-19 pandemic has produced vast quantities of publicly available data that holds a prominent global interest. While the integrity and accuracy of the data has been scrutinized and questioned from nation to nation, the data still provides a great jumping point to explore some fundamental data science tools.
In this post we will explore how nations compared over their first 40 days of COVID-19 exposure for both infections and mortality. The data set for national infections and deaths is large enough that a brute force curve by curve comparison would be a challenge. One strategy is to group nations that followed similar infection and mortality curves together and compare the groups rather than the specific nations. One of the common algorithms for executing this type of analysis is k-means clustering.
As nations and regions plan on restarting their economies and relaxing social distancing restrictions, there have been concerns raised that the data does not support the curve-flattening optimism used to rationalize the economic relaunching. The data for Canada and Alberta hold a selfish personal interest. Monitoring the trajectories of the infection and mortality curves provides insight to any impact this easing of isolation restrictions might have.