This Jupyter notebook analyses publicly available data on COVID-19 infections and deaths using the pandas
data framework.
The data on COVID-19 is published by European Centre for Disease Prevention and Control and can be downloaded daily from https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx
Population data is taken from the World Bank (included in the dataset above).
Start by downloading the latest data:
!curl -O https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx
Read the data into a pandas Dataframe. Use the dateRep
column as the index:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("COVID-19-geographic-disbtribution-worldwide.xlsx").set_index("dateRep")
The dataset contains worldwide data on all countries. Note that each daily entry only includes the new cases/deaths for that day:
df
The dataset includes worldwide data:
# list of countries available in the data set
print("Number of countries in data set: %d" % len(df["countriesAndTerritories"].unique()))
# sum cases and store statistics
import matplotlib.patches as mpatches
import yaml
import datetime
df_all = df[["cases", "deaths"]].sort_index().cumsum()
# save statistics to a data file
totals = df_all.tail(1).values.tolist()[0]
with open(r"docs/_data/statistics.yml", 'w') as file:
documents = yaml.dump({
"cases": {"value": totals[0]},
"deaths": {"value": totals[1]},
"updated": {"value": datetime.datetime.now().strftime("%A %d %B %Y at %H:%M:%S")}
}, file)
countries=["China", "Italy", "United_Kingdom", "France", "United_States_of_America", "Spain", "Germany", "Austria"]
def print_df(df, label):
df_c = df[["cases", "deaths"]].sum()
print(label)
print("\tcases: %s" % df_c["cases"])
print("\tdeaths: %s" % df_c["deaths"])
# worldwide
print_df(df, "Worldwide")
# for selected countries
for c in countries:
print_df(df[df["countriesAndTerritories"] == c], c)
Now group the data set by country ready for the next section of country-based statistics:
#Â sum total of cases and deaths by country
cases_deaths_df = df.groupby(by="countriesAndTerritories")[["cases", "deaths"]].sum()
cases_deaths_df
Start with a simple plot of the overall infection numbers per country. This compares the absolute figures and does not take into account the population of a country.
# use seaborn for nicer graphs
import seaborn as sns
sns.set(style="whitegrid")
df_graph = cases_deaths_df.sort_values("cases")[["deaths", "cases"]].tail(20)
# seaborn needs index as a column (via reset_index) and convert to long-form via pd.melt
df_graph_sns = pd.melt(df_graph.reset_index(), ["countriesAndTerritories"])
sns.barplot(x="value",
y="countriesAndTerritories",
hue="variable",
data=df_graph_sns,
palette="Blues").set_title("Countries with highest number of recorded COVID-19 cases")
#df_graph.plot(kind="barh", title="Countries with highest number of recorded COVID-19 cases")
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_highest_number_of_recorded_COVID-19_cases.png")
Countries with larger populations appear higher in the lists simply as a result of the size.
Also, the statistics do not consider how the levels of testing vary across countries. There appears to be a significant difference across countries which directly affects the numbers of recorded cases. This can be due to many reasons:
Some estimates indicate that only 3% of populations are being tested for COVID-19 which would suggest a significant number of undetected infections.
This graph shows those countries with the highest number of COVID-19 deaths. Again, the figures are presented directly without any scaling to take into account the population size (see below for mortality rates).
# using dataframe.plot
#cases_deaths_df.sort_values("deaths").tail(20).plot(kind="barh", y=["deaths"], title="Countries with highest number of recorded COVID-19 deaths")
# using seaborn
sns.barplot(x="deaths",
y="countriesAndTerritories",
data=cases_deaths_df.sort_values("deaths").tail(20).reset_index(),
palette="Blues_d").set_title("Countries with highest number of recorded COVID-19 deaths")
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_highest_number_of_recorded_covid-19_deaths.png")
print(cases_deaths_df.sort_values("deaths").tail(20))
Figures for COVID-19 deaths may be more accurate than those of infections (due to the significant under-counting of infections for the reasons cited above). However:
# create a pie chart with the top 20, and other others combined into a single row
top_20_deaths = cases_deaths_df["deaths"].sort_values().tail(20)
top_20_deaths.loc["Other"] = cases_deaths_df["deaths"].sum() - top_20_deaths.sum()
#print(top_20_deaths)
top_20_deaths.plot.pie(figsize=(8,8))
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_highest_number_of_recorded_covid-19_deaths_vs_rest_of_world.png")
At the time of writing, the list of countries with COVID-19 cases that has yet to experience any COVID-19 related deaths is:
no_deaths = cases_deaths_df[cases_deaths_df["deaths"] == 0].index
#print("Number of countries without COVID-19 deaths: %d" % len(no_deaths))
#print("\n".join(no_deaths))
# plot countries in a graph
sns.barplot(y="countriesAndTerritories",
x="cases",
data=cases_deaths_df[cases_deaths_df["deaths"] == 0].reset_index(),
palette="Blues_d").set_title("Countries with no recorded COVID-19 deaths¶")
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_no_recorded_covid-19_deaths.png")
plt.show()
This section looks at metrics that are more independent of the underlying population and allow comparisons between countries.
This graph adjusts for population size by expressing all cases and deaths in terms of the underlying population size (i.e. infection/mortality rates). Countries with missing population data are excluded.
# top 20 worst pop/death and pop/cases ratios
cases_deaths_pop_df = cases_deaths_df.join(df.set_index("countriesAndTerritories")["popData2019"].drop_duplicates(), rsuffix="right")
# drop rows with missing population data
cases_deaths_pop_df = cases_deaths_pop_df.dropna(subset=["popData2019"])
# unclear what this data set is
#cases_deaths_pop_df = cases_deaths_pop_df.drop(index="Cases_on_an_international_conveyance_Japan")
cases_deaths_pop_df["cases_per_capita"] = cases_deaths_pop_df["cases"] / cases_deaths_pop_df["popData2019"] * 100
cases_deaths_pop_df["deaths_per_capita"] = cases_deaths_pop_df["deaths"] / cases_deaths_pop_df["popData2019"] * 1000
# infection rates
#cases_deaths_df.sort_values("cases_per_capita").tail(20).plot(kind="barh", y="cases_per_capita", title="Countries with highest recorded COVID-19 infection rates")
sns.barplot(x="cases_per_capita",
y="countriesAndTerritories",
data=cases_deaths_pop_df.sort_values("cases_per_capita").tail(20).reset_index(),
palette="Blues_d").set_title("Countries with highest recorded COVID-19 infection rates")
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_highest_number_of_recorded_COVID-19_cases_per_capita.png")
plt.show()
# mortality rates
#cases_deaths_df.sort_values("deaths_per_capita").tail(20).plot(kind="barh", y="deaths_per_capita", title="Countries with highest recorded COVID-19 mortality rates")
sns.barplot(x="deaths_per_capita",
y="countriesAndTerritories",
data=cases_deaths_pop_df.sort_values("deaths_per_capita").tail(20).reset_index(),
palette="Blues_d").set_title("Countries with highest recorded COVID-19 mortality rates (per 1000)")
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_highest_number_of_recorded_covid-19_deaths_per_capita.png")
plt.show()
While many of the hotspot countries (Italy, Spain) are still present, some of the larger countries no longer appear in the graphs (notably USA: while the recorded infections are far higher than any other countries, this is still lower when population size is taken into consideration).
As before, differences in testing policies and recording deaths will skew the statistics. Countries with particularly small populations can also appear disproportionately (e.g. San Marino, Luxembourg, Guernsey, Monaco). This may be due to statistical margins of errors, or may indicate that the official populations do not accurately reflect the numbers of active people in the country (e.g. large numbers of frontier workes travelling across borders can significantly increase the intra-day population).
This metric calculates the proportion of deaths compared to the recorded cases. It describes the likelihood of a COVID-19 infection being fatal.
While this removes the bias against population sizes, the figures are still subject to the same issues relating from the recording of the raw data (level of testing; classification of death as COVID-related etc). Unlike the Infection Fatality Rate, these graphs only use recorded cases and do not try to take account of undiagnosed cases.
df_sum = df[["cases", "deaths"]].sum()
print("Worldwide CFR: %.2f%%" % (df_sum["deaths"].astype(int) / df_sum["cases"].astype(int) * 100))
# top 20 worst death/cases ratios
cases_deaths_pop_df["deaths_to_cases_ratio"] = cases_deaths_pop_df["deaths"] / cases_deaths_pop_df["cases"] * 100
#cases_deaths_df.sort_values("deaths_to_cases_ratio").tail(20).plot(kind="barh", y="deaths_to_cases_ratio", title="Countries with highest Case Fatality Risk")
sns.barplot(x="deaths_to_cases_ratio",
y="countriesAndTerritories",
data=cases_deaths_pop_df.sort_values("deaths_to_cases_ratio").tail(20).reset_index(),
palette="Blues_d").set_title("Countries with highest Case Fatality Risk")
plt.tight_layout()
plt.savefig("docs/graphs/totals/countries_with_highest_case_fatality_risk.png")
plt.show()
The majority of top countries are under-developed nations. This may indicate limitations in the health care systems (where a COVID infection has an increased risk of a fatality), and/or be a result of more limited testing programmes (which reduce the number of recorded infections).
More generally, there is agreement that the numbers of recorded infections is significantly lower than the actual infections. In these cases, the CFR metric should be treated with caution (currently over 5% worldwide). This is likely to be significantly higher than the real Infection Fatality Rate (which estimates the actual numbers of infections).
The following graphs look at how the COVID-19 infection rate is changing over time. This can identify peak infection rates (where the number of new infections is at its highest) and show how these vary across countries.
The graphs also show how infection curves flatten (how quickly the number of new cases reduces).
# sum total of cases and deaths by country
cumsum_df = df.sort_index().reset_index().groupby(by=["countriesAndTerritories", "dateRep"])[["cases"]].sum().groupby(level=0).cumsum()
# threshold to be considered infected
threshold = 10
ordered_cases_df = cumsum_df[cumsum_df["cases"] >= threshold].reset_index(level=0)[["countriesAndTerritories", "cases"]].drop_duplicates(subset="countriesAndTerritories", keep="first").sort_index()
# group by month
ordered_cases_df["sequence"] = ordered_cases_df.index.to_period("M")
# 1. brew install geos proj gdal
# 2. download http://www.naturalearthdata.com/downloads/10m-cultural-vectors/
# 3. ogr2ogr -simplify .05 -lco ENCODING=UTF-8 countries/ ne_10m_admin_0_countries_lakes/ne_10m_admin_0_countries_lakes.shp
import geopandas as gpd
# world map
world = gpd.read_file("countries/ne_10m_admin_0_countries_lakes.shp")
# clean data: rename japan
ordered_cases_df.loc[(ordered_cases_df["countriesAndTerritories"] == "Cases_on_an_international_conveyance_Japan"), "countriesAndTerritories"] = "Japan"
# clean data: replace underscores with spaces
ordered_cases_df["countriesAndTerritories"] = ordered_cases_df["countriesAndTerritories"].str.replace("_", " ")
# merge into world map data
for_plotting = world.merge(ordered_cases_df, how="right", left_on = "SOVEREIGNT", right_on = "countriesAndTerritories").sort_values("sequence")
# show any countries that could not be matched in the map data (ordered by cases)
errors = len(for_plotting[for_plotting["SOVEREIGNT"].isnull()])
if errors:
print("Warning: Could not find %d countries from %d" % (errors, len(ordered_cases_df)))
for_plotting[for_plotting["SOVEREIGNT"].isnull()][["countriesAndTerritories", "cases"]].sort_values("cases", ascending=False)
ax = for_plotting.plot(column="sequence", figsize=(15,9), cmap="Blues_r", legend=True)
plt.title("COVID-19 spread across the world")
plt.tight_layout()
plt.savefig("docs/graphs/rates/covid-19_spread_across_the_world.png")
These graphs show how the total number of COVID-19 infections/deaths has evolved over time for some selected countries. The gradients of the curves show how the spread of the virus is changing: a steep curve shows large numbers of new cases when the virus is spreading the quickest while a flatter curve shows when the virus is either yet to start speading (start of the curve) or is slowing (top of the curve).
The 1st graphs shows the cases and deaths worldwide. Subsequent graphs break down the figures for selected countries.
ax_cases = df_all.plot(y="cases", title="Worldwide COVID-19 statistics", logy=False)
ax_cases.set_ylabel("cases")
ax_deaths = ax_cases.twinx()
df_all.plot(y="deaths", ax=ax_deaths, color="red")
ax_deaths.set_ylabel("deaths")
plt.legend(handles=[mpatches.Patch(label="COVID-19 infections"), mpatches.Patch(color="red", label="COVID-19 deaths")], loc=2)
plt.tight_layout()
plt.savefig("docs/graphs/totals/worldwide_covid-19_statistics.png")
ax_cases = df_all.plot(y="cases", title="Worldwide COVID-19 statistics with logarithmic scale", logy=True)
ax_cases.set_ylabel("cases")
ax_deaths = ax_cases.twinx()
df_all.plot(y="deaths", ax=ax_deaths, color="red", logy=True)
ax_deaths.set_ylabel("deaths")
plt.legend(handles=[mpatches.Patch(label="COVID-19 infections"), mpatches.Patch(color="red", label="COVID-19 deaths")], loc=2)
plt.tight_layout()
plt.savefig("docs/graphs/totals/worldwide_covid-19_statistics_with_logarithmic_scale.png")
#Â helper function to plot selected countries
def plot_over_time(y, title, filename):
for log_scaling, title_suffix, file_suffix in [(False, "", ""), (True, " with logarithmic scale", "_with_logarithmic_scale")]:
ax = None
for c in countries:
# transform daily delta into cumulative
df_c = df[df["countriesAndTerritories"] == c][["cases", "deaths"]].sort_index().cumsum()
ax = df_c.plot(ax=ax, y=y, title=title + title_suffix, logy=log_scaling)
ax.set_ylabel(y)
ax.legend(countries)
plt.tight_layout()
plt.savefig(filename + file_suffix + ".png")
plot_over_time("cases", "COVID-19 cases over time", "docs/graphs/totals/covid-19_cases_over_time")
plot_over_time("deaths", "COVID-19 deaths over time", "docs/graphs/totals/covid-19_deaths_over_time")
All governments are aiming to keep the peak of the curve as low as possible to reduce the numbers of infections and overall deaths. However, the shape of the curve is key to understanding the spread of the virus and project the evolution. Much of the news coverage talks of 'flattenning the curve' in order to:
As the rate of infection decreases, the curves starts to flatten (eventually flattenning completely as no new cases are discovered). Note how the gradients of the curves - indicating the rates of new infections/deaths - vary across countries. This may be due to many reasons (more effective policies, differences in health case and health system capacity, differences in demographics etc).
Note how some countries are starting to flatten the curve (Europe, China) which indicates a slowdown in the virus. However, the current worldwide trajectory does not suggest the peak has been reached.
These graphs show the numbers of infections and deaths as a percentage of the underlying population. This takes account of population size and - as with some of the earlier graphs - allows figures to be compared across countries.
As before, any variations in data collection methods, testing policies etc will skew the statistics.
# COVID-19 cases/deaths for selected countries (scales as rates per capita)
fig = plt.figure()
fig.tight_layout()
plot_data = {}
# prepare the data separately from plotting
for c in countries:
# transform daily delta into cumulative cases and deaths
df_c = df[df["countriesAndTerritories"] == c][["cases", "deaths"]].sort_index().cumsum()
# lookup population from raw data (no join because missing country)
population = df[df["countriesAndTerritories"] == c]["popData2019"].iloc[0]
df_c["cases_rate"] = df_c["cases"] / population * 100 #Â as %
df_c["deaths_rate"] = df_c["deaths"] / population * 1000 # per 1000
# store the data in a dictionary
plot_data[c] = df_c
# plot infection rates
ax_cases = None
for c, df_c in plot_data.items():
ax_cases = df_c.plot(ax=ax_cases, y="cases_rate", title="COVID-19 infection rate over time", legend=False)
ax_cases.set_ylabel("% cases per capita")
ax_cases.legend(plot_data.keys())
plt.tight_layout()
plt.savefig("docs/graphs/rates/covid-19_infection_rate_over_time.png")
# plot mortality rates
# plot infection rates
ax_deaths = None
for c, df_c in plot_data.items():
ax_deaths = df_c.plot(ax=ax_deaths, y="deaths_rate", title="COVID-19 mortality rate per 1000 over time")
ax_deaths.set_ylabel("deaths per 1000")
ax_deaths.legend(plot_data.keys())
plt.tight_layout()
plt.savefig("docs/graphs/rates/covid-19_mortality_rate_per_1000_over_time.png")
Note how the gradients of the curves - indicating the infection/mortality rates - vary across countries. For example, Italy and Spain have broadly similar numbers of infections, but when scaled as as fraction of the underyling populations, the Spanish infection rate is rather higher. Note also how the gradient of the curves differ: the infection was slower to take hold in Spanish (or at least, as reflected in the recorded figures), but quickly accelerated to match Italy.
Figures for Luxembourg show a very high infection rate as a percentage of the overall population. This is generally due to the figure being used as the population: the active working population is many times higher than the actual number of residents due to the high daily influx of frontier workers. These clearly impact infection rates but are not taken into account in the statistics.
While infection numbers for Spain and Italy are broadly similar, the infection for Italy is comparatively lower due to the larger population.
These graphs show the number of new infections and deaths each day: in other words, the gradients of the curves above. This is the way the raw data is presented in the downloaded file.
These figures show the delta changes per day i.e. the 1st derivative of the infection and death rates. As the infection rate slows, the number of new daily cases will decrease, eventually tending to zero. Conversely, the delta changes are at their highest when the gradient of the earlier graphs are steepest.
# Derivative of COVID-19 cases (unscaled) i.e. new cases/days
# Selected countries merged on to 1 plot
# cases
df_all = pd.DataFrame()
for c in countries:
# no need to transform, just sort on date
df_all[c] = df[df["countriesAndTerritories"] == c]["cases"].sort_index()
ax = df_all.plot(title="New COVID-19 daily cases")
ax.legend(df_all.columns)
fig.tight_layout()
plt.savefig("docs/graphs/rates/new_covid-19_daily_cases.png")
# deaths
df_all = pd.DataFrame()
for c in countries:
df_all[c] = df[df["countriesAndTerritories"] == c]["deaths"].sort_index()
ax = df_all.plot(title="New COVID-19 daily deaths")
ax.legend(df_all.columns)
fig.tight_layout()
plt.savefig("docs/graphs/rates/new_covid-19_daily_deaths.png")
These graphs show the rise of the virus over its lifetime. The peak of each curve is when the virus is spreading fastest through a population. The fall of the curve is when the virus is slowing down and reaching the end of its spread.
These derivative curves show more clearly the rate at which the virus is spreading. Temporary decreases often give offer a false hope and the infection rate soon returns to previously higher levels. This may be due to changes in the spread of the virus, delayed reporting of cases over weekends etc.
The shape of the curve generally follows a bell-curve shape, but there is concern that the peak starts to plateau over a prolonged period before dropping. However, the virus appears to be clearly slowing down in some countries (e.g. Austria).
There is a fear over the longer term that the infection rate may rise again as social distancing measures are relaxed (a so-called "2nd wave").
These graphs show how daily rates of deaths vary over time for selected countries.
# grid line with multiple y-axes us confusing
sns.set(style="white")
# Derivative of COVID-19 cases (unscaled) i.e. new cases/days
# 1 graph per country. Cases and deaths plotted on separate y-axes
for c in df["countriesAndTerritories"].unique():
# no need to transform, just sort on date
ax_cases = df[df["countriesAndTerritories"] == c].sort_index().plot(y="cases", legend=False)
ax_deaths = ax_cases.twinx()
ax_deaths = df[df["countriesAndTerritories"] == c].sort_index().plot(y="deaths", color="red", ax=ax_deaths)
ax_cases.set_title("New COVID-19 daily cases and deaths - %s" % c)
ax_cases.set_ylabel("cases")
ax_deaths.set_ylabel("deaths")
plt.legend(handles=[mpatches.Patch(label="COVID-19 infections"), mpatches.Patch(color="red", label="COVID-19 deaths")], loc=2)
plt.tight_layout()
plt.savefig("docs/graphs/countries/%s_-_new_covid-19_daily_cases_and_deaths.png" % c)
#plt.show()
plt.close()
#break
Thes graphs show the total COVID infections and deaths for each country
sns.set(style="whitegrid")
# helper function to plot all countries
def plot_over_time(y, title, filename):
for log_scaling, title_suffix, file_suffix in [(False, "", ""), (True, " with logarithmic scale", "_with_logarithmic_scale")]:
ax = None
for c in df["countriesAndTerritories"].unique():
# transform daily delta into cumulative
df_c = df[df["countriesAndTerritories"] == c][[y]].sort_index().cumsum()
ax = df_c.plot(title=title + title_suffix + " - " + c, logy=log_scaling)
ax.set_ylabel(y)
# 5 day rolling mean to smooth curve
df_rolling = df_c.rolling(5).mean().rename({y: y + " (rolling)"}, axis="columns")
df_rolling.plot(ax=ax, title=title + title_suffix + " - " + c, logy=log_scaling)
plt.tight_layout()
plt.savefig("docs/graphs/countries/%s_-_%s.png" % (c, filename + file_suffix))
#plt.show()
plt.close()
#break
plot_over_time("cases", "COVID-19 cases over time", "covid-19_cases_over_time")
plot_over_time("deaths", "COVID-19 deaths over time", "covid-19_deaths_over_time")
# grid line with multiple y-axes us confusing
sns.set(style="whitegrid")
# Derivative of COVID-19 cases (unscaled) i.e. new cases/days
# 1 graph per country. Cases and deaths plotted on separate y-axes
for c in df["countriesAndTerritories"].unique():
# no need to transform, just sort on date
df_c = df[df["countriesAndTerritories"] == c][["cases"]].sort_index()
ax_cases = df_c.plot(legend=True)
ax_cases.set_title("New COVID-19 daily cases and 5 day rolling mean - %s" % c)
ax_cases.set_ylabel("cases")
# 5 day rolling mean to smooth curve
df_rolling = df_c.rolling(5).mean().rename({"cases": "cases (rolling)"}, axis="columns")
df_rolling.plot(ax=ax_cases, legend=True)
plt.tight_layout()
plt.savefig("docs/graphs/countries/%s_-_new_covid-19_daily_cases_and_5_day_rolling_mean.png" % c)
#plt.show()
plt.close()
#break
Show how the virus moved to each country and established itself in the population (>= 100 cases).
# sum total of cases and deaths by country
cumsum_df = df.sort_index().reset_index().groupby(by=["countriesAndTerritories", "dateRep"])[["cases"]].sum().groupby(level=0).cumsum()
hundred_cases_df = cumsum_df[cumsum_df["cases"] >= 100].reset_index(level=0)[["countriesAndTerritories"]].drop_duplicates(subset="countriesAndTerritories", keep="first").sort_index()
hundred_cases_df
# COVID-19 cases for selected countries (normalise 0-1)
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
ax_cases = None
ax_deaths = None
# extract data for each country
for c in countries:
df_c = df[df["countriesAndTerritories"] == c][["cases", "deaths"]].sort_index().cumsum()
# normalise the values 0-1
np_scaled = min_max_scaler.fit_transform(df_c)
df_c_normalized = pd.DataFrame(np_scaled, columns=df_c.columns, index=df_c.index)
ax_cases = df_c_normalized.plot(ax=ax_cases, y="cases", title="COVID-19 cases over time (normalised)")
ax_deaths = df_c_normalized.plot(ax=ax_deaths, y="deaths", title="COVID-19 deaths over time (normalised)")
ax_cases.legend(countries)
ax_cases.set_ylabel("cases")
ax_deaths.legend(countries)
ax_deaths.set_ylabel("deaths")
# 2nd derivative of COVID-19 cases (unscaled) i.e. new cases/days
# 1 graph per country
# Shows when the infection rates increase/decrease
ax = None
for c in countries:
# transform daily delta into cumulative'
ax = df[df["countriesAndTerritories"] == c][["cases", "deaths"]].sort_index().diff(axis=0).plot(title="New COVID-19 cases & deaths over time f''(x)")
ax.legend([c])
#plt.savefig("docs/graphs/countries_covid_derivative2_%s.png" % c)