This open source pipeline aggregates public COVID-19 data sources, including COVID-19 hospitalization, ICU, and ventilator data for the countries listed in the Data Sources section. Adding other data types relevant to COVID-19 is welcome and supported.

The pipeline is designed for researchers to build models quickly and for engineers to add new data sources quickly.

We support data sources that can be downloaded automatically in structured formats such as .csv and .xlsx, but also aggregate human-scraped data from charts, tables, pdfs, and (occasionally) tweets.

To use the data

If you just want to use the data for models, visualizations, or research, you can download the aggregated csv directly from data/exported/hospitalizations.csv. Releases to the dataset are tagged so there is a stable Github url that points to each version of the data.

Please see the Data Sources section to note the attributions and licenses for each source. If you want to understand the data aggregation pipeline and how to contribute to the repository, read on.

