covidphdata: An Interface to the Philippines Department of Health COVID-19 Data Drop

Image credit: Ernest Guevarra

The Philippines Department of Health has made COVID-19 related data publicly available as part of its mandate to promoting transparency and accountability in governance. The Philippines COVID-19 Data Drop is distributed via Google Drive with latest updated data provided daily (generally). Data from previous days are archived and also made available through Google Drive. This package provides a coherent, robust and performant API to the latest and archived Philippines COVID-19 data.

Motivation and development history

In early April 2020 as part of CoMo Philippines’s contribution to the CoMo Consortium’s work in COVID-19 modelling for the Philippines context, we developed an R package called comoparams that provided R functions for access, handling and processing of relevant data required for COVID-19 modelling parameters for the Philippines. Within the comoparams package was a set of functions developed to interface with the then newly-announced Philippines Department of Health Data Drop for COVID-19. From then until late June 2020, we continued maintenance of the comoparams package. Specifically, the set of functions for interfacing with the DoH Data Drop required several functions updates (see history of issues and changes to the package here) in relation to noted issues with the DoH Data Drop system and the data it provides. A good discussion of some of these issues by the UP COVID-19 Pandemic Response Team can be found here. Other issues that required syntax-breaking changes to the comoparams package functions were related to how the DoH Data Drop was released and distributed. Earlier during its first release, the DoH Data Drop was distributed through Google Sheets via a single non-changing link. This was for both latest release and the archived data. A few weeks later, the DoH Data Drop was moved to Google Drive again via a single non-changing link for both the latest and the archive data. A month or so after, release of the latest DoH Data Drop was distributed through a single link that changed during every release (usually daily). The archive data, on the other hand, was distributed through a different but constant link. The DoH Data Drop distribution system has since remained as such up to date though the archive data now only includes the current month archive compared to what it was in late June 2020 when all previous months of archive data were available.

Currently, given the still persistent issues raised with regard to the DoH Data Drop system and the datasets it distributes, we felt it was important to create a separate focused R package just for accessing, handling and processing of the DoH Data Drop that can be universally used by other R users regardless of their ultimate end-use for the data (e.g., reporting, visualisation, modelling, etc.). We also wanted to contribute to the work of fellow scientists and researchers in the Philippines whom we are familiar with and whom we know use R for the epidemiologic analysis work they perform which they share with the general public in the Philippines. From our own experiences of dealing with the DoH Data Drop, we felt that it would be extremely helpful for fellow scientists and researchers who use R to have consistent and performant data access, handling and processing functions to interface with the DoH Data Drop that can potentially reduce the daily and/or weekly workload of producing COVID-19 analysis and reports and streamline their routine analytical workflows.

To this end, we took inspiration from the functions we developed in the comoparams package and developed this covidphdata package based on the following R package design principles:

  • use of a modular and refactorable approach to determining and developing the functions required;

  • creation of an R-based robust and performant application programme interface or API to the Google Drive-based DoH Data Drop;

  • application of modern data handling and processing techniques available in R; and,

  • output of coherent and compliant data structures ready for various applications.

Finally, we want to contribute to both the public discourse and to the practice of open data and open science in the Philippines and encourage others to do the same. Our group’s codebase for our work on COVID-19 modelling (primarily in R language for statistical computing) is available publicly via GitHub including the one for the covidphdata package (see https://github.com/como-ph/covidphdata). Our hope is that by creating this package and making its use available to all R users and its codebase open source, we can standardise accessing, handling and processing of the DoH Data Drop thereby providing transparency to the approaches applied to the data before it is analysed or visualised.

What does covidphdatado?

The covidphdata primarily provides functions that serve as low-level wrappers to specific googledrive package functions that support the access to files and folders contained in a Google Drive given that the DoH Data Drop is distributed through this platform.

Currently, the covidphdata package:

  • Provides functions (datadrop_id*) to dynamically retrieve unique DoH Data Drop file and folder identifiers used by Google Drive;

  • Provides a function (datadrop_ls) to list the files and folders within a specified DoH Data Drop folder in Google Drive;

  • Provides a function (datadrop_download) to download a specified file within DoH Data Drop in Google Drive; and,

  • Provides functions (datadrop_get) to retrieve a specified file within DoH Data Drop in Google Drive into R.

To learn more about the covidphdata package - how to install in R, how to use the package, and how to contribute to its development and testing - visit the package website at https://como-ph.github.io/covidphdata.



Related