Abstract
ObjectiveThe objective of this project is to advance the science of biosurveillance by providing a user curated cataloging system, to be used across health department and other users, that advances daily surveillance operations by better characterizing three key issues in available surveillance systems: duplication in biosurveillance activities; differing perspectives and analyses of the same data; and inadequate information sharing.IntroductionA variety of government reports have cited challenges in coordinating national biosurveillance efforts at strategic and tactical levels. The General Accountability Office (GAO), an independent nonpartisan agency that investigates how the federal government funding and performs analysis at the request of congressional committees or by public mandate, has published 64 reports on biosurveillance since 2005. The aim of this project is to better characterize these issues by collecting and analyzing a sample of publicly documented biosurveillance systems, and making our data and results available for the public health community to review and evaluate. This study openly publishes the data files of information collected (i.e. CSV, XLS), the Python NLP scripts, and a freely available web-based application developed in R Shiny that filters against the 227 biosurveillance systems and activities to promote a more transparent understanding of how public health practitioners conduct surveillance activities.MethodsCollected and reviewed data on 424 systems, of which 227 systems and activities met our criteria;
Implemented a new approach to develop a standard framework for data collection using natural language processing (NLP);
Openly published all data files publicly on Github and developed an online analytics application; and
Convened a workshop of experts from across federal, state, not-for-profit, academic and commercial entities in November 2015 in Washington, D.C., to review the methodology and results of this study.ResultsThe results of this project include a fully functional web application and code (available through Github) for the continued expansion, categorization and analysis of surveillance systems. Unique findings currently rendered through the 227 surveillance systems include: Out of 227 systems, 20 were established in the year 2006, alone, with an increase in systems established following 1990; 68% of all systems catalogued are focused solely on human surveillance; 45% of all cataloged systems used statistical analysis and only 4% are using Natural Language Processing; and 43% of all biosurveillance systems in our inventory reported using “health department” data as a data source.ConclusionsWe believe this project is the first step for public health practitioners and researchers to contribute to a transparent inventory of systems and activities. Results provide meaningful metadata on an over focus on human surveillance, over-reliance on a single data source (health departments) and a lack of advanced data science practices being applied to systems in the field. The value of this project 1) provides a starting point for the development of a standard framework of categories to use for cataloging biosurveillance systems, 2) offers openly available data and code on Github [3] for others to integrate into their research, and 3) introduces a set of methodological issues to consider in a biosurveillance inventorying exercise.