Published on in Vol 9, No 1 (2017):

Using a Bayesian Method to Assess Google, Twitter,  and Wikipedia for ILI Surveillance

Using a Bayesian Method to Assess Google, Twitter, and Wikipedia for ILI Surveillance

Using a Bayesian Method to Assess Google, Twitter, and Wikipedia for ILI Surveillance

The full text of this article is available as a PDF download by clicking here.

ObjectiveTo comparatively analyze Google, Twitter, and Wikipedia byevaluating how well change points detected in each web-based sourcecorrespond to change points detected in CDC ILI data.IntroductionTraditional influenza surveillance relies on reports of influenza-like illness (ILI) by healthcare providers, capturing individualswho seek medical care and missing those who may search, post,and tweet about their illnesses instead. Existing research has shownsome promise of using data from Google, Twitter, and Wikipediafor influenza surveillance, but with conflicting findings, studies haveonly evaluated these web-based sources individually or dually withoutcomparing all three of them1-5. A comparative analysis of all threeweb-based sources is needed to know which of the web-based sourcesperforms best in order to be considered to complement traditionalmethods.MethodsWe collected publicly available, de-identified data from the CDCILINet system, Google Flu Trends, HealthTweets.org, and Wikipediafor the 2012-2015 influenza seasons. Bayesian change point analysiswas the method used to detect change points, or seasonal changes,in each of the web-data sources for comparison to change pointsin CDC ILI data. All analyses was conducted using the R package‘bcp’ v4.0.0 in RStudio v0.99.484. Sensitivity and positive predictivevalues (PPV) were then calculated.ResultsDuring the 2012-2015 influenza seasons, a high sensitivity of 92%was found for Google, while the PPV for Google was 85%. A lowsensitivity of 50% was found for Twitter; a low PPV of 43% wasfound for Twitter also. Wikipedia had the lowest sensitivity of 33%and lowest PPV of 40%.ConclusionsGoogle had the best combination of sensitivity and PPV indetecting change points that corresponded with change points found inCDC data. Overall, change points in Google, Twitter, and Wikipediadata occasionally aligned well with change points captured in CDCILI data, yet these sources did not detect all changes in CDC data,which could indicate limitations of the web-based data or signify thatthe Bayesian method is not adequately sensitive. These three web-based sources need to be further studied and compared using otherstatistical methods before being incorporated as surveillance data tocomplement traditional systems.Figure 1. Detection of change points, 2012-2013 influenza seasonFigure 2. Detection of change points, 2013-2014 influenza seasonFigure 3. Detection of change points, 2014-2015 influenza season