Abstract
ObjectiveTo explore the interest of Wikipedia as a data source to monitorseasonal diseases trends in metropolitan France.IntroductionToday, Internet, especially Wikipedia, is an important part ofeveryday life. People can notably use this popular free onlineencyclopedia to search health-related information. Recent studiesshowed that Wikipedia data can be used to monitor and to forecastinfluenza-like illnesses in near real time in the United States [1,2].We carried out a study to explore whether French Wikipedia dataallow to monitor the trends of five seasonal diseases in metropolitanFrance: influenza-like illness, gastroenteritis, bronchiolitis,chickenpox and asthma.MethodsTo collect Wikipedia data, we used two free web applications(https://stats.grok.se and https://tools.wmflabs.org/pageviews), whichaggregate daily views for each French entry of the encyclopedia.As some articles have several entries (redirects), we collectedview statistics for all the article entries and added them to make timeseries from January 1st, 2009 to June 30, 2016 (Figure 1). Then, wecompared these data to those of OSCOUR®network, which is a robustnational surveillance system based on the emergency departments.For each disease, we modelized daily variations in Wikipedia viewsaccording to daily visits in ED using Poisson regression modelsallowing for overdispersion. The following adjustment variables wereincluded in the model: long-term trend, seasonality, day of the week.We tested several lags (day-7 to day+7) in order to explore whetherone of the two indicators (Wikipedia view or ED visits) varied earlierthan the other.ResultsThe mean number of daily views was 764 [16-8271] for influenza-like illness, 202 [6-1660] for bronchiolitis, 1228 [59-10030] forgastroenteritis, 475 [21-2729] for asthma and 879 [25-4081] forchickenpox. Times series analyses showed a positive associationbetween page views and ED visits for each seasonal disease (Figure 2).For each increase in 100 Wikipedia views, the number of ED visitsthe same day increased by 2.9% (95% CI=[2.5-3.3]) for influenza,1.8 (95% CI=[1.4-2.2]) for bronchiolitis, 2.4% (95% CI=[2.2-2.7])for gastroenteritis, 1.4% (95% CI=[1.0-1.7]) for asthma and 2.9%(95% CI=[1.7-4.1]) for chickenpox. Globally, the highest relativerisks were observed for lag-1 (day-1) to lag0.ConclusionsThis study allowed to show that French Wikipedia data canbe useful to monitor the trends of seasonal diseases. Indeed, theywere significantly associated with data from a robust surveillancesystem, with a maximum lag of one day. Wikipedia can thereforebe considered as an interesting complementary data source, notablywhen traditional surveillance systems are not available in real time.Further works will be necessary to elaborate forecasting models forthese seasonal diseases.Figure1. Daily number of page views and ED visits for seasonal dieases,January 1st, 2009 to June 30, 2016Figure2. Relative risk between Wikipedia page views and ED visits forseasonal diseases by several lags