Abstract
ObjectiveTo develop a detailed data validation strategy for facilitiessending emergency department data to the Massachusetts SyndromicSurveillance program and to evaluate the validation strategy bycomparing data quality metrics before and after implementation ofthe strategy.IntroductionAs a participant in the National Syndromic Surveillance Program(NSSP), the Massachusetts Department of Public Health (MDPH)has worked closely with our statewide Health Information Exchange(HIE) and National Syndromic Surveillance Program (NSSP)technical staff to collect and transmit emergency department (ED)data from eligible hospitals (EHs) to the NSSP. Our goal is to ensurecomplete and accurate data using a multi-step process beginning withpre-production data and continuing after EHs are sending live datato production.MethodsWe used an iterative process to establish a framework formonitoring data quality during onboarding of EHs into our syndromicsurveillance system and kept notes of the process.To evaluate the framework, we compared data received duringthe month of January 2016 to the most recent full month of data(June 2016) to describe the following primary data quality metricsand their change over time: total and daily average of message andvisit volume; percent of visits with a chief complaint or diagnosiscode received in the NSSP dataset; and percentage of visits with achief complaint/diagnosis code received within a specified time ofadmission to the ED.ResultsThe strategies for validation we found effective includedexamination of pre-production test HL7 messages and the executionof R scripts for validation of live data in the staging and productionenvironments. Both the staging and production validations areperformed at the individual message level as well as the aggregatedvisit level, and included measures of completeness for requiredfields (Chief Complaint, Diagnosis Codes, Discharge Dispositions),timeliness, examples of text fields (Chief Complaint and TriageNotes), and demographic information. We required EHs to passvalidation in the staging environment before granting access to senddata to the production environment.From January to June 2016, the number of EHs sending data tothe production environment increased from 44 to 48, and the numberof messages and visits captured in the production environmentincreased substantially (see Table 1). The percentage of visits witha chief complaint remained consistently high (>99%); howeverthe percentage of visits with a chief complaint within three hoursof admission decreased during the study period. Both the overallpercentage of visits with a diagnosis code and the percentage of visitswith a diagnosis code within 24 hours of admission increased.ConclusionsFrom January to June 2016, Massachusetts syndromic surveillancedata improved in the percentage of visits with diagnosis codes and thetime from admission to first diagnosis code. This was achieved whilethe volume of data coming into the system increased. The timelinessof chief complaints decreased slightly during the study period, whichmay be due to the inclusion of several new facilities that are unable tosend real-time data. Even with the improvements in the timeliness ofthe diagnosis code field, and the subsequent decrease in the timelinessof the chief complaint field, chief complaints remained a more timelyoption for syndromic surveillance. Pre-production and ongoing dataquality assurance activities are crucial to ensure meaningful dataare acquired for secondary analyses. We found that reviewing testHL7 messages and staging data, daily monitoring of productiondata for key factors such as message volume and percent of visitswith a diagnosis code, and monthly full validation in the productionenvironment were and will continue to be essential to ensure ongoingdata integrity.Table 1: ED Data in the Production Environment