Background: Travel history can help differentiate a public health emergency from a travel-related infection by providing information on exposure but such information is often available only in unstructured clinical documents. We explored the feasibility extracting these mentions from the electronic health record in an automated fashion.
Methods: As a collaboration with the National Biosurveillance Integration Center (NBIC), clinical notes were extracted from patient encounters with Zika, dengue and chikungunya virus testing in the Department of Veterans Affairs (VA; a large healthcare system providing care in its facilities from Puerto Rico to the Philippines) between January 1, 2015 and February 28, 2016. From a corpus of 250,133 notes, we gathered a collection of 4,584 unique snippets by an automated bootstrapping process to identify documents containing potentially relevant information using phrases and travel locations. After establishing a guideline, snippets were manually annotated for travel affirmation and locations visited (see Figure 1). Using machine learning including a neural language model, snippets were used to train a Conditional Random Field (CRF) model to extract affirmed travel locations outside of the continental US. We did not extract the time of travel.
Results: Of annotated snippets, 2,659 (58%) contained an affirmed mention of travel history whereas 347 (7.6%) were negated. An inter-rater reliability (IRR) analysis resulted in an agreement of 89% and an associated kappa-coefficient of 0.65. Analysis of annotated snippets resulted in 551 unique location strings identified (see Figure 2). On a held out test set of 459 snippets (10%), the machine learning model achieved performance metrics of 85.6% Positive Predictive Value and 76.7% Sensitivity. The algorithm now runs daily and is being evaluated for prospective use (see Figure 3).
Conclusion: Targeted travel history extraction is feasible in a large medical system with acceptable accuracy. Our approach was able to extract novel places that would not necessarily be found in a curated list (e.g., Mexican Riviera). Further research could improve accuracy and could incorporate this into models improving the early detection of autochthonous transmission.
O. V. Patterson, None
M. Jones, None