1353. HIV Risk Assessment using Longitudinal Electronic Health Records
Session: Poster Abstract Session: HIV Care Continuum
Friday, October 6, 2017
Room: Poster Hall CD

Universal HIV screening programs are costly, labor-intensive, and in practice unable to identify all individuals at risk of HIV infection. Automated risk assessment methods that leverage longitudinal electronic health records (EHRs) could catalyze targeted screening programs in Emergency Departments and across public health jurisdictions. While information on social and behavioral determinants of health are typically collected in unstructured fields, previous analyses have only considered structured EHR data. We sought to characterize whether clinical notes can improve predictive models of HIV diagnosis.


181 individuals who received care at an academic medical center in New York City prior to a confirmatory HIV diagnosis were included in the study cohort. 543 HIV- controls with similar utilization patterns were selected using propensity score matching. Demographics, laboratory tests, and diagnosis codes were extracted from longitudinal records. Clinical notes were preprocessed using both topic modeling and an n-grams approach. We fit 3 predictive models using Random Forests including a baseline model which included only structured EHR data, the baseline model plus topic modeling, and baseline model plus clinical keywords.


Predictive models demonstrated a range of performance with F-measures of 0.59 for the baseline model, 0.63 for the baseline plus topic modeling and 0.74 for the baseline plus clinical keyword model. The baseline plus topic model displayed low precision but high recall while the baseline plus clinical keyword model displayed high precision but low recall. Clinical keywords including ‘msm’, ‘unprotected’, ‘hiv’, and ‘methamphetamine’ were indicative of elevated risk.


Clinical notes improved the performance of predictive models for automated HIV risk assessment. Future studies should explore novel techniques for extracting social and behavioral determinants from unstructured text in longitudinal EHRs.

Daniel Feller, BA1, Jason Zucker, MD2, Michael Yin, MD3, Peter Gordon, MD1 and Noemie Elhadad, PhD1, (1)Columbia University, New York, NY, (2)Division of Infectious Diseases Columbia University Medical Center, New York, NY, (3)Columbia University Medical Center, New York, NY


D. Feller, None

J. Zucker, None

M. Yin, None

P. Gordon, None

N. Elhadad, None

See more of: HIV Care Continuum
See more of: Poster Abstract Session

Findings in the abstracts are embargoed until 12:01 a.m. PDT, Wednesday Oct. 4th with the exception of research findings presented at the IDWeek press conferences.