1360. Identifying Septic Shock Hospitalizations Using Supervised Machine Learning Classification Algorithms with Electronic Clinical Data
Session: Poster Abstract Session: HAI: Epidemiologic Methods
Friday, October 28, 2016
Room: Poster Hall
Background: Reliably identifying sepsis cases is critical to understanding its epidemiology and the impact of prevention and treatment initiatives. However, sepsis is under-recognized and poorly documented. We evaluated the ability of multiple supervised machine learning classification algorithms to retrospectively identify septic shock hospitalizations using clinical data easily abstracted from electronic health records.

Methods: We trained a series of classification algorithms using two cohorts previously reviewed by clinicians for the presence of septic shock. The first cohort (Cohort 1) included 700 randomly selected hospitalizations at Massachusetts General Hospital and Brigham and Women’s Hospital with at least 1 blood culture, and the second cohort (Cohort 2) included 267 patients at Georgetown University Hospital with at least 1 vasopressor. Variables included daily antibiotic and vasopressor prescriptions, blood culture orders, hospital stay characteristics, age, and ICD-9 diagnosis and procedure codes. Using chart review classifications for septic shock as the gold standard, we compared sensitivity and positive predictive values for the cross-validated classification algorithms against ICD-9 codes for septic shock (785.52).

Results: There were 54 clinician-confirmed septic shock cases in Cohort 1 and 93 cases in Cohort 2. Random forest algorithms were the best performing classifiers in both cohorts. In Cohort 1, when the random forest algorithm was tuned to a PPV comparable with ICD-9 coding (75.9% vs 76.2%, p=0.98), it demonstrated superior sensitivity (75.9% vs 29.6%, p<0.01). Similarly in Cohort 2, the random forest algorithm tuned to a slightly lower PPV than coding (82.3% vs 93.2%, p=0.05) also demonstrated superior sensitivity (84.9% vs 59.1%, p<0.01).

Conclusion: Training supervised machine learning classification algorithms on clinical data including vasopressors, blood cultures, antibiotics, and ICD-9 codes can accurately identify septic shock hospitalizations and are more sensitive than diagnosis codes alone. Other conditions with similarly confounded prevalence estimates due to identification difficulties may provide apt opportunities to apply classification techniques based on clinical data.

John T. Menchaca, BA1, Sameer Kadri, MD2, Jeffrey Strich, MD3, Megan Morales, MD4, Samuel Hohmann, PhD5,6, Robert L. Danner, MD2, Michael Klompas, MD, MPH, FRCPC, FIDSA1 and Chanu Rhee, MD, MPH1, (1)Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, (2)Critical Care Medicine, National Institutes of Health, Bethesda, MD, (3)Department of Internal Medicine, Georgetown University Hospital, Washington, DC, (4)Division of Infectious Diseases, Georgetown University Hospital, Washington, DC, (5)Center for Advanced Analytics, Vizient, Chicago, IL, (6)Department of Health Systems Management, Rush University, Chicago, IL


J. T. Menchaca, None

S. Kadri, None

J. Strich, None

M. Morales, None

S. Hohmann, None

R. L. Danner, None

M. Klompas, None

C. Rhee, None

Findings in the abstracts are embargoed until 12:01 a.m. CDT, Wednesday Oct. 26th with the exception of research findings presented at the IDWeek press conferences.