A small proportion of patients account for a high proportion of healthcare use. Accurate preemptive identification may facilitate tailored intervention. We sought to determine whether machine learning techniques using text from a family practice electronic medical record can be used to predict future high emergency department use and total costs by patients who are not yet high emergency department users or high cost to the healthcare system.
Methods
Text from fields of the cumulative patient profile within an electronic medical record of 43,111 patients was indexed. Separate training and validation cohorts were created. After processing, 11,905 words were used to fit a logistic regression model. The primary outcomes of interest in the 12 months after prediction were 3 or more emergency department visits and being in the top 5% in healthcare expenditures. Outcomes were assessed through linkage to administrative databases housed at the Institute for Clinical Evaluative Sciences.
Results
In the model to predict frequent emergency department visits, after excluding patients who were high emergency department users in the previous year, the area under the receiver operating characteristic curve was 0.71. By using the same methodology, the model to predict the top 5% in total system costs had an area under the receiver operating characteristic curve of 0.76.
Conclusions
Machine learning techniques can be applied to analyze free text contained in electronic medical records. This dataset is more predictive of patients who will generate future high costs than future emergency department visits. It remains to be seen whether these predictions can be used to reduce costs by early interventions in this cohort of patients.
To read this article in its entirety please visit our website.
-David W. Frost, MD, Shankar Vembu, PhD, Jiayi Wang, BSc, Karen Tu, MD, Quaid Morris, PhD, Howard B. Abrams, MD
This article originally appeared in the May 2017 issue of The American Journal of Medicine.