Download eBook for Free

FormatFile SizeNotes
PDF file 1.4 MB

Use Adobe Acrobat Reader version 10 or higher for the best experience.

Research Questions

  1. What are the advantages of using NLP to analyze patient narratives?
  2. What are the potential pitfalls?
  3. Which elements of ML-based NLP are most likely to affect the performance of a system for analyzing patient narratives?

Patient narratives about experiences with health care contain a wealth of information about what is important to patients. These narratives are valuable for both identifying strengths and weaknesses in health care and developing strategies for improvement. However, rigorous qualitative analysis of the extensive data contained in these narratives is a resource-intensive process, and one that can exceed the capabilities of human analysts. One potential solution to these challenges is natural language processing (NLP), which uses computer algorithms to extract structured meaning from unstructured natural language. Because NLP is a relatively new undertaking in the field of health care, the authors set out to demonstrate its feasibility for organizing and classifying these data in a way that can generate actionable information.

In doing so, the authors focused on two steps that must be performed by a machine learning (ML) system designed to classify narratives into such codes as those typically applied by human coders (e.g., positive or negative statements regarding care coordination). These steps are (1) numerically representing the text data (in this case, entire narratives as they are provided by patients) and (2) classifying the data by codes based on that representation. The authors also compared four related approaches to deploying ML algorithms, identified potential pitfalls in the processing of data, and showed how NLP can be used to supplement and support human coding.

Key Findings

  • The success of the fairly simple models described in this pilot study supports the promise of these approaches for analyzing patient narratives at larger scale.
  • There is labor-saving potential in leveraging the strengths of both machine and human coders, potentially in creative ways.
  • Coding performance was significant even with relatively off-the-shelf computing equipment and routines and would likely improve with even modest computing investments.
  • Perhaps the most obvious opportunity for additional investment is increasing the size of the data set on which to train the models, which the authors expect would improve performance.
  • Efficiency may be gained by contracting model building to specialized companies.
  • Broad stakeholder discussions could help coordinate use of NLP for patient narratives.

Table of Contents

  • Chapter One

    Introduction

  • Chapter Two

    Background on Machine Learning Approaches to Natural Language Processing

  • Chapter Three

    Data Source and Methods

  • Chapter Four

    Performance of Demonstration Models

  • Chapter Five

    Discussion

  • Appendix

    Additional Results

Research conducted by

The research described in this report was prepared for the Agency for Healthcare Research and Quality (AHRQ) and conducted by RAND Health Care.

This report is part of the RAND Corporation research report series. RAND reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND reports undergo rigorous peer review to ensure high standards for research quality and objectivity.

Permission is given to duplicate this electronic document for personal use only, as long as it is unaltered and complete. Copies may not be duplicated for commercial purposes. Unauthorized posting of RAND PDFs to a non-RAND Web site is prohibited. RAND PDFs are protected under copyright law. For information on reprint and linking permissions, please visit the RAND Permissions page.

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.