This reference document presents a collection of lessons learned by practitioners from RAND Corporation projects that employed natural language processing (NLP) tools and methods. NLP is an umbrella term for the range of tools and methods that enable computers to analyze human language. The descriptions of lessons learned are organized around four steps: data collection, data processing (i.e., NLP-specific text processing in preparation for modeling), modeling, and application development and deployment.
These NLP practitioners spend or spent a majority of their time at RAND working on projects related to national defense, national intelligence, international security, or homeland security; thus, the lessons learned are drawn largely from projects in these areas. Although few of the lessons are applicable exclusively to the U.S. Department of Defense and its NLP tasks, many may prove particularly salient for the department, because its terminology is very domain-specific and full of jargon, much of its data are classified or sensitive, its computing environment is more restricted, and its information systems are generally not designed to support large-scale analysis.