- Can machine learning (ML) techniques detect deception in the language used by applicants during security clearance investigations?
Security clearance investigations are onerous for both the applicants and investigators, and such investigations are expensive for the U.S. government. In this report, the authors present results from an exploratory analysis that tests automated tools for detecting when some of these applicants attempt to deceive the government during the interview portion of this process. How interviewees answer interview questions could be a useful signal to detect when they are trying to be deceptive.
- Models that used word counts were the most accurate at predicting who was trying to be deceptive versus those who were truthful.
- The authors found similar differences in accuracy rates for detecting deception when interviews were conducted over video teleconferencing and text-based chat.
- ML transcription is generally correct, but there can be errors, and the ML methods often miss subtle features of informal speech.
- Although models that used word counts produced the highest accuracy rates for all participants, there was evidence that these models were more accurate for men than women.
- The federal government should test ML modeling of interview data that uses word counts to identify attempts at deception.
- The federal government should test alternatives to the in-person security clearance interview method—including video teleconferencing and chat-based modes—for certain cases.
- The federal government should test the use of asynchronous interviews via text-based chat to augment existing interview techniques. The data from an in-person (or in-person virtual) interview and chat could help investigators identify topics of concern that merit further investigation.
- The federal government should use ML tools to augment existing investigation processes by conducting additional analysis on pilot data, but it should not replace existing techniques with these tools until they are sufficiently validated.
- The federal government should validate any ML models that it uses for security clearance investigations to limit the bias in accuracy rates on the basis of the ascribed characteristics (e.g., race, gender, age) of interviewees.
- The federal government should have a human in the loop to continuously calibrate any ML models used to detect deception during the security clearance investigation process.
Table of Contents
Relevant Background Literature
Description of Data
Results from Analysis of Interview Data
Potential Sources of Bias
Limitations, Conclusions, and Recommendations
Modified Cognitive Interviewing
Example Output from Amazon Web Services Transcribe
Proof of Concept: Deep Learning Contradiction Model
This research was sponsored by Performance Accountability Council's Program Management Office and conducted within the Forces and Resources Policy Center of the RAND National Security Research Division (NSRD).
This report is part of the RAND Corporation Research report series. RAND reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND reports undergo rigorous peer review to ensure high standards for research quality and objectivity.
This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.
The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.