Cover: Safe Use of Machine Learning for Air Force Human Resource Management

Safe Use of Machine Learning for Air Force Human Resource Management

Volume 4, Evaluation Framework and Use Cases

Published Feb 29, 2024

by Joshua Snoke, Matthew Walsh, Joshua Williams, David Schulker

Download

Download eBook for Free

Full Document

FormatFile SizeNotes
PDF file 4.1 MB

Use Adobe Acrobat Reader version 10 or higher for the best experience.

Research Summary

FormatFile SizeNotes
PDF file 0.1 MB

Use Adobe Acrobat Reader version 10 or higher for the best experience.

Purchase

Purchase Print Copy

 Format Price
Add to Cart Paperback78 pages $32.00

Research Questions

  1. How can HR managers choose the AI system design that best satisfies objectives while also meeting safety criteria?
  2. How should safety be defined for these systems? Can all safety criteria be met simultaneously?
  3. What kind of decision support should a system provide to, for example, a selection board? What are the strengths and weaknesses of different system designs?
  4. Are there strategies for evaluating the performance of ML models, human raters, and joint human-machine teams?

Private-sector companies are applying artificial intelligence (AI) and machine learning (ML) to diverse business functions, including human resource management (HRM), to great effect. The Department of the Air Force (DAF) is poised to adopt new analytic methods, including ML, to transform key aspects of HRM. Yet ML systems, as compared with other information technologies, present distinct safety concerns when applied to HRM because they do not use well-understood, preprogrammed rules set by human resources experts to achieve objectives. The DAF cannot confidently move forward with valuable AI and ML systems in the HRM domain without an analytic framework to evaluate and augment the safety of these systems.

To understand the attributes needed to apply ML to HRM in a responsible and ethical manner, the authors reviewed relevant bodies of literature, policy, and DAF documents. From the review, they developed an analytic framework centered on measuring and augmenting three attributes of ML systems: accuracy, fairness, and explainability. In this report, the authors define safety by these three qualities. They then applied a case study approach; they developed ML systems and exercised the framework using the examples of officer promotion and developmental education boards.

Key Findings

  • For any given HRM process, AI systems can provide different types of decision support via many possible implementation designs. The choice of implementation design affects both the effectiveness of the system and its level of safety. The framework that RAND researchers generated helps HR managers choose the AI system design that best satisfies objectives while also meeting safety criteria.
  • In many cases, but not always, the three safety principles of fairness, accuracy, and explainability might be in competition with one another. For example, limiting how model outputs are used may increases fairness, and limiting model complexity may increase explainability. However, placing such limits on how a model functions could reduce accuracy.
  • With the case study of selection board processes, there are many different types of decision support that AI systems can provide. The possible options present different opportunities to reap business value, but they also have different risks.
  • Multiple strategies are available to evaluate the performance of ML models, human raters, and joint human-machine teams.

Recommendations

  • Before implementing an ML system, the DAF should specify the HRM objectives motivating the application.
  • The DAF should define acceptable limits for accuracy, fairness, and explainability and clarify the importance of each.
  • The DAF should follow an implementation strategy that involves applying ML to limited cases before gradually expanding the scope and consequence of applications.
  • The DAF should use an iterative framework to select, design, and evaluate ML systems.
  • The DAF should invest in means of generating automated summaries of narrative text contained in performance evaluations.
  • The DAF should adopt a layered test and evaluation strategy.

Research conducted by

This research was commissioned by the Director of Plans and Integration, Deputy Chief of Staff for Manpower and Personnel, Headquarters U.S. Air Force (AF/A1X) and conducted within the Workforce, Development, and Health Program of RAND Project AIR FORCE.

This report is part of the RAND research report series. RAND reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND reports undergo rigorous peer review to ensure high standards for research quality and objectivity.

This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.