A Fairness Evaluation of Automated Methods for Scoring Text Evidence Usage in Writing

Published in: Lecture Notes in Computer Science, Volume 12748, pages 255–267 (2021). doi: 10.1007/978-3-030-78292-4_21

Posted on RAND.org on January 12, 2022

by Diane Litman, Haoran Zhang, Richard Correnti, Lindsay Clare Matsumura, Elaine Lin Wang

Read More

Access further information on this document at Lecture Notes in Computer Science

This article was published outside of RAND. The full text of the article can be found at the link above.

Automated Essay Scoring (AES) can reliably grade essays at scale and reduce human effort in both classroom and commercial settings. There are currently three dominant supervised learning paradigms for building AES models: feature-based, neural, and hybrid. While feature-based models are more explainable, neural network models often outperform feature-based models in terms of prediction accuracy. To create models that are accurate and explainable, hybrid approaches combining neural network and feature-based models are of increasing interest. We compare these three types of AES models with respect to a different evaluation dimension, namely algorithmic fairness. We apply three definitions of AES fairness to an essay corpus scored by different types of AES systems with respect to upper elementary students' use of text evidence. Our results indicate that different AES models exhibit different types of biases, spanning students' gender, race, and socioeconomic status. We conclude with a step towards mitigating AES bias once detected.

Research conducted by

This report is part of the RAND Corporation External publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

Our mission to help improve policy and decisionmaking through research and analysis is enabled through our core values of quality and objectivity and our unwavering commitment to the highest level of integrity and ethical behavior. To help ensure our research and analysis are rigorous, objective, and nonpartisan, we subject our research publications to a robust and exacting quality-assurance process; avoid both the appearance and reality of financial and other conflicts of interest through staff training, project screening, and a policy of mandatory disclosure; and pursue transparency in our research engagements through our commitment to the open publication of our research findings and recommendations, disclosure of the source of funding of published research, and policies to ensure intellectual independence. For more information, visit www.rand.org/about/principles.

The RAND Corporation is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.