Cover: A Fairness Evaluation of Automated Methods for Scoring Text Evidence Usage in Writing

A Fairness Evaluation of Automated Methods for Scoring Text Evidence Usage in Writing

Published in: Lecture Notes in Computer Science, Volume 12748, pages 255–267 (2021). doi: 10.1007/978-3-030-78292-4_21

Posted on Jan 12, 2022

by Diane Litman, Haoran Zhang, Richard Correnti, Lindsay Clare Matsumura, Elaine Lin Wang

Automated Essay Scoring (AES) can reliably grade essays at scale and reduce human effort in both classroom and commercial settings. There are currently three dominant supervised learning paradigms for building AES models: feature-based, neural, and hybrid. While feature-based models are more explainable, neural network models often outperform feature-based models in terms of prediction accuracy. To create models that are accurate and explainable, hybrid approaches combining neural network and feature-based models are of increasing interest. We compare these three types of AES models with respect to a different evaluation dimension, namely algorithmic fairness. We apply three definitions of AES fairness to an essay corpus scored by different types of AES systems with respect to upper elementary students' use of text evidence. Our results indicate that different AES models exhibit different types of biases, spanning students' gender, race, and socioeconomic status. We conclude with a step towards mitigating AES bias once detected.

Research conducted by

This report is part of the RAND external publication series. Many RAND studies are published in peer-reviewed scholarly journals, as chapters in commercial books, or as documents published by other organizations.

RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.