Report
General-Purpose Artificial Intelligence (GPAI) Models and GPAI Models with Systemic Risk: Classification and Requirements for Providers
Aug 8, 2024
ResearchPublished Aug 8, 2024
Photo by Zoya/Adobe Stock
The European Union (EU)'s Artificial Intelligence (AI) Act is a landmark piece of legislation that lays out a detailed and wide-ranging framework for the comprehensive regulation of AI deployment in the European Union covering the development, testing, and use of AI.[1] This is one of several reports intended to serve as succinct snapshots of a variety of interconnected subjects that are central to AI governance discussions in the United States, in the European Union, and globally. This report, which focuses on AI impacts on privacy law, is not intended to provide a comprehensive analysis but rather to spark dialogue among stakeholders on specific facets of AI governance, especially as AI applications proliferate worldwide and complex governance debates persist. Although we refrain from offering definitive recommendations, we explore a set of priority options that the United States could consider in relation to different aspects of AI governance in light of the EU AI Act.
AI promises to usher in an era of rapid technological evolution that could affect virtually every aspect of society in both positive and negative manners. The beneficial features of AI require the collection, processing, and interpretation of large amounts of data—including personal and sensitive data. As a result, questions surrounding data protection and privacy rights have surfaced in the public discourse.
Privacy protection plays a pivotal role in individuals maintaining control over their personal information and in agencies safeguarding individuals' sensitive information and preventing the fraudulent use and unauthorized access of individuals' data. Despite this, the United States lacks a comprehensive federal statutory or regulatory framework governing data rights, privacy, and protection. Currently, the only consumer protections that exist are state-specific privacy laws and federal laws that offer limited protection in specific contexts, such as health information. The fragmented nature of a state-by-state data rights regime can make compliance unduly difficult and can stifle innovation.[2] For this reason, President Joe Biden called on Congress to pass bipartisan legislation "to better protect Americans’ privacy, including from the risks posed by AI."[3]
There have been several attempts at comprehensive federal privacy legislation, including the American Privacy Rights Act (APRA), which aims to protect the collection, transfer and use of Americans’ data in most circumstances.[4] Although some data privacy issues could be addressed in legislation, there would still be gaps in data protection because of AI's unique attributes. In this report, we identify those gaps and highlight possible options to address them.
From a data privacy perspective, one of AI's most concerning aspects is the potential lack of understanding of exactly how an algorithm may use, collect, or alter data or make decisions based on those data.[5] This potential lack of understanding is referred to as algorithmic opacity, and it can result from the inherent complexity of the algorithm, the purposeful concealment of a company using trade secrets law to protect its algorithm, or the use of machine learning to build the algorithm—in which case, even the algorithm's creators may not be able predict how it will perform.[6] Algorithmic opacity can make it difficult or impossible to see how data inputs are being transformed into data or decision outputs, limiting the ability to inspect or regulate the AI in question.[7]
There are other general privacy concerns that take on unique aspects related to AI or that are further exaggerated by the unique characteristics of AI:[8]
In most comprehensive data privacy proposals, the foundation is typically set by providing individuals with fundamental rights over their data and privacy, from which the remaining system unfolds. The EU General Data Protection Regulation (GDPR) and the California Consumer Privacy Act—two notable comprehensive data privacy regimes—both begin with a general guarantee of fundamental rights protection, including rights to privacy, personal data protection, and nondiscrimination.[15] Specifically, the data protection rights include the right to know what data have been collected, the right to know what data have been shared with third parties and with whom, the right to have data deleted, and the right to have incorrect data corrected.
Prior to the proliferation of AI, good data management systems and procedures could make compliance with these rights attainable. However, the AI privacy concerns listed above render full compliance more difficult. Specifically, algorithmic opacity makes it difficult to know how data are being used, so it becomes more difficult to know when data have been shared or whether they have been completely deleted. Data repurposing makes it difficult to know how data have been used and with whom they have been shared. Data spillover makes it difficult to know exactly what data have been collected on a particular individual. These issues, along with the plummeting cost of data storage, exacerbate data persistence, or the maintenance of data beyond their intended use or purpose.
The unique problems associated with AI give rise to several options for resolving or mitigating these issues. Here, we provide summaries of these options and examples of how they have been enacted within other regulatory regimes.
Data minimization refers to the practice of limiting the collection of personal information to that which is directly relevant and necessary to accomplish specific and narrowly identified goals.[16] This stands in contrast to the approach used by many companies today, which is to collect as much information as possible. Under the tenets of data minimization, the use of any data would be legally limited only to the use identified at the time of collection.[17] Data minimization is also a key privacy principle in reducing the risks associated with privacy breaches.
Several privacy frameworks incorporate the concept of data minimization. The proposed APRA includes data minimization standards that prevent collection, processing, retention, or transfer of data “beyond what is necessary, proportionate, or limited to provide or maintain” a product or service.[18] The EU GDPR notes that
"[p]ersonal data should be adequate, relevant and limited to what is necessary for the purposes for which they are processed."[19] The EU AI Act also reaffirms that the principles of data minimization and data protection apply to AI systems throughout their entire life cycles whenever personal data are processed.[20] The EU AI Act also imposes strict rules on collecting and using biometric data. For example, it prohibits AI systems that "create or expand facial recognition databases through the untargeted scraping of facial images from the internet or CCTV [closed-circuit television] footage."[21]
As another example of a way to incorporate data minimization, APRA would establish a duty of loyalty, which prohibits covered entities from collecting, using, or transferring covered data beyond what is reasonably necessary to provide the service requested by the individual, unless the use is one of the explicitly permissible purposes listed in the bill.[22] Among other things, the bill would require covered entities to get a consumer’s affirmative, express consent before transferring their sensitive covered data to a third party, unless a specific exception applies.
Algorithmic impact assessments (AIAs) are intended to require accountability for organizations that deploy automated decisionmaking systems.[23] AIAs counter the problem of algorithmic opacity by surfacing potential harms caused by the use of AI in decisionmaking and call for organizations to take practical steps to mitigate any identifiable harms.[24] An AIA would mandate disclosure of proposed and existing AI-based decision systems, including their purpose, reach, and potential impacts on communities, before such algorithms were deployed.[25] When applied to public organizations, AIAs shed light on the use of the algorithm and help avoid political backlash regarding systems that the public does not trust.[26]
APRA includes a requirement that large data holders conduct AIAs that weigh the benefits of their privacy practices against any adverse consequences.[27] These assessments would describe the entity's steps to mitigate potential harms resulting from its algorithms, among other requirements. The bill requires large data holders to submit these AIAs to the Federal Trade Commission and make them available to Congress on request. Similarly, the EU GDPR mandates data protection impact assessments (PIAs) to highlight the risks of automated systems used to evaluate people based on their personal data.[28] The AIAs and the PIAs are similar, but they differ substantially in their scope: While PIAs focus on rights and freedoms of data subjects affected by the processing of their personal data, AIAs address risks posed by the use of nonpersonal data.[29] The EU AI Act further expands the notion of impact assessment to encompass broader risks to fundamental rights not covered under the GDPR. Specifically, the EU AI Act mandates that bodies governed by public law, private providers of public services (such as education, health care, social services, housing, and administration of justice), and banking and insurance service providers using AI systems must conduct fundamental rights impact assessments before deploying high-risk AI systems.[30]
Whereas the AIA assesses impact, an algorithmic audit is "a structured assessment of an AI system to ensure it aligns with predefined objectives, standards, and legal requirements."[31] In such an audit, the system's design, inputs, outputs, use cases, and overall performance are examined thoroughly to identify any gaps, flaws, or risks.[32] A proper algorithmic audit includes definite and clear audit objectives, such as verifying performance and accuracy, as well as standardized metrics and benchmarks to evaluate a system's performance.[33] In the context of privacy, an audit can confirm that data are being used within the context of the subjects' consent and the tenets of applicable regulations.[34]
During the initial stage of an audit, the system is documented, and processes are designated as low, medium, or high risk depending on such factors as the context in which the system is used and the type of data it relies on.[35] After documentation, the system is assessed on its efficacy, bias, transparency, and privacy protection.[36] Then, the outcomes of the assessment are used to identify actions that can lower any identified risks. Such actions may be technical or nontechnical in nature.[37]
This notion of algorithmic audit is embedded in the conformity assessment foreseen by the EU AI Act.[38] The conformity assessment is a formal process in which a provider of a high-risk AI system has to demonstrate compliance with requirements for such systems, including those concerning data and data governance. Specifically, the EU AI Act requires that training, validation, and testing datasets are subject to data governance and management practices appropriate for the system's intended purpose. In the case of personal data, those practices should concern the original purpose and origin as well as data collection processes.[39] Upon completion of the assessment, the entity is required to draft written EU Declarations of Conformity for each relevant system, and these must be maintained for ten years.[40]
Mandatory AI disclosures are another option to address privacy concerns, such as by requiring that uses of AI should be disclosed to the public by the organization employing the technology.[41] For example, the EU AI Act mandates AI-generated content labeling. Furthermore, the EU AI Act requires disclosure when people are exposed to AI systems that can assign them to groups or infer their emotions or intentions based on biometric data (unless the system is intended for law enforcement purposes).[42] Legislation introduced in the United States called the AI Labeling Act of 2023 would require that companies properly label and disclose when they use an AI-generated product, such as a chatbot.[43] The proposed legislation also calls for generative AI system developers to implement reasonable procedures to prevent downstream use of those systems without proper disclosure.[44]
As noted in the previous section, some of these options have been proposed in the United States. Others have been applied in other countries. A key issue is finding the right balance of regulation and innovation. Industry and wider stakeholder input into regulation may help alleviate concerns that regulations could throttle the development and implementation of the benefits offered by AI. Seemingly, the EU AI Act takes this approach for drafting the codes of practice for general-purpose AI models: The EU AI Office is expected to invite stakeholders—especially developers of models—to participate in drafting the codes, which will operationalize many EU AI Act requirements.[45] The EU AI Act also has explicit provisions for supporting innovation, especially among small and medium-sized enterprises, including start-ups.[46] For example, the law introduces regulatory sandboxes: controlled testing and experimentation environments under strict regulatory oversight. The specific purpose of these sandboxes is to foster AI innovation by providing frameworks to develop, train, validate, and test AI systems in a way that ensures compliance with the EU AI Act, thus alleviating legal uncertainty for providers.[47]
AI technology is an umbrella term used for various types of technology, from generative AI used to power chatbots to neural networks that spot potential fraud on credit cards.[48] Moreover, AI technology is advancing rapidly and will continue to change dramatically over the coming years. For this reason, rather than focusing on the details of the underlying technology, legislators might consider regulating the outcomes of the algorithms. Such regulatory resiliency may be accomplished by applying the rules to the data that go into the algorithms, the purposes for which those data are used, and the outcomes that are generated.[49]
Tifani Sedek is a professor at the University of Michigan Law School. From RAND, Karlyn D. Stanley is a senior policy researcher, Gregory Smith is a policy analyst, Krystyna Marcinek is an associate policy researcher, Paul Cormarie is a policy analyst, and Salil Gunashekar is an associate director at RAND Europe.
This research was sponsored by the RAND Institute for Civil Justice and conducted in the Justice Policy Program within RAND Social and Economic Well-Being and the Science and Emerging Technology Research Group within RAND Europe.
This publication is part of the RAND research report series. Research reports present research findings and objective analysis that address the challenges facing the public and private sectors. All RAND research reports undergo rigorous peer review to ensure high standards for research quality and objectivity.
This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited; linking directly to this product page is encouraged. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial purposes. For information on reprint and reuse permissions, please visit www.rand.org/pubs/permissions.
RAND is a nonprofit institution that helps improve policy and decisionmaking through research and analysis. RAND's publications do not necessarily reflect the opinions of its research clients and sponsors.