Predictive Coding Could Reduce E-Discovery Costs, but More Guidance Needed on Data Preservation
April 11, 2012
Companies could lower the high cost of large-scale electronic discovery in lawsuits by using a computer application known as predictive coding to reduce the number of documents requiring human review, according to a new study from the RAND Corporation.
The study also calls for rule changes to address concerns about the scope and process of preserving information in anticipation of future litigation.
Pretrial discovery procedures are designed to help narrow the issues being litigated, eliminate surprise at trial and achieve substantial justice. But in recent years, claims have been made that the societal shift from paper documents to electronically stored information has led to sharp increases in discovery costs compared to the overall costs of litigation. Some claim that these escalating costs are preventing people from litigating legitimate disputes.
The study includes 57 case studies from eight large corporations, reviews the literature on electronic discovery, estimates the costs of complying with discovery requests and examines the challenges of preserving electronic information.
Coauthors Nicholas M. Pace and Laura Zakaras also interviewed key legal personnel to find out how each company responded to new requests for e-discovery, the steps taken to comply with such requests, the nature and size of each company's information technology infrastructure, and its document retention, disaster-recovery and archiving practices.
The costs associated with e-discovery can be grouped into three main categories: collection (locating potential sources of information following a demand to produce electronic documents and data); processing (reducing the volume of collected electronic data and converting it to forms more suitable for review); and review (evaluating the information to identify relevant, responsive and nonprivileged documents). About 8 percent of the costs are incurred during the collection phase, 19 percent during the processing phase and 73 percent during the review phase.
"Typically, in the review process, you're talking about someone, usually an outside attorney, having to sift through documents to find the ones that are relevant and responsive to the case, and eliminating the ones that are privileged," said Pace, a senior social scientist at RAND, a nonprofit research organization. "If it's just a few boxes of documents, the costs of conducting such a review are likely to be modest. When the volume increases to tens of thousands or even millions of e-mails and other electronic documents, the labor costs associated with an eyes-on examination can be enormous."
While some litigants have tried to reduce costs by hiring lower-cost temporary attorneys or even English-speaking lawyers in countries such as India and the Philippines, significant reductions in future costs in this area are unlikely. In addition, techniques to group the documents in such a way as to make the review more efficient are unlikely to yield the dramatic savings necessary to address stakeholder concerns.
On the other hand, predictive coding—a type of computer-categorized review application that classifies documents according to how well they match the concepts and terms in sample documents—may provide substantial savings. Human reviewers are still necessary, but only to review a much smaller subset of documents. One study estimated that the number of hours attorneys spend reviewing materials could be cut by about 80 percent.
Pace said none of the companies in the RAND study was using predictive coding for review purposes in the cases examined. Litigants' concerns about predictive coding may include whether the approach will be able to identify all potentially responsive documents, while avoiding any overproduction, and whether it will be able to identify privileged or confidential information.
The biggest obstacle, however, is the dearth of judicial guidance on the issue. There isn't a large body of judicial decisions squarely approving or disapproving of the use of predictive coding—and few law firms are going to want to become early adopters, Pace said.
Another challenge for companies is determining how much digital information should be preserved for the purposes of future litigation and how best to prevent inadvertent destruction or alteration of potentially discoverable data. None of the companies surveyed were able to calculate how much it costs to track and preserve data, although those practices contribute to the overall cost of litigation.
Some judicial decisions have addressed preservation scope and process, however, the decisions serve as precedent only in specific jurisdictions and are sometimes in conflict. As a result, attorneys reported they had no clear understanding of whether their decisions about what and how to preserve were legally defensible and not at risk for serious sanctions. Litigants reported they secured more information than actually necessary in order to minimize the chances that their decisions could be called into question later.
The study recommends that companies adopt computer categorization to reduce the costs of review in large-scale e-discovery efforts and improve tracking of production and preservation costs. Researchers also suggest policymakers and the courts should bring certainty to legal authority concerning preservation.
The study, "Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery," can be found at www.rand.org.
Research for the study was conducted by the RAND Institute for Civil Justice, a research institute within RAND Law, Business and Regulation. The Institute for Civil Justice is dedicated to improving the civil justice system by supplying policymakers and the public with rigorous and independent research. Research is supported by pooled grants from a range of sources, including corporations, trade and professional associations.