The Future of Automated Document Review

A Posse List blog post earlier this year, Computer-aided document review has arrived (12 January 2010), comments on a thought-provoking e-discovery study described in an academic journal article.  The premise of the underlying study, which compares computer classification of documents with manual review, is that automated systems are capable of categorizing documents at least as well as teams of human reviewers in an 
e-discovery setting. While it raises interesting points, I am not convinced that the evidence supports the authors’ conclusion that computer-aided review has arrived quite yet. It is still a stretch to suggest that human document reviewers face an imminent risk of being supplanted by artificial intelligence-based processes.

The underlying study by a trio of recognized experts in cognitive science, information management, and e-discovery, Herb Roitblat, Anne Kershaw, and Patrick Oot, is described in detail in their journal article, Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, published in the January 2010 issue of the Journal of the American Society for Information Science and Technology (link is to PDF at the Posse List).

There is no question that software can detect ever-more sophisticated language patterns in documents and classify them by theme. Whether this translates readily into a persuasive argument that legal document review will be more fully automated in the near future, or even that such a development is inevitable, is a different question. And I don’t think the study provides conclusive support for the proposition that computer classification could reasonably substitute for human review.

Before turning to my reservations about the study, I want to raise a potentially bigger issue. Irrespective of advances in technology and compelling statistics, we need to ask whether courts and litigators will accept a discovery process that is ever more reliant on technology in pursuit of efficiency. I think the likely answer is, ‘only to a point.’

Right now, e-discovery professionals use many tools to filter large collections to focus lawyers’ attention on a subset likely to be most relevant. Are we missing another step? Is there a way to emulate or even exceed human review to cull the relevant from the extraneous? If so, how do we determine the “reasonableness” and “completeness” of the process? That is the multi-million-dollar question.

Courts now often accept concept clustering and searching in aid of review, but not all the time nor in all settings. For many lawyers, the key argument for using these tools is the ability to reassure courts that these tools focus or prioritize review on the most promising “clusters” of data but do not wholly eliminate review of apparently “unimportant” clusters. I disagree with my colleagues who cite clustering technology as a harbinger of full automation of review. In my experience, it is just a tool that helps humans to make decisions that are more consistent and coordinated.

I think most practicing lawyers would agree with the study’s central premise  that lawyers should use technology to reduce cost. That said, often senior lawyers who ultimately decide on how to conduct a document review neither understand the latest technology nor are they - or their clients - likely to accept the challenge of having to defend an entirely new approach to discovery. And that is why I think the answer to the question, ‘will lawyers and courts accept an expanding role of technology in order to reduce discovery costs’ is, ‘only to a point.’ If that is indeed the case, how does the study advance us toward that critical point?

It raises - and partially answers - the important question whether we are approaching a breakthrough in terms of the capability of automated review tools to render ‘consistent’ and ‘correct’ decisions, as measured against an existing standard, while classifying documents in a legal discovery context. The study pitted two teams of contract attorneys against two commercial electronic discovery applications to review a limited set - 5,000 documents - culled from a collection of 1.6 million documents. The larger collection had been reviewed two years earlier by attorney teams in connection with a Second Request relating to Verizon’s acquisition of MCI. The authors’ hypothesis was that “the rate of agreement between two independent [teams of] reviewers of the same documents will be equal to or less than the agreement between a computer-aided system and the original review.”

The study set out to test whether an automated review tool would show similar levels of agreement with classifications made by the original reviewers as did the two contract teams. The two re-review teams agreed with the original review on about 75% of document classification decisions; the commercial automated applications fared slightly better. By this measure, my initial read would be that the two approaches were proven equally unreliable and inconsistent. But I concede that the study demonstrated that automated applications can match or exceed the performance of two human teams working under certain and arguably limited conditions. The question is, whether the study demonstrated more than that?

The study is properly understood as a demonstration of capability, rather than as a controlled study, which by definition, operates under controlled conditions with a constrained set of variables.  Several factors could have affected the study outcome:

  • The sample set was extremely small. While 5,000 documents out of 1.6 million may be statistically significant for certain purposes, it is not a big sample for a document review. Empirically, we know that a review team gains proficiency and confidence in stages during a review project, eventually reaching a ‘steady state,’ where their decisions are likely more consistently correct on first pass. The two contract teams who were pitted against computers in a test of which alternative more closely mirrored the decisions made by the original review team did not have the benefit of a warm-up, which the original review team likely did have. In addition, we do not know whether the original review team was more senior or experienced than members of the review teams in the study.
  • The study write-up does not make clear whether instructions given to the original review team were provided in identical form to the two review teams in the study. In essence, what we ask review teams to accomplish is to approximate, as closely and consistently as possible, decisions that would have been made by the supervising attorney (or client’s lead counsel) had that attorney personally reviewed each and every document. (In TREC terminology, this is the topic authority.) A key to running a successful review and delivering clean and correct work product is first ensuring that counsel provides clear and complete instructions to the team at the outset and resolves questions raised in the course of review.
  • We also do not know what standards of quality control were applied for any of the review processes. A well-designed and executed regime of quality control enhances the consistency of review results. Much more on this theme, below.

Although I would not hang my proverbial hat on the results of the study, it does open the door for follow-on work. We may yet see a convincing demonstration that automated systems render more consistent decisions than human review teams. The issue will remain whether any amount of proof would persuade the bar to accept use of automated review as a substitute for attorney review in the context of discovery.

A more immediate question is how can we make good use of what appear to be very capable software engines? Can we integrate automated applications to assist human review and make it more consistent? Indeed, making review more consistent has been the aim of quality control and validation processes all along.

A well-designed quality control regime makes use of intelligently-designed searches to identify likely errors and inconsistencies in a set of reviewed documents, and directs “suspect” documents to a team of more experienced reviewers for a focused re-review. This continuously re-focuses QC review toward outliers and other documents that may warrant a second look.

My reservations notwithstanding, I think there is a good amount of value in the study, which sets a stage for further comparisons of automated and human review. The best purpose, in my view, however, might not be to aim toward replacing human reviewers on a first pass review.

One possibly feasible and very attractive target is to develop a process in which we can fully leverage automation to reduce the effort (number of attorneys) required for first pass review. With the right process, fewer, more senior reviewers may need to review only a set of representative documents in a collection, while technology acts as a ‘force multiplier’ to speed review and ensure consistent coding.

Indeed, there are applications in development and commercially available which have demonstrated an ability to “crawl” a large collection of documents to identify documents with substantially similar content. If we can properly leverage that technology, in tandem with a small elite team of attorneys, we have a hybrid workflow that benefits from both human intellect and machine power. I would like to see a study proving the capabilities of such a hybrid model.

A potential hybrid model would have senior attorneys review representative sets of documents and the tool analyze features of the reviewed documents to identify and auto-tag “like” documents in the larger collection. As the review proceeded, the tool would ‘percolate’ to the review team’s attention subsets of documents from the collection dissimilar from those already reviewed. Based on the reviewers’ decisions as to these documents, the tool continues to apply tags to more of the collection.

The attraction of this approach is two-fold: human attorneys are still making initial determinations but the application magnifies the effect of their determinations by propagating decisions to similar documents throughout the larger collection. It has been suggested that, in the proper context, this approach would permit a single attorney to “review” a vast collection of documents in several hours. A test of that claim is warranted and, if the premise were proved, it would be impressive and could directly influence the increased use of automation in review, even if, for all the reasons stated above, wide adoption of such processes would take a while.

An alternate model - one more readily proven in a single study - and one that I believe holds promise as an immediate aid in improving the quality and consistency of document review as a process, would be to fully integrate available automated searching and categorization applications into existing quality control processes. So I would very much like to see a well-designed test of an automated-classification tool in that context, one that demonstrates how the model outperforms quality control processes using a series of term searches to identify sets of documents for targeted QC.

LEAVE A REPLY