Grossman/Cormack: the eDiscovery Medicine Show

Black and white photo of Doc Randall's Ole Medicine Show wagon, with man in a top hat and a dog.

In an upcoming publication to appear in Ohio State Technology Law Journal 18:1 (2021), Dr. Gordon Cormack and Dr. Maura Grossman explore the landscape of technology assisted review and analytics in the context of legal document review. The focus is on marketing claims of excellence, the use of statistics and samples, and mixing and matching keyword search, manual review and computer assisted review.

The article, titled, the eDiscovery Medicine Show, asserts that

The practice of bloodletting gradually fell into disfavor as a growing body of scientific evidence showed its ineffectiveness and demonstrated the effectiveness of various pharmaceuticals for the prevention and treatment of certain diseases. At the same time, the patent medicine industry promoted ineffective remedies at medicine shows featuring entertainment, testimonials, and pseudo-scientific claims with all the trappings–but none of the methodology–of science. Today, many producing parties and eDiscovery vendors similarly promote obsolete technology as well as unvetted tools labeled “artificial intelligence” or “technology-assisted review,” along with unsound validation protocols. This situation will end only when eDiscovery technologies and tools are subject to testing using the methods of information retrieval.

Dr. Cormack and Dr. Grossman have been pioneers in the employment and measurement of information retrieval in the eDiscovery context, publishing an authoritative glossary, participating in the TREC challenge and writing many amply cited scientific and legal articles in support of using technology to more efficiently review massive and ever increasing volumes of data.

Decrying the courts or other authorities lack of discrimination “between practice and sound practice—let alone best practice—or between science and pseudo-science,” Dr. Cormack and Dr. Grossman take on claims of recall, precision, sample size, efficacy, Sedona Principle 6 (producing parties best situated) and more as they advocate for sound scientific information retrieval methodology.

One common practice, combining keyword culling with TAR or CAL and manual review, is described by diminishing, or decreasing returns, setting out the math as:

When multiple information retrieval methods are used in sequence, overall recall is the product of the recall for each constituent method. If keyword culling were to achieve 70% recall, the TAR tool were to achieve 80% recall, and manual review were to achieve 75% recall, the recall of a review effort combining them in sequence would be 70%×80%×75%=42%. It is possible to quibble with the numbers presented here, but not with the fact that each constituent part is imperfect, and that overall or end-to end recall is considerably less than the weakest link in the chain.

Doctors Grossman and Cormack advocate strongly for reconsideration of current methodologies, practices and marketing claims by those using them to produce, those receiving production and the judiciary. “Statistically significant” samples, recall and precision are all scrutinized and the way in which they are employed in practice, found wanting. The authors call out for judicial recognition of which methodologies are sound, advocating that peer reviewed publication of results to use in a trial setting, rather than using court time to prove a methodology or tool in general is a sound way forward.

Assessing the tools, methodology and the context in which they are used need not be as extensive as a Daubert hearing:

To be clear, we do not advocate a full-blown evidentiary hearing every time a TAR or validation process is challenged. Rather, we suggest that the Daubert factors offer useful guidance in determining what a reasonable process is pursuant to Fed. R. Civ. P 26(g)(1), and what is proper evidence to this effect. 

Dr. Maura Grossman and Dr. Gordon Cormack

Maura R. Grossman, J.D., Ph.D., is a Research Professor, and Gordon V. Cormack, Ph.D., is a Professor, in the David R. Cheriton School of Computer Science at the University of Waterloo, in Ontario, Canada. Professor Grossman is also Principal of Maura Grossman Law, an eDiscovery law and consulting firm in Buffalo, New York, U.S.A. Professor Grossman is also an EDRM Global Advisory Council leader, and the co-Project Trustee of EDRM’s Analytic and Machine Learning subteam on validation. This article represents the scholarship of Doctors Cormack and Grossman and does not (yet) represent an EDRM Project Team consensus.

Read the entire article here.

To get involved in any EDRM project, please email us at info@edrm.net to be connected.

Author

  • Mary Mack

    Mary Mack is the CEO and Chief Legal Technologist for EDRM. Mary was the co-editor of the Thomson Reuters West Treatise, eDiscovery for Corporate Counsel for 10 years and the co-author of A Process of Illumination: the Practical Guide to Electronic Discovery. She holds the CISSP among her certifications.