3. Uses of AI in eDiscovery

a. Ranking, Classification, and Review of ESI

In litigation, the volume and diversity of text-based data and the cost of reviewing that data manually has quickly accelerated the use of cutting-edge techniques.

One common use of AI is to segregate potentially relevant information that needs to be reviewed for litigation or investigations from likely irrelevant information.  Another common use of AI by parties facing document requests is to prioritize and/or confirm relevancy decisions by human reviewers during the quality control (“QC”) process.

AI technology is also used by receiving parties to prioritize incoming data to find the most important evidence to build their case. Products that offer nuanced entity extraction, such as the identification of people, places, and organizations, also bring AI to the ESI search realm.  All these techniques coexist with more traditional keyword and metadata search, along with visualization and other analytic tools, to provide comprehensive data analysis capabilities.

The above applications most commonly assume that the input data is textual.  Of course, there are also other data types.  Vendors are increasingly finding better ways to handle video, audio, images, and structured data (such as databases or GPS data).  Such solutions often incorporate AI technologies.

The quality of the ranking or classification resulting from a supervised machine-learning system typically depends more on the consistency and accuracy of the human input than on the precise machine learning method used to produce the ranking or classification.

b. Document Review and Quality Control

There are several different types of AI-driven technologies that can assist with document culling and review.  First, some information can be culled from review with the assistance of unsupervised machine learning tools, such as clustering, email threading, and other classification tools, that can group information in ways that allow certain categories of documents or data to be eliminated from further consideration.

After initial culling is completed, TAR can be used to prioritize documents for human review and further eliminate irrelevant information from manual review. There are a variety of TAR tools available, some of which are calibrated based on training sets of documents and some of which continue to be calibrated throughout the course of human review.

AI can also augment the capabilities of document reviewers or provide quality control of their work.  Software may auto-suggest tagging choices, highlight key text in a document, or provide context for references to people or key terms. While some of these capabilities can be built without employing AI, more advanced techniques tend to involve AI.  Quality control is perhaps the simplest and least controversial place to apply the classification algorithms being adopted for prioritization or identification of relevant documents.  For example, looking for high-scoring documents that have been coded as non-relevant by a human reviewer and low scoring documents coded as relevant by a human. Even a review conducted entirely by humans can benefit from using AI to look for inconsistencies. In fact, algorithms tend to see features or patterns that are often distinct from those identified by people, lending greater confidence to situations where the algorithms agree with the humans. Cases where the algorithms and humans disagree can be a good place to focus quality control efforts.

c. Privilege Determinations

Privilege review is one of the greatest pain points in eDiscovery where AI may also helpful.  There has been a gradual adoption of AI by some lawyers to assist with identification of privileged documents and the verification of privilege coding, as the tools have become more sophisticated and the magnitude of privilege review has grown.  Privilege-review technology has not evolved to the point where it should be the only source used to categorize privileged documents or to make final decisions on privilege.  This is due in part to the complexity of privilege, for example distinguishing legal from business advice, and the fact that privilege calls can differ from document to document based on subtle changes in language or additions to the recipient list.  Two emails from the same email thread could have a different privilege status simply through the addition of a single person that waives the privilege. Not all classification tools use metadata and factor sender, recipients, and date into the equation when classifying documents.  A human reviewer with knowledge of the elements that comprise privilege (or its waiver) may still have to check machine-made privilege calls.  In addition, because there are documents that fall into “gray” areas where reasonable reviewers can reach opposite conclusions, it can be difficult to train AI tools to make such distinctions.  Therefore, most AI tools using privilege-review technology must work hand in hand with human reviewers.  However, the technology can speed up and improve the privilege identification and review process and can provide a quality control element.

AI tools can also be used to identify inconsistent privilege determinations across similar or identical documents, to refine privilege search terms, and to identify unknown parties involved in privileged communications.

d. Investigations

Investigations may be conducted for various purposes.  Sometimes they take the form of Early Case Assessment(“ECA”) to quickly get a handle on key facts, projected costs, and likely outcomes.  Sometimes they are part of an internal investigation into employee conduct or compliance issues that may lead to later employment-related or legal action. Sometimes they involve responding to government investigations that could include second requests, whistleblower complaints, regulatory inquiries or subpoenas, or general government oversight.

By using techniques such as data visualization, interactive dashboards, communication network analysis, clustering, email threading, concept search, and TAR, just to name a few, legal professionals can quickly analyze the document collection, and gain an understanding of the key underlying facts.  Below we discuss some of the specific ways that AI can be used with regard to Early Case Assessment, internal investigations, and government investigations.

  • Early Case Assessment
    AI is making significant inroads on the ECA process. Identifying the potential cost, risks, and related issues earlier in a case has always been a goal for legal professionals who are responsible for deciding on case strategies.  Over the years, this process has become more complicated because data volumes have continued to grow, and datasets are becoming more heterogeneous and complex. To combat these conditions, legal professionals are employing AI during ECA to help them quickly analyze, assess facts, determine case strategies, and identify relevant parties involved in a case.

    Even before a complaint is filed, legal teams are implementing AI on a subset of the document collection to learn the “players” and facts of a case. Legal teams can examine relationships between concepts and custodians.  AI can help the legal team identify additional custodians who should be interviewed or placed on legal hold.  Techniques such as query expansion and concept search can help identify new, unknown keywords and search phrases to find relevant ESI. Many types of AI software also help narrow the focus to highly relevant custodians and concepts.  AI can also help legal teams identify large chunks of data that are likely non-relevant, allowing those documents to be quickly removed from search and review. These initial steps can save time and cost by reducing data volumes, as well as helping legal teams to focus on what is important and to set case strategy earlier.

    Keyword search has not yet disappeared.  In fact, search terms are sometimes used to “jump-start” AI software by identifying good seed or training documents.  Legal professionals are using search terms to identify “low hanging fruit” – documents that are easily accessible and relevant to the case issues. Once these relevant documents are identified, they are used to locate additional, conceptually similar documents and to train supervised machine-learning algorithms.  AI tools can use an entire record or group of documents to bring up likely relevant ESI.  Concept-search tools can also help them to identify terms or concepts in the collection that may not previously have been known to them.

    Communication network analysis tools provide visualizations of communication patterns among individuals or email domains.  Legal teams can quickly see the quantity, frequency, and timing of communication traffic; how people are self-organizing; and when new or unexpected parties enter the conversation.  These tools allow legal teams to prepare and refine custodian lists and streamline custodian interviews.  Once legal teams have identified the custodians, they can concentrate on specific individuals to further investigate the topics being discussed.

  • Internal Investigations
    Internal investigations may be initiated for a variety of reasons.  Companies may receive whistleblower complaints about specific conduct; they may learn of other legal or compliance issues impacting other companies in their industry; or internal investigations may be undertaken as part of the process of ensuring regulatory compliance or to confirm that employees are behaving in accordance with legal requirements and company rules.

    As in the Early Case Assessment example, AI can be used to quickly hone in on key facts and circumstances.  Indeed, some AI may be operating in the background on an ongoing basis to identify potential employee misconduct or even external threats, such as data breaches.  Similarly, AI can be used during mergers and acquisitions as part of “due diligence” to evaluate potential risks.

  • Government Investigations
    There are several aspects of government investigations that may be different from eDiscovery for typical litigation or even for internal investigations. For example, in some investigations, government agencies may not want to fully disclose what they are investigating, but targets of the investigation may still be compelled to respond to broad Civil Investigative Demands (“CIDs”), subpoenas, or other voluntary requests.  In such circumstances, AI tools that can provide the recipient of such a demand an overall picture of the data set resulting from particular search parameters.  This can help the subject or target to hone in on what the government may be investigating.  A second difference from ordinary eDiscovery in the context of litigation is that there typically is no judge to referee discovery disputes, and the subject or target of the investigation may be entirely at the mercy of the investigating agency when seeking to limit the scope of the demands.  Here, some of the AI tools may provide data that will help in negotiations with investigating authorities.  Government investigations may also have shorter or less flexible deadlines for responses, therefore increasing the need for AI tools to help prepare a timely response. Note that when responding to government agency demands, it may be important to discuss in advance any AI tools that the responding party proposes to use for identification of responsive records, to avoid surprise and any adverse reaction from the investigating agency.