[EDRM’s Editor’s Note: This article was first published here on August 16, 2024, and EDRM is grateful to Rob Robinson, editor and managing director of Trusted Partner ComplexDiscovery, for permission to republish.]
ComplexDiscovery’s Editor’s Note: This article is essential for cybersecurity, information governance, and eDiscovery professionals navigating the complexities of legal document review. With the rapid advancements in AI technology, understanding the benefits and limitations of manual review, TAR, and generative AI is crucial for optimizing review processes. The added discussion on human interaction highlights the necessity of oversight and the innovative approaches to integrating AI in a balanced, iterative manner, ensuring that human judgment remains a critical component in legal processes.
eDiscovery Review in Transition: Manual Review, TAR, and the Role of AI
ComplexDiscovery Staff
The ever-evolving world of legal document review has transformed significantly with the advent of new technologies and methodologies. Traditionally, legal teams relied on manual review, a meticulous and labor-intensive process in which human reviewers examine every document to determine its relevance and responsiveness. While this method remains a cornerstone of thorough and defensible legal work, it is often slow and expensive, especially in the face of increasingly large data sets.
To address these challenges, Technology-Assisted Review (TAR) emerged as a groundbreaking solution. TAR leverages machine learning algorithms to assist in identifying relevant documents, drastically improving efficiency and reducing the time and cost associated with manual review. Over time, TAR has evolved into more sophisticated iterations, enhancing the review process’s accuracy and speed.
Recently, Generative AI has entered the field as the latest innovation, promising even greater efficiencies. Unlike TAR, which requires a training phase, Generative AI can process documents with minimal initial input, offering rapid results. However, this technology is still in its early stages, and its acceptance within the legal community faces hurdles, particularly regarding judicial approval and the need for human oversight.
Each of these methodologies—manual review, TAR, and Generative AI—has its own advantages and challenges. Understanding the nuances of each approach is essential for legal professionals who must navigate complex legal requirements, manage costs, and ensure that their review processes are thorough and defensible. As technology advances, choosing the right method for a given case becomes increasingly critical, requiring a careful balance of speed, accuracy, and human judgment.
Manual Review: The Traditional Approach
Manual linear review is the most traditional method, where a team of reviewers manually examines every document to determine its relevance, responsiveness, and other legal factors such as privilege. Although time-tested and universally accepted by courts, this method is slow and expensive. The simplicity of coordination without needing advanced technology is an advantage, making it a go-to for many legal teams. However, the major drawback is the considerable time and cost involved, particularly for large datasets.
Technology-Assisted Review (TAR): A Machine Learning Approach
TAR represents a significant leap forward in document review, leveraging machine learning to increase efficiency. The first iteration, often called TAR 1, involves an attorney reviewing a subset of documents—typically 5,000 to 20,000. This subset trains a machine learning model to differentiate between responsive and non-responsive documents. Once trained, the model can classify the rest of the document set, significantly reducing the manual workload.
An evolved form, TAR 2, introduces Continuous Active Learning®, where the model is iteratively refined. After the initial training, the documents are ranked by the model, reviewed for relevance, and integrated as training to enhance the model’s performance. As review progresses, the model continues to learn and updates document ranks. This iterative process typically results in higher accuracy, making TAR 2 a preferred choice when quality is paramount.
Building on the foundations of TAR 2, TAR 3, also known as Cluster-Centric CAL® (Continuous Active Learning®), represents another step in the evolution of technology-assisted review. TAR 3 incorporates advanced machine learning techniques and deep learning algorithms to refine the review process further. Unlike its predecessors, TAR 3 focuses on grouping similar documents into clusters based on content and context, allowing the review model to learn from these clusters more effectively. This cluster-centric approach enhances the model’s ability to adapt to complex datasets and improves its contextual understanding, leading to more accurate predictions.
TAR 3 still utilizes Continuous Active Learning® but does so within the framework of these document clusters, enabling more sophisticated analytics and model adjustments. This advanced iteration aims to deliver even higher recall and precision rates than its predecessors, offering a more reliable and thorough document review.
However, even with these advancements, TAR (particularly TAR 1) can still be less precise and may require additional manual oversight, especially when dealing with complex datasets. TAR 2 and TAR 3 improve on this by achieving recall rates—an indicator of how many relevant documents are identified—of up to 90%, which is comparable to or even exceeds manual review.
Generative AI: The Cutting-Edge Solution
Generative AI is the latest innovation in document review, offering an automated solution that requires less initial training than TAR. In this approach, the review protocol designed for human reviewers is fed directly into a generative AI model, which then processes the documents. The primary advantage of generative AI is its speed. A review process that might take weeks or months using manual review or TAR can be completed in a fraction of the time with generative AI.
Despite its efficiency, generative AI comes with its own set of challenges. It is currently more expensive on a per-document basis compared to TAR, with costs potentially reaching up to 60 or 70 cents per document under some models. However, the trade-off is in the significant time savings. For instance, reviewing 130,000 documents might take 27 days manually, ten days using TAR 1, 18-20 days with TAR 2 and 3, but only about five days with generative AI.
Regarding accuracy, generative AI can achieve recall rates comparable to TAR 1 right out of the box, usually in the 70% range. With further testing and iteration, it can reach recall rates in the 90% range, rivaling TAR 2 and 3 but in a much shorter time frame.
The effectiveness of Generative AI in review is also contingent on the profile of the documents being analyzed. Documents that are too small, too large, overly structured, or under-structured may present situations where generative AI is not the optimal review option. This variability necessitates careful consideration of document characteristics when deciding whether to deploy generative AI for a review task.
The Role of Human Interaction and Validation
While the efficiency and speed of TAR and generative AI are clear, the importance of human interaction and validation in the document review process cannot be overstated. Human reviewers play a critical role in ensuring that the results produced by these technologies are accurate and aligned with legal standards.
In manual review, humans are directly responsible for assessing each document, making this process inherently reliant on human judgment. With TAR and generative AI, human involvement shifts towards the oversight and validation stages. For TAR, this means verifying that the machine learning model has been trained correctly and is accurately classifying documents. For generative AI, it involves ensuring that the AI’s decisions are sound and consistent with the legal guidelines.
Generative AI also introduces three distinct approaches that integrate varying levels of human oversight:
- GenAI Autonomous Review: The generative AI model operates almost independently in this approach. The review protocol is the only guidance given to the system before it begins the review, and the AI also conducts the Quality Control (QC) of the reviewed documents. This method emphasizes managing the review technology rather than the review process itself. While it offers maximum efficiency, it may carry risks if the AI’s decisions are not sufficiently validated by human reviewers.
- GenAI Assisted Review: This approach blends AI and human expertise more closely. A training set of human-reviewed documents supplements the review protocol to initiate the review. Additionally, the QC process involves human reviewers and the AI system, ensuring the AI’s output is thoroughly validated. This method strikes a balance between efficiency and accuracy, with human oversight enhancing the reliability of the AI’s decisions.
- GenAI Assisted Review – Iterative: The most holistic of the three, this approach builds on GenAI-Enabled Assisted Review by incorporating iterative reviews. The results are assessed after the initial AI review, and the AI model is further refined based on feedback. This iterative process continues until the desired recall and precision percentages are achieved. By combining human oversight, technology enablement, and iterative processes, this approach offers a robust solution that maximizes the efficacy of the AI while minimizing the risk of errors.
Choosing the Right Method
Selecting the appropriate document review method depends on the case’s specific needs, including factors like the size of the document set, budget, time constraints, and the required level of precision. While expensive and slow, manual review remains the gold standard for thoroughness. TAR balances efficiency and accuracy, particularly when using the TAR 2 model. The judiciary has widely accepted both linear review and TAR, ensuring their defensibility in legal proceedings. However, generative AI, though still in its early stages, promises to revolutionize the field by offering speed and high recall rates, albeit at a higher cost. It is important to note that while generative AI holds great potential, there are still acceptance hurdles to be crossed within the judiciary as courts and legal professionals continue to evaluate its reliability and effectiveness.
As generative AI technology continues to develop, it is likely to become more cost-effective and even more accurate, further solidifying its role in the future of legal document review. Legal professionals must stay informed about these advancements to make the most strategic decisions in their review processes.
Background: Understanding Recall and Precision
In any document review process, particularly in eDiscovery, the effectiveness of the review is commonly measured by two key metrics: recall and precision. These metrics are essential for assessing the quality of the review and ensuring that the legal process is both thorough and accurate.
Recall measures how comprehensive the document review is in identifying relevant documents. Specifically, it represents the percentage of truly relevant documents correctly identified as relevant during the review process. A high recall score indicates that most of the relevant documents in the collection were successfully found and classified as relevant. For example, a recall score of 80% would mean that 80% of all the relevant documents were correctly identified, while 20% were missed.
The importance of recall lies in its ability to reduce the risk of missing critical evidence. In legal contexts, failing to identify relevant documents can lead to incomplete discovery, which may impact a case’s outcome. Therefore, ensuring a high recall rate is often prioritized, particularly in cases where comprehensiveness is crucial.
Precision measures the accuracy of the document review in correctly classifying documents as relevant. It is the percentage of documents identified as relevant that are actually relevant. A high precision score means that most of the documents marked as relevant are truly relevant, with few false positives. For example, if the precision score is 90%, it means that 90% of the documents classified as relevant are actually relevant, while 10% are irrelevant documents incorrectly marked as relevant.
Precision is critical for controlling the costs and efficiency of the review process. High precision ensures that the time and resources spent reviewing documents are focused on genuinely relevant materials, reducing the burden of sifting through irrelevant documents.
Key Differences between recall and precision can be summarized as follows:
- Recall focuses on finding all relevant documents within the dataset. High recall reduces the risk of missing important documents but may include more false positives.
- Precision focuses on the accuracy of relevance classifications. High precision ensures that most documents marked as relevant are indeed relevant, but it may result in missing some relevant documents.
In practice, there is often a trade-off between optimizing for recall and precision. High recall might come at the cost of lower precision, leading to more irrelevant documents being reviewed, whereas high precision might reduce the number of irrelevant documents but at the risk of missing some relevant ones.
Importance in eDiscovery
Both recall and precision are crucial for evaluating the effectiveness of a document review process, particularly in the context of eDiscovery:
- Recall helps ensure comprehensive discovery, reducing the risk of missing key evidence that could be vital to the case.
- Precision helps control review costs by minimizing the number of irrelevant documents requiring manual review, thus streamlining the process.
Courts may examine both recall and precision to assess the defensibility and proportionality of document review processes. A balance between these metrics is often necessary to achieve an effective and efficient document review.
The F1 score is commonly used to provide an overall measure of review accuracy. It combines recall and precision into a single metric, giving equal weight to both and helping to balance the trade-offs between them. Achieving a high F1 score indicates a well-rounded and effective document review process.
News Sources
This article is based on an interview with John Brewer, Chief Artificial Intelligence Officer and Chief Data Scientist of HaystackID, conducted on August 14, 2024, at ILTACON, as well as industry research, panel presentations, and anecdotal discussions on the topic during the conference.
Additional Reading
- The Workstream of eDiscovery: Considering Processes and Tasks
- ABA Issues Ethical Guidelines for Integrating AI in Legal Practice
Continuous Active Learning® and CAL® are registered trademarks of Maura Grossman and Gordon V. Cormack.
Source: ComplexDiscovery OÜ
Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.