Announcing Gen II: Merlin’s Second-Generation Generative AI Platform

[EDRM Editor’s Note: EDRM is happy to amplify our Trusted Partners news and events. The opinions and positions are those of John Tredennick and Dr. William Webber.]


We are excited to announce the upcoming release of Gen II, our second-generation GenAI platform. We released the first generation of our multi-LLM-integrated DiscoveryPartner platform in early November of last year. No, not to beta, or limited release, or with an announcement that it is coming “real soon now.” Rather and perhaps unexpectedly, as a production release to real clients.

Gen I: Advanced RAG Architecture

Our initial deployment was the first and still the only multi-LLM system in the legal market. It allows users to choose from different Large Language Models (e.g. Claude and GPT for different tasks and price points. Gen I was the first and only system to summarize documents before synthesizing and reporting against them, which allowed users to analyze as many as 300 documents at a time (rather than being limited to an answer based on four or five documents selected by the vendor’s system). And, it then synthesized across hundreds of summaries using one of the GPT 4 series to do this work.

We offered Claude Instant (100 k) for document summaries because Claude Instant was about 50 times faster and 50 times less expensive than GPT 4 (32k).  And, we put our users in the driver’s seat, allowing them to use our lightning-fast Sherlock Machine Learning algorithms to find and choose the relevant documents to be analyzed. Indeed, you could load our special Analyze folders with hundreds of documents and let GEN AI review and rank each to determine which were most responsive to a particular topic of inquiry.

Here is an example of DiscoveryPartner’s ability to quickly summarize 100 documents (using Claude Instant) and then send them to GPT 4 for synthesis and analysis–with the whole process taking about four minutes.

Gen I version of DiscoveryPartner showing summarization and analysis of 100 documents.

Gen I was the first and so far only platform with these capabilities. But we didn’t stop there.

Gen II: Scalable Enhanced RAG Architecture for all types of files.

By December, we were working on a second generation of the platform, one that allowed us to address several key components in Gen I that we thought could be improved with another round of development. Gen II offers almost limitless scalability, along with the ability to handle multiple file types–from large and small documents to transcripts, text messaging, medical records and even articles and reports.

We’ve done it and are just implementing it in DiscoveryPartner. Here is a quick summary of the exciting new features included in Gen II.

1. Chunking: Automatically Breaking Large Documents into Sections

With Gen I we summarized documents before sending them for synthesis. When we started working with transcripts, larger text volumes and even medical records, we realized that we had to break the documents into smaller sections. This was required in order to properly capture a lot of different information for initial search, our summaries and ultimately for answering questions. Once we saw how effective this was for transcripts, we decided to section all large documents regardless of form.

Gen II showing a section of a January 6th House Investigation Transcript.


The benefit is that Gen II can handle any type of unstructured text file from email, to large files, to transcripts, to chat messaging (IM), to medical records, and even lengthy financial reports. You can put different types of documents in different folders for analysis. Or put them all in one folder. Gen II will handle them either way.

2. Dividing Sections into Paragraphs

Our next step was to create internal paragraph references within the sections that could be linked to specific statements in the summary. The goal here was to make it easier to find relevant passages in the underlying section text.  We highlight the relevant paragraphs. Like this result when I clicked on paragraph 14 in the summary and saw the relevant text below:

Gen II showing links between summary and native text for email file

No special formatting is required, prior to loading, even for larger documents, text messaging / Chat, medical records or even transcripts.

3. Running Searches Before Creating Summaries

Perhaps the most important architectural change is this. When you start a new topic, we don’t automatically summarize all the documents in the folder. Instead, we have devised a powerful new search method which combines natural language search with algorithmic (ranked keyword) search that can instantly search through the document sections (remember, we are breaking large documents into sections) and surface the most relevant ones for summarization and review.

Why two kinds of searches? Because we have found that each methodology has its strengths and weaknesses. NLP converts document text into vectors for a mathematical type of search (with the topic statement also being converted). It can be effective in surfacing documents that don’t have matching keywords but are about the same concept. But, it seems less effective when your initial topic focuses on certain keywords or names.

Gen II showing links between summary and Summaries of Chinese Medical Research Documents (100 found and summarized out of 3,000)


We found that combining the rankings for both types of searches (using a proprietary weighting we developed) was pretty effective at surfacing the most relevant sections for analysis. And, we have plans to take this even further with an additional technique that for patent reasons I can’t disclose yet but should be available in Gen II.

Why is this important? Because this will allow you to load thousands of documents in an Analyze folder (rather than a few hundred). You can analyze and report on a variety of different topics without being forced to summarize and re-summarize the entire population with each new topic. Rather, we will use our multi-layered search system to bring back your top candidates for summarization from the much larger pool.

We recognize that this sets the stage for asking questions against the entire document population, literally asking and getting answers from millions of documents. However, that functionality won’t make it into Gen II, but it will be on our development plan for future sprints.

4. Extending the Inquiry

This is clearly the second most important feature in Gen II. Many of you know that a key architectural limitation of all Large Language Models is that they can only read and analyze a limited amount of text in one go. The limit is the size of the “Context Window.” In our articles and webinars we liken the context window to a whiteboard. The prompt we send goes on the whiteboard, which the LLM can read (but never remember or share). The LLM then responds to the topic question by writing on that whiteboard. When the answer is returned to the user, the whiteboard is erased.

Early versions of ChatGPT were limited to about 3,000 tokens (2500 words) on the whiteboard. That has quickly evolved to 8, 16, 32, 100 and even 200k of tokens (Claude has just released a new version with 200,000 tokens and a claim that it will go up to 1 million tokens). The key point is this: Even with these larger context windows, you still won’t be able to load all your documents into the LLM for analysis at one time.

So, we created an important workaround. Imagine that you create a topic and receive an answer based on the top 100 summaries (possibly out of thousands of documents). The answer seems on target but you find yourself wondering if there might be more information deeper in the results set, e.g. in the next 100 sections from the ranking. With Gen I, there wasn’t an easy way to find out. You might have 500 documents in a folder but the LLM can only read 200 of the summaries. What to do?

That was the reason for our Extend functionality. Simply click on “Extend” and the system will read and summarize the next XXX document sections in the rankings, synthesize/analyze across them and report on any new information you can find. If indicated, you can invoke multiple extensions until you have the information you need or, at least, there seems to be no further novel information in your document set on that topic.

From here you can easily ask the LLM to consider all of its preceding answers (original plus one or more extended answers) and prepare a comprehensive report across all of them. In this fashion, we have worked around a key architectural limitation inherent in Large Language Models.

5. Search Results Graph

For our careful readers who have reached this point, you might ask this question: How do I know when to extend my analysis? How deep do I need to go?  How do I know that there is no more important information on my topic in my document set?

This is a tough but important question. In Gen II we set out to help our more advanced users with this question through the addition of this Relscore chart (accessible from the summaries list).

Gen II relevance score chart suggesting that extended search has found most of the relevant files.


Gen II relevance score chart suggesting that extended search has found most of the relevant files.

What’s going on here? We start with the search ranking. In this case we have summarized and reported on the top 200 results out of 511 sections. We used the first 100 results for the original answer and then extended the answer to cover the second batch of 100 documents. All 200 have been ranked by GPT 4 Turbo (which is like having a human review and rank each document returned from your search).

What we see is that the first 25 documents returned from our search were highly relevant to our topic (as rated by GPT 4 Turbo). We see that about 65% of the first 25 documents returned from our initial search were highly relevant (a rating of 60 is our relevance cutoff). We can also see that the relevance rate dropped significantly in the second batch of 25 and indeed in each of the next groups as well. Interestingly there were some highly relevant documents coming in the next to the last grouping but few in the last three groups.

What might this mean? It suggests to me that the user (me in this case) found many of most of the relevant documents on our topic (“stand your ground” in this case). I could have extended my inquiry further but it seems likely that there won’t be many more documents in this folder that will help me better understand and analyze my topic.

Just for Fun: I did a further extension on this topic, asking the LLM to pull up the next 100 documents to see if there might be more information. Here is the answer I got:

Gen II extension request suggests most relevant documents have been found and analyzed.

We now have reasonable basis to conclude that there is no additional information relevant to the request concerning Florida bills related to “Stand Your Ground” laws or the use of deadly force to protect oneself or one’s property.

And here is the chart we created over the top 300 documents in the search ranking.

Gen II Relevance Ranking chart suggesting that no further extensions required for topic analysis.

Conclusion

Gen II represents a significant leap forward in our mission to harness the power of Large Language Models for legal applications. By addressing key limitations in Gen I and introducing innovative new capabilities, we have created a platform that is more scalable, versatile, and effective at surfacing relevant information from vast troves of unstructured data.

The ability to automatically chunk large documents into sections, divide those sections into paragraphs, and link them to specific statements in the summary provides unprecedented granularity and context for analysis. Our powerful new search method, which combines natural language search with algorithmic search, enables users to quickly surface the most relevant sections from thousands of documents without the need to summarize the entire population for each new topic.

Perhaps most excitingly, the Extend functionality allows users to dig deeper into the search results, uncovering additional relevant information that might otherwise have been missed due to the limitations of LLM context windows. And with the Search Results Graph, advanced users can now make informed decisions about when to extend their analysis and how deep to go based on the relevance ratings of the returned documents.

Gen II sets the stage for even more ambitious developments in the future, such as the ability to ask questions against entire document populations numbering in the millions. While that functionality won’t make it into this release, we are confident that the architectural changes and new features in Gen II will provide our users with an unparalleled tool for navigating the complex world of legal discovery.

As always, we remain committed to pushing the boundaries of what is possible with generative AI in the legal domain. With Gen II, we have taken another significant step towards realizing that vision, and we look forward to continuing to innovate and iterate in partnership with our clients.

Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.

Authors

  • John Tredennick

    John Tredennick (JT@Merlin.Tech) is the CEO and founder of Merlin Search Technologies, a cloud technology company that has developed Sherlock®, a revolutionary machine learning search algorithm. Prior to founding Merlin Search Technologies, Tredennick had a distinguished career as a trial lawyer and litigation partner at a national law firm. With his expertise in legal technology, he founded Catalyst in 2000, an international e-discovery search technology company that was later acquired by a large public company in 2019. Tredennick's extensive experience is evident through his authorship and editing of eight books and numerous articles on legal technology topics. He has also served as Chair of the ABA's Law Practice Management Section.

  • Dr. William Webber

    Dr. William Webber (wwebber@Merlin.Tech) is the Chief Data Scientist of Merlin Search Technologies. With a PhD in Measurement in Information Retrieval Evaluation from the University of Melbourne, Dr. Webber is a leading authority in AI and statistical measurement for information retrieval and ediscovery. He has conducted post-doctoral research at the E-Discovery Lab of the University of Maryland and has over 30 peer-reviewed scientific publications in the areas of information retrieval, statistical evaluation, and machine learning. Dr. Webber has nearly a decade of industry experience as a consulting data scientist for ediscovery software vendors, service providers, and law firms.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.