Claude 3: A First Look at this Exciting New Technology

Claude 3: A First Look at this Exciting New Technology
Image: John Tredennick with AI.

[EDRM Editor’s Note: EDRM is happy to amplify our Trusted Partners news and events. The opinions and positions are those of John Tredennick and Dr. William Webber.]

On March 5, 2024, Anthropic announced the release of three powerful versions of Claude 3. Early reports here and elsewhere suggest these new large language models may give GPT 4 a run for its money both in terms of performance and cost.

We loaded Claude Opus and Claude Sonnet into the lab version of DiscoveryPartner for initial testing prior to rolling it out for production. To put it mildly, we were impressed at the results.

Here is a look at what we found.

John Tredennick and Dr. William Webber, Merlin Search Technologies.

Here is a rundown of the three versions:

  • Claude 3 Opus is the most powerful AI model, with top-level performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding.
  • Claude 3 Sonnet strikes a balance between intelligence and speed. It delivers reliable and high-performing workloads at a lower price than our competitors, and is engineered to be the dependable workhorse for scaled AI deployments.
  • Claude 3 Haiku is their fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with unmatched speed, and also includes image-to-text vision capabilities that are comparable with other state-of-the-art vision models.

We loaded Claude Opus and Claude Sonnet into the lab version of DiscoveryPartner for initial testing prior to rolling it out for production. To put it mildly, we were impressed at the results.

Here is a look at what we found.

Topic Creation

We loaded more than 3,000 documents into our test bed and then created the following topic:

What are the key issues around this topic: Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida.

List the main points mentioned about definition, legality and licensing.

This topic relates to Jeb Bush’s emails which he made public when running for President. The program managers for the NIST annual text retrieval conferences created a list of more than 30 topics to test different machine learning algorithms’ ability to quickly find (but not analyze relevant documents). We now use the set against the full two million emails Bush made available to show speed and scalability. But our system not only finds relevant documents instantly but uses integrated GenAI algorithms to analyze and report on them.

Running AI Powered Searches

The first step with our system was to run several types of searches. One used a Natural Language Processing methodology, which created embeddings for each of the document sections and then for the topic query.

The second was syntactical, using AI to analyze for keyword similarity. We combined the rankings and promoted the top 100 documents for analysis, summarization and ranking. Let me note that all of this was done automatically in a few seconds by the system after I entered the prompt.

You can see a list of the documents we retrieved here:

Document Summaries
Image: Merlin Search Technologies

In this case Claude created the titles based on its summary of the document. It also provided a relevance score for the document. For comparison purposes, we show the results of our initial search (Search rank) as well. In this case the search ranking pretty well matched the GPT judgments (at least for these documents). Think of Relscore as the virtual human judgment and Search rank as the initial sorting of the documents by the AI based search engines.

Document Summaries

Here is a look at an AI-generated summary for the top ranked document.

100/511 Summarized
Image: Merlin Search Technologies

We used the mid-grade Claude 3 Sonnet model for this work. Sonnet offers a cost-effective mix of intelligence, speed and reasonable cost. Initial testing suggests it is a good and cost-effective choice for summarization work.

We now include paragraph references within the summary. These references tie to the actual text of the underlying document, allowing a reader to quickly match the summary statement to the relevant text in the document section.

Here is the link to paragraph 18.

Document Summaries
Image: Merlin Search Technologies

Our users find these linked references a helpful timesaver when reviewing source links for synthesis reports and the underlying summaries. Of course they always have quick access to the full text of the document either in text or native format.

Claude 3 Opus for Reporting

The Opus model for Claude 3 is their top of the line GPT 4 competitor. Like the GPT 4 series it is more expensive (likely owing to the additional compute power required) and slower than the lesser models. While we could use it for summarization, it has and continues to feel like overkill for that part of the work. Certainly the summaries would take longer to create. They would be five times more expensive as well![1]

So, when it comes to the synthesis and reporting function, we use the top model, whether it is Claude Opus or GPT 4 Turbo. For this work, Claude Opus was required to read 100 summaries, analyze the information they contained and write a comprehensive report.

Let’s see how Opus did:

Definition:

What are the key issues around this topic: Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida. 

List the main points mentioned about definition, legality and licensing.
DP
2024-03-06 16:49:11
claude-3-opus
Summaries used: 100Extend answer

Based on the document summaries provided, the key issues around the definition, legality, and licensing of slot machines in Florida are as follows:

Definition:

  • Florida law broadly defines slot machines, potentially including both Class II and Class III machines (Document 256022Document 256099).
  • Class II machines are computerized bingo games, while Class III machines use random number generators to determine outcomes (Document 203072,Document 203076).
  • Slot machines can have features like bonus rounds, multiple win lines, embedded bill acceptors, and recognizable game themes (Document 203072,Document 203076).
  • There is debate over whether machines that involve an element of skill, such as allowing players to stop spinning reels, qualify as slot machines or “amusement machines” (Document 253557Document 253570Document 253575).

Legality:

Licensing and Regulation:

Key people and organizations involved in this issue include:

The main time period covered in these documents is from 2004 to 2006, with a focus on the aftermath of the 2004 constitutional amendment allowing slot machines in Broward and Miami-Dade counties. 

For our money, this was a pretty good report, one generated in a couple of minutes.. We found Claude Opus to be quick in providing its answer, certainly on par with GPT 4 Turbo, which was our past favorite. The writing was first rate and the answer comprehensive.

In our system, we provide links from report statements to the underlying summary and, if desired, to the underlying source text. You can see an example here for Document ID 250887.

Image: Merlin Search Technologies

As you can see, Claude reported the following:

Broward County drafted regulations to monitor slot machines, audit revenues, and prevent criminal involvement in the absence of state regulations (Document 250887).

It was based on the underlying summary which reported:

1. Broward County released draft guidelines outlining how the county’s two racetracks, a dog track, and a jai-alai fronton can operate casinos with slot machines. [¶3]

2. The draft regulations aim to monitor the fairness of slot machines, audit the casinos’ revenue for taxation, and ensure no one with a criminal background is involved. [¶10-¶11]

3. The regulations were drafted after the Florida Legislature failed to pass slot machine rules during its regular session, despite voters approving a constitutional amendment allowing slot machines at Broward and Miami-Dade parimutuels. [¶4-¶5¶8¶14-¶15]

4. A Broward County Circuit Court judge ordered the County Commission to create operating regulations after parimutuels sued due to the lack of state guidelines. [¶8-¶9]

5. While parimutuel owners prefer state-issued regulations, they appreciate Broward County’s effort to move the process forward. [¶12-¶13]

Claude Opus also gave me a nice timeline of events.

Image: Merlin Search Technologies

While a good associate could do this work, they might take a bit longer than the 41 seconds it took Claude Opus to read the 100 summaries and provide this timeline.

What Do We Think?

Color me excited. GPT 4 now has a viable competitor at the top end of the LLM market. I have no doubt that Sam Altman and his crew are working hard on their response to these powerful new LLMs. I also don’t doubt that OpenAI or maybe even Google will come back with even more powerful and cheaper models to regain the throne.

Are the Claude 3 models better than the GPT 4 line? I can’t say, at least not yet. My work with Claude Opus tells me it is every bit as good at GPT 4 and is offered at a better price. I am particularly pleased with their decision to offer three versions of the model, allowing our users to choose which combination of capabilities and pricing match the task at hand. Up until now, we were offering models from different vendors to provide that flexibility.

Many readers will note that I did not try to compare the Slot machine summary and report shown above to one produced by GPT 4 under similar circumstances. They would be right. That is work our prompt engineers will be doing in the lab, and which we may report on at a later time. And, I note that Anthropic published benchmark testing which suggested that Claude 3 matches or exceeds the equivalent OpenAI and Google offerings.

For now, my purpose was to give you a quick look at Anthropic’s exciting new Large Language Models. What I can say for sure is that Claude 3 is the real deal–one that we will be looking to integrate in DiscoveryPartner if our testing confirms my initial impressions.

Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.

Notes


[1]I should note that GPT 4 (32k) can be 40 to 50 times more expensive that our prior summarization tool of choice, Claude Instant (100k). The LLM choice can be important, at least when the bill comes around.

Authors

  • John Tredennick

    John Tredennick (JT@Merlin.Tech) is the CEO and founder of Merlin Search Technologies, a cloud technology company that has developed Sherlock®, a revolutionary machine learning search algorithm. Prior to founding Merlin Search Technologies, Tredennick had a distinguished career as a trial lawyer and litigation partner at a national law firm. With his expertise in legal technology, he founded Catalyst in 2000, an international e-discovery search technology company that was later acquired by a large public company in 2019. Tredennick's extensive experience is evident through his authorship and editing of eight books and numerous articles on legal technology topics. He has also served as Chair of the ABA's Law Practice Management Section.

    View all posts
  • Wiliiam webber

    Dr. William Webber (wwebber@Merlin.Tech) is the Chief Data Scientist of Merlin Search Technologies. With a PhD in Measurement in Information Retrieval Evaluation from the University of Melbourne, Dr. Webber is a leading authority in AI and statistical measurement for information retrieval and ediscovery. He has conducted post-doctoral research at the E-Discovery Lab of the University of Maryland and has over 30 peer-reviewed scientific publications in the areas of information retrieval, statistical evaluation, and machine learning. Dr. Webber has nearly a decade of industry experience as a consulting data scientist for ediscovery software vendors, service providers, and law firms.

    View all posts

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.