[EDRM Editor’s Note: EDRM is happy to amplify our Trusted Partners news and events. The opinions and positions are those of John Tredennick and Dr. William Webber.]
On March 5, 2024, Anthropic announced the release of three powerful versions of Claude 3. Early reports here and elsewhere suggest these new large language models may give GPT 4 a run for its money both in terms of performance and cost.
Here is a rundown of the three versions:
- Claude 3 Opus is the most powerful AI model, with top-level performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding.
- Claude 3 Sonnet strikes a balance between intelligence and speed. It delivers reliable and high-performing workloads at a lower price than our competitors, and is engineered to be the dependable workhorse for scaled AI deployments.
- Claude 3 Haiku is their fastest, most compact model for near-instant responsiveness. It answers simple queries and requests with unmatched speed, and also includes image-to-text vision capabilities that are comparable with other state-of-the-art vision models.
We loaded Claude Opus and Claude Sonnet into the lab version of DiscoveryPartner for initial testing prior to rolling it out for production. To put it mildly, we were impressed at the results.
Here is a look at what we found.
Topic Creation
We loaded more than 3,000 documents into our test bed and then created the following topic:
What are the key issues around this topic: Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida.
List the main points mentioned about definition, legality and licensing.
This topic relates to Jeb Bush’s emails which he made public when running for President. The program managers for the NIST annual text retrieval conferences created a list of more than 30 topics to test different machine learning algorithms’ ability to quickly find (but not analyze relevant documents). We now use the set against the full two million emails Bush made available to show speed and scalability. But our system not only finds relevant documents instantly but uses integrated GenAI algorithms to analyze and report on them.
Running AI Powered Searches
The first step with our system was to run several types of searches. One used a Natural Language Processing methodology, which created embeddings for each of the document sections and then for the topic query.
The second was syntactical, using AI to analyze for keyword similarity. We combined the rankings and promoted the top 100 documents for analysis, summarization and ranking. Let me note that all of this was done automatically in a few seconds by the system after I entered the prompt.
You can see a list of the documents we retrieved here:
In this case Claude created the titles based on its summary of the document. It also provided a relevance score for the document. For comparison purposes, we show the results of our initial search (Search rank) as well. In this case the search ranking pretty well matched the GPT judgments (at least for these documents). Think of Relscore as the virtual human judgment and Search rank as the initial sorting of the documents by the AI based search engines.
Document Summaries
Here is a look at an AI-generated summary for the top ranked document.
We used the mid-grade Claude 3 Sonnet model for this work. Sonnet offers a cost-effective mix of intelligence, speed and reasonable cost. Initial testing suggests it is a good and cost-effective choice for summarization work.
We now include paragraph references within the summary. These references tie to the actual text of the underlying document, allowing a reader to quickly match the summary statement to the relevant text in the document section.
Here is the link to paragraph 18.
Our users find these linked references a helpful timesaver when reviewing source links for synthesis reports and the underlying summaries. Of course they always have quick access to the full text of the document either in text or native format.
Claude 3 Opus for Reporting
The Opus model for Claude 3 is their top of the line GPT 4 competitor. Like the GPT 4 series it is more expensive (likely owing to the additional compute power required) and slower than the lesser models. While we could use it for summarization, it has and continues to feel like overkill for that part of the work. Certainly the summaries would take longer to create. They would be five times more expensive as well![1]
So, when it comes to the synthesis and reporting function, we use the top model, whether it is Claude Opus or GPT 4 Turbo. For this work, Claude Opus was required to read 100 summaries, analyze the information they contained and write a comprehensive report.
Let’s see how Opus did:
Definition:
What are the key issues around this topic: Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida.
List the main points mentioned about definition, legality and licensing.
DP
2024-03-06 16:49:11
claude-3-opus
Summaries used: 100Extend answer
Based on the document summaries provided, the key issues around the definition, legality, and licensing of slot machines in Florida are as follows:
Definition:
- Florida law broadly defines slot machines, potentially including both Class II and Class III machines (Document 256022, Document 256099).
- Class II machines are computerized bingo games, while Class III machines use random number generators to determine outcomes (Document 203072,Document 203076).
- Slot machines can have features like bonus rounds, multiple win lines, embedded bill acceptors, and recognizable game themes (Document 203072,Document 203076).
- There is debate over whether machines that involve an element of skill, such as allowing players to stop spinning reels, qualify as slot machines or “amusement machines” (Document 253557, Document 253570, Document 253575).
Legality:
- In 2004, Florida voters approved a constitutional amendment allowing slot machines at pari-mutuel facilities in Broward and Miami-Dade counties (Document 229876, Document 242947, Document 250887).
- The Florida Legislature failed to pass regulations for slot machines by the July 1, 2005 deadline, leading to legal battles and a court order for Broward County to establish regulations (Document 229701, Document 229876, Document 242947, Document 250887).
- There are concerns about the proliferation of unregulated “adult arcades” featuring slot machine-like games, which exploit a loophole in Florida’s gambling laws (Document 146539, Document 244261, Document 244353,Document 244365, Document 244366, Document 244367, Document 244423,Document 244425, Document 244426, Document 244436, Document 244471,Document 253449, Document 253501, Document 253502, Document 253558,Document 253571, Document 253576, Document 253600, Document 253602,Document 253675, Document 253701, Document 281926, Document 281992).
- Some argue that voters were misled about the types of slot machines that would be allowed under the constitutional amendment (Document 256022,Document 256099).
Licensing and Regulation:
- The Florida Legislature is responsible for establishing licensing requirements, payout percentages, machine types, and tax rates for slot machines (Document 202237, Document 202398, Document 229876).
- Broward County drafted regulations to monitor slot machines, audit revenues, and prevent criminal involvement in the absence of state regulations (Document 250887).
- There are proposals to limit slot machines to existing pari-mutuel facilities, restrict ownership and profit-sharing arrangements, and set tax rates based on revenue estimates from the amendment campaign (Document 210748,Document 210965, Document 256022, Document 256099).
- Concerns have been raised about the lack of regulation and potential for gambling addiction and exploitation of seniors at adult arcades (Document 244261, Document 244353, Document 244365, Document 244366, Document 244367, Document 244423, Document 244425, Document 244426, Document 244436, Document 244471, Document 253449, Document 253501, Document 253502, Document 253558, Document 253571, Document 253576, Document 253600, Document 253602, Document 253675, Document 253701).
Key people and organizations involved in this issue include:
- Jeb Bush, Governor of Florida (mentioned in multiple documents)
- Florida Legislature (mentioned in multiple documents)
- Broward County officials (Document 229701, Document 229876, Document 250887)Broward Circuit Judge Leroy Moe (Document 229701, Document 229876,Document 242947)
- Pari-mutuel facilities in Broward County (Document 229876, Document 250887)Florida Arcade Association (Document 244261, Document 244353, Document 244365, Document 244366, Document 244367, Document 244423, Document 244425, Document 244426, Document 244436)
- No Casinos campaign (Document 256022, Document 256099)
The main time period covered in these documents is from 2004 to 2006, with a focus on the aftermath of the 2004 constitutional amendment allowing slot machines in Broward and Miami-Dade counties.
For our money, this was a pretty good report, one generated in a couple of minutes.. We found Claude Opus to be quick in providing its answer, certainly on par with GPT 4 Turbo, which was our past favorite. The writing was first rate and the answer comprehensive.
In our system, we provide links from report statements to the underlying summary and, if desired, to the underlying source text. You can see an example here for Document ID 250887.
As you can see, Claude reported the following:
Broward County drafted regulations to monitor slot machines, audit revenues, and prevent criminal involvement in the absence of state regulations (Document 250887).
It was based on the underlying summary which reported:
1. Broward County released draft guidelines outlining how the county’s two racetracks, a dog track, and a jai-alai fronton can operate casinos with slot machines. [¶3] 2. The draft regulations aim to monitor the fairness of slot machines, audit the casinos’ revenue for taxation, and ensure no one with a criminal background is involved. [¶10-¶11] 3. The regulations were drafted after the Florida Legislature failed to pass slot machine rules during its regular session, despite voters approving a constitutional amendment allowing slot machines at Broward and Miami-Dade parimutuels. [¶4-¶5, ¶8, ¶14-¶15] 4. A Broward County Circuit Court judge ordered the County Commission to create operating regulations after parimutuels sued due to the lack of state guidelines. [¶8-¶9] 5. While parimutuel owners prefer state-issued regulations, they appreciate Broward County’s effort to move the process forward. [¶12-¶13] |
Claude Opus also gave me a nice timeline of events.
While a good associate could do this work, they might take a bit longer than the 41 seconds it took Claude Opus to read the 100 summaries and provide this timeline.
What Do We Think?
Color me excited. GPT 4 now has a viable competitor at the top end of the LLM market. I have no doubt that Sam Altman and his crew are working hard on their response to these powerful new LLMs. I also don’t doubt that OpenAI or maybe even Google will come back with even more powerful and cheaper models to regain the throne.
Are the Claude 3 models better than the GPT 4 line? I can’t say, at least not yet. My work with Claude Opus tells me it is every bit as good at GPT 4 and is offered at a better price. I am particularly pleased with their decision to offer three versions of the model, allowing our users to choose which combination of capabilities and pricing match the task at hand. Up until now, we were offering models from different vendors to provide that flexibility.
Many readers will note that I did not try to compare the Slot machine summary and report shown above to one produced by GPT 4 under similar circumstances. They would be right. That is work our prompt engineers will be doing in the lab, and which we may report on at a later time. And, I note that Anthropic published benchmark testing which suggested that Claude 3 matches or exceeds the equivalent OpenAI and Google offerings.
For now, my purpose was to give you a quick look at Anthropic’s exciting new Large Language Models. What I can say for sure is that Claude 3 is the real deal–one that we will be looking to integrate in DiscoveryPartner if our testing confirms my initial impressions.
Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.
Notes
[1]I should note that GPT 4 (32k) can be 40 to 50 times more expensive that our prior summarization tool of choice, Claude Instant (100k). The LLM choice can be important, at least when the bill comes around.