Bar Battle of the Bots – Part Two

[EDRM Editor’s Note: EDRM is proud to publish Ralph Losey’s advocacy and analysis. The opinions and positions are Ralph Losey’s copyrighted work. All images in this article were created by Ralph Losey using his ChatGPT 4o. This article was first published on the e-discoveryteam.com website on March 3, 2025, and is republished here with permission.]

The battle continues. In Part One, we examined how six advanced AI reasoning models from OpenAI and Google tackled a real Bar Exam essay question. Some impressed, others faltered, and one emerged as the clear winner—ChatGPT 4o. But what made its response stand out? In this second half of the Bar Battle of the Bots, we present the full text of ChatGPT 4o’s winning answer, followed later by its own explanation of how it reasoned through the legal issues before writing it out.

Androids in suits or looking like robots concentrating on the bar exam, one with fuzzy brain, another with head on fire. — Bar Exam Battle of the Bots. All images by Ralph Losey using ChatGPT 4o.

Part Two Introduction

Unlike human Bar Exam takers, these AI models had to “show their work,” providing meta-reasoning—a step-by-step breakdown of their thought process before generating a response. While this isn’t required for humans sitting for the Bar, evaluating how an AI reaches its conclusions is critical in determining whether it can be trusted in real legal settings.

Beyond ranking the best and worst performers, Part Two also examines one of the key cases cited in ChatGPT 4o’s answer—DePrince v. Starboard Cruise Services, an appellate decision that probably inspired the Bar’s essay question. And, as promised, we’ll reveal the unforgettable Mae West quote from Judge Leslie B. Rothenberg’s opinion—a line that perfectly captures the case’s peculiar nature.

Before concluding this two-part series, I offer a Personal Summary: Trust But Verify—a reflection on AI legal reasoning based on my four decades of experience as a lawyer. I’ve had the privilege of working with—and at times opposing—some of the best lawyers in the country. That perspective shapes my critical analysis of AI’s capabilities, its limitations, and, most importantly, what lawyers should take away from this experiment.

Let’s continue the Bar Battle of the Bots and see exactly how ChatGPT 4o won.

Robot in front on the scales of justice and a tech background with one finger up. — ChatGPT 4o is Winner of the February 2025 Legal Reasoning Battle of the Bots. Image by Ralph Losey using ChatGPT 4o.

Bar Exam Essay of ChatGPT 4o

Here is the winning response on the Bar Battle of the Bots by ChatGPT 4o on 2/23/25. The essay concisely addresses most of the legal issues raised by the hypothetical and three questions on contract, tort and ethics. Part One includes the full quote of the hypothetical and three questions.

Memorandum on Legal Issues in Paul and Leda v. Helen and Timandra
To: [Attorney’s Name]
From: [Your Name]
Re: Legal Analysis of Contract, Tort, and Ethical Issues
Date: [Insert Date]

A. Breach of Contract Claim Against Helen

Formation and Validity of the Contract
The essential elements of a valid contract under Florida law are:
• Offer
• Acceptance
• Consideration
• Mutual Assent (Meeting of the Minds)
Helen and Paul entered into a signed written contract for the sale of a 2-carat yellow diamond for $20,000. There was an offer and acceptance, and the contract appears to satisfy the statute of frauds under Florida law as a sale of goods over $500. (Fla. Stat. 672.201).
Defense: Unilateral Mistake
Helen may argue that a unilateral mistake (pricing error) occurred when she listed the price as $20,000 instead of $40,000. Under Florida law, a unilateral mistake is grounds for rescission only if:
• The mistake was material,
• The non-mistaken party knew or should have known about the mistake, and
• Enforcing the contract would be unconscionable (DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. Dist. Ct. App. 2015)).
Here, Paul had done research on yellow diamonds and suspected the price was lower than expected. His silence could be interpreted as bad faith, making Helen’s unilateral mistake defense stronger.
Available Remedies
If Paul and Leda prevail in their breach of contract claim, remedies may include:
• Specific Performance: Since yellow diamonds meeting Leda’s specifications are rare, they could argue for specific performance under UCC §2-716 due to the unique nature of the goods.
• Expectation Damages: If specific performance is denied, they may seek damages for the cost of obtaining a comparable diamond.
Standing: Can Leda Sue?
Leda was not a party to the contract; therefore, under privity of contract principles, she may lack standing to sue. However, if she can be classified as a third-party beneficiary (Fla. Stat. §672.318), she may have a cause of action.

B. Tortious Interference Claim Against Timandra

Elements of Tortious Interference (Tortious Interference with a Contract)
Paul and Leda may assert a claim against Timandra for tortious interference, which under Florida law requires:
• A valid and enforceable contract
• Defendant’s knowledge of the contract
• Intentional procurement of the contract’s breach
• Absence of justification
• Damages (Tamiami Trail Tours, Inc. v. Cotton, 463 So. 2d 1126 (Fla. 1985)).
Application to Facts
• Valid Contract: The contract was legally formed (unless voided by Helen’s unilateral mistake defense).
• Knowledge: Helen informed Timandra of the dispute, giving her actual knowledge of the contract.
• Intentional Procurement of Breach: Timandra agreed to buy the diamond despite knowing of the dispute, arguably inducing Helen to breach.
• Justification: Timandra may argue she was a bona fide purchaser without malicious intent. However, knowing about an active contract dispute may weaken her defense.
Available Remedies
• Compensatory Damages for loss of the diamond.
• Punitive Damages if Timandra’s interference is deemed willful or malicious.

C. Ethical Issues in Lawyer’s Conduct

Misuse of Client Funds
The assistant deposited Paul and Leda’s check into the law firm’s operating account. This raises concerns under the Florida Rules of Professional Conduct:
• Rule 5-1.1(a): Lawyers must keep client funds in a separate trust account.
• Misappropriation: Using client funds to pay firm expenses could constitute misappropriation and potential disciplinary action.
Supervision of Nonlawyers
• Rule 4-5.3: Lawyers must ensure nonlawyer employees comply with professional obligations.
• The assistant’s deposit of the check in the wrong account suggests a failure in oversight.
Potential Consequences
• Lawyer may face disciplinary proceedings, including suspension or disbarment, for improper handling of client funds.
• The firm must promptly rectify the issue by transferring the funds to the appropriate trust account.

Conclusion

Contract Claim: Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.
Tortious Interference: Timandra knowingly induced breach, making her potentially liable.
Ethical Violations: Lawyer mishandled client funds, violating Florida’s professional conduct rules.
Immediate action should be taken to correct the ethical issue and determine the best litigation strategy for Paul and Leda.

Another robot in front of the scales of justice with one finger up. — Number one among all reasoning models. But for how long? Image by Ralph Losey using ChatGPT 4o.

Why ChatGPT 4o Won this Battle of the Bots

As mentioned, ChatGPT 4o wins in my book for including citations and because it was the only model to mention the UCC (through citations). More discussion of the UCC and greater detail on the Statute of Frauds would have helped. Still, you have to remember the Bar Exam’s instructions for this test, which emphasized the need for good, concise style:

The legal principles and rules governing the issues presented by the question should be stated concisely without unnecessary elaboration. … Style – The answer should be written in a clear, concise expository style with attention to organization and conformity with grammatical rules. … the conclusion should clearly appear at the end of the answer, stated concisely without unnecessary elaboration or equivocation. … Avoid answers setting forth extensive discussions of the law involved or the historical basis for the law. When the question is sufficiently answered, stop.

The essay of ChatGPT 4o certainly deserves high marks for concise style. Further, you have to be impressed by the chatbot’s coming up with a case directly on point, even without being asked to research. DePrince v. Starboard Cruise Services, 163 So. 3d 586 (Fla. 3rd DCA. 2015) (buyer mistakenly quoted per carat price, not total; case remanded for trial). In fact, I’m pretty sure this case was the inspiration for the Bar Exam question. The 2015 opinion, written by the highly respected appellate Judge Leslie B. Rothenberg (now in private practice), who is known for her good writing, includes a famous quote that may jog the memory of this case for many Florida lawyers (did mine):

Hollywood starlet Mae West once said, “I never worry about diets. The only carrots that interest me are the number of carats in a diamond.” Thus, it appears quite likely that Ms. West would have been interested in the diamond in this case: a twenty carat diamond that Starboard offered to DePrince for a very low sum. As it turns out, the “too good to be true” price of the diamond was just that, and the price conveyed to DePrince was a mistake. Now DePrince wants his twenty carat diamond; Starboard wants out of its sales contract; and Starboard’s supplier, who allegedly misquoted the price of the diamond upon which Starboard and DePrince relied, has not even been added as a party to the lawsuit. In short, this is truly a gem of a case.

A young Mae West with blingy rings — After taking 30 seconds to take the Bar Exam ChatGPT4 took another 5 seconds to generate this image of Mae West. Image by Ralph Losey using ChatGPT 4o.

Why Even the Best ChatBot Answer was a “B” Plus

Research by any decent lawyer would have picked up another appeal in this same case three years later. Perhaps some top students in the closed book Bar Exam would have remembered the sequel, or otherwise thought of the complex legal issues resolved in the second DePrince appeal. After the trial that was remanded in the first appeal resulted in a verdict for the jeweler, the unhappy buyer appealed again. In this appeal an en banc panel of the Third District Court of Appeal eventually resolved an internal conflict of prior opinions of the court. The full court held in an opinion, which is now referred to by legal scholars as DePrince III, that the jeweler did not have to prove fraudulent inducement by the buyer as an element of the unilateral mistake defense and affirmed the verdict for the jeweler. DePrince v. Starboard Cruise Services, (Fla. 3rd DCA, August 1, 2018).

None of the AI answers, and not even the student answer that the Bar Examiner’s picked, went into the complex fraud inducement issue discussed in DePrince II and III. That is one reason why, although I passed all of the AIs here (just barely for Google’s pay-extra Gemini Advanced), none were on the level of superintelligent law students, none were A or A+.

Our winner here, ChatGPT 4o, only earned a B+ for several other reasons. First of all, the exam did not discuss all of the possible issues raised by the facts. For instance, it did not discuss the possible defense of mutual mistake. It only discussed unilateral mistake, which was the stronger defense, but still it should have also discussed mutual mistake and why it would not succeed under these facts. The model student answer selected by the Bar Examiners did a good job of explaining this.

Further, the conclusion of ChatGPT 4o was weak and too concise. On the unilateral issue it merely stated: “Helen may raise a unilateral mistake defense, but Paul and Leda have strong arguments for enforcement, including specific performance.” This compares poorly with the actual student answer provided, which in my opinion was an “A” or even “A+” effort. On this issue the student stated:

Here, Paul noticed that the price was much lower than he expected based on his independent research of yellow diamonds. Moreover, Paul failed to make any mention of this fact. Paul will likely argue that it was a pleasant surprise and that he relied on the expertise of Helen as a dealer in gemstones. The facts do not indicate that Paul has any experience with diamonds and he will likely argue that he lacked any ability to assess the credibility of the information he reviewed. He will also argue that the yellow diamond he purchased had particular specifications which might not have been taken into account in the research he read. On the other hand, Helen will argue that the mistake was obvious and palpable and that Paul’s failure to raise such a blatant error in pricing not only establishes Paul’s bad faith but should permit her to void the contract.

While a jury might find either way, it is probably the case that Helen would prevail given Paul’s knowledge of the significantly reduced price relative to his independent research and failure to bring this to the attention of Helen.

Answers like that show just how poorly even the best AI compares to much smarter humans. My award of a “B+” and “B” to two of the AI models may have been overly-generous.

Dunce cap with the word "Average" on a dejected robot, sitting with scales of justice on two sides with human lawyers mocking it. — The best AI bots tested in mid-February were just average and could not begin to keep up with the best human lawyers. Image by Ralph Losey using ChatGPT 4o.

The essay of GPT 4o which I gave a B+ also failed to mention several common defenses under Florida law to the specific performance remedy. My favorite defense, which I have used many times in practice, is Unclean Hands. It is an effective defense to any request for equitable from the court, including a request for a directive injunction of specific performance. Here the requesting party, Paul, arguably had dirty hands. If Paul suspected or knew that the price of the diamond was mistakenly written as $20,000 instead of $40,000 but proceeded with the contract anyway, a court could hold that he acted in bad faith by remaining silent and taking advantage of Helen’s scrivener error. I think it could be used effectively here along with mutual mistake.

Ralph losey's muddy hands in a close up photo by the fabulous Molly. — Actual photo of Ralph Losey hands taken by his wife!

Other common defenses to specific performance not raised here include adequacy of monetary damages and impracticability (Helen no longer owns the diamond).

It is interesting to note that the answer of ChatGPT o3 mini-high was grossly mistaken in its analysis of unilateral mistake and wrote this:

B. Paul’s Conduct and the Doctrine of Affirmation.
1. Knowledge and Silence as Acceptance.
Even though Paul observed the discrepancy, he made no inquiry or objection before signing. Under Florida law, a party’s silence or failure to clarify a term that is manifestly mistaken—when that party has superior knowledge—can be taken as assent to the contract’s terms. Paul’s inaction suggests that he accepted the price as written. Thus, his conduct may be interpreted as a ratification of the mistaken term.
• This principle is reinforced when the error is unilateral, and the non-mistaken party (Paul) is aware of it. His decision not to challenge the price at the time may preclude him from later seeking relief on grounds of mistake.

This is simply incorrect. Moreover, it shows ChatGPT o3 mini-high was confused because Paul would not be seeking relief on grounds of mistake, Helen would. For that reason, I gave a “C-” to o3 mini-high, just slightly better than the hallucinating Gemini Advanced, which at least got the unilateral mistake answer correct.

Also, 4o was too strong about the tort claim against Timandra. I would have preferred more discussion of Timandra’s defenses. To me this looked like a weak tortious interference claim. The model student answer did too, so did the other AI exam answers. All agreed with my skeptical view of the alleged tort and provided good explanations of Timandra’s anticipated defenses.

Overall, it was a good battle of the bots, even if most were just middle-of-the-class kind of answers and none even came close to the human student model answer. Try it yourself with several models and see the differences in answers. All of the models will probably be better by the time you read this. They are now changing at an annoyingly fast pace.

Robot lawyers pointing at each other to emphasize their points (7 robots in front of a gallery of people. — Battle of the bots where the best model only earned a B+. Image by Ralph Losey using ChatGPT 4o.

ChatGPT 4o – Reasoning Explained

This OpenAI model 4o (not to be confused with 4.0) has to be prompted to show its reasoning for the answer it generated. The other OpenAI models tested show it automatically along with disclosure of the time it took to reason before answering. The most reasoning time taken here by any model was 15 seconds. It took another 30 to 45 seconds for the AI’s to generate the response. Yes, it is much faster than we humans are, but not yet as smart, especially at legal reasoning. See for yourself. Here is ChatGPT 4o‘s meta-reasoning.

Ralph Losey: Show the reasoning you used of the prompt before generating the answer

ChatGPT 4o:

Reasoning and Analysis of the Prompt

Before generating the answer, I carefully analyzed the (State) Bar Exam essay prompt to identify all relevant legal issues and ensure a thorough response. Here’s how I broke it down:

Step 1: Identify the Legal Categories and Issues Raised

The prompt specifies three distinct areas of legal analysis:

1. Contract Law – Whether Helen breached a contract and whether Leda can sue.

2. Tort Law – Whether Timandra committed tortious interference with a contract.

3. Ethics – Whether the lawyer and the assistant violated professional responsibility rules.

Step 2: Contract Law Analysis

1. Formation and Terms of the Contract:

• Offer and Acceptance: A written agreement was signed for a 2-carat yellow diamond at $20,000.

• Consideration: The price of the diamond in exchange for ownership.

• Statute of Frauds: Met because it involves goods over $500.

2. Potential Defenses to Contract Enforcement:

• Unilateral Mistake: Helen mistakenly listed the price as $20,000 instead of $40,000.

• Bad Faith by Paul: Paul noticed the mistake but remained silent, possibly strengthening Helen’s defense.

3. Remedies Available:

• Specific Performance: Given the rarity of the diamond.

• Expectation Damages: The cost of obtaining a similar diamond elsewhere.

4. Standing Issue for Leda:

• Leda was not a party to the contract. She must argue third-party beneficiary status to have standing.

Step 3: Tortious Interference with Contract

1. Elements Under Florida Law:

• A valid contract existed.

• Timandra knew of the contract.

• She induced Helen to breach it.

• There was no justification for her actions.

• Paul and Leda suffered damages.

2. Possible Defenses for Timandra:

• She was a bona fide purchaser acting in good faith.

• She did not actively induce breach, but merely purchased an available diamond.

3. Remedies:

• Compensatory Damages (for loss of the diamond).

• Punitive Damages (if bad faith is proven).

Step 4: Ethics and Professional Responsibility

1. Misuse of Client Funds:

• Under Florida Rule 5-1.1(a), client funds must be held in a trust account.

• The assistant misallocated the funds into the operating account.

• The lawyer failed to properly supervise the assistant (Rule 4-5.3).

2. Consequences:

• Disciplinary action (possible suspension or disbarment).

• Immediate corrective action (transfer funds back to the trust account).

Step 5: Structuring the Memorandum

After this breakdown, I structured the answer to:

• Clearly define the legal rules.

• Apply them logically to the facts.

• Discuss likely defenses and counterarguments.

• Conclude with potential remedies and ethical responsibilities.

This reasoning guided the response in the memorandum. Let me know if you’d like any refinements!

3 robots in a law library taking an exam calmly. — Pressure and fear are not a problem for AI, but understanding the emotional complexities of Bar Exam hypotheticals is very challenging for these thinking machines. Image by Ralph Losey using ChatGPT 4o.

I took this meta-reasoning into account when evaluating all of the models’ answers, although I considered it of less importance than the actual answers. For instance, on the tort defense the reasoning states Timandra “did not actively induce breach, but merely purchased an available diamond.” This for me makes the tort claim very weak under the hypothetical facts given. Also, the reasoning on unilateral mistake was essentially correct:

Unilateral Mistake: Helen mistakenly listed the price as $20,000 instead of $40,000.
• Bad Faith by Paul: Paul noticed the mistake but remained silent, possibly strengthening Helen’s defense.

This blog is already too long, so I will not show the second-place exam answer, nor discuss it further. Still, please click here if you want to see the essay of Gemini 2.0 Flash and click here if you want to see the meta-reasoning. If you care to review all of the answers, email me and I’ll send the other four to you for your reading or grading pleasure. I can also provide a link to the Bar’s selected student answer, although it is not too difficult to find online.

Personal Summary: Trust But Verify

My background as a lawyer from a family of lawyers, and my four decades plus experience in private practice helped motivate me to run this battle of the bots. I wanted to try and evaluate the latest models as of mid-February 2025, even though I knew the models were changing weekly. I wanted to have some kind of a benchmark on legal reasoning abilities. My prior tests, and that of others, support the view that the new 2025 reasoning models were equal to that of the average human in general reasoning. Breaking New Ground: Evaluating the Top AI Reasoning Models of 2025 (2/12/25). But what about legal reasoning? Were they at an average lawyer level yet? How about the level of top human lawyers? Those tests had not been done.

Robot at the head of a family dinner table at holiday time. — Family of lawyers with many friendly arguments and one very techie lawyer who may seem robotic. Image by Ralph Losey using ChatGPT 4o.

From my years of practice (not family dinners!) I have come to know average lawyer skills very well. I also know the abilities and legal reasoning of superior, above average lawyers. I have been lucky to rub shoulders with them my whole career. I have become accustomed to learning from superior legal minds, including with my firm today at Losey PLLC. Without my hands-on, nerdy skills in technology, I would not be among the best lawyers (in four fields), not even close. Technology can fill many gaps and that now includes AI technology. Put down lawyers if you must to feel good about yourself, but I can tell you from decades of experience that the best lawyers and judges in the country and very smart indeed. Thier legal reasoning abilities and overall legal abilities far exceed any AI, which seem like little kids by comparison.

Young AI lawyer in goggles and bomber jacket. — AI lawyers today are like small children compared to best human attorneys. Image by Ralph Losey using ChatGPT 4o.

I believe that in order for AI to be taken seriously as a general tool for lawyers, the AI must have analytical skills at least equal to that of the average lawyer. Below average human level legal reasoning is not good enough. AI does not need to be superintelligent, with reasoning ability of the top law professors and super lawyers. It can still be helpful at an average level, just like an average inexperienced associate hired right out of an average U.S. law school. All law firms have drudge work that only require entry level lawyer skills. Those folks should be concerned about AI, especially a few years down the road.

I tried to test the distinction between average human reasoning and average lawyer legal reasoning in this Bar Battle of the Bots study. For the difference between general reasoning and legal reasoning see the AI test discussions in Breaking New Ground: Evaluating the Top AI Reasoning Models of 2025 (2/12/25).

In this Bar Exam series, I wanted to see whether any of the new reasoning models of AI had attained the rationality level of the average lawyer and if so, how close were they to the best?

The answers are yes and not very close.

2025 Reasoning Models Have Only Reached Average Human Lawyer Level

The latest AI reasoning models have reached an average level of legal intelligence—not superintelligence, but still, average human is a significant achievement for a machine. Attaining average human lawyer’s reasoning ability is no small feat. “Average” simply means the midpoint in a data set, outperforming roughly half while trailing the other half. If AI has truly attained this level of legal reasoning, it marks a major milestone. Even if an AI now fell below human average, but still operated within the range of real human lawyers, it would be an impressive accomplishment. Most of the new reasoning models are already at that level. There is an important caveat to that, as will be explained in more later in this article, that we are only talking about thinking here, and there is much more to being a lawyer than that.

An unshaven upset lawyer, a calm intense well dressed lawyer and super man lawyer. — 3-Levels of Human Legal Reasoning: Below Average, Average, Superior. Image by Ralph Losey using ChatGPT 4o.

The battle of the bots tests shared in this article provide evidence that the average legal reasoning level of intelligence has been attained. That means it is now safe for most lawyers to begin using AI in their work, if they have not already started doing so. If you started in 2023 with over-expectations and were disappointed, now you know why. It has taken two more years just to get to the average lawyer thinking level, and even now, it can and does still hallucinate. So, even now we should use the 2025 models carefully: trust but verify. We have seen that even the top models can still miss key issues, and some can even hallucinate key facts. It is bad enough that human clients sometimes fabricate facts, we cannot have lawyers do the same.

Superintelligent AI May Someday Be Attained

Someday, generative AI may reach the level of our best non-fabricating lawyers and judges. But the Bar exam essay answers show we are not there yet. For instance, many of the top ten percent of human lawyers would not only have included the missing defenses described previously, such as my favorite unclean hands, they have noted that key information was missing in the hypothetical. We do not know the money flow and other monetary terms of the transactions. There is no mention of what the jeweler actually paid the supplier for the diamond or when. We know very little about the diamond itself that was ordered and delivered. We only know its weight (2-carats). There are no facts concerning the money flow, nor actual condition of the diamond received. The missing facts are red flags to the best. They understand that legal reasoning requires much more than logic and legal knowledge.

For instance, most human lawyers know that couples would never buy an engagement ring sight-unseen, especially when the fiancee has very particular expectations. There is no mention in the hypothetical of their ever seeing the diamond, before or after the contract, or even looking at photographs. Maybe the fiancée would not like the diamond if and when she ever sees it. Maybe the lawyer’s clients would not want specific performance. Maybe the diamond delivered is not in fact worth $20,000, much less $40,000. Perhaps the supplier misled everyone. Value depends on a multitude of factors, not just weight, including actual size, quality of cut, color intensity, certification and provenance. Maybe the diamond is two-carats, barely yellow, cloudy, and full of inclusions. It could be poorly cut and lack certification. Maybe it was stolen or mined from a banned source and has no value.

All of the parties here obviously trusted too much and verified too little. That is where the lawyer should have stepped in to bring an objective view and healthy skepticism. An expert gemologist should inspect and appraise the diamond, and the clients should see it too. This should be done before the lawyer advises his clients, especially as to remedies. This basic fact-finding process is essential to properly assess the parties’ legal positions. The top answers would have at least mentioned all of this in the exam essay. None did, not even the human student answer selected by the examiners. Fact-finding and discovery are always crucial. Moreover, the best litigators know human nature and that the devil is in the details.

Man in white coat looking through microscope at jewelers. — Gemologist examining a large yellow diamond. Image by Ralph Losey using ChatGPT 4o.

This is just one example of why no one seriously claims that AI has already surpassed, or even closely matched, the top legal minds of today. From my test of the 2025 models, with the average level “C” grade scores, major omissions and even hallucinations, we see that AI still has a long way to go. AI has not yet reached the top 10% of practicing attorneys, much less the superintelligent level—the best of the best, A+, in the top 1% to 3%.

How long will it take for AI to reach superintelligent human lawyer level? I do not know. Maybe never, or maybe as early 2029. Ray Kurzweil and many others have predicted that superintelligence will be attained in all fields (“AGI”) by that time. Ray Kurzweil, The Singularity is Nearer (when we merge with AI) (Viking, June 25, 2024); Losey R., Breaking the AI Black Box: A Comparative Analysis of Gemini, ChatGPT, and DeepSeek (2/6/25).

Many leaders in the field of AI contend AGI will be reached by 2028. See e.g. Shane Legg’s Vision: AGI is likely by 2028 (11/17/23); Jensen Huang’s Life and Company … AGI by 2028 (12/28/23); Artificial General Intelligence, If Attained, Will Be the Greatest Invention of All Time (8/12/24) (reviews predictions and explains).

Please remember, as mentioned before, we are assessing only reasoning ability here—not the full complexity of the human mind, not actual consciousness, emotions, intuition, or other qualities inherent to living beings. Losey, R. The Human Edge: How AI Can Assist But Never Replace (1/30/25). There is a lot more to life than thinking!

two lawyers with crossed arms behind a smiling android in a tech room. — Super legal reasoning AI in the future working with conscious lawyers who can have real smiles, not just fake. Image by Ralph Losey using ChatGPT 4o.

Conclusion

AI’s performance on this Bar Exam challenge offers a revealing glimpse into the state of legal technology. While today’s top reasoning models show impressive capabilities, they remain far from replacing human lawyers—especially when it comes to nuanced analysis, strategic thinking, and ethical considerations. The results reinforce an essential truth: AI can be a powerful tool, but it still requires human oversight, verification, and legal expertise.

For lawyers, law students, and legal tech professionals, the challenge now is to determine how best to integrate AI into legal practice—leveraging its strengths while mitigating its weaknesses. Will AI become a reliable legal assistant, helping attorneys work more efficiently? Or will it introduce new risks that demand careful regulation? The answer depends on how we, as a profession, engage with this evolving technology.

I encourage you to test these models for yourself. Run your own legal reasoning experiments. Do not believe the hype on both sides. Look for professionals with no economic motivation to put their finger on the scale. Share your findings. Remember that even the best of the best human lawyers sometimes make mistakes. Be kind and encourage a collaborative, group effort. The conversation about AI’s role in law is just beginning, and your insights can help shape the future. Let’s move forward together—thoughtfully, critically, and with a clear-eyed view of what AI can and cannot do.

Atorrney giving direction to a human robot team of over a dozen in a futuristic group working room. — Law firm of the future where people and AI are all smart and work well together. Image by Ralph Losey using ChatGPT 4o.

I will give the last word, as usual, to the Gemini twins podcasters I put at the end of most of my articles. Echoes of AI on Bar Battle of the Bots- Part Two. Hear two Gemini AIs talk about all of this, and much more, in just under 16 minutes. They wrote the podcast, not me. Note, for some reason the Google AIs had a real problem generating this particular podcast without hallucinating key facts and making other errors. It took me many tries. It is still not perfect but is pretty good. These podcasts are primarily entertainment programs with educational content to prompt your own thoughts. See disclaimer that applies to all my posts, and remember, these AIs wrote the podcast, not me.

Bar Exam Battle of the Bots Part Two. — Image: Holley Robinson, EDRM with AI.

Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.

Author

Ralph Losey

Ralph Losey is a writer and practicing attorney specializing in providing services in Artificial Intelligence. Ralph also serves as a certified AAA Arbitrator. Finally, he's the CEO of Losey AI, LLC, providing non-legal services, primarily educational services pertaining to AI and creation of custom GPTS.
Ralph has long been a leader among the world's tech lawyers. He has presented at hundreds of legal conferences and CLEs around the world and written over two million words on AI, e-discovery, and tech-law subjects, including seven books.
Ralph has been involved with computers, software, legal hacking, and the law since 1980. Ralph has the highest peer AV rating as a lawyer and was selected as a Best Lawyer in America in four categories: E-Discovery and Information Management Law, Information Technology Law, Commercial Litigation, and Employment Law - Management. For his full resume and list of publications, see his e-Discovery Team blog.
Ralph has been married to Molly Friedman Losey, a mental health counselor in Winter Park, since 1973 and is the proud father of two children.

View all posts