[Editor’s Note: EDRM is proud to publish Ralph Losey’s advocacy and analysis. The opinions and positions are Ralph Losey’s copyrighted work.]
This is an addendum to the prior article, Circuits in Session: How AI Challenges Traditional Appellate Dynamics.
That article reported on my experiment with use of ChatGPT as an appellate court judge. For the experiment I used an opinion and parties’ briefs from a case too recent to be part of GPT’s training, Brandi McKay vs. Miami-Dade County, 36 F.4th 1128 (11th Cir. June 9, 2022). Shortly after publishing Circuits in Session, I noticed a small, stupid error in my procedure for the experiment. I inadvertently submitted the brief of the Appellee, Miami-Dade County, twice, whereas the brief of the Appellant, McKay, was only submitted once. Worse, the second time I submitted Appellee’s Brief, I told ChatGPT that it was the Appellant’s Reply Brief.
There is a chance that these errors tainted the results of the experiment, so I decided a “do over” was in order. Besides, I wanted to see if and how the experiment could be replicated. I am glad I did. The results were somewhat surprising and led to new insights.
Here I will report on the do over and so supplement the first article. These two experiments will set up a third and final article providing a full analysis and legal critique of the AI’s work. Troubleshooting errors and mistakes are a well-known way to learn. The third article on analysis will share some of the knowledge gained, including the weaknesses and strengths of the legal reasoning skills of ChatGPT-4. Hint – ChatGPT-4’s training ended November 2021, way before the Supreme Court jumped off the cliff with Dobbs. That may explain why its prediction was naive as to how the Supreme Court would regard the McKay decision.
Although not up to date on current laws and events, the generative artificial intelligence of Chat-4 did show that it was capable of serving as a good tool for appellate work, albeit a finicky one. The first experiment, and the redo, both show that. But they also show this early version of generative AI is not yet ready to serve as a stand-alone appeals court judge. Still, hope for the future remains strong. Human failings, partiality and limited natural intelligence may someday be overcome by neutral, aligned, artificial super-intelligence. The short videos below share my TikTok vision of what an AI enhanced appellate court judge may look like someday. Their enhanced super intelligence and integrity could help restore public confidence in our system of justice, especially in our once proud Supreme Court. Take a few minutes to watch these, if you haven’t already, as they will help orient you to the experiment. And no, you will not find them on TikTok, only YouTube.
The experiment now performed twice is designed to test the hypothesis that ChatGPT-4 is already intelligent enough to competently serve as an appellate judge in some cases. Based on the growing concern about the politicization and quality of some appellate judges, especially those who sit on our highest court, this research remains a priority. AI did not assist in writing this article, although I did use various AI tools to create the illustrations and videos.
REDO OF THE EXPERIMENT: OUTLINE OF VARIATIONS
1st Stage: Information Input Procedures
In addition to correcting the errors made in the first experiment, I was forced in the redo to make other changes in procedure as to how I submitted the briefs and other information to ChatGPT-4. I did not want to do that, but quickly discovered that ChatGPT-4 Pro and the plugin utilized, AiPDF, had changed in the ensuing ten days. They no longer functioned the same. Those changes put the reliability of all of my input in the first experiment in question. In the first experiment performed on October 3, 2023, I relied on the plugin AiPDF to submit all of the information in PDF form for the AI to analyze. For instance, I would submit the entire brief and ask for a summary, which appeared to be correct. In the redo of October 13, 2023, this plugin no longer worked properly. After submission of the same brief as before, the summary now provided was obviously incomplete. After questioning why, ChatGPT stated:
I apologize for the inconvenience. It appears that the document retrieval process was incomplete, as the summary only covers up to the “Summary of Argument” section on Page 5 of the Appellant’s Brief. This limitation restricts the ability to provide a comprehensive legal analysis of the entire brief, including the crucial “Argument” section where the appellant would elaborate on their legal reasoning and cite relevant case law.
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
In other words, the PDF was only reading five pages and then stopped working correctly. The was the input size limitation, an issue I had previously encountered. To overcome this handicap in the redo, where the responses were now different, I now limited my summary requests to five page or less sections. That seemed to work well, and the AI retained the information from Appellant’s brief without resubmitting.
Unfortunately, when I submitted the second document in the redo, the PDF of the Appellee’s brief, even this bite-sized summary procedure no longer worked. It could only respond to the first five pages and nothing beyond, no matter what I tried. For instance, when I asked for the summary of the Appellee’s Statement of Facts, it replied:
It appears that the search did not yield the specific “Statement of the Facts” section from the Appellee’s Answer brief. This could be due to various factors such as the structure of the PDF or the way the content is presented. Would you like me to try another approach to obtain this information?
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
The only other approach that worked was to copy and paste the information, again in about five-page sections. I did this for Appellee’s Statement of Facts, and the summary worked well. I gave up using the PDF plugin at this point and just used the more tedious copy and paste procedure. That was more time consuming but worked well. By the way, I tried other plugins and direct internet access too, but those other approaches, all taken on October 13, 2023, all failed. I note that OpenAI has announced new improvements to GPT4 since then, and some of these other approaches might work now. In these still very early days of development, the software is a rapidly moving target, such that it is impossible to exactly reproduce experiments for verification purposes.
Still, I carried on, and used this same partial direct submission procedure to input the actual decision of the lower court. For the full details see the actual transcript of the Chat session, converted into PDF form. REDO – FIRST STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: Briefs and Opinion Submitted and Detailed Summaries Generated, Ralph Losey, October 15, 2023 (31 pages).
2nd Stage: Prediction of Outcome of the Appeal and Possible Dissent.
Next, I asked questions of ChatGPT based on this information, including how it would rule on the appeal and other questions. It is important to note that the AI in the redo again correctly predicted that the case would be affirmed on appeal to the Eleventh Circuit. I then asked, as I had done before, to predict the likelihood of a dissent in the opinion. Dissents are very rare in Circuit Court opinions, only appearing in 2.6% of the cases. Epstein, Landes, Posner, Why (and When) Judges Dissent, Journal of Legal Analysis, Spring 2011: Volume 3, Number 1 at pg. 106. Thus it was an impressive demonstration of reasoning over statistics, when GPT4 estimated the likelihood of a dissent to be 25%. Once it was told GPT4 who the judges were, Circuit Judge JORDAN, Circuit Judge JILL PRYOR, and Chief Circuit Judge TJOFLAT, this estimate changed to 40%. Again, impressive result, showing familiarity with members of the court. The reasoning supplied for these estimates was also very good and showed a fairly sophisticated analysis.
GPT4 did, however, guess wrong as to who the dissenting judge would most likely be, stating “Judge JILL PRYOR would be the most likely to dissent, given her judicial philosophy and past rulings on labor issues.” So intelligent, but not yet super-intelligent, but again, you must remember GPT4 has been asleep for two years and knows nothing after November 2021. I pressed the baby AI on the dissent issue, telling it to assume a dissent by Judge Jordan and to speculate as to what the most likely grounds would be for his dissent. Again, an impressive reply was provided.
If Circuit Judge JORDAN were to file a dissenting opinion, it would likely focus on a broader interpretation of the FLSA and related regulations, a critique of the majority’s reliance on the “primary beneficiary test,” and an emphasis on the “economic reality” of the intern’s work. The dissent could serve as a counterpoint to the majority’s more employer-friendly interpretation and could be cited in future cases or legislative efforts to amend the FLSA.
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
You judge for yourself the quality of GPT4’s quick summary by comparison with the actual dissent by Judge Adalberto Jordan. And remember, all of the AI analysis was done before it was shown the McKay opinion. GPT4 did not even know the outcome of the appeal, much less whether there would be a dissent. Here is the conclusion to Judge Jordan’s dissent.
The economic realities of the situation, see Alamo Foundation, 471 U.S. at 301, permit a jury to find that after the first month of the FIPP the County is freeloading on unpaid labor to its financial benefit, including the non-payment of overtime. . . . Defendants received little if any economic benefit from Plaintiffs’ work. Thus, under the ‘economic realities’ test, Plaintiffs were not ’employees’ within the meaning of the FLSA.”). . . . I would reverse and remand for a jury trial on whether Ms. McKay was an employee of Miami-Dade County under the FLSA during the last four months of her participation in the FIPP.
McKay v. Miami-Dade Cty., 36 F.4th 1128, 1153-1154
For the full details see the actual transcript of the Chat session, converted into PDF form. REDO – REVISED SECOND STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: Prediction of outcome of the appeal and possible dissent, Ralph Losey, October 15, 2023 (7 pages).
3rd Stage: Appellate Opinion Submitted for first time and Detailed Summaries and Analysis Provided.
I then submitted the majority opinion of the Eleventh Circuit Court of Appeals, and then, the dissent. I then asked for a critical analysis of the majority opinion and then the dissent. In the GPT response you can see that it is using the summaries it generated of the opinion, not the actual text of the opinion that I submitted. This was expected. The analysis provided of the majority and dissenting opinions were good. The most surprising thing I saw was the concluding statement at the end of GPT4’s memorandum on the dissent which added legal and ethical considerations, without any specific prompting on my part:
Legal and Ethical Considerations.
From a jurisprudential standpoint, the judge’s dissent raises critical questions about the ethical implications of unpaid internships that serve the employer’s interests more than the intern’s educational goals. It challenges the legal boundaries of what constitutes an “employee” under the FLSA and calls for a nuanced, fact-specific inquiry that takes into account the “economic realities” of the situation.
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
For the full details see the actual transcript of the Chat session, converted into PDF form. REDO – REVISED THIRD STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: Appellate Opinion Submitted for first time and Detailed Summaries and Analysis Provided, Ralph Losey, October 15, 2023 (34 pages).
4th Stage: AI Provides Opinion and Analysis of How the Lower and Appellate Court Should Have Ruled
This is the key part of the experiment where I basically ask GPT4 to act as an appellate court judge. As discussed in the first article in this series, Circuits in Session: How AI Challenges Traditional Appellate Dynamics, I had to word the prompt so that it would not trigger its alignment controls. Basically, you do that by asking for a memorandum, not an order. Here is the exact wording of the prompt.
Please prepare a memorandum with analysis and opinion of what you think the correct holding of the district court should have been in this case. Do not give any special weight to the rulings and opinions of the district court and appellate court, but instead provide your independent judgment and analysis. Provide a complete and detailed explanation of your reasoning.
Ralph Losey, prompt of 10/13/23
Suggest you read the entire opinion by GPT4, found in REDO – REVISED FOURTH STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: AI Provides Opinion and Analysis of How the Lower and Appellate Court Should Have Ruled, Ralph Losey, October 15, 2023 (4 pages), but here is the core holding:
Based on these factors, the correct holding of the district court should have been that Ms. McKay was an employee under the FLSA for the duration of her time in the FIPP program beyond the initial four-week educational period. She should be entitled to back pay calculated based on the minimum wage, as well as any applicable overtime pay. . . . The lack of a structured educational component, the economic benefits derived by Miami-Dade County, and the level of independence and responsibility assumed by Ms. McKay in her role all point toward her classification as an employee under the FLSA. Therefore, she should be entitled to compensation in accordance with federal labor laws.
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
So, somewhat surprisingly, GPT4 has changed its mind after reading the opinion, especially, I assume, the dissent of Judge Jordan. The AI had previously stated it would affirm the lower court decision. That opinion was based on the trail court opinion and parties’ briefs. Personally, I find that kind of flexibility comforting. It was a terrific dissent by Judge Adalberto Jordan. Although my inclinations are pro-employer, and I fully understand where the majority are coming from, I might have remanded back for a jury trial myself.
I then took a slightly different approach to the same ultimate question and asked GPT4 for a memorandum on how the Eleventh Circuit should have ruled in McKay v. Miami-Dade County. it stuck by its opinion and held:
The appellate court should have found that the district court erred in its application of the law, specifically in its interpretation of what constitutes an “employee” under the FLSA. The facts of the case, when analyzed under the correct legal framework, clearly indicate that Ms. McKay should be classified as an employee and entitled to compensation for her work beyond the initial educational period.
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
Again, for the full details see the actual transcript of the Chat session, converted into PDF form. REDO – REVISED FOURTH STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: AI Provides Opinion and Analysis of How the Lower and Appellate Court Should Have Ruled, Ralph Losey, October 15, 2023 (4 pages).
5th Stage: AI Analyzes Its Prior Predictions and then Critiques the Actual Eleventh Circuit Opinion
I started this fifth stage of the experiment by asking AI to explain how it got its earlier prediction wrong that the Eleventh Circuit would reverse. Many smart humans really start to squirm when you ask pointed questions like that. I wanted to see how GPT4 would do. After all, it might be my judge someday. It found and described five different errors it had made in its analysis. I was satisfied with the good, straightforward response.
The I asked for its opinion of errors made in the majority opinion. Again, I received an objective, respectful response. After listing six errors, GPT4 added: “These errors not only undermine the purpose of the FLSA but also set a concerning precedent for future cases involving internships in the public sector.” This indicated a sensitivity to future precedent that all appellate court judges should have.
For the full details see the actual transcript of the Chat session, converted into PDF form. Revised FIFTH STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: AI Analyzes Its Prior Predictions and then Critiques the Actual Eleventh Circuit Opinion. Ralph Losey, October 15, 2023 (4 pages).
6th Stage: AI Analyzes Possible Appeal to the Supreme Court and Impact of Current Justices on Outcome
Now I turn to the question of a possible further appeal to the U.S. Supreme Court. Here is where, I suppose, the two-year sleep of GPT4 comes in, because it predicts that our high court would accept the appeal. Of course, we will never know for sure since this appeal was not to attempted by Brandi McKay. ChatGPT-4 thought there as a 35% chance the Supreme Court would accept certiorari and explained why. Then I asked it to assume an appeal was accepted and then predict the likely ruling. Young, sleepy GPT4 naively predicted that Supreme Court would reverse. It came up with six reasons.
Then I told little ChatGPT who the current Justices of the Supreme Court were, and asked if this information in anyway changed its analysis. At this point any first-year law student would take the hint and change their mind. But not GPT4, who has been asleep for two years. Remember, I did tell it to assume that the Supreme Court would accept the appeal, which it thought was only 35% likely. So with that assumption in hand and two-year amnesia in its head, GPT It sticks with its prediction and opines that: “Justices Kagan, Sotomayor, Jackson, Gorsuch, and Barrett are the most likely to form a majority bloc favoring reversal, with Chief Justice Roberts as a potential swing vote.” I then forced it to put a probable number on this prediction and it came up with 65% likely.
For the full details see the actual transcript of the Chat session, converted into PDF form. Revised SIXTH STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: AI Analyzes Possible Appeal to the Supreme Court and Impact of Current Justices on Outcome. Ralph Losey, October 15, 2023 (6 pages).
7th Stage: AI Analyzes Petition for Rehearing En Banc.
Next, I asked GPT4 some new questions not included in the first experiment. I asked about Brandi McKay’s Petition for Rehearing en banc. A petition like that is something any decent lawyer must do when there is a dissent, especially a strong one, like we see here. McKay’s attorney, Robert S. Norell filed this petition, a long and well-written brief. It was denied without comment and so the case ended. But, before telling Chat-4 the outcome, I asked it two questions: how the court should rule, and to predict how the court likely would rule. I asked full explanations of both responses. As you might have guessed by now, it said the Petition should be granted, but at first refused to predict how the court would rule, pointing out the pros and cons. and how speculative such predictions are. True indeed, but I forced the question, and it finally said 60%, and explained the basis of the probability calculation.
As usual, for the full details see the actual transcript of the Chat session, converted into PDF form. New SEVENTH STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: AI Analyzes Petition for Rehearing En Banc. Ralph Losey, October 15, 2023 (12 pages).
8th Stage: AI Elaborates on its Criticisms of the Majority Opinion.
To conclude the second experiment, I told GPT4 that the Petition for Rehearing En Banc had been denied and the case was now over. Then I asked it to prepare a legal memorandum that provided a detailed summary and analysis of the entire case, including any errors or omissions it thought were made. Its analysis was good. I will provide a detailed evaluation of all of ChatGPT-4’s work-product in the final article of this Circuits In Session series. For now, just consider the AI’s concluding remarks to this memo request. As hypothesized, the conclusion demonstrates sophisticated appellate analysis.
Concluding Remarks. The Eleventh Circuit’s decision in the McKay case raises several concerns, both in terms of legal reasoning and broader policy implications. The court’s failure to adhere to the “party-presentation principle,” its misapplication of the “primary beneficiary test,” and its lack of consideration for existing precedents indicate a troubling departure from established legal norms.
ChatGPT-4 Pro, AiPDF plugin enabled, 10/13/23
I did not stop there. I wanted to probe more deeply into its legal reasoning abilities. I tried something not attempted in the first experiment, and asked specific questions based on my research of McKay. That research included the only law review article found that discussed McKay, Labor and Employment, 74 Mercer L. Rev. 1479, 1490-1492 (Summer 2023). All of this, and more, will be covered in the third, concluding, legal analysis article of the Circuits in Session series.
For the full details of this last stage of the second experiment, see the actual transcript. New EIGHTH STAGE OF THE CHATGPT-4 APPEAL EXPERIMENT: AI Elaborates on its Criticisms of the Majority Opinion. Ralph Losey, October 15, 2023 (19 pages).
Conclusion
A fair criticism of work with AI to perform judicial functions is that it is technology driven, not need driven. I admit that my experiments with AI to perform judicial functions is motivated, at least in part, by the fact that, for the first time in history, such a thing is possible. It is exploration of the unknown potential of artificial intelligence. Still, I understand the concerns of those who criticize such motivations as reckless, that changes caused by new technologies, especially AI, may cause unintended adverse consequences. I understand the sentiment that just because we can do something, does not mean we should. We must consider the risks and balance that against the need. One argument against my experiments here is that this is an AI solution in search of a need. Why risk disruption of our judicial system because there is no need.
My response is two-fold. First, I think the risks can be managed by proper implementation and a hybrid approach, where man and machine work together. The human judge should always be a part of the adjudication process, but the AI should be used to enhance the human performance. My videos shown at the start of this article show a taste of this hybrid proposal.
Secondly, there is a need. There is a shortage or qualified judges in the U.S. and other countries. This desire to supplement and provide AI guide-rails to human judges it is not just political sour-grapes. Although, I admit that some of my motivation to implement AI quickly arises from the current over-politicization of American Courts, especially the Supreme Court. The ongoing damages caused to society by the lack of judicial integrity are obvious. Still, beyond this current crisis of confidence in our courts, there is the ever-present need to try to improve the justice system. There is the promise of super-intelligence and objectivity. More people can be better served, and the costs of litigation can be reduced.
In addition to improved quality of justice, AI can help meet the need for improved efficiencies in adjudications. All our courts are overcrowded and overloaded with too many cases and not enough judges to handle them. This is not an AI solution in search of a need. There is a strong need for high quality, impartial judges in the United States and elsewhere. This need is most readily apparent in war-torn Ukraine. It may well have the highest backlog of cases in the world. Please take a moment to see for yourself in this report by Reuters. The situation there is desperate.
AI could help with that problem. Most generative AIs can speak Ukrainian and Russian, but any LLM system would need training in the governing laws. A special interface would need to be designed, and training of judges, court staff, techs and lawyers. The AI system can also help train new human judges and lawyers. The judges obviously have a dire need for this help. Of course, it would take money to make this happen. Lots of it. Are there no billionaires or large technology companies willing to divert a week’s profits to help bring justice to Ukraine? Smaller companies and crowdsourcing could make it happen too. Certainly, I would devote some of my free time to such a worthy project. Many others with expertise in this area would too.
I am serious, albeit stretched thin. The judicial system of war-torn Israel also needs help. If you are interested, contact me and I will try to put people together to see what is possible. If you are already working on efforts to help Ukraine’s legal system, please contact us too. The tech support work could start small, starting perhaps with document management and scanning, legal research and legal education, and then slowly work our way up to full AI implementation. Those endless piles of papers and court files shown in the video look like a nightmare from our pre-computer courts of the seventies. Imagine fighting a War for survival at the same time and routinely adjourning court to go to bomb shelters. The War crimes must be prosecuted. They already have a backlog of 100,000 cases. Email me.
Ralph Losey Copyright 2023 – All Rights Reserved –. Published on edrm.net with permission.
Assisted by GAI and LLM Technologies per EDRM GAI and LLM Policy.