Navigating a Shifting Landscape

An Interview with Dr. Maura Grossman by Kate Halloran

Navigating a Shifting Landscape
Image: Kaylee Walstad, EDRM with AI – Hat Tip to Ralph Losey’s Visual Muse.

[Editor’s Note: This article was first published June 2024 and EDRM is grateful to Dr. Maura Grossman for introducing us to Jennifer D. Adams, Editor of Trial and Senior Director of AAJ Publications and Education, for permission to republish. The opinions and positions are those of the author.

Kate Halloran interviewed Dr. Maura R. Grossman for the magazine, Trial, a publication of the American Association for Justice (AAJ). The EDRM blog is publishing an excerpt. At the end of this post, the entire article can be scrolled by clicking inside the article and/or downloaded by clicking the download button.. The interviewer’s questions are bolded and Dr. Grossman’s responses are in plain text.]


What do you recommend that law firms focus on when developing policies around the responsible use of AI?

Firms must have a very clear idea of the scope of permissible and impermissible uses for generative AI tools. Under what circumstances may such tools be used? For example, perhaps law firm staff may use a tool for internal firm communications but not for sending anything to a client. Or perhaps they can use a tool for which the firm has an enterprise license that has protections for confidential data, but they may not use a public tool like ChatGPT that lacks such protections. Firms should specify permissible or impermissible tools and uses explicitly.

But we’re now in a world where deepfakes are so good that virtually everything will pass that very low threshold of “Is it more likely than not Maura’s voice?” And it’s not enough for the opponent of the evidence to just say, “That’s not me.”

Dr. Maura R. Grossman.

There may be certain purposes for using AI tools that are fine—like an initial draft of deposition questions or preparatory materials for a hearing. But maybe you don’t want staff conducting research with the tool if it isn’t one that has been trained specifically on case law in your jurisdiction. Perhaps the firm wants to prohibit the creation of any deepfakes or cloning of anyone’s voice, even if it is meant in a funny, harmless way. Deepfakes, even as jokes, can get out of hand quickly. Whatever the firm’s policy is, spell it out in writing.

Most firms already have document retention policies. Firms will now need to add prompt retention policies and output retention policies (in other words, material generated from AI tools). Perhaps users need to keep a copy of the original output and the edited version. And then if a court asks, you can prove that you reviewed and edited it.

Mandatory training on the AI tools that you choose to implement and your firm’s policies related to those tools is very important. Everyone who is going to use this technology needs to understand how it works and its benefits and limitations. All of this is moving at such a fast pace that we will need ongoing monitoring and compliance checks to make sure people are using the tech properly, new hires get the appropriate training, and firm policies stay current.

Firms will also need to think about what to communicate to clients. Is it going to be via an engagement letter that says, “We use AI for X, Y, and Z purposes. If you would like to discuss the use of AI on your case, please raise this with your attorney.” Or is it going to be a conversation you have with all new clients? And if you’re just using AI to correct grammar, or to make a paragraph a little tighter, or to generate an initial draft of deposition questions, is it even necessary to have that discussion at all? Clients probably don’t care about those sorts of things, but they might very well care if you are drafting court pleadings or an opening argument using AI.

Let’s shift to how AI could affect evidence in court. What are the main issues?

There are two different circumstances where parties may seek to admit AI or purported AI. One is where both parties agree that the evidence is AI based. For instance, both parties agree that an AI tool was used for hiring and the plaintiff didn’t get the job because the algorithm

said, “X is in the bottom quartile and doesn’t qualify for this job.” So, then the process for admitting related evidence tends to look the same as the process for most technical and scientific evidence. The questions are going to be: How does this tool work? Are there standards for its operation? How was it trained? What is the data that it was trained on? Has it been tested? What is its error rate? Has it been peer reviewed and adopted by others in the industry?

In the employment example, if you are a woman of color and the training data was gathered primarily from white males, the tool likely won’t make an accurate prediction. We want to know about the training data and whether it’s representative of the groups about which the tool is being used to predict. What due diligence was done, or what was done to test that this tool is both valid and reliable? “Valid” meaning it measures or predicts what it’s supposed to, and “reliable” meaning that it does so consistently under similar circumstances. We also want to know about bias.Is the tool biased against certain protected groups?

The second situation is different.

It involves disputed AI evidence or deepfakes. You say you have audio of me saying “X, Y, Z,” and I say, “That’s not me. I never said that.” Under ordinary circumstances, to get that admitted, all you have to do is find somebody who knows my voice really well to testify that it’s my voice. And that would authenticate that piece of information for admission, and the question of its weight would go to the jury.

But we’re now in a world where deepfakes are so good that virtually everything will pass that very low threshold of “Is it more likely than not Maura’s voice?” And it’s not enough for the opponent of the evidence to just say, “That’s not me.” It becomes more helpful if they can say, “The metadata— the data about data—in this audio says it was recorded on Wednesday,March 20 at 1:23 p.m., but here’s proof that I was in the dentist’s office under anesthesia having my teeth drilled at that time.” Then it becomes a much more complicated question for the court.

You and retired federal district court judge Paul W. Grimm have made recommendations to start addressing these concerns. What do they encompass?

For the first scenario, where the parties agree that the evidence is AI or the product of AI, the Daubert factors (FRE 702) work pretty nicely. But the opposing side may argue the technology is proprietary and shouldn’t be made available to you. So there may be a battle about whether the data or the tech is proprietary, or whether there should be a protective order and what it should say. Let’s assume the court says the underlying data or technology must be produced—what exactly is going to be produced and how? The parties must leave some time for this discovery, especially if it’s important evidence that could make or break the case. It’s not something that you should be springing on the court right before trial.

But the wording of the Federal Rules of Evidence is a little vague and confusing in this regard. The rules use the word “reliability” and, in some places, the word “accuracy.” But the terms that scientists and people who are steeped in this area use are “validity” and “reliability.” “Validity” refers to whether the tool measures what it purports to measure, and “reliability” means that it does so consistently under substantially similar circumstances.

Dr. Maura R. Grossman.

In the second scenario, if one party is either going to proffer evidence that it thinks will be questioned as a deepfake, or the other side intends to challenge the evidence as a deepfake, a hearing with experts is likely needed. And again, the parties must give the court sufficient time to address and rule on these issues. One of the things that Judge Grimm and I emphasize about the first scenario is that we think it’s relatively straightforward and that we already have tools available. But the wording of the Federal Rules of Evidence is a little vague and confusing in this regard. The rules use the word “reliability” and, in some places, the word “accuracy.” But the terms that scientists and people who are steeped in this area use are “validity” and “reliability.” “Validity” refers to whether the tool measures what it purports to measure, and “reliability” means that it does so consistently under substantially similar circumstances. So, we proposed an amendment to use those words and then spell out how to incorporate the Daubert factors into the existing standards for admitting evidence.

The second scenario is harder because, normally, if the evidence meets the preponderance threshold, it comes in—as in the earlier example of the audio of my voice. We’re concerned that if the audio is a deepfake, it will make such an impression on the jurors (or even the judge) that they won’t be able to get it out of their minds. We’ve suggested that under these circumstances, there should be a bit of burden shifting. If the party challenging the evidence can make a preponderance showing that it’s just as likely as not that the evidence is fake, and the proponent has made a showing that it’s just as likely as not that it’s real, then the judge shouldn’t put that evidence to the jury if the potential prejudice outweighs the probative value. The judge should make the decision by looking at the totality of the circumstances and balancing those two things.

We proposed an amendment to FRE 901(b)(9) and a new FRE 901(c). Previously, we had suggested tweaking FRE 403—the rule that permits a judge to exclude evidence that is unduly prejudicial—but the threshold before FRE 403 is triggered is very high. That’s why we proposed the other changes. But at its last meeting in April, the Advisory Committee on Evidence Rules rejected our proposals and asked us to go back to the drawing board. The committee is not yet convinced that deepfakes pose a unique problem warranting a rules change.

Could you talk about how deepfakes might affect juries? 

I think there’s a risk of “automation bias”: the view that it came from a machine, so it must be true. And some people block out information that is inconsistent with what they already believe. This is called “confirmation bias.”

And some people will become very cynical and begin to disbelieve all evidence that is put in front of them. They think you can’t trust any of it, so they start to make decisions based on things other than the evidence before them, which is very dangerous.

[Editor: To read the entire article, click below and scroll, or use the download button at the bottom of the article.]

Author

  • Trial magazine

    Kate Halloran is the Managing Editor of Trial, Departments & Production for Trial Magazine, a publication of the American Association for Justice (AAJ).

    View all posts Managing Editor of Trial, Departments and Production.