Five Ways to Use ChatGPT for Investigations and Ediscovery

Image created by John Tredennick using Midjourney

By John Tredennick and Dr. William Webber

{Editor’s Note: This article first appeared in the May 24, 2023 issue of LegalTech News from Law.com, a publication of American Lawyer Media.]

Since its release in late 2022, ChatGPT has captivated the world, drawing a million users in its first five days and totaling over 100 million today. Whether you are fearful or excited, there is little doubt we have entered a transformational era, much like the invention of the printing press, the steam engine, cell phones, and the Internet itself. The extent and speed of ChatGPT’s progress (and of competitor systems) remains the only open question.

Like everyone else, we have read lots of flowery articles about the potential of ChatGPT to redefine the professional world. In this article, we intend to go beyond generalities by demonstrating five specific ways legal professionals can use ChatGPT (and its underlying GPT engine) to make ediscovery more efficient and cost-effective. 

Using GPT and a basic search engine, we can show how legal professionals will soon take advantage of Large Language Models like GPT to streamline their ediscovery processes and better understand the key documents in their cases.

Prepare to be amazed.

Our Topic for This Exercise

Most ediscovery efforts start with a Rule 34 Request for Production. For this exercise, we will use one of the legal track topics from NIST’s annual Text Retrieval Conference (TREC), which was created in part to test the ability of different machine learning algorithms to find relevant documents:

“2000 Recount — All documents concerning the contested result of the 2000 presidential election.”

Our discovery collection consists of approximately 290,000 emails that Jeb Bush made public from his two terms as Governor of Florida. TREC used these emails for its programs and they provide a good test bed for the integration of GPT and traditional ediscovery search and review methods.  Our GPT system makes these documents available for interactive analysis and interrogation through GPT. 

1. Using GPT to Help with Keyword Searches

One of the first steps in addressing an RFP is to identify appropriate keywords and build a Boolean search. While we can lift terms from the topic request, experienced ediscovery professionals know that the process can require hours of time testing keywords and analyzing search results to come up with a comprehensive list of terms to include in the search.  

Let’s start by asking GPT to help us determine keywords for this topic with the following prompt (a question or instruction sent to GPT). Watch as we have a conversation with GPT to flesh out our keyword list and turn it into a Boolean search:

Prompt: 

Here is my topic: “2000 Recount — All documents concerning the contested result of the 2000 presidential election.” Can you suggest keywords to help make my search better? 

GPT: 

Keywords that could improve the search for “2000 Recount — All documents concerning the contested result of the 2000 presidential election” include:

election manipulation, voting machines, Diebold, security, paper trail, ballot review, consortium, statewide hand count, overvoted ballots, polls, voting, election, fraud, revote, federal law, media projection, electoral college, standard ballot, personnel, troubleshooting, felony list, and legitimate voters. 

These keywords are supported by the following documents: –

Document 018716: Mentions “election manipulation,” “voting machines,” “Diebold,” “security,” and “paper trail.” – Document 099018: Mentions “ballot review,” “consortium,” “statewide hand count,” “overvoted ballots,” and “outcome.” –

Document 064808: Mentions “polls,” “voting,” “election,” “contested result,” “legal maneuvers,” “ambiguous conclusions,” “procedures,” “previous elections,” and “precedent.” 

Document 124142: Mentions “contested result,” “fraud,” and “revote.” –

Document 065969: Mentions “federal law,” “media projection,” “polls,” “electoral college,” “standard ballot,” “voting machine,” “recount,” and “Florida votes.”-

Document 123697: Mentions “voting equipment,” “personnel,” “troubleshooting,” “felony list,” “legitimate voters,” and “right to vote.” –

Document 145857: Mentions “touch screen voting machines,” “security,” “open election records,” “Perfect Voting System,” “investigation,” “Georgia,” “Maryland,” “audit,” and “public inspection.” –

Document 123823: Mentions “recount,” “contested result,” and “AP Election Documentary.” –
Document 064796: Mentions “recount,” “ballots,” “Secretary of State,” and “deadline.” –

Document 113386: Mentions “recount,” “contested result,” and “political careers.” –

Document 113389: Mentions “recount,” “contested result,” “political career,” and “Katherine Harris.” 

Prompt: 

Can you suggest any other keywords I might use to improve my search? 

GPT:  

Here are some additional keywords that could improve your search: –

Hanging chads
Butterfly ballot
Supreme Court
Gore v. Bush
Florida Supreme Court
Bush v. Palm Beach County Canvassing Board
Bush v. Gore
Absentee ballots
Voter disenfranchisement
Voter suppression
Partisan politics
Legal challenges
Election reform
Public opinion
Media coverage
Political polarization

These keywords are supported by the documents mentioned earlier, as well as other documents related to the 2000 presidential election recount.

Prompt: 

Thanks. Can you build me a Boolean search based on all of the keywords you have suggested? 

GPT:  

Certainly! Here’s a Boolean search string that includes all of the keywords suggested: 

(“election manipulation” OR “voting machines” OR “Diebold” OR “security” OR “paper trail” OR “ballot review” OR “consortium” OR “statewide hand count” OR “over voted ballots” OR “polls” OR “voting” OR “election” OR “fraud” OR “revote” OR “federal law” OR “media projection” OR “electoral college” OR “standard ballot” OR “touch screen voting machines” OR “open election records” OR “Perfect Voting System” OR “investigation” OR “Georgia” OR “Maryland” OR “audit” OR “public inspection” OR “hanging chads” OR “butterfly ballot” OR “Supreme Court” OR “Gore v. Bush” OR “Florida Supreme Court” OR “Bush v. Palm Beach County Canvassing Board” OR “Bush v. Gore” OR “absentee ballots” OR “voter disenfranchisement” OR “voter suppression” OR “partisan politics” OR “legal challenges” OR “election reform” OR “public opinion” OR “media coverage” OR “political polarization”) 

You can use this search string in combination with your preferred search engine or database to find relevant documents related to the 2000 presidential election recount.

Read the original article chronicling the entire experiment, including using GPT to summarize documents, using GPT to answer questions, using GPT to create an investigation report, and using GPT for document review.

Authors

  • John Tredennick

    John Tredennick (JT@Merlin.Tech) is the CEO and founder of Merlin Search Technologies, a cloud technology company that has developed Sherlock®, a revolutionary machine learning search algorithm. Prior to founding Merlin Search Technologies, Tredennick had a distinguished career as a trial lawyer and litigation partner at a national law firm. With his expertise in legal technology, he founded Catalyst in 2000, an international e-discovery search technology company that was later acquired by a large public company in 2019. Tredennick's extensive experience is evident through his authorship and editing of eight books and numerous articles on legal technology topics. He has also served as Chair of the ABA's Law Practice Management Section.

    View all posts
  • Dr. William Webber

    Dr. William Webber (wwebber@Merlin.Tech) is the Chief Data Scientist of Merlin Search Technologies. With a PhD in Measurement in Information Retrieval Evaluation from the University of Melbourne, Dr. Webber is a leading authority in AI and statistical measurement for information retrieval and ediscovery. He has conducted post-doctoral research at the E-Discovery Lab of the University of Maryland and has over 30 peer-reviewed scientific publications in the areas of information retrieval, statistical evaluation, and machine learning. Dr. Webber has nearly a decade of industry experience as a consulting data scientist for ediscovery software vendors, service providers, and law firms.

    View all posts