More Questions about Ediscovery People are too Afraid to Ask

People looking confused in s bubble, Ediscovery Cat logo with cat lasers, powered by EDRM

Unpacking surprisingly complex “simple” ediscovery topics

Having too many ideas to fit into a single installment is a sure sign of stumbling upon a good blog topic, and this is certainly the case with “Things I wish I knew about ediscovery throughout my career but was too afraid to ask.” For a practice area that at first glance seems fairly straightforward, ediscovery has a degree of nuance that makes truly mastering it quite the long and winding road! Without further ado, here are a few more answers to your burning ediscovery questions.

Kitten wrapped in a blanket (inside an accordian folder (Redweld)

What counts as a document these days? 

From Bankers Boxes full of Redwelds to the deluge of email and now the mobile application invasion, uncovering evidence in discovery has been a wild ride over the last two decades. Discovery and production requests were designed with paper in mind. Now, there’s far less paper, yet discovery (or rather, “ediscovery”) has not evolved to catch up. 

How does a Slack channel with 20 people talking about a specific topic over a two-year period translate to the concept of a document? Why are we trying to conform a Zoom meeting with video, chat, and polls to 8.5”x11” production format?

While the universe of data has shifted farther and farther from the original paper documents that preceded it, the base unit in a document review (and in ediscovery platforms designed to facilitate it) remains a “document.” For an email, the document unit of measure refers to each individual email and metadata sent, received, or forwarded. More traditional documents like a spreadsheet, presentation, or Word document refer to the entirety of the individual file (no matter how lengthy) and relevant metadata. 

For newer messaging applications, SMS, collaboration tools, and more, the document classifier varies depending on which software you use. Some will break up a text or collaboration tool chat by date or some other temporal unit while others take a more granular message-to-message approach. Video files, like their document predecessors, extend to the full duration of the video no matter the length. Anytime you engage with newer data types, it’s important to understand the document unit, because some lend themselves more readily to easy review and gaining insights. 

Cartoon with 3 piles of paper, medium (need), small (Don't need) and huge (will never need in a million years but can't risk throwing out becuase of the remote risk that I MIGHT need it will keep me up at night

What is the difference between preservation and retention?

*Co-leader and one of the original attorneys in Morgan Lewis’s eData practice, Scott A. Milner, suggested this great question among others (and demanded credit!)

While the terms retention and preservation are sometimes used interchangeably, there are differences between the two in terms of the source of obligation, mechanisms, and place in the information governance and the ediscovery lifecycles. While both involve saving documents, the difference essentially comes down to purpose: retention meets regulatory obligation and is an overarching policy, while preservation is a reaction to anticipated investigation or litigation. To dive in further:


Data retention and retention policies are a central component of any information governance program. Heck, the Information Governance Reference Model (IGRM) has data preservation smack dab in the middle of the diagram! Data preservation is governed by regulatory obligations (for example, the Sarbanes-Oxley Act might require you to retain certain types of content for seven years) or internal policies. Data retention is all about balancing the competing needs of extracting value from data and mitigating risk (cyber, privacy and more). 

Policies relating to data preservation govern an organization’s obligation to retain information for operational use while ensuring adherence to the laws and regulations governing personally identifiable information, rights to privacy, and other regulations relating to obligations to retain or defensibly delete information. These policies stipulate which data will be archived, to how long it will be kept and what happens to the data at the end of the retention period (archive or destroy).


Preservation, by contrast, is triggered by an event such as reasonable anticipation of litigation or indication of a pending investigation. The EDRM defines preservation as an effort to promptly isolate and protect potentially relevant data in ways that are: legally defensible; reasonable; proportionate; efficient; auditable; broad but tailored; mitigate risks. Often, the obligation to preserve requires data to be moved out of the standard preservation policy (to prevent deletion or modification) and placed under legal hold. This legal hold ensures that standard deletion policies (after a certain period of time or as a result of employee turnover) are suspended and the data is preserved for future use in the ediscovery lifecycle. 

When does my duty to preserve kick in? 

According to the Federal Rules of Civil Litigation (FRCP), 37(e), a party must preserve documents and electronically stored information (ESI) when it reasonably anticipated litigation. One of the foundational cases in the development of ediscovery case law, Zubulake v. UBS Warburg LLC, further opined that this reasonable anticipation of litigation kicked in once there was a credible threat of litigation. 

Trigger events for the purpose of preservation obligation do not only apply to litigation and can include any of the following: 

  • Receipt of a demand letter
  • Subpoena
  • Formal complaint
  • Occurence of an event that generally results in litigation
  • Media reporting of pending litigation or investigations
  • Communication from opposing counsel
Garbage in to an analysis pipeline and then garbage out with a woman looking on with a question mark on her head

What actually happens during data processing?

At the most basic level, processing consists of organizing, indexing, and converting data from 1s and 0s into a usable and reviewable format, while at the same time getting rid of “garbage data.” Generally the saying goes,“Garbage data in, garbage data out.” So how does the processing stage of ediscovery get a legal practitioner from nonsensical code to reviewable and relevant information? 

Data processing contains three main components: data cleanup, data set refinement, and converting the data into a load file for a review platform. Data cleanup is achieved in several ways including:

  • Deduplicate: Use the unique identifier MD-5 to remove copies 
  • DeNIST: Use the National Institute of Standard and Technology list to identify computer files known to be unimportant system files and remove them from your document collection.
  • Decrypt: Unlock documents protected by a password, via digital rights management (DRM) or other encryption schemes
  • Extract metadata: Pull out the data about the data 
  • Image native files: Convert to searchable image with metadata
  • Render for review: Convert from 1s and 0s to a reviewable format

During the processing phase, there are also matter-specific parameters that can be applied to reduce the universe of data that needs to be reviewed. These may include (but are not limited to) data range limiters, key custodians, specific data source, and search terms agreed upon in the 26(f) conference. At this stage, ediscovery tools with analytics capability will also prepare the data for data visualization, concept clustering, technology-assisted review (TAR), and other machine learning functions. Once all of this is complete, the processing tool will either create a load file to promote the data for review in a different platform or load the data directly into the review portion of a platform. 

What happens to data when a case is over?

The answer to this question is not to act like the data was whisked away by the magical data fairy or pretend that it never existed in the first place. Hosting data indefinitely is costly and poses risks, so there are several options you have once the matter is resolved or on an indefinite hiatus. If the regulatory obligation to preserve the data or maintain it under a legal hold is lifted, a legal practitioner can request that the database is exported and sent back to them for defensible deletion or that the service provider defensibly delete the data on their behalf. 

Alternatively, if the hold period has not terminated or if there is a possibility that you will need the data for subsequent related matters or follow on litigation in the future, you can either archive or move your databases to nearline storage (the storage option between active data and archiving). Generally, there is a cost to archive (though archiving ends the recurring hosting cost), whereas moving to nearline generally results in a lowered ongoing hosting cost. In the event you keep a database in either of these forms you should reevaluate the ongoing need to keep this data active to minimize cost and future exposure to keeping data outside of the retention obligation. 

Black clad thief with mask carrying a bag of money in a spotlight

How much will it cost per GB? (Trick question. There’s a lot more than the per-GB cost you’re gonna be paying for)!

As with nearly every facet of ediscovery, that question is far more complicated than it first seems. I took a deep dive with a recent blog, The Gig is Up, but here are many factors that can impact per-GB cost estimates: 

  • Whether the per-GB charge on compressed data or expanded 
  • “All-in” pricing that is not as all-in as at first glance
  • Estimate assumptions may not match the reality (cull rates, review speeds, QC volume all impact final cost)
  • The need for lots of hourly PM support vs. something more accessible (next-gen tools like DISCO require a fraction of the PM support)
  • Cost to get your data at the conclusion of a matter or if you have to migrate 
  • Fees for stuff you should be able to do yourself (logins, batching, case setup, and more)
  • Whether or not AI and advanced tech is used as promised
  • Differing bill rates for expertise
  • Overcharging for weird data types

What occurs during document review? 

When lawyers think of ediscovery, document review is often the first thing that comes to mind. This is the phase where case teams, legions of document review attorneys, and savvy ediscovery experts get to review and analyze documents for relevance to the case and to identify potentially privileged information. Who actually conducts the review is dependent on the client, case strategy, and the budget, but often a third party will be involved to provide additional technological and human power to complete the review. 

Up to this point, much of the ediscovery process is done without human interaction, and while increasingly advanced analytics are used to help amplify human decisions, this is the phase where people are pivotal. Document review is the final critical step that organizes the data by issue, uncovers key pieces of potential evidence, forms case strategy, and ultimately decides what is produced to the opposing side. Generally, this step is conducted within a review platform because the team reviewing the data needs the ability to search, visualize, and tag documents throughout the process. 

Hey girl, You're my gold standard, says dreamy eyed Ryan Gosling (meme)

Is “eyes on every document” the gold standard? 

Nope. Multiple studies have found that linear purely human review is not as accurate as if the case team leverages technology to organize, prioritize, and even provide tagging suggestions. Tools like data visualization, AI-powered data categorization, and advanced search functionality help attorneys prioritize and parse data efficiently while TAR powered by AI helps amplify coding decisions across the entire matter and provide suggestions as to the relevance of data. 

These tools that amplify accuracy and speed are especially important in the costly review stage. Upwards of 70-80% of ediscovery cost occurs during the review phase. And when you consider that human review is not the gold standard many claim it to be, the accuracy benefits alone suggest that human teams informed by advanced technology get dramatically more accurate outcomes while incurring less time and cost. 

The technology does not replace humans, but rather supercharges them — a bit like how Iron Man’s suit informed and empowered him but was still beholden to his direction and insights. Chris Dale aptly pointed out in a Forbes Articleseveral years ago just this “[n]one of this technology solves the problem on its own. It needs a brain, and a legally trained brain at that . . . to [meet] the clients’ objective . . . [of] disposing of a dispute in the shortest time by the most cost-effective method.”

Considering the “gold standard” is no longer the gold standard in currency and that innovation drove the change — perhaps legal practitioners should consider following suit!

Cat in a woman's business suit and neckace, at a table with dollars with a Cat Casey signature background (pulsating)

Why is ediscovery so darn expensive?

For anyone who has been in the ediscovery space for a while, the good news is that individual costs for many aspects of ediscovery are declining. The bad news is that overall cost is still dramatically rising. How can this be the case? In my mind, it comes down to a few factors: 

  • Data overload – The volume, variety, and velocity of data has exploded throughout our lives, and as a result the relative size of ediscovery matters has likewise blown up. A big case earlier in my career was in the hundreds of GB. Today it may be in the tens of thousands of GB. And more than just volume, there are many more types of data that may require bespoke workflows, expertise, or tech to manage. 
  • It’s people! – While many think of technology when it comes to ediscovery, there are legions of experts at every step of the process to ensure that the case is handled in a forensically sound manner. These experts may not be changed as a line item, but they are factored into the overall cost per GB or to license certain tools. While next-gen tech may feel like pressing an easy button compared to back in the day, there are hundreds of engineers, discovery practitioners, and consultants that helped develop, support, and evolve the tech for you.
  • Review, review, review – While technology is dramatically improving, there is still a massive portion (70-80%) of ediscovery spend going to human review. Some advanced tech will cut that by as much as 60%, but there is still substantial cost associated with the document review phase of ediscovery. 
  • Premium tools are costly to develop – More and varied data sources and volume require much more sophisticated tools to manage, parse, and extract insights from. This requires substantial investment in industry-leading engineers, managers, and experts. This sort of undertaking is not cheap, and will likely impact the overall cost of ediscovery.
  • Not using AI – Human review is the most costly aspect of ediscovery, and yet many people are still not deploying technology to accelerate time to insight and realize cost savings. There is a less costly way to approach ediscovery, but practitioners must first embrace it! 

That’s all folks (for now) 

After writing a veritable novel on the seemingly simple and surprisingly complex aspects of ediscovery, I am certain there are still questions unanswered. Hopefully this breakdown along with part one helps get the ball rolling on your journey to ediscovery mastery. Feel free to send any burning questions you may have to me via email or LinkedIn and I promise at some point down the road I will revisit this topic. Until then, stay curious and feel free to ask this ediscovery dork any questions! 


Cat Casey

Catherine “Cat” Casey, Chief Growth Officer, Reveal Cat Casey is the Chief Growth Officer of Reveal, the leading cloud-based AI-powered legal technology company, where she spearheads development and strategy for its advanced legal technology solutions. She is a frequent keynote speaker and outspoken advocate of legal professionals embracing technology to deliver better legal outcomes. Casey has over a decade and a half of experience assisting clients with complex ediscovery and forensic needs that arise from litigation, expansive regulation, and complex contractual relationships. Before joining Reveal, Casey was the Chief Innovation Officer of DISCO, and director of Global Practice Support for Gibson Dunn, based out of their New York office. She led a global team comprising experienced practitioners in the areas of electronic discovery, data privacy, and information governance. Prior to that, Casey was a leader in the Forensic Technology Practice for PwC. Prior to that Casey built out the antitrust forensic technology practice and served as the national subject matter expert on ediscovery for KPMG. Casey has an A.L.B. from Harvard University and attended Pepperdine School of Law.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.