What is Big Data in Ediscovery?

Young busineess Person with jaunty hair in front of a computere with various data types visualized and a stack of 3 big books

[Editor’s Note: EDRM is proud to post our Guardian Plus Trusted Partner Zapproved’s Educational Resources.]

Driving Down the Cost of Ediscovery Part 1

Taming the Data Dragon

The more data you have, the higher your ediscovery costs. Therefore, the first step to driving down the cost of ediscovery is to simply reduce the amount of data you have to manage. Data reduction requires consideration of the entire lifecycle of data, beginning with a plan for information governance and ending with defensible data disposition. But first, let’s explore exactly what big data is and more importantly what it means for ediscovery.

What is Big Data and How Does it Affect Ediscovery?

The term big data is thrown around a lot and tends to be a hot button topic as our world becomes more technology forward. According to Oracle, big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs.

Put simply, big data is larger, more complex data, often coming from entirely new data sources. These data sets are also so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.

How does this really affect ediscovery? Well, for starters, as the amount of data that companies have has expanded dramatically, the ediscovery process has become much more complex. For example, when the threat of litigation looms and legal holds must be put in place, it’s now a lot more difficult to segregate data that may be subject to a hold and should not be deleted. Below, we’ll explore a few key ways ediscovery has been affected by big data. 

Information Governance: Organizing Your Organization

Information governance (or “info gov”) is the overarching approach that dictates how your company controls and organizes the mountains of data it generates from sources like employees, customers, prospects, and departments. Companies need to be able to process, store, and retrieve data for a number of reasons, including regulatory requirements and litigation as well as product innovation and sales strategies.

Ediscovery professionals can play a key role in establishing the information governance initiatives that set the rules for how data is acquired or created, processed, managed, stored, retrieved, and deleted. You already have insight into the data types and repositories that have historically constituted higher volumes or greater risks for ediscovery (such as the time required to preserve, collect, and review that data). You can further refine your information governance recommendations by monitoring basic metrics in your work, such as tracking where time and money is being spent on ediscovery by data type or source.

More Devices = More Data

Legal teams are a great resource for identifying the devices, applications, and data repositories that employees are using — even when they’re not on the “approved” list. When you interview custodians, verify your data creation and retention practices with a few key questions:

  • Are you using any personal devices or applications for business purposes?
  • Have you stored or accessed any data anywhere outside of the company-approved list of repositories (for example, in unauthorized collaboration tools or cloud storage repositories)?
  • If so, does any of that externally retained data include any personally identifiable information (PII) or payment card industry (PCI) data?

Clarifying the type of data being retained, such as PII or PCI data, can aid in reducing cybersecurity and privacy risks to the company. During ediscovery, legal teams interact with both custodians and their data, identifying the types of data captured and retained on various devices. These custodian interviews are essentially a hands-on way to understand employees’ real-world data governance practices. All of these routine ediscovery actions give you unique insight into the potential vulnerabilities and risks posed by your data, which can inform your privacy and security initiatives.

Consider circulating portions of your custodian questionnaires to employees periodically, independent of any legal hold; they represent a quick way to determine whether any unauthorized devices, applications, or data repositories have made their way into your organization.

Zapproved Pro Tip

Defensible Deletion

To adequately manage your data, you must have a plan for defensible deletion, including adherence to record retention schedules coupled with clearly understood preservation obligations. Companies are required to keep data for a variety of legitimate reasons, such as regulatory compliance, records management, and legal obligations. However, when data has outlived its legitimate business purpose, organizations should get rid of that aged, redundant, or obsolete data. The question is, how can data be deleted in a manner that will stand up to later scrutiny should a claim of spoliation be leveled against the company?

Legal teams can help assess the risk of data retention and deletion policies for legacy applications and storage devices (including old backup tapes). They can also provide guidance on reasonable practices to inform the decisions about whether to keep or discard data that is otherwise deemed redundant or obsolete. And, of course, they can pave the way for defensible deletion by routinely releasing expired legal holds after a duty to preserve no longer applies.

New Technology Selection

Finally, legal teams should be engaged in evaluating new applications or data repositories that are being considered for use within an organization. Any formal evaluation should include a thorough assessment of how the data that will be created can be governed, encompassing how long it should be kept, how it can be preserved when necessary, and what protocols are used to collect the data when required for ediscovery.

Author

  • Mary Mack

    Mary Mack is the CEO and Chief Legal Technologist for EDRM. Mary was the co-editor of the Thomson Reuters West Treatise, eDiscovery for Corporate Counsel for 10 years and the co-author of A Process of Illumination: the Practical Guide to Electronic Discovery. She holds the CISSP among her certifications.