EDRM Collection Standards

Updated January 16, 2014

In May of 2013 a small group of attendees at the EDRM meeting discussed the maturity of the e-discovery industry and how different phases of the EDRM model have developed as standards over time. Not official standards but rather; what processes are repeatable and have understandable risks and rewards that can be used to evaluate a strategy in various cases. The group decided that “Collection” of ESI had evolved to the point that it made sense to document collection best practices and considerations for developing a collection strategy. A team collaborated over the last several months to develop these standards for public comment.

Accompanying the EDRM Collection Standards is the EDRM Collection Standards Glossary.

1. Forensic Image (Physical or Logical Target)

  • Definition: A forensic image is an image or exact, sector-by-sector bit stream image that captures all the ones and zeros on from a source. This type of image will capture both active and inactive data. Inactive data includes files, fragments and artifacts that reside in unallocated and/or slack space including deleted files that have not yet been overwritten. The source can be a physical target such as a physical hard drive or a logical target such as a logical drive (C:\ drive, file system, etc.) contained on a hard drive. Considering all user and OS data is stored on logical partitions it is typically unnecessary to image the non-partitioned space for ediscovery purposes.
  • Also known as: bit by bit image or bitstream image, .E01 file, .Ex01 file, RAW image file, dd file, etc.
  • When to use: If date stamps, deleted data, edit history, web browser history, or registry values have any bearing on issues at bar, a forensic image should be considered. Forensic images are commonly acquired in the contexts of internal and criminal investigations. Forensic images can also contribute to e-discovery process design as a means of capturing static, verifiable preservation data sets. Images are also often acquired when employees terminate their employment so their data may be recoverable even after reassigning their computer, should it be deemed necessary in the future.
  • How to use: In a non-investigative, preservation context, a forensic image may be acquired by a person trained in best practices and competent in the use of the tool of choice. Next generation software can simplify the acquisition process enabling a defensible self-collection even by untrained users/custodians, while the use of a hardware solution can reduce drive imaging to a simple, ministerial task. However, in an investigative context requiring forensic analysis, a forensic image should be examined by a thoroughly vetted, certified forensic examiner with a proven track record as an expert in judicial proceedings.
  • Pros:
    • Creation of static preservation data set in e-discovery context.
    • Improved defensibility since all possible data has been captured and therefore may allow the recovery of deleted files.
    • Provides client with a sense of security.
    • Verifiable at both device and file levels.
  • Cons:
    • May require use of third party forensic examiner/expert.
    • May incur additional cost to extract native files from forensic image format before processing.
    • Potential for over preservation/collection, especially whenever incremental/delta collections are needed.
    • Increased time of collection and potential down-time or disruption to the business.
    • Bandwidth issues if moving the image(s) over the network.
    • Increased storage needs and associated costs.
    • If using self collection the individual may have to testify regarding technology and processes if challenged.
    • If using self collection the individual may need to explain any error messages that came up during the copy process.
  • Glossary
    • .E01 file
    • .Ex01 file
    • Bitstream image
    • Certified forensic examiner
    • dd file
    • Logical target
    • Physical target
    • RAW image file

2. Custom Content/Targeted Image

  • Definition: The resulting dataset from a process of collecting active documents and folders from a computer’s file system. The dataset is an exact copy of the source and intended to be used for evidentiary purposes. Creating such an image is intended to preserve the integrity of the source.
  • Also known as: File copy, logical evidence file or logical copy, forensic duplicate…
  • When to use: When collecting from a trusted source and there is no suspicion of data deletion any user or system process that has access to the system. When only specific files need to be collected. When collection of specific files may be completed and complete metadata is also required. May be used when both parties to a matter agree on specifics of time, date and/or subject matter and there is no allegation that data deletion has already taken place.
  • How to use: Performed using specific Information Technology (IT) software tools that work in conjunction with the computer’s file system to transfer the files in question to an external container while preserving the metadata. Different software solutions use various technologies to initiate the process; sometimes applications or services (i.e. “agents”) are installed on the target machine, with or without the user/custodians’ knowledge, while others may only run in memory after downloading the application via their web browser or starting it on a preconfigured portable USB drive.
  • Pros:
    • Fast collection.
    • Targets only necessary files.
    • Can be run on a live machine or file server without interrupting service to the users.
    • Decreased storage and processing costs over the use of a forensic image.
    • Potential for decreased storage and processing costs if parties can agree on the appropriate data subset, i.e. , a more “reasonable” approach for the majority of e-discovery cases.
    • Most IT departments have basic tools on hand, more comprehensive tools that include data analysis are readily available.
    • New technology conveniently enables the custodian to run the collection based on criteria set by an authorized user (i.e. Legal).
  • Cons:
    • Only collects the specified files, no analysis or recovering of deleted items or slack space.
    • System must be powered – and either available for direct access or available via an Ethernet connection to a Local Area Network.
    • The collector must have appropriate permissions, i.e. read-write access to the computer file system.
    • Some tools must be properly configured to preserve metadata.
    • Collector needs to be trained in how to select appropriate tools and how to execute collection processes.
    • Collector may need to testify in court.
  • Glossary
    • Live machine
    • Logical evidence file
    • Metadata

3. Non-Forensic Copy

  • Definition: Creating a copy of a file using the operating system including file copying provisions in the user interface such as the command line commands “cp” in UNIX and “copy” in MS-DOS; operating systems with a graphical user interface, or GUI, usually provide copy/paste or drag-and-drop methods of file copying.
  • Also known as: UNIX copy, LINUX copy, Windows copy, drag-and-drop, copy/paste.
  • When to use: When preservation of metadata is not required, usually by agreement between parties. This method can be helpful when there is a small quantity of relevant documents in a larger volume and/or when tools aren’t available for logical collection or dedicated collection personnel are not available.
  • How to use: The relevant files are selected by the custodian and then copied using the appropriate command or action to a new location. It is critical to document the process and have specific instructions. Best practices suggest that for graphical system users (Windows users) the Copy/Paste function should be used as opposed to “drag-and-drop” to ensure a copy is made rather than accidentally moving the selected files and thereby deleting them from the source drive.
  • Pros:
    • This is similar to what was done in the paper world in that it relies on users to identify the location of relevant information and gather that information without supervision.
    • Typically results in a smaller collection set reducing processing and review costs.
    • May eliminate highly sensitive irrelevant information from collection.
    • Will likely include information missed by keyword search terms due to the custodian’s knowledge of local practice and usage.
  • Cons:
    • May alter metadata and lose original folder structure.
    • Requires rigorous processes and documentation.
    • May not be easy to reproduce.
    • Defensibility may be compromised if mistakes are made during the collection process.
    • May require re-collection of files in a forensically sound manner if authentication of file(s) is likely to be questioned
    • May require notification to opposing counsel that metadata may be altered during collection.
    • Highly dependent on custodians to locate and copy data.
    • Using filtering (including keywords) during collection may result in missed files.
  • Glossary
    • Copy/paste
    • Drag-and-drop
    • Graphical user interface
    • Metadata
    • MS-DOS
    • Self collection
    • UNIX

4. Exports – Harvesting Email

4.1. Back-End: Server or Archive Solution

  • Definition: An automated process supported by various email platforms to export content for mailbox accounts from a server.
  • Also known as: Export, ExMerge, replicate, extract (PST, NSF, mbox, DBX).
  • When to use: Whenever you have access to a server along with a dedicated internal client IT resource to execute the process.
  • How to use: In collaboration with appropriate internal IT resource, an automated process is staged and executed to send relevant data from the application server to an external location. This process usually leverages internal services within the application from which the data is being retrieved and is therefore application specific.
  • Pros:
    • Complete – entire mailbox account or identified folder locations.
    • Effective automated method to create a preservation data set when scope is accurately identified by location.
    • Reduced collection cost by leveraging internal client resources.
    • Export utilities provided by the authors of the application software reduce risk of altering metadata.
    • Speeds collection in a defensible manner when large number of custodians are at issue.
    • More defensible than using in some forms of individual custodian collection.
  • Cons:
    • Potential for over preservation.
    • Potential increase in culling costs.
  • Glossary
    • DBX
    • mbox
    • PST
    • NSF

4.2. Front-End: Local Email Client

  • Definition: An automated or manual process supported by various email platforms to export content for mailbox accounts from the email client.
  • Also known as: Export, copy, capture (in MSG, EML, PST, HTM formats).
  • When to use: During desk-side collection that identifies content through an interview process.
  • How to use: Use e-mail client export utilities or drag-and-drop directly from e-mail client into Windows foldering.
  • Pros:
    • Targeted.
    • Email client export utilities reduce risk of altering metadata.
  • Cons:
    • Incomplete.
    • Lack of automated audit trail with drag-and-drop into Windows.
    • Drag-and-drop of email increases risk of altering metadata.
  • Glossary
    • drag-and-drop
    • EML
    • Metadata
    • MSG
    • PST

4.3. Web-based: configure local e-mail client

  • Definition: An automated process to download webmail content by configuring a local email client.
  • Also known as: Download, synchronize, create local mail store, configure local client (PST, mbox containers).
  • When to use: When user credentials are available and access can be gained through an internet protocol.
  • How to use: Use internet protocol and email client configuration settings.
  • Pros:
    • Complete – should retrieve entire mailbox account.
    • Effective preservation data set created when scope extends to entire mailbox.
    • Reduced collection cost by leveraging internal client resources.
    • Defensible.
    • Export utilities reduce risk of altering metadata.
  • Cons:
    • Will require an expert in many cases to ensure a complete and accurate collection.
    • Lack of standards in web mail clients and configurations make this complex.
    • Potential for over preservation.
    • Potential for increased culling costs.
    • Difficulty accessing this information.
    • Ability of provider to work with you.
    • Lack of direct access and administrative control can make these collections much more difficult.
    • Difficult to verify complete collection.
  • Glossary
    • mbox
    • Metadata
    • PST

5. Exports – Non-Email

  • Definition: The use of utilities that are included in an application or application suite that enable the export of records to an external location. These utilities are usually provided to enable reuse of data from application to another or to provide backup and recovery of key data.
  • Also known as: Exports from Sharepoint 2013, Enterprise Systems, Databases, Social Media, Instant Messaging.
  • When to use: When collecting structured data.
  • How to use: Work in conjunction with dedicated IT representative to utilize application utilities to export the relevant information. A database administrator (DBA) or system administrator (Sys Admin) is the appropriate person to engage. He or she will understand the export capabilities of the software product in question and will assist in developing a strategy for retrieving the information that has been requested for the matter in question. An export migration plan will be prepared and approved by the legal representatives. That migration plan will result in an automated process being scheduled and run by the IT department to copy the relevant data to an external container. That information then needs to be re-imported into either a standalone tool for analysis or a new “clean” copy of the application on another server to replicate the information that has been requested.
  • Pros:
    • Export utilities potentially eliminate the risk of accidentally altering metadata.
    • The only way to retrieve structured data from key systems so that it may be used elsewhere.
    • Avoids having to give opponents access to sensitive corporate operations and systems.
    • Avoids protracted arguments on running reports against databases and the incongruities those reports sometimes seem to expose.
  • Cons:
    • Requires knowledge of how data is stored and what is included/excluded in export.
    • Can only be executed by database administrators and system administrators within the IT department.
    • Exports may need to be scheduled for nights, weekends or holidays when the software system is not in use by the business.
    • Once data has been exported, it may be very expensive to rebuild a copy of the original system or reformat the data so that it may be read and understood.
  • Glossary
    • Database administrator
    • Metadata
    • Structured data
    • System administrator

6. Exceptions

As technology continues to evolve so do best practices. At this time, the EDRM collection standards do not address the technologies below. It is recommended that you consult an expert when collecting these data sources. As best practices evolve in the industry the EDRM will update these standards to include these technologies.

6.1. Mobile Devices

Reason why not addressed here: There are different manufacturers and brands, with different carriers, different plans, different operating systems/settings, different applications and different tools. This requires different collections software; merging images, cell phone service providers and sms; mixing many different pieces together.

Specific Challenges:

  • Cell phones can be wiped by the cell provider, owner or company (i.e. Find My Phone by Apple).
  • Place in a faraday bag/store in a shielded room.
  • The lock code may be necessary to access the phone.
  • Make sure you seize the power cables.
  • Keeping pace with new technology. New phones are released almost on a weekly basis. Tools cannot keep pace.
  • Data can be stored on the network.
  • Device access can vary from device to device (even the same model) based on carrier, plan, operating system version, etc.

6.2. Instant Messaging

Reason why not addressed here: Each is a proprietary software that is many times different than anything else.

Specific Challenges:

  • Logging chat is generally a feature that is not set to “on” even though users can manually save them.
  • Logging is generally saved at a default location whereas manual save can be anywhere.
  • There are also tools such as On The Record (OTR) and Off The Record that can affect whether messages are stored or even wiped from the computer.
  • Data, videos and images can be sent via the IM tool.
  • Images and video, if stored, can frequently be stored in a separate location and be very difficult to recombine to the original message.
  • Various tools have no formal registration process, leaving room for anonymity.
  • Chat messages can be forwarded to cell phones.

6.3. MACs (Macintosh or MAC)

Reason why not addressed here: While not as difficult as they once were, many Mac’s cannot be disassembled and can only be acquired with the hard drive in place. Most of the common industry standard tools still do not handle Mac systems or images effectively.

6.4. International Protocols

Reason not addressed here: This may cross into privacy standards and the rest of the world is much more strict than US; each country has their standards and their processes; country by country basis typically and these are changing continually.

Specific Challenges:

  • Many countries place privacy of the individual before all else. In addition, many European countries require an intermediary such as a Data Protection Officer (Germany) who can institute their own requirements.
  • For many businesses, sending data in or out of the country can implicate national security concerns, requiring compliance with additional domestic regulations.

6.5. Social Media (or Other Cloud Storage)

Reason not addressed here: Different software, different firewalls and ways to protect against hacking and viruses and spam; different settings and protocols across the board.

Specific Challenges:

  • Difficulty accessing this information.
  • Ability or willingness of provider to work with you.
  • Lack of direct access and administrative control can make these collections much more difficult.
  • Contain embedded audio and video content.
  • Various platforms frequently link to and interact with each other (i.e., a Facebook post that links to a YouTube video).
  • Cannot be effectively collected using traditional tools.

Contributors

  • Julie Brown, Vorys (project lead)
  • Patrick Chavez
  • Teri Christensen, Faegre Baker Daniels
  • Kevin Clark
  • Justin Coffey
  • Sean d’Albertis, Faegre Baker Daniels
  • Kevin Esposito
  • Faisal Habib, AccessData Group
  • Valerie Lloyd, Excel Energy
  • Jeremy Montz, kCura
  • Rick Nalle, KPMG
  • Andrea Donovan Napp, Robinson & Cole
  • John Wilson