EDRM Evergreen/Processing/Practices to Consider
From Working EDRM
| Comments: Please submit comments to the EDRM Evergreen Processing forum |
Categories
add introduction
Contents |
Audit and Chain of Custody
It is often appropriate that the collection team be trained in computer forensic to insure that the collection process is done according to forensic protocols so that all data collected is properly preserved and that you do no harm to the computer. The level of training will depend upon the complexity of the collection and computer systems. The trend is for automation of the entire collection process in order to avoid collection errors and chain of custody problems. Have three teams or subsets of each electronic discovery provider group. The first teams are the forensic investigators. It is their job to collect the evidence and document that process. The second team is in charge of logging, inventorying and safeguarding the evidence. The third team is in charge of copying the original data, fingerprinting it (via MD5 hashing) and analyzing the data. While these teams may overlap, it is generally a best practice to keep the second team small and differentiated from the other two teams. This is important because the task is very different and calls for a different skill set. The logging and inventory personnel need to be among the most organized in the organization. The logging and inventory processing are very likely to be subject to the most challenge in litigation. When the evidence (computer or media) is physically collected, document the collection by having the collector sign a form indicating: a) the date, b) time, c) name of the person(s) from whom the evidence was collected; and d) a description of the item(s) collected including unique identifiers (manufacturer name and serial number if possible and at least the manufacturer name and model number when the serial number is not apparent). (**Important note: If the evidence is shipped to the electronic discovery provider, it should only be shipped via a carrier that provides excellent shipping and tracking documentation, insurance and high reliability. (For these reasons, we generally ship via companies such as FEDEX or UPS.) Bonded point-to-point curriers can also be utilized depending on security needs and cost. A copy of the form should be provided as a receipt to the person/company from whom the evidence was collected. Note that if a trained forensic investigator collects the evidence, he or she should complete a more lengthy form which also includes the address of the premises and lists the names and versions of any hardware or software tools used to make the collection. This form should also provide space for notes to capture the kinds of details that would help the investigator recall the events surrounding the collection should he or she ever need to testify. (Providing this longer form to persons not trained in forensic investigation may cause confusion.) As soon after the collection as is practical, the electronic discovery provider needs to take physical custody of the evidence. Following its written procedures the employee or employees responsible for logging the evidence collection should be given custody of it immediately. We recommend using a database to capture the log information. (While not yet a best practice, in the ideal electronic discovery environment, this database log would be available to clients and other interested parties through a secure log-in via the Web.) The headings should include at least the following:
- Electronic discovery identification and inventory number (we strongly recommend using a barcode labeling system)
- Date received
- Matter name
- Client name
- Client/matter number
- Name of person/company/shipper delivering evidence
- Description of item(s) (including manufacturer name, model number and unique identifier/serial number whenever possible)
- MD5 Hash of each piece of media where possible (electronic fingerprint)
- Name of person receiving evidence (Logged by)
- Check Out (check box—Yes/No)
- If “Yes”,
- Date
- Reason
- Custodian name
- Name of recipient (used when evidence shipped form electronic discovery provider to anyone)
- Name of shipper
- Shipper’s tracking number
- Date of shipment
- Date of receipt
- Check-in date
Whenever the original evidence is accessed, it should only be available to the small team in charge of logging and Processing securing the evidence. Any activities involving the original evidence should be logged. After the logging, a communication to the owner of the evidence should be sent confirming the receipt of the evidence. When the evidence (computer or media) is physically collected, document the collection by having the collector sign a form indicating: a) the date, b) time, c) name of the person(s) from whom the evidence was collected; and d) a description of the item(s) collected including unique identifiers (manufacturer name and serial number if possible and at least the manufacturer name and model number when the serial number is not apparent). A copy of the form should be provided as a receipt to the person/company from whom the evidence was collected. Note that if a trained forensic investigator collects the evidence, he or she should complete a more lengthy form which also includes the address of the premises and lists the names and versions of any hardware or software tools used to make the collection. This form should also provide space for notes to capture the kinds of details that would help the investigator recall the events surrounding the collection should he or she ever need to testify. (Providing this longer form to persons not trained in forensic investigation may cause confusion.)
Copying, Fingerprinting and Analyzing Original Data
As soon as practical after logging it, inventorying and safeguarding the data the original evidence should be forensically copied using a copying tool that does not change the data in any way. Note that many forms of duplication do change the data. Even booting a computer or hard drive with its usual operating system will change the data. It is important to only use software and hardware tools that are certified for non-intrusive duplication. It is also important that these tools only be operated by persons who have been trained to operate them. As soon as possible after collection, the evidence should be handed-off to a sub-set of the electronic discovery provider’s team who are charged with logging and safeguarding the evidence. In a large organization this team should be an entirely separate set of personnel from the collections and analysis teams. If the evidence is shipped to the electronic discovery provider, it should only be shipped via a carrier that provides excellent shipping and tracking documentation, insurance and high reliability. (For these reasons, we generally ship via nationally recognized package shipping companies.) As soon as possible after collection, using a non-destructive hashing tool, an MD5 hash[1](http://edrm.net/index.php/Processing_-_Audit_and_Chain_of_Custody#endnote_note1 ) should be obtained from the media. This hash is a unique electronic fingerprint that allows others to verify that the original evidence was not altered from that point forward and that duplicates of the media are truly identical. This is especially important in a forensic media collection. Not only can electronic discovery tools analyze files, they can also locate files and file fragments that were deleted or are in unallocated hard disk space. The MD5 hash of the entire media helps prove the authenticity of the original media and copies and lays the foundation for claims related to information found on the media that are not in the traditional file structure. The MD5 hash of the original media should be compared to the MD5 hash of the duplicate media. This step is documented by most media copying hardware. The receipt document generated by the hardware should be kept with the original evidence and a copy should be kept with the duplicate media. If possible, the electronic discovery process is subject to less criticism if the original evidence is preserved and not put back into production. If the owner requires the immediate return of the evidence, this transaction should be documented in writing with a cover letter and shipped using a highly reliable carrier. The tracking number should be contained in the cover letter and the return process should be detailed in the logging database. After the forensic duplicate is made, the original should be tagged with a summary of the logged information or with an identifier that ties it back to the evidence log or both. (We recommend using a combination of written tagging on the evidence bag and a bar code.) The evidence should be placed in a sealed bag if possible. The evidence should then be secured in an environment with limited access and safeguarded from foreseeable mishaps such as fire or the accidental activation of fire sprinklers.
Whenever the original evidence is accessed, it should only be available to the small team in charge of logging and securing the evidence. Any activities involving the original evidence should be logged.
As part of the forensic duplication process, it is a best practice to create an MD5 hash of every file and .pst. When .pst or other compilation-type files are separated into messages or smaller segments, an MD5 hash should be created for every message or segment. Again this is a way to confirm that future duplicates have not been altered and it is the primary way that native-file productions can be tracked.
When a matter closes, the original evidence should be either returned to the client/original owner of the information or stored along with the paper portions of the file and subject to the electronic discovery provider’s data retention policy. The lawyers directing your work should be consulted regarding the disposition of working copies of data. This includes all duplicates and analysis sets including those on the network. Generally the attorneys will direct you to deal with this data in one of three ways. They may direct you to place the duplicates and working data into the applicable data retention scheme for the rest of the file materials, they may want it offered to the client/owner or they may ask you to destroy the data. No matter the disposition, this process should be documented in the logging database.
Unfortunately, most of us do not have access to a software tool or suite that can perform all of the forensic collection and analysis. The standard today is to use a collection of individual tools. It is a best practice to re-hash or re-fingerprint a sample of your data files every time files are put into a new tool or environment to confirm that the files remain identical. Finally, in native-format productions where an exchange of data is being provided instead of an exchange of paper or .tif images, it is important to perform one final MD5 hash of all data files produced (as well as one of all data files received). These MD5 hashes are sometimes used as a modern equivalent of a Bates stamp. The indexes of these hashes will be crucial to determine where a given file came from. Every electronic discovery provider should audit its own procedures and logging methods once a year and consider augmenting them. Providers should consider having an annual audit conducted by an outside auditor from an IT consultancy, a large accounting firm or by a non-competitor electronic discovery provider. The audit will have three benefits. First it will confirm whether or not your current procedures are being followed. Second, it will be an opportunity to carefully consider your procedures and whether the procedures should be revised. Third, a positive audit report can be a powerful sales tool. Note also that some clients are requiring audit reports before retaining electronic discovery vendors. Finally, the audit process also provides the opportunity to review procedures before they are questioned by opposing counsel in litigation.
Cost Drivers
The cost of processing documents for electronic discovery can be tremendous. Determining which party should shoulder this cost has been a critical driver of several court decisions in which the cost for producing the electronic evidence was shifted from the producing party to the demanding party. Data that exists in antiquated systems and which was originally created and exists in multiple media formats and file structures is costly to recover and to protect for discovery. The issues that are considered in these situations have been driven by the shear volume of this data and the technology required recovering it. It is important therefore to understand the elements that drive the cost for processing this data and any current electronic data. The volume and composition of the dataset is a driving element in the cost of electronic discovery. As complex as identifying and collecting the data may be, the complexity of processing electronic data drives unique requirements on how the data must be handled. Converting and indexing the data into a common searchable usable format can be a labor intensive, specialized activity. Supporting the conversion of the data are the software tools and service organizations that must be used to perform the electronic conversion of the data from numerous random formats to a common output that can be used for review. These tools can run the full range from simple tiff printers to enterprise wide integrated products capable of processing vast collections of source files in short periods of time. The cost of these tools and services depends on the volume and rate of conversions to be performed and the time that the data must be delivered in. Software tools and services also require infrastructure. The infrastructure may be as simple as a desktop PC or it might include networked racks of servers and roomfuls of storage arrays. Support software, such as SQL databases, and native software applications, etc. contribute to the cost of infrastructure. Therefore, the capital investment in either software tools or infrastructure on which services are deployed drives cost. This is clearly a situation where one size does not fit all and the need to select the correct tools, services and the correct support environment to meet the size of the discovery becomes critical. Service providers may be able to minimize costs by spreading the cost of the above elements across multiple cases. Specialized tools and environments require a qualified and trained staff. The nature of electronic discovery requires a higher skill level than does paper document capture. Scan operators need to be replaced by IT personnel who understand and know how to operate the tools and support the environments mandated by the technology involved. Because of the wide variability in the capability of the service providers, multiple schemas have been found for pricing these services. There is not a consistent method of pricing. Some vendors still charge via the page, some by the file count and others by the volume of gigabytes. For budgeting purposes, volume pricing is the most reliable since you typically know how much data you are dealing with for your case. However, you should be aware that in some instances per page or per file pricing can be cheaper.
Status Reports
Status reports provide information regarding how quickly data collections are progressing through the automated process. These reports can be categorized in a number of ways; by piece of media or by custodian being the most commonly used. Status reports are a great way to track the progress of the dataset being processed, so if there are any anomalies that arise that require a change be made to the delivery schedule, all the appropriate parties are informed as soon as possible. Status reports can also be used to refine cost estimates as more information regarding the number of files and/or the number of images resulting from the files can be tracked in real time.
Selection Considerations
Searching electronic data is a methodology being employed during many phases in electronic discovery. Searching may be used to determine what data is collected; used within a review tool to prioritize review workflow; used during the processing of electronic documents as a means to both cull and flag data prior to review. Proper understanding and implementation of searching technology and methodology can greatly impact the cost and time for any electronic discovery project, for both data processing and review.
Search Techniques
The search strategy can be determined by a number of factors. As a result, the appropriate search technique often varies from matter to matter. Below are a few examples of common search techniques.
Culling and Searching Considerations
Electronic discovery is often depicted as a linear process. In practice, a sound discovery process requires the flexibility to manage inputs from downstream activities (such as review) that impact upstream strategies and tactics (such as processing). Culling and searching occur throughout the discovery and review process and should be included as key components of the overall discovery strategy. Considerations for culling and searching include:
Sampling and Developing a Strategy
Gaining an initial understanding the collection can be difficult because the discovery team very infrequently receives all of the data at the same time or has the capability (or resources) to process all of the data at once. A common tactic for gaining insight into the collection is sampling. To obtain a reliable sample, consider collecting and processing key identified custodians first. Key custodians typically provide the most fertile content for exploration. A solid sample set can provide the team with a mini-collection for testing theories, identifying new custodians and data sources, and developing an appropriate processing and review strategy.
Early Review Benefits Culling and Searching
In discovery, time is always a factor. The processing strategy—including culling and searching—should ensure that a review team starts reviewing documents as soon as possible. Often a key document is found in review that can impact the entire discovery process. Being able to start review during the processing phase not only saves valuable time, but also can help further refine the culling and search strategy. Early identification of key documents is very useful in focusing the search strategy and is often a good way of identifying other useful search terms and techniques.
Suppress, Don't Delete
During the culling phase, many documents will be removed from review consideration based on the criteria established by the review team and/or negotiated with the opposing counsel. When and if documents are removed, make sure that they are only suppressed from review and can still be accessed if needed later. Before culling begins, understand how culling decisions are tracked and how to audit the system(s).
Measure Success
Large-scale document review can be an arduous process with frequently changing requirements. Help clients understand progress and success by putting metrics and reporting in place at the beginning of review. If and when a surprise occurs, this will help clients understand the related impact on time and expense. During the culling and searching phase, common metrics include average volume per custodian, culling rates, and relevancy rates.
Is It Defensible?
Finally, a strategy that is not defensible is not a strategy. Ensure that an accurate and automated audit trail is in place. In addition, make sure that the attorneys can explain the culling and search strategy in layman’s terms to clients, opposing counsel and a judge if necessary.
Footnotes
- ^ MD5 was developed by Professor Ronald L. Rivest of MIT. What it does, to quote the executive summary of rfc1321, is:
[The MD5 algorithm] takes as input a message of arbitrary length and produces as output a 128-bit 'fingerprint' or 'message digest' of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA. In essence, MD5 is a way to verify data integrity, and is much more reliable than checksum and many other commonly used methods.
[updated Jan. 29, 2008]

