April 27, 2012
Here is an interesting article, based on an empirical study of sampling in the e-discovery context:
The study was conducted by taking samples from six inactive e-discovery databases that had already been fully evaluated. By taking 10,000 samples from each database, the authors were able to measure the extent to which the actual sample results conformed to the confidence levels and margins of error predicted by statistical theory. The study concluded that simple random sampling, “when applied to eDiscovery data sets, produces results in line with accepted statistical principles.”
The study also recommends “the creation of protocols and standards for further incorporating [simple random sampling] methods into the eDiscovery workflow. This effort should also include standardized protocols for reporting on the sampling methods employed and the results obtained to ensure transparency in the process.”

