EDRM offers the following Datasets listed below.
EDRM File Format Data Set
EDRM_Data-Set_File-Formats_1-0.zip --
Contents (17.6 MB):
- Readme.txt
- EDRM_Data-Set_File-Formats_1-0_Manifest.xls
- “data-set” folder containing 158 folders with 381 files
The EDRM File Format Data Set consists of 381 files covering 200 file formats.
The files types include:
Adobe Photoshop | Mac WordPerfect | PFS: Plan |
Ami Draw | Mac Works | Post Script |
Corel Draw | MacPaint | Q&A Database |
Corel Presentations | MacWrite | Q&A Write |
dBASE | Micrografax Designer | Quattro Pro |
First Choice DB, SS, WP | Microsoft Access | Reflex |
Freelance | Microsoft Excel | Smart Spreadsheet |
Harvard Graphics | Microsoft PowerPoint | ShartWare II |
Gem File | Microsoft Project | StarOffice Calc |
Gem Image | Microsoft PST | StarOffice Impress |
IBM DCA/RFT | Microsoft Visio | StarOffice Writer |
IBM DisplayWrite | Microsoft Win Metafile | SuperCalc |
IBM Graphics Data Format | Microsoft Word | Symphony |
IBM Picture Interchange | Microsoft Works | Targa |
IBM Writing Assistant | MultiMate | Total Word |
IGES Drawing | Mutipage | vCard |
Kodak Photo CD | Multiplan | Volkswriter |
Lotus 1-2-3 | OfficeWriter | VP Planner |
Lotus Manuscript | Paintbrush | Wang IWP |
Lotus PIC | Paint Shop Pro | WordPerfect |
Lotus Screen Snapshot | Paradox | Word Star |
Mac PowerPoint | XyWrite | |
Mac Word | PerfectWorks for Windows |
EDRM Internationalization Data Set
Download:
Download EDRM File Formats Data Set 1.0.1EDRM_Data-Set_File-Formats_1-0-1.zip -- 17.58 MB Download EDRM Internationalization Data Set
EDRM_Data-Set_I18N_1-0.zip -- 176.49 MB
The EDRM Internationalization Data Set (18.4 MB) is a snapshot of selected Ubuntu localization mailing list archives covering 23 languages in 724 MB of email.
The languages are:
Arabic | Catalan | Chinese |
Danish | Dutch | English |
Finnish | French | German |
Greek | Hebrew | Hungarian |
Italian | Japanese | Korean |
Norwegian | Polish | Portuguese |
Romanian | Russian | Spanish |
Swedish | Tamil | Turkish |
EDRM Micro Dataset
EDRM offers a “Micro Dataset” designed for eDiscovery testing and process validation. Software vendors, litigation support organizations, law firms and others may use these smaller sets to qualify support, test speed and accuracy in indexing and search, and conduct more forensically oriented analytics exercises throughout the eDiscovery workflow.
The EDRM community thanks these members for their active participation in this important initiative:
- Eric Robi
- Michael Lappin
- Chad Main
- Henry Moreno
EDRM Micro Dataset
The EDRM Micro Dataset is an approximately 136.9 MB zip file containing the latest versions of everything from Microsoft Office and Adobe Acrobat files to image files. The EDRM Dataset group has scoured the internet and found usable freely available data at universities, government sites and elsewhere, a selection of which are included in the zip file.
Download EDRM Public Micro Datasetreel1hl3ukrr9d3xedmv1mes2vn66bto.zip -- 9.00 B
The full dataset is sourced from publicly available data and free from copyright restrictions. It was assembled by the Digital Forensics Research Laboratories at the Auckland University of Technology, in collaboration with the EDRM Dataset team.
The EDRM Micro Dataset is valued for its large variety of file types and other challenges characteristic of ESI collected in discovery cases. The files have various levels of corruption, and the dataset contains a duplicate set of files that are encrypted, to support exception handling exercises and advanced testing.
The EDRM Micro Dataset mix of file types includes:
- A variety of.csv files
- Websites and web pages
- Adobe Acrobat files
- Graphic files and photographs
- Public census data
- Microsoft Office files
- Audio files
- 4 email boxes with shared correspondence, threads and attachments
- Multiple Encase .e01 files containing data from a phone and another data source
The Dataset team includes:
- Eric Robi, president, Elluma Discovery
- Michael Lappin, director, Technology and Sales Engineering, Nuix
- Chad Main, founder, Percipient
- Henry Moreno, eDiscovery manager, Dell Inc.
- Brian Cusack, director, AUT Digital Forensic Research Laboratories, and professor, ECU Security Research Center, Auckland University of Technology
EDRM ESI Checklist
EDRM_ESI_Checklist_Version_1.00e.pdf -- 1.49 MB
Note: This PDF file can only be opened with the free Adobe Acrobat Reader. Third-party PDF viewers, including those built into web browsers, may display an error when attempting to open this file.
Other Data Sets
The EDRM Project focusing on Data Sets is looking for very large data sets with a variety of data types. Email info@edrm.net to join the project, or identify some great data.