Data Set

EDRM Data Set Project

Featured EDRM Participants

more EDRM participants ยป

Print
  • RSS
  • Twitter
  • Add to favorites
  • LinkedIn
  • Facebook
  • Google Bookmarks

EDRM Enron Email Data Set v2

The EDRM Enron Email Data Set v2 consist of Enron e-mail messages and attachments in two sets of downloadable compressed files: XML and PST.

Files in each group are organized by custodian and listed alphabetically with compressed file sizes in parentheses. Materials for some custodians are spread across more than a single XML or PST file.

Select any combination of XML files or any combination of PST files to download.

If you find these files useful, please consider joining EDRM!


XML Files | PST Files

XML Files

The XML files are organized by custodian. Each compressed file should contain some combination of the following (depending on availability, of course):

  • XML
  • EML with attachments
  • Native attachments
  • Text email bodies
  • Text email attachments

PLEASE NOTE: These files may contain viruses, as can be the case with any set of files collected during discovery.

Select any combination of EDRM Enron Email Data Set v2 XML files to download:

PST Files

PST files are organized by custodian.

PLEASE NOTE: These files may contain viruses, as can be the case with any set of files collected during discovery.

Select any combination of EDRM Enron Email Data Set v2 PST files to download:

12 comments to EDRM Enron Email Data Set v2

  • 1
    jg says:

    Thanks so much for putting this data out there.

  • 2
    SGS says:

    These types are dataset are very much useful to our environment.Thanks for intimation.

  • 3
    gaultz says:

    Thanks for these. (So these Enron emails are now under Creative Commons license??)

    Membering now…

    • 3.1
      George Socha says:

      We have released this collection under a Creative Commons Attribution 3.0 United States License. This collection is a reworking of the previously available data, with substantial effort required to transform it to the released version. Thank you.

  • 4
    paul says:

    Anyone have any advice for rewriting the domain enron.com to someother.com? We’d like to use the PST data in a testing environment, and would like to use routable email addresses. Thanks, Paul.

  • 5
    Greg says:

    Thanks for making all of these files available for download. FYI, the file edrm-enron-v2_rodrique-r_pst.zip is missing and gives an error.

    • 5.1
      George Socha says:

      Greg,

      Earlier this week the folks preparing the files notified me that the file you cite was improperly named. It was supposed to be “rodrigue” with a “G” and not “rodrique” with a Q. I made the appropriate changes. That file is posted and you should be able to download it from this page.

      Thanks,

      George

  • 6
    George Socha says:

    Please note that we have replaced two files and corrected the file name on the third. We replaced “edrm-enron-v2_reitmeyer-j_pst.zip” and “edrm-enron-v2_arnold-j_pst.zip”; neither was decompressing properly. We changed the name of “edrm-enron-V2-rodrique-r_pst.zip” to “edrm-enron-V2-rodrigue-r_pst.zip”.

  • 7
    William Webber says:

    Thanks for making this data available. Would it be possible to calculate and publish md5sums of these files, so that users can verify download integrity?

  • 8
    B Jano says:

    it’s probably worth noting here that all of the v2 files together are 116GB (37GB for PST, 79GB for XML).

    if you use a downloading tool to limit your bandwidth to (say) 50KB/s (400Kb/s) so you don’t destroy your employer’s internet connection or network proxies, that translates into about a month. divide appropriately if you choose to use more bandwidth or download only PST or XML.

  • 9
    Olivier says:

    Thank you so much for your work.
    These datasets are very useful for my project.

  • 10
    Bob says:

    Thank you for making these available!

    When attempting to unzip two of these I’m repeatedly encountering problems. The zips I’m having a problem with are:

    edrm-enron-v2_kaminski-v_xml_1of2.zip
    edrm-enron-v2_kaminski-v_xml_2of2.zip

    Are there any known issues with these? The error I receive is:

    “! \edrm-enron-v2_kaminski-v_xml_2of2.zip: The archive is corrupt” and
    “! \edrm-enron-v2_kaminski-v_xml_1of2.zip: The archive is corrupt”

    Thanks again!!

Leave a comment

Go to top | go to comments

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe without commenting