EDRM Internationalization Data Set

The EDRM Internationalization Data Set (18.4 MB) is a snapshot of selected Ubuntu localization mailing list archives covering 23 languages in 724 MB of email.

The languages are:

Arabic Catalan Chinese
Danish Dutch English
Finnish French German
Greek Hebrew Hungarian
Italian Japanese Korean
Norwegian Polish Portuguese
Romanian Russian Spanish
Swedish Tamil Turkish
en_USEnglish
es_ESSpanish en_USEnglish
X