The Enron Mobile Email Dataset

Home > Software > The Enron Mobile Email Dataset

This dataset consists of sentences written by Enron employees on BlackBerry mobile devices. We provide a series of test sets we recommend for use in text entry evaluations.

The sentence and sentence fragments were found by looking for messages with the default BlackBerry signature at the end of an email. All the sentences were manually reviewed and corrected. For each sentence, we provide metadata about the category (business, personal, Enron-specific), how easy the sentence was to remember, and how quickly and accurately the sentence was typed on full-sized keyboards.

For further details, see our paper A Versatile Dataset for Text Entry Evaluations Based on Genuine Mobile Emails.

Files:

	enronmobile.zip	Zip file containing the Enron Mobile Dataset (contains everything below and much much more).
	readme.txt	Readme describing the dataset

Memorable test sets:

	mem1.txt	40 easy to remember sentences, set 1
	mem2.txt	40 easy to remember sentences, set 2
	mem3.txt	40 easy to remember sentences, set 3
	mem4.txt	40 easy to remember sentences, set 4
	mem5.txt	40 easy to remember sentences, set 5

	mem.zip	All 200 memorable sentences.
	mem_wav.zip	All 200 memorable sentences with WAV audio recordings of each.

Character combination sets:

	bi40.txt	40 sentences with representative character bigram frequencies
	bi80.txt	80 sentences with representative character bigram frequencies
	bi160.txt	160 sentences with representative character bigram frequencies
	bi320.txt	320 sentences with representative character bigram frequencies

Memorable character combination set:

mem_bi.txt

40 sentence memorable sentences with representative bigram frequencies