NOTE: Use the new word lists. The files on this page are deprecated.
I created a bunch of large English word lists by taking words that appeared in the intersection of 10 different word lists. I used the following sources for the word lists:
- British national corpus
- American national corpus
- Gigaword newswire corpus (top 400K words)
- LM-CSR newswire corpus (top 400K words)
- Google corpus (top 400K words)
- Enron email corpus
- Wikipedia
- Moby word list
- CMU pronuciation dictionary
- 20 newsgroup corpus
By varying the number of lists a word must appear in (from 1 to 10), I got word lists of varying size and "quality".
|