An overview of the
Natural Language Toolkit
Steven Bird, Ewan Klein, Edward Loper
nltk.org
Summary
NLTK is a suite of open source Python
modules, data sets and tutorials
supporting research and development in
natural language processing
Download NLTK from nltk.org
Components of NLTK
1. Code: corpus readers, tokenizers,
stemmers, taggers, chunkers, parsers,
wordnet, ... (50k lines of code)
2. Corpora: >30 annotated data sets
widely used in natural language
processing (>300Mb data)
3. Documentation: a 400-page book,
articles, reviews, API documentation
1. Code
corpus readers
tokenizers
stemmers
taggers
parsers
wordnet
semantic interpretation
clusterers
evaluation metrics
…
2. Corpora
Brown Corpus
Carnegie Mellon Pronouncing Dictionary
CoNLL 2000 Chunking Corpus
Project Gutenberg Selections
NIST 1999 Information Extraction: Entity Recognition Corpus
US Presidential Inaugural Address Corpus
Indian Language POS-Tagged Corpus
Floresta Portuguese Treebank
Prepositional Phrase Attachment Corpus
SENSEVAL 2 Corpus
Sinica Treebank Corpus Sample
Universal Declaration of Human Rights Corpus
Stopwords Corpus
TIMIT Corpus Sample
Treebank Corpus Sample
…
3. Documentation
a 400-page book about natural language
processing in Python and NLTK
teaches Python and NLP
provides numerous examples and exercises
installation instructions
presentation slides for some of the book
chapters
API Documentation: describes every module,
interface, class, and method
Adoption in NLP courses
Amsterdam, Ben-Gurion, Brown, Bryn Mawr,
CDAC-Mumbai, Coruña, Edinburgh, Erlangen,
Georgetown, Helsinki, IIT-Bombay, Iowa State,
Konstanz, MIT, Macquarie, Magdeburg, Malta,
Marquette, Melbourne, Nancy, Naval
Postgraduate School, Northeastern, Ohio State,
Pitt, San Diego State, Simon Fraser, Stanford,
Syracuse University, Tsuda College, U
Colorado, UC Berkeley, UMass Amherst,
UNAM, U Penn, UT Austin, Warsaw
Contribute…
NLTK is an open source project
all code, data, documentation is free
dozens of people have contributed over
the past 6 years
please visit the website for project ideas
sign up on the NLTK-Announce mailing
list to hear about new releases