0% found this document useful (0 votes)

216 views

NLTK Tutorial

This document provides an overview of the Natural Language Toolkit (NLTK) for natural language processing in Python. It describes NLTK's goals of simplicity, consistency, extensibility and modularity. It also summarizes some of NLTK's core modules for tasks like accessing text corpora, string processing, part-of-speech tagging, and classification. The document recommends resources like the NLTK book and provides examples of using NLTK functions for tokenization, tagging, stemming and accessing corpora.

Uploaded by

maxellligue5487

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

216 views

NLTK Tutorial

Uploaded by

maxellligue5487

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

NLTK Tutorial

CSC 485/2501
September 17, 2015

Krish Perumal
[email protected] / [email protected]

Based on slides by Katie Fraser and Sean Robertson

CDF

Computing Disciplines Facility

www.cdf.toronto.edu

Collection of computer labs and computing

environments provided by the University

Admin office: Bahen

Most labs in Bahen, one in Gerstein

See CDF website for complete list
Should be able to access with T-card
CDF Account

Must be enrolled in CS course

Account name lookup:
http://www.cdf.toronto.edu/resources/
cdf_username_lookup.html
Requires UTORid

Password will initially be student number, but you must

change it on first log-in
For more information: Users Guide
http://www.cdf.toronto.edu/resources/
general_student_guide_to_cdf.html
Accessing CDF outside the lab
Use ssh (on MacOS, Linux):

ssh -Y <CDF_login>@cdf.toronto.edu

NX Remote Access (on Windows, MacOS, Linux)

Can download and install NX client from CDF webpage --

http://www.cdf.utoronto.ca/using_cdf/remote_access_server.html

Step-by-step instructions provided -- https://www.cdf.toronto.edu/nx/nx.php

Use sshfs to mount file system locally on your machine

sshfs <CDF_login>@cdf.toronto.edu:<remote_filepath>
<local_mount_path>
Submitting Assignments

From the command line:

submit c <course> a <assignment_name>

f <filename_1> <filename_n>

Can also submit from CDF Student

Secure Website --
https://www.cdf.toronto.edu/students/
Python

High-level, general-purpose language

Readable code, clear syntax

Dynamic typing

Automatic garbage collection and memory

management

Large standard library

Python Editors and IDEs

Installed on CDF:
emacs (powerful, but steep learning curve)
IDLE (X forwarding, comes with Python)

Others:
eclipse with Python plug-in (slow, but good)
Notepad++ (basic editor with highlighting)
Natural Language Toolkit
(NLTK)

Python package that implements many standard

NLP data structures, algorithms

First developed in 2001 as part of a CL course at

University of Pennsylvania
Many contributors since then
led by Steven Bird, Edward Loper, Ewan Klein

Open-source

http://www.nltk.org
Documentation also at this address
Goals of NLTK

GOALS:
Simplicity
Consistency
Extensibility
Modularity

NON-GOALS:
Encyclopedic coverage
Optimization/clever tricks
(Some) Modules in NLTK
Language Processing NLTK module Some functionalities
Task
Accessing corpora Nltk.corpus Standardized interfaces to
corpora and lexicons
String processing Nltk.tokenize Sentence and word
tokenizers
Nltk.stem Stemmers
Part-of-speech tagging nltk.tag Various part-of-speech
taggers
Classification Nltk.classify Decision tree, maximum
entropy
Nltk.cluster K-means
Chunking Nltk.chunk Regular expressions,
named entity tagging
NLTK Book

Very useful resource

Can buy a physical copy

(~$45 amazon.ca)

Also available for free

online:
http://nltk.org/book/
Python/NLTK Versions

We will use:
Python 2.7
NLTK 2.0.4
(default on CDF)
Accessing Python and NLTK

Option 1: Log in to your CDF account

% python
>>> import nltk

Option 2: Install on your own machine (but make

sure your code for assignments runs on CDF!)
Python 2.7 (https://www.python.org/)
PyPi (https://pip.pypa.io/en/latest/installing.html)
NLTK 2.0.4 (http://www.nltk.org/download)
pip install nltk
Getting Started: Corpora
Task: Accessing corpora
NLTK module: nltk.corpus
Functionality: standardized interfaces to
corpora and lexicons
Example:
>>> from nltk.corpus import gutenberg

>>> gutenberg.fileids()

>>> hamlet = gutenberg.words('shakespeare-hamlet.txt')

>>> hamlet[1:100]

Also: Brown, Reuters, chats, reviews, etc.

Getting Started: String
Processing
Task: string processing

Modules: nltk.tokenize, nltk.stem

Functionality: word tokenizers, sentence tokenizers, stemmers

Example:
>>> text = nltk.word_tokenize("The quick brown fox jumps over the lazy
dog")

>>> text = nltk.sent_tokenize("The quick brown fox jumps over the lazy dog.
What a lazy dog!")

>>> from nltk.stem.wordnet import WordNetLemmatizer

>>> WordNetLemmatizer().lemmatize(dogs,n)

>>> WordNetLemmatizer().lemmatize(jumps,v)
Getting Started: Part-of-
Speech Tagging
Task: Part-of-speech tagging
Module: nltk.tag
Functionality: Brill, HMM, TnT taggers
Example:
>>> text = nltk.word_tokenize(It was the best of times, it
was the worst of times.)

>>> nltk.pos_tag(text)

(Penn Treebank tag set:

http://www.ling.upenn.edu/courses/Fall_2003/ling001/
penn_treebank_pos.html)
List of Tutorials

General Python
http://docs.python.org/tutorial

NLTK-specific
http://www.nltk.org/book

Project Report On Bank Management
67% (12)
Project Report On Bank Management
38 pages
Assignment 1 - Embedded System-1
0% (1)
Assignment 1 - Embedded System-1
13 pages
Internshala Core - Java
No ratings yet
Internshala Core - Java
20 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
NLTK
No ratings yet
NLTK
16 pages
Natural Language Processing Using Python: (With NLTK, Scikit-Learn and Stanford NLP Apis)
No ratings yet
Natural Language Processing Using Python: (With NLTK, Scikit-Learn and Stanford NLP Apis)
27 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
Sivasri NLP Lab
No ratings yet
Sivasri NLP Lab
50 pages
NLTK Documentation: Release 3.2.5
No ratings yet
NLTK Documentation: Release 3.2.5
87 pages
NLTK: The Natural Language Toolkit: Steven Bird Edward Loper
No ratings yet
NLTK: The Natural Language Toolkit: Steven Bird Edward Loper
4 pages
NLTK Installation Guide
No ratings yet
NLTK Installation Guide
13 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP syllabus
No ratings yet
NLP syllabus
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
116 pages
TPLS, 09
No ratings yet
TPLS, 09
9 pages
Python Programming: Learn, Code, Create
From Everand
Python Programming: Learn, Code, Create
Sachin Naha
No ratings yet
Living with Linux in the Industrial World
From Everand
Living with Linux in the Industrial World
Elaiya Iswera Lallan
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
ch5&6_lecture_AI
No ratings yet
ch5&6_lecture_AI
69 pages
Lab8_NLTK_v2
No ratings yet
Lab8_NLTK_v2
53 pages
Introduction To NLTK
No ratings yet
Introduction To NLTK
101 pages
AI Applications
No ratings yet
AI Applications
4 pages
Mastering Python in 7 Days
From Everand
Mastering Python in 7 Days
Alex Wood
No ratings yet
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
Screenshot 2024-11-29 at 8.35.21 AM
No ratings yet
Screenshot 2024-11-29 at 8.35.21 AM
40 pages
Natural Language Processing With Python
100% (1)
Natural Language Processing With Python
504 pages
Python OOP Step by Step: A Practical Guide with Examples
From Everand
Python OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Natural Language Processing (NLP) With Python - Tutorial
No ratings yet
Natural Language Processing (NLP) With Python - Tutorial
72 pages
Natural Language Processing With Python's NLTK Package – Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package – Real Python
27 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
21 01 23
No ratings yet
21 01 23
8 pages
Introduction nlc
No ratings yet
Introduction nlc
69 pages
Python For Scientific and High Performance Com
100% (1)
Python For Scientific and High Performance Com
125 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
31 pages
Natural Language Processing
No ratings yet
Natural Language Processing
1 page
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
Docker: The Complete Guide to the Most Widely Used Virtualization Technology. Create Containers and Deploy them to Production Safely and Securely.: Docker & Kubernetes, #1
From Everand
Docker: The Complete Guide to the Most Widely Used Virtualization Technology. Create Containers and Deploy them to Production Safely and Securely.: Docker & Kubernetes, #1
Jordan Lioy
No ratings yet
Minor Assignment-3 (NLP)
No ratings yet
Minor Assignment-3 (NLP)
2 pages
CCS369-Text and Speech Analysis Lab (1-9) (1)
No ratings yet
CCS369-Text and Speech Analysis Lab (1-9) (1)
37 pages
Named Entity Recognition: Katharine Jarmul
No ratings yet
Named Entity Recognition: Katharine Jarmul
17 pages
Building A Simple Chatbot From Scratch in Python1
No ratings yet
Building A Simple Chatbot From Scratch in Python1
8 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
LPI Linux Certification Questions: LPI Linux Interview Questions, Answers, and Explanations
From Everand
LPI Linux Certification Questions: LPI Linux Interview Questions, Answers, and Explanations
equitypress
3.5/5 (6)
Cheating
No ratings yet
Cheating
1 page
A082 - Shubham Kumar - Practical No. 2
No ratings yet
A082 - Shubham Kumar - Practical No. 2
6 pages
Practical Guide to Python: From Basics to Advanced Programming
From Everand
Practical Guide to Python: From Basics to Advanced Programming
Arcadia J. Darell
No ratings yet
NLP_course-EDC-1-29
No ratings yet
NLP_course-EDC-1-29
29 pages
UBC Summer School in NLP - VSP 2019 Lecture 8
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 8
27 pages
NLTK Presentation
No ratings yet
NLTK Presentation
46 pages
CCECE.2019.8861892
No ratings yet
CCECE.2019.8861892
4 pages
COMP_262_winter2022
No ratings yet
COMP_262_winter2022
9 pages
NLTK Tutorial: What Is NLTK Library in Python?
No ratings yet
NLTK Tutorial: What Is NLTK Library in Python?
3 pages
Python Algorithms Step by Step: A Practical Guide with Examples
From Everand
Python Algorithms Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Natural Language Processing
No ratings yet
Natural Language Processing
38 pages
NLP PDF
No ratings yet
NLP PDF
25 pages
Assignment 1_NLP
No ratings yet
Assignment 1_NLP
2 pages
UNIX Shell Scripting Interview Questions, Answers, and Explanations: UNIX Shell Certification Review
From Everand
UNIX Shell Scripting Interview Questions, Answers, and Explanations: UNIX Shell Certification Review
Equity Press
4.5/5 (4)
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
Career Handbook
100% (1)
Career Handbook
92 pages
Conda Cheat Sheet: Bit - Ly/tryconda
No ratings yet
Conda Cheat Sheet: Bit - Ly/tryconda
2 pages
Sensor Basics Tutorial NI
No ratings yet
Sensor Basics Tutorial NI
9 pages
Anaconda Installation
No ratings yet
Anaconda Installation
8 pages
James Brownjohn Energy Dissipation From VibratingFloor Slabs Due Human StructureInteraction
No ratings yet
James Brownjohn Energy Dissipation From VibratingFloor Slabs Due Human StructureInteraction
10 pages
Incremental Encoder Baumer G0356 175RO73 10k PPR
No ratings yet
Incremental Encoder Baumer G0356 175RO73 10k PPR
4 pages
Kubler 05 2400 1122 1024
No ratings yet
Kubler 05 2400 1122 1024
3 pages
Lab 6 PF
No ratings yet
Lab 6 PF
6 pages
An Abstract Data Type
No ratings yet
An Abstract Data Type
6 pages
10 Must Know ABAP Skills For Functional Consultants
No ratings yet
10 Must Know ABAP Skills For Functional Consultants
19 pages
BEEC4814 Introduction
No ratings yet
BEEC4814 Introduction
46 pages
Errorlog
No ratings yet
Errorlog
7 pages
Exercises Oop Basics
No ratings yet
Exercises Oop Basics
2 pages
Module6 - Functions
No ratings yet
Module6 - Functions
12 pages
Components of Dbms L2 & L3
No ratings yet
Components of Dbms L2 & L3
17 pages
Oem CMD Help
No ratings yet
Oem CMD Help
12 pages
Multithreading: 1. How Do Servlets Work? Instantiation, Sessions, Shared Variables and
No ratings yet
Multithreading: 1. How Do Servlets Work? Instantiation, Sessions, Shared Variables and
4 pages
5.Blood Donation Management System
No ratings yet
5.Blood Donation Management System
9 pages
Introduction To Assembly Language and RISC-V Instruction Set Architecture
No ratings yet
Introduction To Assembly Language and RISC-V Instruction Set Architecture
52 pages
Turtlesim Project
No ratings yet
Turtlesim Project
4 pages
Food Ordering CH4
No ratings yet
Food Ordering CH4
12 pages
Myeclipse Wed Services
100% (2)
Myeclipse Wed Services
29 pages
Name: Student Id: Section: Signature
No ratings yet
Name: Student Id: Section: Signature
4 pages
Chapter 6
No ratings yet
Chapter 6
44 pages
OT BASE Asset Discovery Installation and First Steps
No ratings yet
OT BASE Asset Discovery Installation and First Steps
8 pages
Python Unit5
No ratings yet
Python Unit5
45 pages
Difference Between C, C++ and Java
No ratings yet
Difference Between C, C++ and Java
3 pages
Ec2-Ug Compressed
No ratings yet
Ec2-Ug Compressed
2,111 pages
How-To - Use The Graph API To Pull The Movies Friends Like - Facebook Developers
No ratings yet
How-To - Use The Graph API To Pull The Movies Friends Like - Facebook Developers
5 pages
The Apache Tomcat Connector - Reference Guide: Workers - Properties Configuration
No ratings yet
The Apache Tomcat Connector - Reference Guide: Workers - Properties Configuration
6 pages
Scholasticus: Online Grading System Module For Philippine Science High School - Central Mindanao Campus
No ratings yet
Scholasticus: Online Grading System Module For Philippine Science High School - Central Mindanao Campus
41 pages
Quality Integration Setup
No ratings yet
Quality Integration Setup
12 pages
Ch.1.pptx
No ratings yet
Ch.1.pptx
27 pages
#1 Semantic Web Vision and Introduction Part2
No ratings yet
#1 Semantic Web Vision and Introduction Part2
52 pages

NLTK Tutorial

Uploaded by

NLTK Tutorial

Uploaded by

NLTK Tutorial

Based on slides by Katie Fraser and Sean Robertson

Computing Disciplines Facility

Collection of computer labs and computing

Admin office: Bahen

Most labs in Bahen, one in Gerstein

Must be enrolled in CS course

Password will initially be student number, but you must

NX Remote Access (on Windows, MacOS, Linux)

Can download and install NX client from CDF webpage --

Step-by-step instructions provided -- https://www.cdf.toronto.edu/nx/nx.php

Use sshfs to mount file system locally on your machine

From the command line:

submit c <course> a <assignment_name>

Can also submit from CDF Student

High-level, general-purpose language

Readable code, clear syntax

Automatic garbage collection and memory

Large standard library

Python package that implements many standard

First developed in 2001 as part of a CL course at

Very useful resource

Can buy a physical copy

Also available for free

Option 1: Log in to your CDF account

Option 2: Install on your own machine (but make

>>> hamlet = gutenberg.words('shakespeare-hamlet.txt')

Also: Brown, Reuters, chats, reviews, etc.

Modules: nltk.tokenize, nltk.stem

Functionality: word tokenizers, sentence tokenizers, stemmers

>>> from nltk.stem.wordnet import WordNetLemmatizer

(Penn Treebank tag set:

You might also like