Python NLTK | nltk.tokenizer.word_tokenize()

Python NLTK | nltk.tokenizer.word_tokenize()

Last Updated : 12 Jun, 2019

With the help of nltk.tokenize.word_tokenize() method, we are able to extract the tokens from string of characters by using tokenize.word_tokenize() method. It actually returns the syllables from a single word. A single word can contain one or two syllables.

Syntax : tokenize.word_tokenize() Return : Return the list of syllables of words.

Example #1 : In this example we can see that by using tokenize.word_tokenize() method, we are able to extract the syllables from stream of words or sentences.

Python3 1=1

# import SyllableTokenizer() method from nltk
from nltk import word_tokenize
   
# Create a reference variable for Class word_tokenize
tk = SyllableTokenizer()
   
# Create a string input
gfg = "Antidisestablishmentarianism"
   
# Use tokenize method
geek = tk.tokenize(gfg)
   
print(geek)

Output :

['An', 'ti', 'dis', 'es', 'ta', 'blish', 'men', 'ta', 'ria', 'nism']

Example #2 :

Python3 1=1

# import SyllableTokenizer() method from nltk
from nltk.tokenize import word_tokenize
   
# Create a reference variable for Class word_tokenize
tk = SyllableTokenizer()
   
# Create a string input
gfg = "Gametophyte"
   
# Use tokenize method
geek = tk.tokenize(gfg)
   
print(geek)

Output :

['Ga', 'me', 'to', 'phy', 'te']

Python NLTK | nltk.tokenizer.word_tokenize()

J

Jitender_1998

Improve

Article Tags :

Practice Tags :

python

Similar Reads

Python NLTK | nltk.tokenize.TabTokenizer()

With the help of nltk.tokenize.TabTokenizer() method, we are able to extract the tokens from string of words on the basis of tabs between them by using tokenize.TabTokenizer() method. Syntax : tokenize.TabTokenizer() Return : Return the tokens of words. Example #1 : In this example we can see that b

Python NLTK | nltk.tokenize.SpaceTokenizer()

With the help of nltk.tokenize.SpaceTokenizer() method, we are able to extract the tokens from string of words on the basis of space between them by using tokenize.SpaceTokenizer() method. Syntax : tokenize.SpaceTokenizer() Return : Return the tokens of words. Example #1 : In this example we can see

Python NLTK | nltk.tokenize.SExprTokenizer()

With the help of nltk.tokenize.SExprTokenizer() method, we are able to extract the tokens from string of characters or numbers by using tokenize.SExprTokenizer() method. It actually looking for proper brackets to make tokens. Syntax : tokenize.SExprTokenizer() Return : Return the tokens from a strin

Python NLTK | nltk.tokenize.StanfordTokenizer()

With the help of nltk.tokenize.StanfordTokenizer() method, we are able to extract the tokens from string of characters or numbers by using tokenize.StanfordTokenizer() method. It follows stanford standard for generating tokens. Syntax : tokenize.StanfordTokenizer() Return : Return the tokens from a

Python NLTK | nltk.TweetTokenizer()

With the help of NLTK nltk.TweetTokenizer() method, we are able to convert the stream of words into small Â tokens so that we can analyse the audio stream with the help of nltk.TweetTokenizer() method. Syntax : nltk.TweetTokenizer() Return : Return the stream of token Example #1 : In this example whe

Python NLTK | tokenize.WordPunctTokenizer()

With the help of nltk.tokenize.WordPunctTokenizer()() method, we are able to extract the tokens from string of words or sentences in the form of Alphabetic and Non-Alphabetic character by using tokenize.WordPunctTokenizer()() method. Syntax : tokenize.WordPunctTokenizer()() Return : Return the token

Python NLTK | nltk.WhitespaceTokenizer

With the help of nltk.tokenize.WhitespaceTokenizer() method, we are able to extract the tokens from string of words or sentences without whitespaces, new line and tabs by using tokenize.WhitespaceTokenizer() method. Syntax : tokenize.WhitespaceTokenizer() Return : Return the tokens from a string Exa

Python NLTK | nltk.tokenize.mwe()

With the help of NLTK nltk.tokenize.mwe() method, we can tokenize the audio stream into multi_word expression token which helps to bind the tokens with underscore by using nltk.tokenize.mwe() method. Remember it is case sensitive. Syntax : MWETokenizer.tokenize() Return : Return bind tokens as one i

Tokenize text using NLTK in python

To run the below python program, (NLTK) natural language toolkit has to be installed in your system.The NLTK module is a massive tool kit, aimed at helping you with the entire Natural Language Processing (NLP) methodology.In order to install NLTK run the following commands in your terminal. sudo pip

Python NLTK | tokenize.regexp()

With the help of NLTK tokenize.regexp() module, we are able to extract the tokens from string by using regular expression with RegexpTokenizer() method. Syntax : tokenize.RegexpTokenizer() Return : Return array of tokens using regular expression Example #1 : In this example we are using RegexpTokeni