Chapters 5
Chapters 5
Python has a convenient way of accessing characters near the right end of a string
called: negative indexing.
The idea is that the characters of a string are indexed with negative numbers going from right
to left:
>>> s = 'apple'
>>> s[-1]
'e'
>>> s[-2]
'l'
>>> s[-3]
'p'
>>> s[-4]
'p'
>>> s[-5]
'a'
Python Programming
Strings
Characters
Strings consist of characters, and characters themselves have a corresponding character code
that you can find using the ord function; i.e. Ord() returns the number representing the
Unicode of a specified character.
>>> ord('a')
97
>>> ord('b')
98
>>> ord('c')
99
Given a character code number, you can retrieve its corresponding character using
the chr function:
>>> chr(97)
'a'
>>> chr(98)
'b'
>>> chr(99)
'c'
Character codes are assigned using Unicode, which is a large and complex standard for
encoding all the symbols and characters that occur in all the world’s languages.
Python Programming
Strings
Accessing characters with a for-loop
If you need to access every character of a string in sequence, a for-loop can be helpful.
Example:
>>> ord('H')
72
>>> ord('i')
# codesum.py 105
>>> ord(' ')
def codesum (s):
32
""" Returns the sums of the ...
character codes of s.
"""
total = 0
for c in s: # At the beginning of each iteration the loop variable c is
total = total + ord(c) set to be the next character in s. The indexing into s is
handled automatically by the for-loop.
return total
If you need to access every character of a string in sequence, a for-loop can be helpful.
Example:
An alternative solution is illustrated here:
# codesum.py
def codesum (s):
""" Returns the sums of the
character codes of s.
"""
total = 0
for i in range(len(s)):
total = total + ord(s[i])
return total
The Rise of Unicode
In the 1960s, ’70s, and ’80s, the most popular character encoding scheme was ASCII (American Standard Code for Information Interchange). ASCII is far simpler than
Unicode, but its fatal flaw is that it can represent only 256 different characters—enough for English and French and a few other similar languages, but nowhere near enough
to represent the huge variety of characters and symbols found in other languages. For instance, Chinese alone has thousands of ideograms that could appear in text
documents.
Essentially, Unicode provides a far larger set of character codes. Conveniently, Unicode mimics the ASCII code for the first 256 characters, so if you are only dealing with
English characters (as we are in this book), you’ll rarely need to worry about the details of Unicode. For more information, see the Unicode home page (www.unicode.org).
Python Programming
Strings
Escape characters
Escape character is a single character should be led by a (Backslash) \ to tell Python that
this is a special character, but that \ does not count as an extra character when
determining a string’s length.
>>> len('\\')
1
>>> len('a\nb\nc')
5
To handle whitespace and other unprintable characters, Python uses a special notation
called escape sequences, or escape characters.
The standard way in Python for ending a line is to use the \n character:
>>> print('one\ntwo\nthree')
one
two
three
Python Programming
Strings
Slicing Strings
Slicing is how Python lets you extract a substring from a string. To slice a string, you
indicate both the first character you want and one past the last character you want.
s[begin:end] returns the substring starting at index begin and ending at index end - 1.
>>> food = 'apple pie'
>>> food[0:5]
'apple'
>>> food[6:9]
'pie'
If you leave out the begin index of a slice, then Python assumes you mean 0; and if you
leave off the end index, Python assumes you want everything to the end of the string. For
instance:
>>> food = 'apple pie'
>>> food[:5]
'apple'
>>> food[6:]
'pie'
>>> food[:]
'apple pie'
Python Programming
Strings
Slicing Strings
# extension.py
>>> get_ext('hello.text')
def get_ext(fname):
'text'
""" Returns the extension of file fname.
>>> get_ext('pizza.py')
"""
'py'
dot = fname.rfind('.')
>>> get_ext('pizza.old.py')
if dot == -1:
'py'
return ''
>>> get_ext('pizza')
else:
''
return fname[dot + 1:]
The get_ext function works by determining the index position of the rightmost '.' (hence the
use of rfind to search for it from right to left). If there is no '.' in fname, the empty string is
returned; otherwise, all the characters from the '.' onward are returned.
Python Programming
Strings
Slicing with negative indexes
Python strings come prepackaged with a number of useful functions; use dir on any
string (for example, dir('')) to see them all.
>>> s= ('apple')
>>> s.endswith('t')
False
>>> s.endswith('e')
True
>>> s.isnumeric()
False
>>> s.islower()
True
>>> s.isdecimal()
False
>>> s.isprintable()
True
Python Programming
Strings
Standard String Functions
There are several ways to find substrings within a string. The difference between index and find
functions is what happens when they don’t find what they are looking for.
>>> s = 'cheese'
>>> s.find('s')
>>> s.index('eee')
4
Traceback (most recent call last):
>>> s.find('eee')
File "<pyshell#18>", line 1, in
-1
<module> In general, find and index return
>>> s.rfind('e') the smallest index where the
s.index('eee') passed-in string starts,
5
ValueError: substring not found and rfind and rindex return the
>>> s.find('e') largest index where it starts.
>>> s.index('ee')
2
2
>>> s.index('c')
0
>>> s.index('s')
4
>>> s.find('ee')
2
>>> s.find('c')
0
Python Programming
Strings
Case-changing functions
For all these functions, Python creates and returns a new string. But, it doesn’t modify them
>>> S.center(40,'R')
'RRRRRa new design of a flying robotRRRRR'
>>> S.ljust(40,'R')
'a new design of a flying robotRRRRRRRRRR'
• They are replaced by the values of the corresponding
>>> '{1} likes {0}'.format('ice cream','Jack') strings or variables
'Jack likes ice cream' • You can also refer to the names of keyword parameters
>>> leet_table = ''.maketrans ('EIOBT', '31087') >>> ' '.join(['once', 'upon', 'a', 'time'])
>>> 'BE COOL. SPEAK LEET!'.translate (leet_table) 'once upon a time'
'83 C00L. SP3AK L337!' >>> '-'.join(['once', 'upon', 'a', 'time'])
>>> S 'once-upon-a-time'
'a new design of a flying robot' >>> ''.join(['once', 'upon', 'a', 'time'])
>>> S.count('a') 'onceuponatime'
2
>>> S.zfill(40)
'0000000000a new design of a flying robot'
Python Programming
Strings
Regular Expressions *, +, ?, |
• Consider the string 'cat'. It represents a single string consisting of the letters c, a, and t. Now
consider the regular expression 'cats?'. Here, the ? does not mean an English question mark but
instead represents a regular expression operator, meaning that the character to its immediate left is
optional. Thus the regular expression 'cats?' describes a set of two strings: ['cat', 'cats']
• Another regular expression operator is |, which means “or.” For example, the regular
expression 'a|b|c' describes the set of three strings 'a', 'b', and 'c'.
• The regular expression 'a*' describes an infinite set of strings: '', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa', and
so on. In other words, 'a*' describes the set of all strings consisting of a sequence of 0 or more 'a's.
The regular expression 'a+' is the same as 'a*' but excludes the empty string ''.
• Finally, within a regular expression you can use round brackets to indicate what substring an operator
ought to apply to. For example, the regular expression '(ha)+!' describes these
strings: 'ha!', 'haha!', 'hahaha!', and so on. In contrast, 'ha+!' describes a very different
set: 'ha!', 'haa!', 'haaa!', and so on.
• You can mix and match these (and many other) regular expression operators in any way you want.
This turns out to be a very useful way to describe many commonly occurring types of strings, such as
phone numbers or email addresses.
Python Programming
Strings
Matching with regular expressions
The first line of the second version imports Python’s standard regular expression library.
To match a regular expression, we use the re.match(regex, s) function, which
returns None if regex does not match s, and a special regular expression match
object otherwise.
Suppose we decide to add a few more possible stopping strings. For the regular
expression version, we just rewrite the regular expression string to be,
say, 'done|quit|over|finished|end|stop'. In contrast, to make the same change to the first
version, we’d need to include or s == for each string we added, which would make for a
very long line of code that would be hard to read.
Suppose you want to recognize funny strings, which consist of one or more 'ha' strings
followed immediately by one or more '!'s. For example, 'haha!', 'ha!!!!!',
and 'hahaha!!' are all funny strings. It’s easy to match these using regular expressions:
import re
def is_funny(s):
return re.match('(ha)+!+', s) != None
Python Programming
Example 11
Use the anonymous (lambda) function inside the filter() built-in function to find all
the numbers divisible by 13 in the given list.