0% found this document useful (0 votes)
10 views8 pages

Summer10 Strings Nup

Uploaded by

Tadesse Abate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

Summer10 Strings Nup

Uploaded by

Tadesse Abate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Strings and Characters

Introduction to Programming in Python A string is a sequence of characters. Python treats strings and
characters in the same way. Use either single or double quote
Strings
marks.

letter = ’A ’ # same as letter = " A "


Dr. Bill Young
numChar = " 4 " # same as numChar = ’4 ’
Department of Computer Science
msg = " Good morning "
University of Texas at Austin

(Many) characters are represented in memory by binary strings in


the ASCII (American Standard Code for Information Interchange)
Last updated: June 4, 2021 at 11:04
encoding.

Texas Summer Discovery Slideset 10: 1 Strings Texas Summer Discovery Slideset 10: 2 Strings

Strings and Characters ASCII

A string is represented in memory by a sequence of ASCII


The following is part of the ASCII (American Standard Code for
character codes. So manipulating characters really means
Information Interchange) representation for characters.
manipulating these numbers in memory.

... ... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
32 ! ” # $ % & ’ ( ) * + , - . /
... ... 48 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

2000 Encoding for character ’J’


64 @ A B C D E F G H I J K L M N O
01001010 80 P Q R S T U V W X Y Z [ \ ] ∧
2001 01100001 Encoding for character ’a’ 96
112

p
a
q
b
r
c
s
d
t
e
u
f
v
g
w
h
x
i
y
j
z
k
{
l

m
}
n o

2002 01110110 Encoding for character ’v’


2003 01100001 Encoding for character ’a’ The standard ASCII table defines 128 character codes (from 0 to
... ... 127), of which, the first 32 are control codes (non-printable), and
... ... the remaining 96 character codes are representable characters.

Texas Summer Discovery Slideset 10: 3 Strings Texas Summer Discovery Slideset 10: 4 Strings
Unicode Operating on Characters

Notice that:
ASCII codes are only 7 bits (some are extended to 8 bits). 7 bits The lowercase letters have consecutive ASCII values
only allows 128 characters. There are many more characters than (97...122); so do the uppercase letters (65...90).
that in the world.
The uppercase letters have lower ASCII values than the
uppercase letters, so “less” alphabetically.
Unicode is an extension to ASCII that uses multiple bytes for
character encodings. With Unicode you can have Chinese There is a difference of 32 between any lowercase letter and
characters, Hebrew characters, Greek characters, etc. the corresponding uppercase letter.

To convert from upper to lower, add 32 to the ASCII value.


Unicode was defined such that ASCII is a subset. So Unicode
readers recognize ASCII. To convert from lower to upper, subtract 32 from the ASCII value.
To sort characters/strings, sort their ASCII representations.

Texas Summer Discovery Slideset 10: 5 Strings Texas Summer Discovery Slideset 10: 6 Strings

ord and chr Escape Characters


Two useful functions for characters: Some special characters wouldn’t be easy to include in strings,
ord(c) : give the ASCII code for character c; returns a e.g., single or double quotes.
number. >>> print ( " He said : " Hello " " )
File " < stdin > " , line 1
chr(n) : give the character with ASCII code n; returns a print ( " He said : " Hello " " )
character. ˆ
SyntaxError : invalid syntax
>>> ord ( ’a ’)
97
>>> ord ( ’A ’)
What went wrong?
65
>>> diff = ( ord ( ’a ’) - ord ( ’A ’) ) To include these in a string, we need an escape sequence.
>>> diff
32
>>> upper = ’R ’ Escape Escape
>>> lower = chr ( ord ( upper ) + diff ) # upper to lower Sequence Name Sequence Name
>>> lower
’r ’
\n linefeed \’ single quote
>>> lower = ’m ’ \f formfeed \" double quote
>>>
>>>
upper = chr ( ord ( lower ) - diff ) # lower to upper
upper
\b backspace \r carriage return
’M ’ \t tab \\ backslash

Texas Summer Discovery Slideset 10: 7 Strings Texas Summer Discovery Slideset 10: 8 Strings
Creating Strings Functions on Strings

Some functions that are available on strings:

Function Description
Strings are immutable meaning that two instances of the same len(s) return length of the string
min(s) return char in string with lowest ASCII value
string are really the same object.
max(s) return char in string with highest ASCII value
>>> s1 = str ( " Hello " ) # using the constructor function
>>> s2 = " Hello " # alternative syntax
>>> s3 = str ( " Hello " ) >>> s1 = " Hello , World ! "
>>> s1 is s2 # are these the same object ? >>> len ( s1 )
True 13
>>> s2 is s3 >>> min ( s1 )
True ’ ’
>>> min ( " Hello " )
’H ’
>>> max ( s1 )
’r ’

Why does it make sense for a blank to have lower ASCII value than
any letter?

Texas Summer Discovery Slideset 10: 9 Strings Texas Summer Discovery Slideset 10: 10 Strings

Indexing into Strings Indexing into Strings

Strings are sequences of characters, which can be accessed via an


index.
>>> s = " Hello , World ! "
>>> s [0]
’H ’
>>> s [6]
’ ’
>>> s [ -1]
’! ’
>>> s [ -6]
’W ’
>>> s [ -6 + len ( s ) ]
’W ’

Indexes are 0-based, ranging from [0 ... len(s)-1].


You can also index using negatives, s[-i] means -i+len(s)].

Texas Summer Discovery Slideset 10: 11 Strings Texas Summer Discovery Slideset 10: 12 Strings
Slicing Concatenation and Repetition
General Forms:
Slicing means to select a contiguous
subsequence of a sequence or string. s1 + s2
s * n
General Form: n * s
String[start : end]
s1 + s1 means to create a new string of s1 followed by s2.
>>> s = " Hello , World ! " s * n or n * s means to create a new string containing n
>>> s [1 : 4] # substring from s [1]... s [3] repetitions of s
’ ell ’
>>> s [ : 4] # substring from s [0]... s [3] >>> s1 = " Hello "
’ Hell ’ >>> s2 = " , World ! "
>>> s [1 : -3] # substring from s [1]... s [ -4] >>> s1 + s2 # + is not commutative
’ ello , Wor ’ ’ Hello , World ! ’
>>> s [1 : ] # same as s [1 : s ( len ) ] >>> s1 * 3 # * is commutative
’ ello , World ! ’ ’ Hel loHe lloHello ’
>>> s [ : 5] # same as s [0 : 5] >>> 3 * s1
’ Hello ’ ’ Hel loHe lloHello ’
>>> s [:] # same as s
’ Hello , World ! ’ Notice that concatenation and repetition overload two familiar
>>> s [3 : 1] # empty slice
’’ operators.
Texas Summer Discovery Slideset 10: 13 Strings Texas Summer Discovery Slideset 10: 14 Strings

in and not in operators Comparing Strings

The in and not in operators allow checking whether one string is


a contiguous substring of another. In addition to equality comparisons, you can order strings using the
relational operators: <, <=, >, >=.
General Forms:
s1 in s2 For strings, this is lexicographic (or alphabetical) ordering using
s1 not in s2 the ASCII character codes.
>>> " abc " < " abcd "
>>> s1 = " xyz " True
>>> s2 = " abcxyzrls " >>> " abcd " <= " abc "
>>> s3 = " axbyczd " False
>>> s1 in s2 >>> " Paul Jones " < " Paul Smith "
True True
>>> s1 in s3 >>> " Paul Smith " < " Paul Smithson "
False True
>>> s1 not in s2 >>> " Paula Smith " < " Paul Smith "
False False
>>> s1 not in s3
True

Texas Summer Discovery Slideset 10: 15 Strings Texas Summer Discovery Slideset 10: 16 Strings
Iterating Over a String Strings are Immutable

Sometimes it is useful to do something to each character in a


string, e.g., change the case (lower to upper and upper to lower). You can’t change a string, by assigning at an index. You have to
DIFF = ord ( ’a ’) - ord ( ’A ’) create a new string.
def swapCase ( s ) :
result = " "
for ch in s : >>> s = " Pat "
if ( ’A ’ <= ch <= ’Z ’ ) : >>> s [0] = ’R ’
result += chr ( ord ( ch ) + DIFF ) Traceback ( most recent call last ) :
elif ( ’a ’ <= ch <= ’z ’ ) : File " < stdin > " , line 1 , in < module >
result += chr ( ord ( ch ) - DIFF ) TypeError : ’ str ’ object does not support item assignment
else : >>> s2 = ’R ’ + s [1:]
result += ch >>> s2
return result ’ Rat ’

print ( swapCase ( " abCDefGH " ) )


Whenever you concatenate two strings or append something to a
> python StringIterate . py string, you create a new value.
ABcdEFgh

Texas Summer Discovery Slideset 10: 17 Strings Texas Summer Discovery Slideset 10: 18 Strings

Functions vs. Methods Useful Testing Methods

Python is an Object Oriented Language; everthing data item is a


member of a class. For example, integers are members of class You have to get used to the syntax of method invocation.
int. Below are some useful methods on strings. Notice that they are
When you type 2 + 3, that’s really syntactic shorthand for methods, not functions, so called on string s.
int.__add__(2, 3), calling method __add__ on the class int
with arguments 2 and 3. Function Description
s.isalnum(): nonempty alphanumeric string?
When you call len( lst ), that’s really shorthand for
s.isalpha(): nonempty alphabetic string?
lst.__len__(). s.isdigit(): nonempty and contains only digits?
General form: s.isidentifier(): follows rules for Python identifier?
s.islower(): nonempty and contains only lowercase letters?
item.method( args ) s.isupper(): nonempty and contains only uppercase letters?
s.isspace(): nonempty and contains only whitespace?
So many things that look like function calls in Python are really
method invocations. That’s not true of functions you write.

Texas Summer Discovery Slideset 10: 19 Strings Texas Summer Discovery Slideset 10: 20 Strings
Useful Testing Methods Substring Search

>>> s1 = " abc123 "


>>> isalpha ( s1 ) # wrong syntax Python provides some string methods to see if a string contains
Traceback ( most recent call last ) : another as a substring:
File " < stdin > " , line 1 , in < module >
NameError : name ’ isalpha ’ is not defined
>>> s1 . isalpha ()
False Function Description
>>> " 1234 " . isdigit () s.endswith(s1): does s end with substring s1?
True s.startswith(s1): does s start with substring s1?
>>> " abCD " . isupper ()
False
s.find(s1): lowest index where s1 starts in s, -1 if not found
>>> " \ n \ t \ b " . isspace () s.rfind(s1): highest index where s1 starts in s, -1 if not found
False s.count(s1): number of non-overlapping occurrences of s1 in s
>>> " \ n \ t \ t " . isspace ()
True

Texas Summer Discovery Slideset 10: 21 Strings Texas Summer Discovery Slideset 10: 22 Strings

Substring Search String Exercise

>>> s = " Hello , World ! "


>>> s . endswith ( " d ! " )
True The string count method counts nonoverlapping occurrences of
>>> s . startswith ( " hello " ) # case matters one string within another.
False
>>> s . startswith ( " Hello " ) >>> " ababababa " . count ( ’ aba ’)
True 2
>>> s . find ( ’l ’) # search from left >>> " ababababa " . count ( ’c ’)
2 0
>>> s . rfind ( ’l ’) # search from right
10 Suppose we wanted to write a function that would count all
>>> s . count ( ’l ’)
3 occurrences, including possibly overlapping ones.
>>> " ababababa " . count ( ’ aba ’) # nonoverlapping occurrences
2

Texas Summer Discovery Slideset 10: 23 Strings Texas Summer Discovery Slideset 10: 24 Strings
String Exercise Converting Strings

In file countOverlaps.py:
def countOverlaps ( txt , s ) :
""" Count the occurrences of s in txt , Below are some additional methods on strings. Remember that
including possible overlapping occurrences . """ strings are immutable, so these all make a new copy of the string.
count = 0
while len ( txt ) >= len ( s ) :
if txt . startswith ( s ) :
count += 1
Function Description
txt = txt [1:] s.capitalize(): return a copy with first character capitalized
return count s.lower(): lowercase all letters
s.upper(): uppercase all letters
Running our code: s.title(): capitalize all words
>>> from countOverlaps import * s.swapcase(): lowercase letters to upper, and vice versa
>>> txt = " abababababa " s.replace(old, new): replace occurences of old with new
>>> s = " aba "
>>> countOverlaps ( txt , s )
5
>>>

Texas Summer Discovery Slideset 10: 25 Strings Texas Summer Discovery Slideset 10: 26 Strings

String Conversions Stripping Whitespace

>>> " abcDEfg " . upper ()


It’s often useful to remove whitespace at the start, end, or both of
’ ABCDEFG ’ string input. Use these functions:
>>> " abcDEfg " . lower ()
’ abcdefg ’
>>> " abc123 " . upper () # only changes letters
Function Description
’ ABC123 ’
>>> " abcDEF " . capitalize () s.lstrip(): return copy with leading whitespace removed
’ Abcdef ’ s.rstrip(): return copy with trailing whitespace removed
>>> " abcDEF " . swapcase () # only changes letters s.strip(): return copy with leading and trailing whitespace removed
’ ABCdef ’
>>> book = " introduction to programming using python "
>>> book . title () # doesn ’t change book >>> s1 = " abc "
’ Introduction To Programming Using Python ’ >>> s1 . lstrip () # new string
>>> book2 = book . replace ( " ming " , " s " ) ’ abc ’
>>> book2 >>> s1 . rstrip () # new string
’ introduction to programs using python ’ ’ abc ’
>>> book2 . title () >>> s1 . strip () # new string
’ Introduction To Programs Using Python ’ ’ abc ’
>>> book2 . title () . replace ( " Using " , " With " ) >>> " a b c " . strip ()
’ Introduction To Programs With Python ’ ’a b c ’

Texas Summer Discovery Slideset 10: 27 Strings Texas Summer Discovery Slideset 10: 28 Strings
String Exercise String Exercise
Exercise: Input a string from the user. Count and print out the Exercise: Input a string from the user. Count and print out the
number of lower case, upper case, and non-letters. number of lower case, upper case, and non-letters.
In file CountCases.py:
def countCases ( txt ) :
""" For a text , count and return the number of lower
upper , and non - letter letters . """
lowers = 0
uppers = 0
nonletters = 0
# For each character in the text , see if lower , upper ,
# or non - letter and increment the count .
for ch in txt :
if ch . islower () :
lowers += 1
elif ch . isupper () :
uppers += 1
else :
nonletters += 1
# Return a triple of the counts .
return lowers , uppers , nonletters

Texas Summer Discovery Slideset 10: 29 Strings Texas Summer Discovery Slideset 10: 30 Strings

Calling countCases

def main () :
txt = input ( " Please enter a text : " )
lc , uc , nl = countCases ( txt )
print ( " Contains : " )
print ( " Lower case letters : " , lc )
print ( " Upper case letters : " , uc )
print ( " Non - letters : " , nl )

main ()

Here’s a sample run:


> python CountCases . py
Please enter a text : abcXYZ784 *&ˆ def
Contains :
Lower case letters : 6
Upper case letters : 3
Non - letters : 6

Texas Summer Discovery Slideset 10: 31 Strings

You might also like