0% found this document useful (0 votes)
28 views73 pages

David Wang Computing Science and Information Technology: Info 1211 - Operating System'S Principles and Applications

The document provides an overview of regular expressions, including their history, features, and usage in text processing and programming. It explains the mechanics of regular expressions, including various metacharacters, character classes, and practical examples. Additionally, it covers the differences between basic and extended regular expressions, highlighting their applications in tools like grep and egrep.

Uploaded by

Rajan Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views73 pages

David Wang Computing Science and Information Technology: Info 1211 - Operating System'S Principles and Applications

The document provides an overview of regular expressions, including their history, features, and usage in text processing and programming. It explains the mechanics of regular expressions, including various metacharacters, character classes, and practical examples. Additionally, it covers the differences between basic and extended regular expressions, highlighting their applications in tools like grep and egrep.

Uploaded by

Rajan Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

LECTURE 4:

INFO 1211 – OPERATING SYSTEM’S PRINCIPLES AND APPLICATIONS

DAVID WANG
COMPUTING SCIENCE AND INFORMATION TECHNOLOGY
OUTLINE
• Regular Expression
• grep Command
• sad Command
REGULAR EXPRESSION
• A regular expression (regex or regexp for short) is a
sequence of characters that define a search pattern
• Similar to (but different from) Wild Cards
• More precise
• More complicated
• Terms
• Character
• basic unit of text
• letter, number, punctuation, space, etc.
• String
• a sequence in length from 0 to many characters
HISTORY
• Regular Set
• Originated version of Regular Expression
• By Mathematician Stephen Cole Kleene in 1956
• described regular languages using his mathematical notation
• Applied to theoretical computer science
• automata theory (models of computation)
• the description and classification of formal languages.
• SNOBOL language
• Other early implementations of pattern matching
• did not use regular expressions, but instead its own syntax
• Regular expressions become popular from 1968 in two uses:
• Pattern matching in a text editor
• I study at KPU, I work at KPU, I found my friends at KPU…
• I xxx at KPU (Pattern matching)
• Lexical analysis in a compiler
• C, Java …
HISTORY (CONT.)
• First Appearances of Regular Expression in program form
• Ken Thompson’s pattern matching in QED editor
• For speed, Thompson implemented regular expression
matching by just-in-time compilation (JIT) to IBM 7094 code
on the Compatible Time-Sharing System, an important early
example of JIT compilation.
• Many variations in Unix programs at Bell Labs in the 1970s
• including vi, emacs, lex, sed, AWK and expr
• standardized in POSIX.2 in 1992
• Today regular expressions are widely supported in
• programming languages (validate an email address input)
• text processing programs (e.g., lexers for lexical analysis)
• advanced text editors (search, replace)
• etc.
PATTERN
The pattern sequence itself is an expression that is a
statement in a language designed specifically to represent
prescribed targets in the most concise and flexible way to
direct the automation of text processing of general text files,
specific textual forms, or of random input strings.
• an expression to specify a set of strings required for a
particular purpose
• a simple way to specify a finite set of strings is to list its
elements or members
WORKING MECHANISM
• A regular expression processor translates a regular
expression into a nondeterministic finite automaton (NFA),
to recognize substrings that match the regular expression
• The picture shows the NFA scheme that accepts any
binary string that contains at least one 00 or 11 as a
substring
WORKING MECHANISM
• An NFA that accepts all binary strings that end with 101.
REGULAR EXPRESSIONS FEATURES
• Regular expressions are interpreted by the command and
not by the shell
• Quoting ensures that the shell isn’t able to interfere and
interpret the metacharacters in its own way.
• Some of the characters used by regular expressions are
also meaningful to the shell – enough reason why these
expressions should be quoted.
• Category of Regular Expressions
• Basic Regular Expressions (BREs)
• Extended Regular Expressions (EREs)
• Perl Regular Expressions (PREs)
• Python Regular Expressions
BASIC REGULAR EXPRESSIONS
• Oldest regular expression flavor still in use today
• Standardizes a flavor similar to the one used by the
traditional UNIX grep command
• Most metacharacters require a backslash to give the
metacharacter its flavor
• Using a backslash to escape a character that is never a
metacharacter is an error
• Supports POSIX bracket expressions
POSIX BRACKET EXPRESSIONS
POSIX Description ASCII Java

[:alnum:] Alphanumeric characters [a-zA-Z0-9] \p{Alnum}


[:alpha:] Alphabetic characters [a-zA-Z] \p{Alpha}
[:ascii:] ASCII characters [\x00-\x7F] \p{ASCII}
[:blank:] Space and tab [ \t] \p{Blank}
[:cntrl:] Control characters [\x00-\x1F\x7F] \p{Cntrl}
[:digit:] Digits [0-9] \p{Digit}
Visible characters (i.e.
[:graph:] anything except spaces, [\x21-\x7E] \p{Graph}
control characters, etc.)
POSIX BRACKET EXPRESSIONS (CONT.)
POSIX Description ASCII Java

[:lower:] Lowercase letters [a-z] \p{Lower}


Visible characters and spaces
[:print:] (i.e. anything except control [\x20-\x7E] \p{Print}
characters, etc.)
[!"#$
[:punct:] Punctuation and symbols. %&'()*+,-./:;<=>? \p{Punct}
@[\]^_`{|}~]
All whitespace characters,
[:space:] [ \t\r\n\v\f] \p{Space}
including line breaks
[:upper:] Uppercase letters [A-Z] \p{Upper}
Word characters (letters,
[:word:] [A-Za-z0-9_]
numbers and underscores)
[:xdigit:] Hexadecimal digits [A-Fa-f0-9] \p{XDigit}
THE CHARACTER CLASS
• Single Character Matching
• Specify a group of characters enclosed within a pair of
rectangular brackets [ ]
• Example
• [od] matches either o or d
• [od][de] matches four patterns
• od
• oe
• dd
• de
• To match woodhouse and wodehouse
• wo[od][de]house
NEGATING A CLASS
• Use a caret ^ to negate the character class
• The same to the bang ! in shell wild card
• Example
• [^a-zA-Z] matches single non-alphabetic character string
• [^0-9] match single non-numeric character string
*
• Use * to matches the preceding pattern element zero or
more times
• refers to the immediate preceding pattern
• Nothing common with the * in wild card
• Example
• e* matches null, e, ee, eee, eeee, ...
• s*printf matches print, sprint, ssprintf, sssprintf, ...
• to match trueman and truman
• true*man
• to match wilcox and wilcocks
• wilco[cx]k*s*
.
• Use . to match single any character except a newline
• The same to the ? in wild card
• Within square brackets the dot is literal
• Can be escape with \
• Example
• 2...
• matches a four-character pattern beginning with a 2
• chap..
• matches a two-character pattern beginning with chap
• \.[co]
• matches .c or .o
• [.][co]
.*
• Use .* to signify any number of characters or none
• similar to * in wild card
• Example
• p.j. woodhouse
• p. woodhouse
• p.j.woodhouse
• p.*woodhouse
• A regular expression match is made for the longest
possible string
• 03.*05 will match 03 and 05 as close to the left and right of
the line, respetively
^ AND $
• Most of the regular expression characters are used for
matching patterns
• Use ^ and $ to specify pattern locations
• ^ matches pattern at the beginning of a line
• $ matches pattern at the end of a line
• Example
• bash$
• bash at the end of line
• ^bash
• bash at the beginning of the line
• Find lines with bash as the only word inline
• ^bash$
• Find blank lines
• ^$
EXAMPLE
2365 :john woodcook :director :personnel :05/11/47 :120000
5678 :robert dylan :d.g.m :marketing 04/19/43 :85000
9876 :bill Johnson :director :production :03/12/50 :130000
2233 :charles harris :g.m. :sales :12/12/52 :90000
5423 :barry wood :chairman :admin :08/30/56 :160000

• Find lines with beginning of 2


• ^2
• Find lines with ending range from 80000 to 99999
• [89]....$
• Find lines with beginning of character that is not 2
• ^[^2]
TRIPLE ROLES OF ^
• Beginning of a character class
• Negates every character of the class
• [^a-z]
• Beginning of the expression
• Pattern matched at the beginning of the line
• ^2
• Other locations
• Matches itself
• Discussion
• ^^^ and ^[^^]
ESCAPING
• Some of the special characters may exist as text
• . and * lost meanings when placed inside character class
• [.] matches .
• s[*] matches s*
• * is also matched literally if it is the first character
• *sta[re] matches *star and *stae
• For others, use \ to escape from the metacharacters
• g\* matches g*
• \[ matches [
• \.\* matches .*
PRACTICE
• Handel", Hundel, and Haendel
• can be specified by the pattern H[ua]e*ndel
• .at
• matches any three-character string ending with "at",
including "hat", "cat", and "bat".
• [hc]at
• matches "hat" and "cat".
• [^b]at
• matches all strings matched by .at except "bat".
• [^hc]at
• matches all strings matched by .at other than "hat" and
"cat".
PRACTICE
• ^[hc]at
• matches "hat" and "cat", but only at the beginning of the
string or line.
• [hc]at$
• matches "hat" and "cat", but only at the end of the string or
line.
• \[.\]
• matches any single character surrounded by "[" and "]"
since the brackets are escaped, for example: "[a]" and "[b]".
• s.*
• matches s followed by zero or more characters, for
example: "s" and "saw" and "seed".
CASE STUDY
• To find hi in a paragraph
• Regex: hi
• match all hi in the paragraph
• Problem
• Also matches
• him
• history
• high
• Solution
• \bhi\b
\b
• Matches a zero-width boundary between a word-class
character and either a non-word class character or an
edge
• In general, the delimiters between words are space,
punctuation, newline, etc. However, \b does not matches
any of them, it only matches a position.
• Example
• er\b matches never, but doesn’t match verb
\B
• Matches a none zero-width boundary between a word-
class character and either a non-word class character or
an edge
• Example
• er\B matches verb, but doesn’t match never
\w
• Matches an alphanumeric character, including "_“
• same as [A-Za-z0-9_] in ASCII

\W
• Matches a non-alphanumeric character, excluding "_“
• same as [^A-Za-z0-9_] in ASCII
BRE SUMMARY
Pattern Matches
* zero or more occurrences of the previous character
. a single character
[pqr] a single character p, q, or r
[c1-c2] a single character within the asci range between c1 and c2
[^pqr] a single character which is not p, q, or r
^pat pattern par at beginning of line
pat$ pattern par at end of line
\b a zero-length boundary matches at a position
\B the negated version of \b
\w an alphanumeric character, including "_"
\W a non-alphanumeric character, excluding "_"
PRACTICE
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)

in[du] STRING1 match finds ind in Windows


STRING2 match finds inu in Linux
x[0-9A-Z] STRING1 no match Again the tests are case sensitive to find
the xt in DigExt we would need to use [0-9a-z] or
[0-9A- Zt]. We
can also use this format for testing upper
and lower case e.g. [Ff] will check for lower and
upper
case F.
STRING2 match Finds x2 in Linux2
[^A-M]in STRING1 match Finds Win in Windows
STRING2 no match We have excluded the range A
to M in our
search so Linux is not found but linux (if it were
present) would be found.
PRACTICE
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)

m STRING1 match Finds the m in compatible


STRING2 no match There is no lower case m in this string. Searches
are case sensitive
unless you take special action.
a/4 STRING1 match Found in Mozilla/4.0 - any
combination of characters can be
used for the match
STRING2 match Found in same place as in
STRING1
5 \[ STRING1 no match The search is looking for a pattern of '5 [' and this
does NOT exist in
STRING1. Spaces are valid in searches.
STRING2 match Found in Mozilla/4.75 [en]
Note: The \
(backslash) is an escape character and must be
present since the following [ is a meta character that we will
meet in
the next section.
in STRING1 match found in Windows
STRING2 match Found in Linux
le STRING1 match found in compatible
STRING2 no match There is an l and an e in this string but they are
PRACTICE
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)

[a-z]\)$ STRING1 match finds t) in DigiExt) Note: The \ is an escape character and
is required to treat the ) as a literal
STRING2 no match We have a numeric value at the end of this string but we
would need [0-9a-z]) to find it.
.in STRING1 match Finds Win in Windows.
STRING2 match Finds Lin in Linux.
EXTENDED REGULAR EXPRESSIONS
• "Extended" is relative to the original UNIX grep
• grep only had bracket, dot, caret, dollar and star
• Standardizes a flavor similar to the one used by the UNIX
egrep command
• egrep did not maintain compatibility with grep
• use a backslash to suppress the meaning of
metacharacters
• Adds ?, +, and |, and it removes the need to escape the
metacharacters ( ) and { }, which are required in BRE.
+ AND ?
• Often used in place of the * to restrict the matching scope
• + matches one or more occurrences of the previous character
• Example
• b+ matches b, bb, bbb, bbbb, ...
• doesn’t matches nothing
• #include +<stdio.h> matches
#include <stdio.h>, #include <stdio.h>, #include <stdio.h>
• ? matches zero or one occurrence of the previous character
• Example
• b? matches null, b
• doesn’t matches bb, bbb, ...
• true?man matches trueman and truman
• Typical usage: # ?include +<stdio.h>
| AND ()
• Use | to serve as the delimiter of multiple patterns
• Example
• woodhouse|woodcock
• matches woodhouse or woodcock
• Use ( ) to group patterns
• works with | as a better alternative
• Example
• wood(house|cock)
• matches woodhouse or woodcock
• gr(a|e)y
• matches gray and grey
• @(samp|code)\{[^}]+\}
• matches @code{foo} and @samp{bar}
( ) IN BRE
• ( ) can also be used in basic regular expressions
• Requires \( \)
PRACTICE
• wilco[cx]k*s*|wood(house|cock)
• woodcock
• woodhouse
• wilcocwoodcock
• wicowood
• wilcocx
• woodcoxk
• wilcocxks
• Woodcoxks
• wilcocks
• wilcox
PRACTICE
Is following statements correct?
• a+ = aa*
• a? = (a|ε)
REPETITION TIMES
• Use { } to denote the match count
• {m} Denotes the accurate m match count.
• {m,n} Denotes the minimum m and the maximum n match
count
• {m,} Denotes the minimum m match count
• Example
• ca{1,3}ndy matches candy, caandy, caaandy, not caaaandy
• a{2} matches caandy and caaandy, but not candy
• a{1,3} matches candy, caandy, caaandy, caaaandy
• a{2,} matches caandy, caaandy, caaaandy
{ } IN BRE
• { } can also be used in basic regular expressions
• Requires \{ \}
ERE SUMMARY
Pattern Matches
+ matches one or more occurrences of the previous
character
? matches zero or one occurrences of the previous character
| This is the alternative operator (logical OR), as the
delimiter of multiple patterns
(...) Used for grouping in regular expressions, as in arithmetic.
{m} the accurate m match count
{m,n} the minimum m and the maximum n match count
{m,} the minimum m match count
EXAMPLE
• [hc]+at
• matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and
so on, but not "at".
• [hc]?at
• matches "hat", "cat", and "at".
• [hc]*at
• matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", "at",
and so on.
• cat|dog
• matches "cat" or "dog".
PRACTICE
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)

\(.*l STRING1 match finds the ( and l in (compatible. The opening \ is


an escape character used to indicate the ( it
precedes is a literal (search character) not a
metacharacter.
STRING2 no match Mozilla contains lls but not preceded by an
open parenthesis (no match) and Linux has an
upper case L (no match).
W*in STRING1 match Finds the Win in Windows.
STRING2 match Finds in in Linux preceded by W zero times - so
a match.
[xX][0-9a-z]{2} STRING1 no match Finds x in DigExt but only one t.
STRING2 match Finds X and 11 in X11.
PRACTICE
STRING1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
STRING2 Mozilla/4.75 [en](X11;U;Linux2.2.16-22 i586)

^([L-Z]in) STRING1 no match The '^' is an anchor (because it lies outside any
square brackets) indicating first position. Win
does not start the string so no match.
STRING2 no match The '^' is an anchor (because it lies outside any
square brackets) indicating first position. Linux
does not start the string so no match.
((4\.[0-3])|(2\.[0-3])) STRING1 match Finds the 4.0 in Mozilla/4.0. The '\.' sequence
uses the escape metacharacter (\) to ensure
that the '.' (dot) is used as a literal in the
search.
STRING2 match Finds the 2.2 in Linux2.2.16-22.
(W|L)in STRING1 match Finds Win in Windows.
STRING2 match Finds Lin in Linux.
MORE REGULAR EXPRESSIONS
• https://en.wikipedia.org/wiki/Regular_expression
• http://www.regular-expressions.info/
• https://
en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extend
ed_Regular_Expressions
GREP: SEARCHING FOR A PATTERN
• Unix has a special family of commands for handling
search requirements
• grep scans its input for a pattern, and displays:
• The selected pattern
• The line numbers
• Or the filenames where the pattern occur
• Format
• $ grep options pattern filenames
• Example
• $ grep “sales” emp.list
FILTER PROGRAMS
• grep and sed are filter programs
• They do not change a file "in place"
• Produce a new data stream
• To be piped into another command, or
• captured into a new file with output redirection
• grep and sed are interpreted, not compiled
• hence they tend to be slower at run time
• Fast in terms of programming time
COMMAND FEATURES
Commands Standard Input Standard Output
mkdir, rmdir, cp, rm No No
ls, pwd, who No Yes
lp, lpr Yes No
cat, wc, gzip Yes Yes

• Commands in the forth category are called filters


• dual stream-handling feature
• makes them powerful text manipulators
• Flexible usage
• $ wc < calc.txt > result.txt
• $ wc > result.txt < calc.txt
• $ wc>result.txt<calc.txt
• $ > result.txt < calc.txt wc
GREP AS A FILTER
• Because grep is also a filter, it is able to
• search standard input for the pattern
• store the output in a file
• Example
• $ who | grep henry > foo
• $ grep henry < namelist.txt > foo
SUPRESS THE FILENAME
• When grep is used with multiple filenames, it displays the
filenames along with the output.
• $ grep ‘director’ emp1.lst emp2.lst
emp1.lst:1006:gordon lightfood:director:sales:09/03/38:140000
emp1.lst:6521:derryk o’brien:director:marketing:09/26/45:125000
emp2.lst:9876:bill johnson:director:production:03/12/05:130000
emp2.lst:2365:john woodcook:director:personnel:05/11/47:120000
• To suppress the filenames:
• make grep ignorant of the source of its input
• $ cat emp[12].lst | grep ‘director’
• $ grep ‘director ‘ < emp1.list
• use cut to select all but the first field using grep as its input
QUOTING IN GREP
• Do we need quoting in grep?
• $ grep “sales” emp.list
• $ who | grep henry > foo
• $ grep ‘director’ emp1.lst emp2.lst
• Quoting is essential if the search string consists of
• more than one word (has space in the search pattern)
• any of the shell’s metacharacters
• Example
• $ grep gordon lightfoot emp1.lst
• error: lightfoot: no such file or directory
• emp1.lst:1006:gordon lightfood:director:sales:09/03/38:140000
• $ grep ‘gordon lightfoot’ emp1.lst
• 1006:gordon lightfood:director:sales:09/03/38:140000
SINGLE OR DOUBLE QUOTE
• Principle
• single quote protects double quote
• double quote protects single quote
• double quote allows command substitution and variable
evaluation
• Example
• $ grep ‘neil o’bryan’ emp1.lst
• >
• $ grep “neil o’bryan” emp1.lst
• 4290:neil o’bryan:executive:production:09/07/50:65000
• $ grep ‘Ted ”TK” Kim ’ emp1.lst
• 2210:Ted “TK” Kim:executive:research:03/01/60:135000
• $ grep “`echo name`” emp1.lst
• $ grep “$USERNAME” emp1.lst
WHEN GREP FAILS
• grep simply returns the prompt when the pattern can’t be
located
• $ grep president emp.lst
• $
• A similar behavior to cmp and sed
• pattern search failed
• command execution success
• exit status
• 0
• 1
GREP OPTIONS
Options Significance
-i Ignores case for matching
-v Doesn’t display lines matching expression
-n Displays line numbers along with lines
-c Displays count of number of occurrences
-l Displays list of file names only
-e exp Specifies expression exp with this option. Can use
multiple times. Also used for matching expression
beginning with a hyphen
GREP OPTIONS (CONT.)
Options Significance
-x Matches pattern with entire line (doesn’t match
embedded patterns)
-f file Takes patterns from file, one per line
-E Treat pattern as an extended regular expression (ERE)
-F Matches multiple fixed strings
-n Displays line and n lines above and below (Linux only)
-A n Displays line and n lines after matching lines (Linux only)
-B n Displays line and n lines before matching lines (Linux only)
-e
• Matching Multiple Patterns
• $ grep -e gordon -e derryk -e bill emp.lst
1006:gordon lightfood:director:sales:09/03/38:140000
6521:derryk o’brien:director:marketing:09/26/45:125000
9876:bill johnson:director:production:03/12/05:130000
• Matching Expression beginning with a hyphen
• $ grep “-mtime” filename.txt
grep: invalid option - m
Usage: grep [OPTION] ... PATTERN [FILE]...
Try `grep --help` for more information.
• $ grep -e “-mtime” filename.txt
romeo:55 17 * * 4 find / -name core –mtime +30 -print
USING REGULAR EXPRESSIONS
• Regular expressions introduce efficient pattern matching
• Regular expression interpreted by the command not by
the shell
• Quoting ensures that shell isn’t able to interfere
• grep
• support basic regular expression by default
• $ grep “expression” filenames
• support extended regular expression by –E option
• $ grep -E “expression” filenames
• if grep doesn’t support -E, use egrep instead
APPLICATIONS
• Listing Only Directories
• $ ls -l | grep “^d”

• Identifying files with write permissions for group users


• $ ls -l | grep “^.....w”
CUT COMMAND
• Review
• $ head –n 5 file.txt
• $ tail –n 3 file.txt
• Head and tail slice a file horizontally
• In contrast, cut is the command that slice a file vertically
• cutting columns
• $ cut -c1-4 file.txt
• get the 1st to 4th column of each line in the file
• cutting fields
• $ cut -d”:” –f1,3 file.txt
• get the 1st and 3rd field of each line in the file
PRACTICE
• what does these command do?
• $ grep a b c
• find a in file b and c
• $ grep <HTML> foo
• not working, since < and > shall be quoted
• $ grep “**” foo
• looks for zero or more *
• matches all lines
• $ grep *
• If * expands to multiple filenames, grep looks for the first
filename in the remaining files.
• If * expands to a single filename, grep searches the
standard input.
SUMMARY
• grep is used to search lines from input
• using regular expression to search
SED: ADDRESSING
• Addressing in sed is done in two ways:
• Line Addressing
• by one or two line numbers
• 3,7
• Content Addressing
• By specifying a /-enclosed pattern which occurs in a line
• /From:/
LINE ADDRESSING
• Addressing by line numbers
• Example
• $ sed ‘3q’ emp.lst
• q is the action of quit
• means quits after line number 3
• $ sed -n ‘1,2p’ emp.lst
• p is the action of print
• means print line number 1 through 2
• -n is used to suppress the behavior of printing all lines
• when using p
• $ sed -n ‘$p’ emp.lst
• $ selects last line
• means print last line
LINE ADDRESSING (CONT.)
• More Examples
• $ sed -n ‘9,11p’ emp.lst
• means print line number 9 through 11
• $ sed -n ‘1,2p; 7,9p; $p’ emp.lst
• selecting multiple groups of lines
• can be written in multiple lines without semicolon
• $ sed -n ‘1,2p
> 7,9p
> $p’ emp.lst
• $ sed -n ‘3,$!p’ emp.lst
• ! is used to negate the action
• means print line number 1 through 2
CONTEXT ADDRESSING
• Addressing by contexts (pattern matching)
• Example
• $ sed -n ‘/From:/p’ $HOME/mbox
• means print lines contains From:
• Using regular expression to help pattern matching
• $ sed -n ‘/^From:/p’ $HOME/mbox
• $ sed -n ‘/wilco[cx]k*s*/p’ emp.lst
• $ sed -n “/o’br[iy][ae]n/p;/lennon/p” emp.lst
• Using double quote to protect single quote
• using semicolon for multiple action
• Only support basic regular expression!
CONTEXT ADDRESSING (CONT.)
• Using comma to select a group of contiguous lines
• $ sed -n ‘/johnson/,/lightfoot/p’ emp.lst
• print lines between johnson and lightfoot
• what if we have multiple johnson and lightfood?
• print lines between first johnson and last lightfood
• $ sed -n ‘1,/woodcock/p’ emp.lst
• supports mix of line and context address
• print lines between 1st line and woodcock
SED: WRITING LINES TO A FILE
• Using w command to write selected lines to a file
• Example
• $ sed ‘/<FORM>/,/<\/FORM>/w forms.html’ *.html
• extract all FORMs from all html files
• write these lines to forms.html
• use -n to suppress the print of content in *.html
SED: TEXT EDITING
• sed can insert text and change existing text in a file.
• i – insert
• a – append
• c – change
• d – delete
• Examples:
• $ sed ‘1i #include <stdio.h>’ foo.c > $$; mv $$ foo.c
• 1i means to insert text at line number 1
• output redirected to file $$
• move $$ to foo.c to overwrite foo.c
• echo ”#include <stdio.h>" | cat - foo.c> $$; mv $$ foo.c
• Inserting multiple lines by
• $ sed ‘1i\
> #include <stdio.h>\
> #include <unistd.h>
> ’ foo.c > $$; mv $$ foo.c
SED: TEXT EDITING (CONT.)
• Examples:
• $ sed ‘a\
>
> ’ emp.lst
• Using a (append) without specifying line numbers
• Appending text to every line of the file
• insert a blank line after each line
• $ sed '
> /WORD/ c\
> new sentence
> ‘ emp.lst
• replace WORD by new sentence
• $ sed ‘/^#/d’ emp.lst
• delete lines starting with #
SED: SUBSTITUTION
• Substitution is the most important feature of sed
• Usage
• $ sed ‘[address]s/expression1/expression2/flags’ filename(s)
• Description
• expression1 is replaced with expression2 in all lines
specified by [address]
• If the address is not specified, the substitution is performed
for all matching lines
• if flags is set to g, all occurrences are replaced, else, only
the first occurrence is replaced
SED: SUBSTITUTION EXAMPLE
• Example
• $ sed ‘s/:/|/’ emp.lst
• only the first instance of the : in a line
• $ sed ‘s/:/|/g’ emp.lst
• use the g(global) flag to replace all the :
• $ sed ‘s/^/2/;s/$/.00/’ emp.lst
• using regular expression
• $ sed ‘s/<I>/<EM>/g
> s/<EM>/<STRONG>/g’ form.html
• using multiple lines to create multiple substitution
• sed processes several instructions in a sequential manner.
• Each instruction operates on the output of the previous
instruction.
SED OPTIONS
• -e
• lets you use multiple instructions
• $ sed -e ‘/<FORM>/,/<\/FORM>/w forms.html’
-e ‘/<TABLE>/,/<\/TABLE>/w tables.html’
-e ‘/<FRAME>/,/<\/ FRAME >/w frames.html’
*.html
• -f
• take instructions from a file
• when you have a group of instructions to execute
• place them in a file and use sed with the - f option
PRACTICE
• Use sed to insert <HTML> and </HTML> to the beginning
and end of foo.html, repectively
• $ sed -e ‘1i\<HTML>‘ -e ‘$a\</HTML>‘ foo.html > $$; mv $$ foo.html
• Explain what will happen
• $ sed -e ‘s/compute/calculate/g’ -e ‘s/computer/host/g’ foo
• Solution
• $ sed -e ‘s/computer/host/g’ -e ‘s/compute/calculate/g’ foo
• How to sort a file that is double-spaced (even numbered lines are
blank lines) and still preserve the blank lines?
• $ sort foo | sed -e ‘/^ *$/d’ -e ‘a\
> [blank line]
>‘
SUMMARY
• Sed is a stream editor in UNIX
• Sed uses line number and content to address lines
• Sed supports operations of print, insert, append, change,
delete, write and substitution
• Output of sed is a standard output

You might also like