Linux Regular Expression
Linux Regular Expression
As a Linux administrator, you'll need to work with text files. Different tools such as grep, awk
and sed are at your disposal to find files that contain a specific text string. Here I offer an
introduction to working with regular expressions to search for text in these files in a flexible
manner.
Let's consider an example where regular expressions play a role. For instance, if you try a
command like grep -r host /, it would give you a huge result because every word that
contains the text host (e.g., words like ghostscript) would match because they contain the string
host. By using regular expression you can be much more specific about what you are looking for.
For example, you can tell grep that it should look only for lines that start with the word host by
using the regular expression '^host'.
Regular expressions are not available for all commands -- the command that you use must be
programmed to work with regular expressions. The most common examples of such commands,
are the grep, tr and vi utilities. Other utilities, like sed and awk can also work with them.
grep 'lin.x' *
The dot in the regular expression 'lin.x' has a special meaning, it makes every character at that
particular position in the text string viewed as a match. To prevent interpretation problems, I
advise you to always put regular expressions between single quotes. This way, you'll prevent the
shell from interpreting the regular expression.
^: indicates that the text string has to be at the beginning of a line. So, to find lines only
that have the word "hosts" at the beginning of a line, use: grep -ls '^hosts'
$: refers to the end of a line. So, to find lines only that have the word "hosts" at the end of
the line, use: grep -ls 'hosts$'
You can combine ^ and $ in a regular expression. To find lines that contain only the word "yes",
you would use grep -ls '^yes$'
.: a wildcard that refers to any character, with the exception of a newline character. To
find lines that contain tex, tux, tox or tix, use: grep -ls 't.x'
[ ]: indicates in a regular expression that characters between the square brackets are
interpreted as alternatives. To find users that have the name pinda or linda: grep -ls
'[pl]inda'
[^ ]: ignores all characters between square brackets after the ^ sign. To find all lines that
have the text inda in them, but not lines that contain the text linda or pinda: grep -ls
'[^pl]inda'
-: refers to a class or a range of characters. This is useful in commands like tr, where the
following is used to translate all lowercase letters into uppercase letters: tr a-z A-Z <
mytext. Likewise, you could use a regular expression to find all files that have lines that
start with a number, using: grep -ls '^0-9'
\< and \>: searches for patterns at the beginning of a word or at the end of a word. To find
lines that have words beginning with "san": grep \<SAN< code>. These regular
expressions have two disadvantages -- they don't find lines that start
with the provided regular expression and they are not supported by all
utilities, however, vi and grep will work.
These regular expressions help you find words that contain certain text
strings. You can also use regular expressions to specify how often a given
string should occur in a word. For example, you can use a regular expression
to search for files containing the username "linda" exactly three times. To do
this, you need to use regular expression repetition operators and you need to
make sure that the entire regular expression is in quotes. Without the quotes,
you may end up with the shell interpreting your repetition operator.
*: indicates that the preceding regular expression may occur once, more
than once or not at all. Caution: don't try to use it as a * in the
shell -- in a shell environment, * stands for any character. In regular
expressions, * indicates that the preceding regular expression may
exist.
?: indicates that there may be a character at this position (but there
doesn't have to be). For example, where both the words color and colour
are found: grep -ls 'colo.r'
Here you have been given an overview of how to work with regular expressions.
This allows you to do your work as an administrator more efficiently. Regular
expressions have much more to offer, including rather complicated operations.
However, before starting on that path, make sure you master the skills
discussed here. Regular expressions can be so complex that it can be easy to
get lost in them.
ABOUT THE AUTHOR: Sander van Vugt is an author and independent technical
trainer, specializing in Linux since 1994. Vugt is also a technical
consultant for high-availability (HA) clustering and performance
optimization, as well as an expert on SUSE Linux Enterprise Desktop 10 (SLED
10) administration.