0% found this document useful (0 votes)
78 views

Regex

This document provides a cheat sheet for using regular expressions in R. It summarizes common patterns used in regular expressions to match characters, lists regular expression functions in base R and the stringr package, and describes options for making regular expressions case insensitive, lazy, or using lookahead/lookbehind operations. It is a concise reference for working with regular expressions in R.

Uploaded by

Gary Goyle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Regex

This document provides a cheat sheet for using regular expressions in R. It summarizes common patterns used in regular expressions to match characters, lists regular expression functions in base R and the stringr package, and describes options for making regular expressions case insensitive, lazy, or using lookahead/lookbehind operations. It is a concise reference for working with regular expressions in R.

Uploaded by

Gary Goyle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

pattern

regmatches(string, regexpr(pattern, string))


Cheat Sheet extract first match [1] "tam" "tim"
string regmatches(string, gregexpr(pattern, string))
extract all matches, outputs a list
[[1]] "tam" [[2]] character(0) [[3]] "tim" "tom"
stringr::str_extract(string, pattern)
extract first match [1] "tam" NA "tim"
[[:digit:]] or \\d Digits; [0-9] stringr::str_extract_all(string, pattern)
\\D Non-digits; [^0-9] extract all matches, outputs a list
[[:lower:]] Lower-case letters; [a-z] > string <- c("Hiphopopotamus", "Rhymenoceros", "time for bottomless lyrics")
stringr::str_extract_all(string, pattern, simplify = TRUE)
[[:upper:]] Upper-case letters; [A-Z] > pattern <- "t.m"
extract all matches, outputs a matrix
[[:alpha:]] Alphabetic characters; [A-z]
stringr::str_match(string, pattern)
[[:alnum:]] Alphanumeric characters [A-z0-9]
extract first match + individual character groups
\\w Word characters; [A-z0-9_]
\\W Non-word characters grep(pattern, string) regexpr(pattern, string) stringr::str_match_all(string, pattern)
[[:xdigit:]] or \\x Hexadec. digits; [0-9A-Fa-f] [1] 1 3 find starting position and length of first match extract all matches + individual character groups
[[:blank:]] Space and tab grep(pattern, string, value = TRUE) gregexpr(pattern, string)
[[:space:]] or \\s Space, tab, vertical tab, newline, [1] "Hiphopopotamus" find starting position and length of all matches
form feed, carriage return [2] "time for bottomless lyrics“ stringr::str_locate(string, pattern)
\\S Not space; [^[:space:]] sub(pattern, replacement, string)
grepl(pattern, string) find starting and end position of first match replace first match
[[:punct:]] Punctuation characters; [1] TRUE FALSE TRUE
!"#$%&’()*+,-./:;<=>?@[]^_`{|}~ stringr::str_locate_all(string, pattern) gsub(pattern, replacement, string)
[[:graph:]] Graphical characters; stringr::str_detect(string, pattern) find starting and end position of all matches replace all matches
[[:alnum:][:punct:]] [1] TRUE FALSE TRUE
stringr::str_replace(string, pattern, replacement)
[[:print:]] Printable characters;
[[:alnum:][:punct:]\\s] replace first match
[[:cntrl:]] or \\c Control characters; \n, \r etc. stringr::str_replace_all(string, pattern, replacement)
strsplit(string, pattern) or stringr::str_split(string, pattern) replace all matches

\n New line . Any character except \n


^ Start of the string * Matches at least 0 times
\r Carriage return | Or, e.g. (a|b)
$ End of the string + Matches at least 1 time
\t Tab […] List permitted characters, e.g. [abc]
\\b Empty string at either edge of a word ? Matches at most 1 time; optional string
\v Vertical tab [a-z] Specify character ranges
\\B NOT the edge of a word {n} Matches exactly n times
\f Form feed [^…] List excluded characters
\\< Beginning of a word {n,} Matches at least n times
(…) Grouping, enables back referencing using
\\> End of a word {n,m} Matches between n and m times
\\N where N is an integer

(?=) Lookahead (requires PERL = TRUE),


e.g. (?=yx): position followed by 'xy' By default R uses extended regular expressions. Metacharacters (. * + etc.) can be used as By default the asterisk * is greedy, i.e. it always
(?!) Negative lookahead (PERL = TRUE); You can switch to PCRE regular expressions literal characters by escaping them. Characters matches the longest possible string. It can be
position NOT followed by pattern using PERL = TRUE for base or by wrapping can be escaped using \\ or by enclosing them used in lazy mode by adding ?, i.e. *?.
(?<=) Lookbehind (PERL = TRUE), e.g. patterns with perl() for stringr. in \\Q...\\E.
(?<=yx): position following 'xy' Greedy mode can be turned off using (?U). This
(?<!) Negative lookbehind (PERL = TRUE); All functions can be used with literal searches switches the syntax, so that (?U)a* is lazy and
position NOT following pattern using fixed = TRUE for base or by wrapping (?U)a*? is greedy.
patterns with fixed() for stringr. Regular expressions can be made case insensitive
?(if)then If-then-condition (PERL = TRUE); use
using (?i). In backreferences, the strings can be
lookaheads, optional char. etc in if-clause
All base functions can be made case insensitive converted to lower or upper case using \\L or \\U
?(if)then|else If-then-else-condition (PERL = TRUE) Regular expressions can conveniently be
by specifying ignore.cases = TRUE. (e.g. \\L\\1). This requires PERL = TRUE.
*see, e.g. http://www.regular-expressions.info/lookaround.html created using e.g. the packages rex or rebus.
http://www.regular-expressions.info/conditional.html

CC BY Ian Kopacka • [email protected] Updated: 10/18

You might also like