This document provides a cheat sheet for using regular expressions in R. It summarizes common patterns used in regular expressions to match characters, lists regular expression functions in base R and the stringr package, and describes options for making regular expressions case insensitive, lazy, or using lookahead/lookbehind operations. It is a concise reference for working with regular expressions in R.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
78 views
Regex
This document provides a cheat sheet for using regular expressions in R. It summarizes common patterns used in regular expressions to match characters, lists regular expression functions in base R and the stringr package, and describes options for making regular expressions case insensitive, lazy, or using lookahead/lookbehind operations. It is a concise reference for working with regular expressions in R.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
pattern
regmatches(string, regexpr(pattern, string))
Cheat Sheet extract first match [1] "tam" "tim" string regmatches(string, gregexpr(pattern, string)) extract all matches, outputs a list [[1]] "tam" [[2]] character(0) [[3]] "tim" "tom" stringr::str_extract(string, pattern) extract first match [1] "tam" NA "tim" [[:digit:]] or \\d Digits; [0-9] stringr::str_extract_all(string, pattern) \\D Non-digits; [^0-9] extract all matches, outputs a list [[:lower:]] Lower-case letters; [a-z] > string <- c("Hiphopopotamus", "Rhymenoceros", "time for bottomless lyrics") stringr::str_extract_all(string, pattern, simplify = TRUE) [[:upper:]] Upper-case letters; [A-Z] > pattern <- "t.m" extract all matches, outputs a matrix [[:alpha:]] Alphabetic characters; [A-z] stringr::str_match(string, pattern) [[:alnum:]] Alphanumeric characters [A-z0-9] extract first match + individual character groups \\w Word characters; [A-z0-9_] \\W Non-word characters grep(pattern, string) regexpr(pattern, string) stringr::str_match_all(string, pattern) [[:xdigit:]] or \\x Hexadec. digits; [0-9A-Fa-f] [1] 1 3 find starting position and length of first match extract all matches + individual character groups [[:blank:]] Space and tab grep(pattern, string, value = TRUE) gregexpr(pattern, string) [[:space:]] or \\s Space, tab, vertical tab, newline, [1] "Hiphopopotamus" find starting position and length of all matches form feed, carriage return [2] "time for bottomless lyrics“ stringr::str_locate(string, pattern) \\S Not space; [^[:space:]] sub(pattern, replacement, string) grepl(pattern, string) find starting and end position of first match replace first match [[:punct:]] Punctuation characters; [1] TRUE FALSE TRUE !"#$%&’()*+,-./:;<=>?@[]^_`{|}~ stringr::str_locate_all(string, pattern) gsub(pattern, replacement, string) [[:graph:]] Graphical characters; stringr::str_detect(string, pattern) find starting and end position of all matches replace all matches [[:alnum:][:punct:]] [1] TRUE FALSE TRUE stringr::str_replace(string, pattern, replacement) [[:print:]] Printable characters; [[:alnum:][:punct:]\\s] replace first match [[:cntrl:]] or \\c Control characters; \n, \r etc. stringr::str_replace_all(string, pattern, replacement) strsplit(string, pattern) or stringr::str_split(string, pattern) replace all matches
\n New line . Any character except \n
^ Start of the string * Matches at least 0 times \r Carriage return | Or, e.g. (a|b) $ End of the string + Matches at least 1 time \t Tab […] List permitted characters, e.g. [abc] \\b Empty string at either edge of a word ? Matches at most 1 time; optional string \v Vertical tab [a-z] Specify character ranges \\B NOT the edge of a word {n} Matches exactly n times \f Form feed [^…] List excluded characters \\< Beginning of a word {n,} Matches at least n times (…) Grouping, enables back referencing using \\> End of a word {n,m} Matches between n and m times \\N where N is an integer
(?=) Lookahead (requires PERL = TRUE),
e.g. (?=yx): position followed by 'xy' By default R uses extended regular expressions. Metacharacters (. * + etc.) can be used as By default the asterisk * is greedy, i.e. it always (?!) Negative lookahead (PERL = TRUE); You can switch to PCRE regular expressions literal characters by escaping them. Characters matches the longest possible string. It can be position NOT followed by pattern using PERL = TRUE for base or by wrapping can be escaped using \\ or by enclosing them used in lazy mode by adding ?, i.e. *?. (?<=) Lookbehind (PERL = TRUE), e.g. patterns with perl() for stringr. in \\Q...\\E. (?<=yx): position following 'xy' Greedy mode can be turned off using (?U). This (?<!) Negative lookbehind (PERL = TRUE); All functions can be used with literal searches switches the syntax, so that (?U)a* is lazy and position NOT following pattern using fixed = TRUE for base or by wrapping (?U)a*? is greedy. patterns with fixed() for stringr. Regular expressions can be made case insensitive ?(if)then If-then-condition (PERL = TRUE); use using (?i). In backreferences, the strings can be lookaheads, optional char. etc in if-clause All base functions can be made case insensitive converted to lower or upper case using \\L or \\U ?(if)then|else If-then-else-condition (PERL = TRUE) Regular expressions can conveniently be by specifying ignore.cases = TRUE. (e.g. \\L\\1). This requires PERL = TRUE. *see, e.g. http://www.regular-expressions.info/lookaround.html created using e.g. the packages rex or rebus. http://www.regular-expressions.info/conditional.html