0% found this document useful (0 votes)

9 views

RegEx in Python (4)

Uploaded by

Yash Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

RegEx in Python (4)

Uploaded by

Yash Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

REGULAR EXPRESSIONS (REGEX) IN PYTHON:

Regular Expressions (RegEx) are a powerful tool for pattern matching and text manipulation. In Python, regex
functionality is implemented through the re module.

APPLICATIONS OF REGEX
● Data validation
● Data extraction
● Input sanitization (data cleaning)

This document explains regex basics, syntax, functions, and practical examples with improved clarity and structure.

What is a Regular Expression?

A Regular Expression is a sequence of characters that defines a search pattern. It can be used to match strings,
validate formats, or extract information.

COMMON USE CASES OF REGEX THAT ARE ALSO COVERED IN THIS ARTICLE WITH DETAILED EXPLANATION:

● Extracting email addresses

● Extracting timestamps from logs
● Extracting URLs
● Validating phone numbers or dates
● Searching for words or patterns in text
● Validating passwords

Regex Syntax in Python

To use regex, you define a pattern or a regex expression that consists of special characters and sequences, which
defines what to look for in a text.
Here are some of the most common components of regex syntax:

1. SPECIAL CHARACTERS
Character Description
. Matches any single character.
^ Matches the start of the string.
$ Matches the end of the string.
* Matches 0 or more repetitions.
+ Matches 1 or more repetitions.
? Matches 0 or 1 occurrence.
{n} Matches exactly n occurrences.
{n,} Matches n or more occurrences.
{n,m} Matches between n and m occurrences.
\ Escapes special characters.

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

2. CHARACTER CLASSES
Syntax Description
[arn] where one of the a, r or n is present
[a-n] returns a match for any lowercase character between a and n
[^arn] returns a match where character is not a, r or n
[0123] return a match where 0,1,2 or 3 is present
[0-9] returns a match where a number between 0 to 9
[0-5][0-9] returns a match for any number between 00-59
[a-zA-Z] returns a match for any alphabetical character
[+] in sets, special characters have no meaning, so it will return a match if a '+' character is found.

3. PREDEFINED SEQUENCES
Sequence Description
\A returns a match if the specified characters are at the start of the string
\b Returns a match where the specified characters are at the beginning or at the end of a word
\B A match where the specified characters are present, but NOT at the beginning or at the end of a word
\d returns a match where the string contains digits 0-9
\D returns a match where the string does not contains digits 0-9
\s returns a match where the string contains a white space character
\S returns a match where the string DOES NOT contains a white space character
\w returns a match where the string contains word character i.e., a-zA-Z0-9 and underscore
\W returns a match where the string DOES NOT contain a word character
\Z returns a match if the specified characters are at the end of the string.

4. GROUPING AND CAPTURING

Parentheses () are used to group parts of a regex pattern and capture matches. Capturing groups save the matched
content for later use, while non-capturing groups allow grouping without saving the matched content.

CAPTURING GROUP
A capturing group matches the specified pattern and saves the matched content for reference. For example:

pattern = r"(\d{3})-(\d{2})-(\d{4})"
text = "123-45-6789"
match = re.match(pattern, text)
print(match.groups()) # Output: ('123', '45', '6789')

NON-CAPTURING GROUP
A non-capturing group groups the pattern without saving the matched content. Use (?:...) to create a non-
capturing group. For example:

pattern = r"(?:\d{3})-(\d{2})-(\d{4})"
text = "123-45-6789"
match = re.match(pattern, text)
print(match.groups()) # Output: ('45', '6789')

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

PRACTICAL EXAMPLES
1. MATCHING EMAIL ADDRESSES
Example: [email protected]
● The username part i.e., before @ part:
Can contain alphabets a-z, A-Z, numbers 0-9, dot ., space, hyphen -, and some emails unlike gmail allow
underscore _ and other special characters like + as well.
○ [email protected] : “[a-zA-Z0-9 .-_+]+” : one or more than one occurrence of these
characters
● The domain part i.e., after @ part:
Can contain sub domains, domains, domain extensions and one necessary ending extension that must
contain at least 2 alphabets.
○ [email protected] : “[a-zA-Z0-9-.]+”
○ [email protected] : “\.[a-zA-Z]{2,}”
# Complete regex:
r"[a-zA-Z0-9 ._-+]+@[a-zA-Z-.]+\.[a-zA-Z]{2,}"
# Equivalent regex:
r"[\w .-+]+@[\w-.]+\.[a-zA-Z]{2,}"
# (\w: any alphabet, number, underscore, {2,} means occurrence greater than 2
times)

2. MATCHING QUESTIONS
Examples:
- Is this your final answer?
- "Python is a snake" - is this statement correct?
- Why is the sky blue during the day?
● Starting of question: can be alphanumeric, can contain quotation marks: r”[a-zA-Z0-9\”’]+”
● Middle part of a question: r”[a-zA-Z0-9\”’ ,-_–+]*”
(you can include more special characters if they’re allowed in the questions, or you can use [^?\n] to match
every character except a question mark and a new line)
● Ending of a question: r”\?”

# Complete regex:
r"[\w\"']+[\w\"',-_+ ]*\?"

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

3. MATCHING URLS
Examples:
- https://www.example.com?query_param1=value1&query_param2=value2
- Components of a URL:

Since, there are a lot of special characters allowed in the URL, some are not allowed, for example white space is
encoded using %20, and non ascii characters are also encoded using word characters and some special characters.

● Scheme (http/https) of url followed by :// - r”https?:\/\/”

● Subdomain, domain, top level domain: r”(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}”
● Port number’s non capturing group: r”(?::[0-9]{1,5})?”
● Path’s non capturing group: r”(?:\/[^\s?#]*)?”
● Query Separator and Parameters’ non capturing group: r”(?:\?[a-zA-Z0-9%._\-~+=&]*)?”
● Fragment’s non capturing group: r”(?:#[^\s]+)?”

# Complete regex:
r"https?:\/\/(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(?::[0-
9]{1,5})?(?:\/[^\s?#]*)?(?:\?[a-zA-Z0-9%._\-~+=&]*)?(?:#[^\s]*)?"

4. MATCHING IPV4 ADDRESSES

An IPv4 address consists of four octets, separated by dots (.), where each octet is a number between 0 and 255.
Logic behind regex to match a number between 0-255:
● Number between 0-9: [0-9]
● Number between 10-99: [1-9][0-9]
● Number between 0-99: [0-9][0-9]?
● Number between 0-199: [0-1]?[0-9][0-9]?
● Number between 200-255: 2[0-5][0-5]

Regex for number to be in between 0-255: r”(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])”

# Complete regex:
r"(?:(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])\.){3}(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])"

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Python’s re Module
The re module provides built-in functions for regex operations.

COMMON FUNCTIONS
Function Description Syntax Return Value (x)

Returns a list containing all matches in x=

List of all matched
re.findall the order they are found. If no match, re.findall("regex_expression",
strings
empty list. text)

Returns a match object for the first x=

Match object (if
re.search match found. Returns None if no match is re.search("regex_expression",
found) or None
found. text)

Splits a string into a list at each match. x = re.split("regex_expression", List of separated

re.split
Optionally, limit the splits with maxsplit. text, [maxsplit]) strings

Replaces one or more matches with a x = re.sub("regex_expression", A new string with

re.sub given string. Optionally limit "replacement_string", text, substitutions
replacements with count. count) applied

CODE:
import re

# Sample text with correct and incorrect examples

sample_text = """
Correct Examples:
[email protected]
[email protected]
Is this your final answer?
"Python is a snake" - is this statement correct?
https://www.example.com?query_param1=value1&query_param2=value2
http://example.org/resource
192.168.1.1
127.0.0.1

Incorrect Examples:
john.doe@com
noatsymbol.com
Is this even correct..
ftp://wrong.protocol.com
256.256.256.256
999.999.999.999
"""

# Regex patterns
patterns = {
"Email Address": r"[a-zA-Z0-9._+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
"Question": r"[a-zA-Z0-9\"'][a-zA-Z0-9\"',-_-+ ]*\?",
"URL": r"https?:\/\/(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(?::[0-
9]{1,5})?(?:\/[^\s?#]*)?(?:\?[a-zA-Z0-9%._\-~+=&]*)?(?:#[^\s]*)?",
"IPv4 Address": r"(?:(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])\.){3}(?:[0-1]?[0-9][0-
9]?|2[0-5][0-5])"
}

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

def test_regex(pattern_name, pattern, text):
print(f"\nTesting: {pattern_name}")
matches = re.findall(pattern, text)
print("Matches:")
for match in matches:
print(f" - {match}")

# Testing all patterns

for name, regex in patterns.items():
test_regex(name, regex, sample_text)

OUTPUT:
Testing: Email Address
Matches:
- [email protected]
- [email protected]
Testing: Question
Matches:
- Is this your final answer?
- "Python is a snake" - is this statement correct?
- https://www.example.com?
Testing: URL
Matches:
- https://www.example.com?query_param1=value1&query_param2=value2
- http://example.org/resource
Testing: IPv4 Address
Matches:
- 192.168.1.1
- 127.0.0.1

Theory References:
https://www.w3schools.com/python/python_regex.asp
https://www.geeksforgeeks.org/components-of-a-url/

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

System Architecture Design and Platform Development Strategies
No ratings yet
System Architecture Design and Platform Development Strategies
203 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
Regular Expressions in Python
No ratings yet
Regular Expressions in Python
12 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Python Re
No ratings yet
Python Re
18 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
9.RegEx
No ratings yet
9.RegEx
57 pages
RegEx-in-Python
No ratings yet
RegEx-in-Python
5 pages
9Python-Simple-Character-Matches
No ratings yet
9Python-Simple-Character-Matches
19 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
06 - Regular Expressions and Network Programming
No ratings yet
06 - Regular Expressions and Network Programming
55 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Python Regex
No ratings yet
Python Regex
8 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Untitled
No ratings yet
Untitled
53 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
14.Regular Expression
No ratings yet
14.Regular Expression
3 pages
A Simple Intro To Regex With Python: You Have 2 Free Stories Left This Month
No ratings yet
A Simple Intro To Regex With Python: You Have 2 Free Stories Left This Month
18 pages
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
No ratings yet
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
17 pages
RE
No ratings yet
RE
22 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Text-Processing-For-NLP-Understanding-Regex (7)
No ratings yet
Text-Processing-For-NLP-Understanding-Regex (7)
16 pages
5A - Regex
No ratings yet
5A - Regex
32 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
2 - Python Strings
No ratings yet
2 - Python Strings
23 pages
Regular
No ratings yet
Regular
9 pages
Structuring with regix
No ratings yet
Structuring with regix
49 pages
03.1- Regular Expressions
No ratings yet
03.1- Regular Expressions
34 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Supplement Python Regular Expression
No ratings yet
Supplement Python Regular Expression
6 pages
Python Regex Cheatsheet With Examples: Re Module Functions
No ratings yet
Python Regex Cheatsheet With Examples: Re Module Functions
1 page
regex_patterns_and_syntax
No ratings yet
regex_patterns_and_syntax
6 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Module II
No ratings yet
Module II
17 pages
Manipulating Text with Regular Expression in python
No ratings yet
Manipulating Text with Regular Expression in python
4 pages
Module5_RegularExpressions
No ratings yet
Module5_RegularExpressions
10 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Lecture03 Regular Expressions 20092024 012539pm
No ratings yet
Lecture03 Regular Expressions 20092024 012539pm
36 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
No ratings yet
Regular Expressions: Regular Expressions Are A Powerful Tool For Various Kinds of String Manipulation
4 pages
howto-regex
No ratings yet
howto-regex
20 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
002
No ratings yet
002
2 pages
robotics report
No ratings yet
robotics report
17 pages
Bd-Spc-Gen-Ma-1002 Specification For Noise and Vibration Rev0
No ratings yet
Bd-Spc-Gen-Ma-1002 Specification For Noise and Vibration Rev0
11 pages
Practice For Exam
No ratings yet
Practice For Exam
2 pages
Risk Engineering Bulletin - Marsh Specialty - Safety System SIS Bypass Best Practices
No ratings yet
Risk Engineering Bulletin - Marsh Specialty - Safety System SIS Bypass Best Practices
8 pages
Unit 4
No ratings yet
Unit 4
66 pages
ZXCTN 6000 Product Series Introduction - 20121217 - EN PDF
100% (3)
ZXCTN 6000 Product Series Introduction - 20121217 - EN PDF
48 pages
Incubation Program
No ratings yet
Incubation Program
3 pages
GRADES 1 To 12 Daily Lesson LOG School: Grade Level: Teacher: Learning Area: Teaching Date and Time: Quarter
No ratings yet
GRADES 1 To 12 Daily Lesson LOG School: Grade Level: Teacher: Learning Area: Teaching Date and Time: Quarter
5 pages
Unit Ii Mac Protocols For Ad Hoc Wireless Networks
No ratings yet
Unit Ii Mac Protocols For Ad Hoc Wireless Networks
76 pages
BVMS 11.0 - Licensing Overview
No ratings yet
BVMS 11.0 - Licensing Overview
41 pages
As en 12079.3-2010 Offshore Containers and Associated Lifting Sets Periodic Inspection Examination and Testin
No ratings yet
As en 12079.3-2010 Offshore Containers and Associated Lifting Sets Periodic Inspection Examination and Testin
10 pages
Lecture W4 CN IP Addressing P1
No ratings yet
Lecture W4 CN IP Addressing P1
65 pages
CCS359 - Quantum Computing Manual(WOL)
No ratings yet
CCS359 - Quantum Computing Manual(WOL)
25 pages
Manual - IR-Sensor Switch E18 PDF
No ratings yet
Manual - IR-Sensor Switch E18 PDF
2 pages
Combinational & Sequential Logics
No ratings yet
Combinational & Sequential Logics
32 pages
Webex Calling Pte Guide
No ratings yet
Webex Calling Pte Guide
15 pages
IBM Security Verify Access Level 2 24 PDF
No ratings yet
IBM Security Verify Access Level 2 24 PDF
8 pages
Address & Telephone Nos. of Officers of Various Ministries/Departments Dealing With Parliamentary Work (As On January, 2022)
0% (2)
Address & Telephone Nos. of Officers of Various Ministries/Departments Dealing With Parliamentary Work (As On January, 2022)
32 pages
Digital Forensic and Investigation
No ratings yet
Digital Forensic and Investigation
5 pages
KeyBoard Short Cuts
No ratings yet
KeyBoard Short Cuts
3 pages
1995-THEORY OF THE COMBINATION OF OBSERVATIONS LEAST SUBJECT TO ERRORS-G.W.Stewart
No ratings yet
1995-THEORY OF THE COMBINATION OF OBSERVATIONS LEAST SUBJECT TO ERRORS-G.W.Stewart
254 pages
IPC-5000 Plus Price List
No ratings yet
IPC-5000 Plus Price List
1 page
LUBS5902-Lec6-LinearRegressionAssumptions-full - Tagged
No ratings yet
LUBS5902-Lec6-LinearRegressionAssumptions-full - Tagged
52 pages
Fire Extinguisher Datasheet
No ratings yet
Fire Extinguisher Datasheet
11 pages
How To Access An AXE
100% (2)
How To Access An AXE
20 pages
Hunan University of Science & Technology 2023 - Yoedu Study Network
No ratings yet
Hunan University of Science & Technology 2023 - Yoedu Study Network
7 pages
Mobile Device Agreement Template
No ratings yet
Mobile Device Agreement Template
3 pages
Benchmark Practices of Amazon Adopted in India
50% (2)
Benchmark Practices of Amazon Adopted in India
8 pages

RegEx in Python (4)

Uploaded by

RegEx in Python (4)

Uploaded by

REGULAR EXPRESSIONS (REGEX) IN PYTHON:

What is a Regular Expression?

● Extracting email addresses

Regex Syntax in Python

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

4. GROUPING AND CAPTURING

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

● Scheme (http/https) of url followed by :// - r”https?:\/\/”

4. MATCHING IPV4 ADDRESSES

Regex for number to be in between 0-255: r”(?:[0-1]?[0-9][0-9]?|2[0-5][0-5])”

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

Returns a list containing all matches in x=

Returns a match object for the first x=

Splits a string into a list at each match. x = re.split("regex_expression", List of separated

Replaces one or more matches with a x = re.sub("regex_expression", A new string with

# Sample text with correct and incorrect examples

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

# Testing all patterns

Created by: Anjali Garg | Data Scientist | Aspiring ML Engineer | https://www.linkedin.com/in/anjali-garg-2a7747222/

You might also like