Python 3 – Regular Expressions
Python 3, an open source programming language, is the core language used for web development, data science and machine learning. Python 3 offers a wide range of built-in functions and modules to ease programming tasks. Regular expressions, often referred to as regex, is one of the most powerful tools provided by Python 3 for string manipulation.
In this article, we will explore Python 3’s regular expression module and its various use cases. We will start with the basics of regular expressions and syntax, then move on to practical examples and applications.
What are Regular Expressions?
A regular expression, also known as regex, is a sequence of characters used to search, replace or match a specific pattern within a string. It is a powerful tool used for data validation, string manipulation, and text processing.
Python provides the built-in re
module for working with regular expressions. This module implements regular expression matching operations to search, replace or extract specific patterns from a string.
Regular Expression Syntax
Before we dive into practical examples, let’s familiarize ourselves with how to write regular expressions.
Character Sets
A character set is a set of characters enclosed in square brackets []
. It matches any character in the set. For example, the regular expression [ab]
matches any occurrence of either a
or b
.
Quantifiers
Quantifiers are used to indicate the number of occurrences of a character or pattern to match. They are represented by symbols such as *
, +
, and ?
.
*
– Matches zero or more occurrences of the preceding character or pattern.+
– Matches one or more occurrences of the preceding character or pattern.?
– Matches zero or one occurrence of the preceding character or pattern.
For example, the regular expression a*
matches zero or more occurrences of the letter a
, while a+
matches one or more occurrences of the letter a
.
Wildcard
The wildcard character .
matches any character except a newline character. It is often used to match any character in place of an unknown character.
Anchors
Anchors are used to match the position of a character or pattern within a string. There are two types of anchors:
^
– Matches the beginning of a string.$
– Matches the end of a string.
Grouping
Grouping is used to group a set of characters or patterns together. It is represented by parentheses ()
. Grouping helps to define precedence and simplifies pattern matching.
Escape Sequence
Escape sequences are used to represent special characters such as tab, newline, and backslash. They are represented by a backslash \
followed by a special character. For example, \n
represents a newline.
Practical Examples
Now that we have covered the basics of regular expression syntax, let’s take a look at some practical examples.
Example 1: Email Validation
Email validation is a common use case for regular expressions. Let’s write a regular expression that matches any valid email address.
import re
email_regex = r'^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$'
def validate_email(email):
if re.search(email_regex, email):
return True
return False
email = "spam@mycompany.com"
if validate_email(email):
print("Valid email")
else:
print("Invalid email")
In this example, we used the regular expression ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
to match any valid email address. The regular expression matches any set of alphanumeric characters, hyphens, periods, and underscores before and after the @
symbol. It then matches any set of alphanumeric characters, hyphens, and periods before the top-level domain (TLD), which can be between two to five characters long.
Example 2: Phone Number Validation
Phone number validation is another common use case for regular expressions. Let’s write a regular expression that matches any valid phone number.
import re
phone_regex = r'^\+?[\d\s]{2,}[0-9]$'
def validate_phone_number(phone):
if re.search(phone_regex, phone):
return True
return False
phone = "+1 555 123 4567"
if validate_phone_number(phone):
print("Valid phone number")
else:
print("Invalid phone number")
In this example, we used the regular expression ^\+?[\d\s]{2,}[0-9]$
to match any valid phone number. The regular expression matches any set of one or more digits preceded by an optional plus +
symbol. It also matches any set of two or more digits and spaces before the last digit.
Example 3: Password Validation
Password validation is another common use case for regular expressions. Let’s write a regular expression that matches any valid password.
import re
password_regex = r'^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$'
def validate_password(password):
if re.search(password_regex, password):
return True
return False
password = "MyP@ssw0rd"
if validate_password(password):
print("Valid password")
else:
print("Invalid password")
In this example, we used the regular expression ^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]{8,}$
to match any valid password. The regular expression matches any set of one or more alphanumeric characters, including at least one lowercase and uppercase letter, and at least one digit. It also matches any set of eight or more characters.
Conclusion
Python 3’s built-in re
module provides a powerful tool for string manipulation and text processing. Regular expressions can be used for data validation, string manipulation, and pattern matching. In this article, we covered the basics of regular expression syntax and practical examples. With these skills, you can use the power of regular expressions to simplify and streamline your programming tasks.