Home Backend Development Python Tutorial How to use regular expressions in python

How to use regular expressions in python

Jun 22, 2019 pm 01:18 PM
python regular expression

Strings are the most commonly used data structure in programming, and the need to operate on strings is almost everywhere. For example, to determine whether a string is a legal email address, although you can programmatically extract the substrings before and after @, and then determine whether it is a word and a domain name, this is not only troublesome, but also difficult to reuse the code. Regular expressions are a powerful weapon for matching strings. Its design idea is to use a descriptive language to define a rule for a string. Any string that conforms to the rule is considered to "match". Otherwise, the string is illegal.

How to use regular expressions in python

So the way we judge whether a string is a legal Email is:

Create a regular expression that matches Email;

Use this regular expression to match the user's input to determine whether it is legal.

Because regular expressions are also represented by strings, we must first understand how to use characters to describe characters.

In regular expressions, if characters are given directly, it is an exact match. Use \d to match a number, \w to match a letter or number, so:

'00\d' can match '007', but cannot match '00A';

' \d\d\d' can match '010';

'\w\w\d' can match 'py3';

. can match any character, so:

'py.' can match 'pyc', 'pyo', 'py!', etc.

To match variable-length characters, in regular expressions, use * to represent any number of characters (including 0), use to represent at least one character, use ? to represent 0 or 1 characters, and use {n } represents n characters, and {n,m} represents n-m characters:

Let’s look at a complex example: \d{3}\s \d{3,8}.

Let’s interpret it from left to right:

\d{3} means matching 3 numbers, such as '010';

\s can match a space ( Also includes tab and other whitespace characters), so \s means at least one space, such as matching ' ', ' ', etc.;

\d{3,8} means 3-8 numbers, such as '1234567' .

Taken together, the above regular expression can match phone numbers with area codes separated by any number of spaces.

What if you want to match a number like '010-12345'? Since '-' is a special character, it needs to be escaped with '\' in regular expressions, so the above regular expression is \d{3}\-\d{3,8}.

However, '010 - 12345' still cannot be matched because of spaces. So we need more complex matching methods.

Related recommendations: "Python Video Tutorial"

Advanced

To make a more precise match, you can use [ ] represents a range, for example:

[0-9a-zA-Z\_] can match a number, letter or underscore;

[0-9a-zA-Z\_] can Matches a string consisting of at least one number, letter or underscore, such as 'a100', '0_Z', 'Py3000', etc.;

[a-zA-Z\_][0-9a-zA -Z\_]* can match a string starting with a letter or an underscore, followed by any number of strings consisting of a number, a letter, or an underscore, which is a legal variable in Python;

[a-zA-Z\ _][0-9a-zA-Z\_]{0, 19} more precisely limits the length of the variable to 1-20 characters (up to 19 characters after the first character).

A|B can match A or B, so (P|p)ython can match 'Python' or 'python'.

^ means the beginning of the line, ^\d means it must start with a number.

$ indicates the end of the line, \d$ indicates that it must end with a number.

You may have noticed that py can also match 'python', but adding ^py$ turns it into a whole line match, so it can only match 'py'.

re module

With the preparatory knowledge, we can use regular expressions in Python. Python provides the re module, which contains all regular expression functions. Since Python's string itself is also escaped with \, special attention should be paid to:

s = 'ABC\\-001' # Python's string # The corresponding regular expression string becomes: # ' ABC\-001'

Therefore we strongly recommend using Python's r prefix, so you don't have to worry about escaping:

s = r'ABC\-001' # Python The string # corresponding to the regular expression string remains unchanged: # 'ABC\-001'

Let's first see how to determine whether the regular expression matches:

>>> import re
>>> re.match(r'^\d{3}\-\d{3,8}$', '010-12345')
<_sre.SRE_Match object; span=(0, 9), match=&#39;010-12345&#39;
>>>> re.match(r&#39;^\d{3}\-\d{3,8}$&#39;, &#39;010 12345&#39;)
>>>
Copy after login

match( ) method determines whether there is a match. If the match is successful, it returns a Match object, otherwise it returns None. The common judgment method is:

test = 'The string entered by the user'if re.match(r'regular expression', test):

print(&#39;ok&#39;)else:
print(&#39;failed&#39;)
Copy after login

cut String splitting

Using regular expressions to split strings is more flexible than using fixed characters. Please see the normal splitting code:

>>> &#39;a b   c&#39;.split(&#39; &#39;)
[&#39;a&#39;, &#39;b&#39;, &#39;&#39;, &#39;&#39;, &#39;c&#39;]
Copy after login

Well, continuous spaces cannot be recognized. , try using regular expressions:

>>> re.split(r&#39;\s+&#39;, &#39;a b   c&#39;)
[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]
Copy after login

It can be divided normally no matter how many spaces there are. Join, try:

>>> re.split(r&#39;[\s\,]+&#39;, &#39;a,b, c  d&#39;)
[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]
Copy after login

Join again; try:

>>> re.split(r&#39;[\s\,\;]+&#39;, &#39;a,b;; c  d&#39;)
[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;]
Copy after login

If the user enters a set of tags, remember to use regular expressions to convert irregular input into correct ones next time array.

Group

除了简单地判断是否匹配之外,正则表达式还有提取子串的强大功能。用()表示的就是要提取的分组(Group)。比如:

^(\d{3})-(\d{3,8})$分别定义了两个组,可以直接从匹配的字符串中提取出区号和本地号码:

>>> m = re.match(r&#39;^(\d{3})-(\d{3,8})$&#39;, &#39;010-12345&#39;)
>>> m
<_sre.SRE_Match object; span=(0, 9), match=&#39;010-12345&#39;
>>>> m.group(0)&#39;010-12345&#39;
>>> m.group(1)&#39;010&#39;
>>> m.group(2)&#39;12345&#39;
Copy after login

如果正则表达式中定义了组,就可以在Match对象上用group()方法提取出子串来。

注意到group(0)永远是原始字符串,group(1)、group(2)……表示第1、2、……个子串。

提取子串非常有用。来看一个更凶残的例子:

>>> t = &#39;19:05:30&#39;
>>> m = re.match(r&#39;^(0[0-9]|1[0-9]|2[0-3]|[0-9])\:(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])\:(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])$&#39;, t)>>> m.groups()
(&#39;19&#39;, &#39;05&#39;, &#39;30&#39;)
Copy after login

这个正则表达式可以直接识别合法的时间。但是有些时候,用正则表达式也无法做到完全验证,比如识别日期:

&#39;^(0[1-9]|1[0-2]|[0-9])-(0[1-9]|1[0-9]|2[0-9]|3[0-1]|[0-9])$&#39;
Copy after login

对于'2-30','4-31'这样的非法日期,用正则还是识别不了,或者说写出来非常困难,这时就需要程序配合识别了。

贪婪匹配

最后需要特别指出的是,正则匹配默认是贪婪匹配,也就是匹配尽可能多的字符。举例如下,匹配出数字后面的0:

>>> re.match(r&#39;^(\d+)(0*)$&#39;, &#39;102300&#39;).groups()
(&#39;102300&#39;, &#39;&#39;)
Copy after login

由于\d+采用贪婪匹配,直接把后面的0全部匹配了,结果0*只能匹配空字符串了。

必须让\d+采用非贪婪匹配(也就是尽可能少匹配),才能把后面的0匹配出来,加个?就可以让\d+采用非贪婪匹配:

>>> re.match(r&#39;^(\d+?)(0*)$&#39;, &#39;102300&#39;).groups()
(&#39;1023&#39;, &#39;00&#39;)
Copy after login

编译

当我们在Python中使用正则表达式时,re模块内部会干两件事情:

编译正则表达式,如果正则表达式的字符串本身不合法,会报错;

用编译后的正则表达式去匹配字符串。

如果一个正则表达式要重复使用几千次,出于效率的考虑,我们可以预编译该正则表达式,接下来重复使用时就不需要编译这个步骤了,直接匹配:

>>> import re
# 编译:
>>> re_telephone = re.compile(r&#39;^(\d{3})-(\d{3,8})$&#39;)
# 使用:
>>> re_telephone.match(&#39;010-12345&#39;).groups()
(&#39;010&#39;, &#39;12345&#39;)
>>> re_telephone.match(&#39;010-8086&#39;).groups()
(&#39;010&#39;, &#39;8086&#39;)
Copy after login

编译后生成Regular Expression对象,由于该对象自己包含了正则表达式,所以调用对应的方法时不用给出正则字符串。

参数

How to use regular expressions in python

修饰符

How to use regular expressions in python

模式

How to use regular expressions in python

How to use regular expressions in python

The above is the detailed content of How to use regular expressions in python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Can vs code run in Windows 8 Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Can visual studio code be used in python Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

How to run programs in terminal vscode How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Is the vscode extension malicious? Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

See all articles