Home Web Front-end JS Tutorial Detailed explanation of the use of balance groups in regular expressions (with code)

Detailed explanation of the use of balance groups in regular expressions (with code)

Mar 29, 2018 pm 04:10 PM
use balance Detailed explanation

This time I will bring you a detailed explanation of the use of balance groups in regular rules (with code). What are the precautions for using regular balance groups? . The following is a practical case, let's take a look.

Is this article suitable for you?

To understand the essence of this article, you'd better have some foundation in regular matching principles. For example, ".*?" matches the text content "asp163". Anyone who knows a little bit about regular expression knows that it can be matched, but do you know its matching process? If you are not clear about it, then the following content may not be suitable for you. Perhaps, it is too difficult to read and you cannot understand the usage of the balance group. Therefore, I recommend that you first understand the matching principle of the regular expression NFA engine. It does take some time to put together an easy-to-understand and easy-to-describe article, but I don’t know if this content will achieve the effect I expected. Slowly improve it~ (Note: I wrote this in 2010. Now take it and read it as a reader when you have time. Correct the problematic areas and add some examples to make it as easy to understand as possible. .)

Introduction to balanced groups in the general regular tutorial

If you want to match a nestable hierarchical structure, you have to use balanced groups. For example, how to capture the content within the longest angle brackets in a string like "xx aa> yy"?

The following syntax structure needs to be used here:
(?<group>) Name the captured content as group and push it onto the stack
( ?<-group>) Pop the captured content named group last pushed onto the stack from the stack. If the stack is originally empty, the matching of this group fails
(?(group)yes |no) If there is a captured content named group on the stack, continue to match the expression of the yes part, otherwise continue to match the no part
(?!) The sequence negates the look around, because Without a suffix expression, trying to match always fails

If you are not a programmer (or you are a programmer who is not familiar with the concept of the stack), you can understand the above three syntaxes like this: No. One is to write (or write another) "group" on the blackboard, the second is to erase a "group" from the blackboard, and the third is to see if there is still "group" written on the blackboard. If so, Continue to match the yes part, otherwise match the no part.
What we need to do is every time we encounter a left bracket, write a "group" on the blackboard, every time we encounter a right bracket, erase one, and at the end, see if there is anything else on the blackboard - if there is that It proves that there are more left brackets than right brackets, so the match should fail (in order to see it more clearly, I used the syntax of (?'group')):

<         #最外层的左括号
 [^<>]*     #最外层的左括号后面的不是括号的内容
 (
  (
   (?'Open'<) #碰到了左括号,在黑板上写一个"Open"
   [^<>>]*   #匹配左括号后面的不是括号的内容
  )+
  (
   (?'-Open'>) #碰到了右括号,擦掉一个"Open"
   [^<>]*   #匹配右括号后面不是括号的内容
  )+
 )*
 (?(Open)(?!))  #在遇到最外层的右括号前面,判断黑板上还有没有没擦掉的"Open";如果有,则匹配失败
>         #最外层的右括号
Copy after login

Why did I write this Article

After reading the above introduction, do you understand? Before I understood the principle of regular expression matching, looking at the above introduction to balanced groups, I seemed to understand but not understand it, and it could only be remembered as a template, but could not be used flexibly. Therefore, I read a lot of information about regular expressions. I am especially grateful to lxcnn's technical documentation and the book "Mastering Regular Expressions", which gave me a deeper and more systematic understanding of regular expressions. Therefore, on the basis of them, Above, I will make a summary based on my own learning experience. Firstly, it will be archived as study notes. In addition, if it can solve your doubts, it will also be a happy thing.
I will not analyze the above code for now, but first explain the concepts and knowledge related to the balance group.
The following expression matching test tool is: Expresso, this site also provides its perfect cracked version for download.

The concept and function of the balance group

The balance group, as the name suggests, balance means symmetry. It mainly combines several regular grammar rules to provide solutions for pairings. Matching of nested structures. Balanced group has two definitions: narrow sense and broad sense. Balanced group in the narrow sense refers to (?Expression) grammar, while balanced group in the broad sense is not a fixed grammatical rule, but a comprehensive application of several grammatical rules. What we usually call The balanced group usually refers to the generalized balanced group. Unless otherwise specified in this article, the abbreviation of balance group refers to the generalized balance group.
The matching principle of the balanced group
The matching principle of the balanced group can be explained using the stack. First give an example, and then explain based on the example.

源字符串:a+(b*(c+d))/e+f-(g/(h-i))*j正则表达式:((?<Open>\()|(?<−Open>)|[^()])*(?(Open)(?!))\)
需求说明:匹配成对出现的()中的内容
输出:(b*(c+d)) 和 (g/(h-i))
我将上面正则表达式代码分行写,并加上注释,这样看起来有层次,而且方便

 \(        #普通字符“(”
  (       #分组构造,用来限定量词“*”修饰范围
   (?<Open>\() #命名捕获组,遇到开括弧“Open”计数加1
   |      #分支结构
   (?<-Open>\)) #狭义平衡组,遇到闭括弧“Open”计数减1
   |      #分支结构
   [^()]+    #非括弧的其它任意字符
  )*       #以上子串出现0次或任意多次
  (?(Open)(?!)) #判断是否还有“Open”,有则说明不配对,什么都不匹配
 \)       #普通闭括弧
Copy after login

对于一个嵌套结构而言,开始和结束标记都是确定的,对于本例开始为“(”,结束为“)”,那么接下来就是考察中间的结构,中间的字符可以划分为三类,一类是“(”,一类是“)”,其余的就是除这两个字符以外的任意字符。

那么平衡组的匹配原理就是这样的

1、先找到第一个“(”,作为匹配的开始。即上面的第1行,匹配了:a+(b*(c+d))/e+f-(g/(h-i))*j (红色显示部分)

2、在第1步以后,每匹配到一个“(”,就入栈一个Open捕获组,计数加1

3、在第1步以后,每匹配到一个“)”,就出栈最近入栈的Open捕获组,计数减1

也就是讲,上面的第一行正则“\(”匹配了:a+(b*(c+d))/e+f-(g/(h-i))*j (红色显示部分)
然后,匹配到c前面的“(”,此时,计数加1;继续匹配,匹配到d后面的“)”,计算减1;——注意喽:此时堆栈中的计数是0,正则还是会向前继续匹配的,但是,如果匹配到“)”的话,比如,这个例子中d))(红色显示的括号)——引擎此时将控制权交给(?(Open)(?!)),判断堆栈中是否为0,如果为0,则执行匹配“no”分支,由于这个条件判断结构中没有“no”分支,所以什么都不做,把控制权交给接下来的“\)”
这个正则表达式“\)”可匹配接下来的),即b))(红色显示的括号)

4、后面的 (?(Open)(?!))用来保证堆栈中Open捕获组计数是否为0,也就是“(”和“)”是配对出现的

5、最后的“)”,作为匹配的结束

匹配过程

首先匹配第一个“(”,然后一直匹配,直到出现以下两种情况之一时,把控制权交给(?(Open)(?!)):
a)堆栈中Open计数已为0,此时再遇到“)”
b)匹配到字符串结束符
这时控制权交给(?(Open)(?!)),判断Open是否有匹配,由于此时计数为0,没有匹配,那么就匹配“no”分支,由于这个条件判断结构中没有“no”分支,所以什么都不做,把控制权交给接下来的“\)”
如果上面遇到的是情况a),那么此时“\)”可以匹配接下来的“)”,匹配成功;
如果上面遇到的是情况b),那么此时会进行回溯,直到“\)”匹配成功为止,否则报告整个表达式匹配失败。
由于.NET中的狭义平衡组“(?<Close-Open>Expression)”结构,可以动态的对堆栈中捕获组进行计数,匹配到一个开始标记,入栈,计数加1,匹配到一个结束标记,出栈,计数减1,最后再判断堆栈中是否还有Open,有则说明开始和结束标记不配对出现,不匹配,进行回溯或报告匹配失败;如果没有,则说明开始和结束标记配对出现,继续进行后面子表达式的匹配。
需要对“(?!)”进行一下说明,它属于顺序否定环视,完整的语法是“(?!Expression)”。由于这里的“Expression”不存在,表示这里不是一个位置,所以试图尝试匹配总是失败的,作用就是在Open不配对出现时,报告匹配失败。

下面在看个例子:

<table>
<tr>
<td id="td1"> </td>
<td id="td2">
<table>
<tr>
<td>snhame</td>
<td>f</td>
</tr>
</table>
</td>
<td></td>
</tr> </table>
Copy after login

以上为部分的HTML代码.现在我们的问题是要提取出其的标签并将其删除掉,以往我们惯用的方法都是直接去取,像[\s\S]+?\,不过问题出来了,我们提取到的不是我们想要的内容,而是

<td id="td2">
<table>
<tr>
<td>snhame</td>
Copy after login

原因也很简单,它和离他最近的标签匹配上了,不过它不知道这个标签不是它的-_-,是不是就是?符号的原因呢,我们去掉让他无限制贪婪,可这下问题更大了,什么乱七八糟的东东它都匹配到了

<td id="td2">
<table>
<tr>
<td>snhame</td>
f


Copy after login

这个结果也不是我们想要的。那么我就用“平衡组”来解决吧。

]*>((?]*>)+|(?<-mm>)|[\s\S])*?(?(mm)(?!))

匹配的结果是

<td id="td2">
<table>
<tr>
<td>snhame</td>
f



Copy after login
Copy after login

这正是我们想要的
注意,我开始写成这样的方式

<td\s*id="td2"[^>]*>((?<mm><td[^>]*>)+|(?<-mm></td>)|[\s\S])*(?(mm)(?!))</td>
Copy after login

匹配的结果是

<td id="td2">
<table>
<tr>
<td>snhame</td>
f



Copy after login
Copy after login

一个问题
以下代码只是做为一个问题探讨
文本内容:e+f(-(g/(h-i))*j

正则表达式:

\(
 (
  (?<mm>\()
  |
  (?<-mm>\))
  |
  .
 )*?
 (?(mm)(?!))
\)
Copy after login

匹配的结果是:(-(g/(h-i))

相信看了本文案例你已经掌握了方法,更多精彩请关注php中文网其它相关文章!

推荐阅读:

正则的非捕获组与捕获组使用详解

正则表达式怎么匹配图片地址与img标签

The above is the detailed content of Detailed explanation of the use of balance groups in regular expressions (with code). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What software is crystaldiskmark? -How to use crystaldiskmark? What software is crystaldiskmark? -How to use crystaldiskmark? Mar 18, 2024 pm 02:58 PM

CrystalDiskMark is a small HDD benchmark tool for hard drives that quickly measures sequential and random read/write speeds. Next, let the editor introduce CrystalDiskMark to you and how to use crystaldiskmark~ 1. Introduction to CrystalDiskMark CrystalDiskMark is a widely used disk performance testing tool used to evaluate the read and write speed and performance of mechanical hard drives and solid-state drives (SSD). Random I/O performance. It is a free Windows application and provides a user-friendly interface and various test modes to evaluate different aspects of hard drive performance and is widely used in hardware reviews

How to download foobar2000? -How to use foobar2000 How to download foobar2000? -How to use foobar2000 Mar 18, 2024 am 10:58 AM

foobar2000 is a software that can listen to music resources at any time. It brings you all kinds of music with lossless sound quality. The enhanced version of the music player allows you to get a more comprehensive and comfortable music experience. Its design concept is to play the advanced audio on the computer The device is transplanted to mobile phones to provide a more convenient and efficient music playback experience. The interface design is simple, clear and easy to use. It adopts a minimalist design style without too many decorations and cumbersome operations to get started quickly. It also supports a variety of skins and Theme, personalize settings according to your own preferences, and create an exclusive music player that supports the playback of multiple audio formats. It also supports the audio gain function to adjust the volume according to your own hearing conditions to avoid hearing damage caused by excessive volume. Next, let me help you

How to use Baidu Netdisk app How to use Baidu Netdisk app Mar 27, 2024 pm 06:46 PM

Cloud storage has become an indispensable part of our daily life and work nowadays. As one of the leading cloud storage services in China, Baidu Netdisk has won the favor of a large number of users with its powerful storage functions, efficient transmission speed and convenient operation experience. And whether you want to back up important files, share information, watch videos online, or listen to music, Baidu Cloud Disk can meet your needs. However, many users may not understand the specific use method of Baidu Netdisk app, so this tutorial will introduce in detail how to use Baidu Netdisk app. Users who are still confused can follow this article to learn more. ! How to use Baidu Cloud Network Disk: 1. Installation First, when downloading and installing Baidu Cloud software, please select the custom installation option.

How to use NetEase Mailbox Master How to use NetEase Mailbox Master Mar 27, 2024 pm 05:32 PM

NetEase Mailbox, as an email address widely used by Chinese netizens, has always won the trust of users with its stable and efficient services. NetEase Mailbox Master is an email software specially created for mobile phone users. It greatly simplifies the process of sending and receiving emails and makes our email processing more convenient. So how to use NetEase Mailbox Master, and what specific functions it has. Below, the editor of this site will give you a detailed introduction, hoping to help you! First, you can search and download the NetEase Mailbox Master app in the mobile app store. Search for "NetEase Mailbox Master" in App Store or Baidu Mobile Assistant, and then follow the prompts to install it. After the download and installation is completed, we open the NetEase email account and log in. The login interface is as shown below

BTCC tutorial: How to bind and use MetaMask wallet on BTCC exchange? BTCC tutorial: How to bind and use MetaMask wallet on BTCC exchange? Apr 26, 2024 am 09:40 AM

MetaMask (also called Little Fox Wallet in Chinese) is a free and well-received encryption wallet software. Currently, BTCC supports binding to the MetaMask wallet. After binding, you can use the MetaMask wallet to quickly log in, store value, buy coins, etc., and you can also get 20 USDT trial bonus for the first time binding. In the BTCCMetaMask wallet tutorial, we will introduce in detail how to register and use MetaMask, and how to bind and use the Little Fox wallet in BTCC. What is MetaMask wallet? With over 30 million users, MetaMask Little Fox Wallet is one of the most popular cryptocurrency wallets today. It is free to use and can be installed on the network as an extension

Detailed explanation of obtaining administrator rights in Win11 Detailed explanation of obtaining administrator rights in Win11 Mar 08, 2024 pm 03:06 PM

Windows operating system is one of the most popular operating systems in the world, and its new version Win11 has attracted much attention. In the Win11 system, obtaining administrator rights is an important operation. Administrator rights allow users to perform more operations and settings on the system. This article will introduce in detail how to obtain administrator permissions in Win11 system and how to effectively manage permissions. In the Win11 system, administrator rights are divided into two types: local administrator and domain administrator. A local administrator has full administrative rights to the local computer

Detailed explanation of division operation in Oracle SQL Detailed explanation of division operation in Oracle SQL Mar 10, 2024 am 09:51 AM

Detailed explanation of division operation in OracleSQL In OracleSQL, division operation is a common and important mathematical operation, used to calculate the result of dividing two numbers. Division is often used in database queries, so understanding the division operation and its usage in OracleSQL is one of the essential skills for database developers. This article will discuss the relevant knowledge of division operations in OracleSQL in detail and provide specific code examples for readers' reference. 1. Division operation in OracleSQL

Teach you how to use the new advanced features of iOS 17.4 'Stolen Device Protection' Teach you how to use the new advanced features of iOS 17.4 'Stolen Device Protection' Mar 10, 2024 pm 04:34 PM

Apple rolled out the iOS 17.4 update on Tuesday, bringing a slew of new features and fixes to iPhones. The update includes new emojis, and EU users will also be able to download them from other app stores. In addition, the update also strengthens the control of iPhone security and introduces more "Stolen Device Protection" setting options to provide users with more choices and protection. "iOS17.3 introduces the "Stolen Device Protection" function for the first time, adding extra security to users' sensitive information. When the user is away from home and other familiar places, this function requires the user to enter biometric information for the first time, and after one hour You must enter information again to access and change certain data, such as changing your Apple ID password or turning off stolen device protection.

See all articles