Table of Contents
2. CSS style setting support
3. CSS 3 selector support
Home Web Front-end HTML Tutorial Detailed explanation of a perfect HTML parsing engine (Jumony)

Detailed explanation of a perfect HTML parsing engine (Jumony)

May 04, 2017 pm 02:57 PM

Perhaps many people will think that the current HTML parser is enough, and even simple regular expressions can already meet the needs of manipulating HTML documents. Yes, for the vast majority of HTML documents on the Internet, in fact, most of them meet the XHTML specifications, and their parsing does not require a powerful parser. But a powerful parser is one thing, and a perfect parser is another.

Jumony Core first provides a nearly perfect HTML parsing engine, and its parsing results are infinitely close to those of the browser. Whether it is elements without end tags, elements with optional end tags, tag attributes, or CSS selectors and styles, all legal and illegal HTML documents will be parsed by the browser, and Jumony will parse them into whatever they are. Sample. In other words, the results of Jumony parsing are the same as the results of browser parsing, so you no longer have to worry about whether the HTML document can be recognized. If the browser can read it, Jumony can understand it.

There is only one step between perfection and power, but a perfect parser allows you to never have to care about the HTML source document.

The following is an incomplete list of features supported by the Jumony parser

##< a href='#'>Use double quotes for attribute valuesDo not use quotation marks for attribute valuesThe attribute value is missing (but there is an equal sign)There are spaces in front of the attribute valueParsing


Not only can it parse HTML from text, Jumony's API can directly grab documents for analysis from the Internet, and automatically identify encodings based on HTTP headers:

It is currently second only to Jumony The HTML parsing open source project HtmlAgilityPack has stopped updating for a long time. After so many years, there are still problems with the parsing of the most basic
elements.

2. CSS style setting support

Just perfectly parsing HTML does not bring much benefit. As mentioned above, in fact, most HTML documents can be parsed with second-rate parsing. It can analyze even simple regular expressions, so why do we need Jumony?

The answer is that an HTML engine is more than just parsing the DOM structure.

Consider this scenario: I need to set a none value to the display style of an element. In the browser, we only need a simple element.style.display = "none" to meet our requirements. Now, we have obtained the DOM we need through the parser, but do we still need to concatenate strings to set the style?

No need, Jumony supports CSS style parsing, and even some CSS style abbreviation rules can be recognized. In Jumony, setting a style for an element is as simple as in the browser:

We Let's look at this example again:

, what will happen if we set padding-left: 0px on this element?

In Jumony, the result will be:

<p style="padding-left: 0px; padding-right: 5px; padding-top:5px; padding-bottom: 5px"></p>
Copy after login

Look, the padding attribute is magically expanded automatically.

3. CSS 3 selector support

CSS selector is a popular query language in the HTML world. It is simple and powerful and is supported by many browsers. Jumony also supports almost complete CSS3 selectors (except runtime pseudo-classes and pseudo-objects). With the help of selectors, we can easily find the objects we are interested in in HTML. For example, grab all the article titles on the homepage of the blog park:

new JumonyParser().LoadDocument( "www.php.cn/" ).Find( ".post_item a.titlelnk" )
Copy after login
Copy after login

Capture, analyze, select, all in one go. With just a simple code, we can output the data we captured on the console:

 foreach( var title = new JumonyParser().LoadDocument( "www.php.cn/" ).Find( ".post_item a.titlelnk" ) )
  Console.WriteLine( title.InnerText() );
Copy after login

List of CSS3 selectors supported by Jumony:

特性 例子
孤立的<解析为文本< a应当解析为< a
孤立的>解析为文本 >应当解析为>
标记属性(没有值的属性)
元素丢失结束标签

测试链接

可选结束标签元素
"body", "colgroup", "dd", "dt", "head", "html", "li", "option", "p", "tbody", "td", "tfoot", "th", "thead", "tr"

abc

123

无结束标签元素
"area", "base", "basefont", "br", "col", "frame", "hr", "img", "input", "isindex", "link", "meta" , "param", "wbr", "bgsound", "spacer", "keygen"
CDataElement ##<script>if ( 1<a ) alert( "< p>" );</script>
"script", "style", "textarea", "title"  
Preformatted elements
 <span style="font-family:courier new,courier;font-size:12px;">There is a space in front<span class="font5"></span>
Use single quotes for attribute values
HTMLDeclaration
##p~aSelect subsequent elements##[attr][attr=value][ attr~=value][attr^=value][attr*=value][attr$=value][attr!=value]: not:only-child:only-of-type:empty: nth-child:nth-last-child##:nth-of-type
Selector Description
* Select all elements
p a Select descendant elements
##p>a Select child elements
p+a Select adjacent elements
Attribute existence selection
Exact match of attribute value
Attribute value approximate match
The attribute value starts with matching
The attribute value contains Match
Attribute value ends with match
Attribute value negative matching
Negative pseudo-class
Unique sub-element pseudo-class
only-of-type pseudo-class
Empty element pseudo-class
Structured pseudo-class
Structured pseudo-class
structured pseudo-class
:nth-last-of-type Structured pseudo-class
:first-child Structured pseudo-class
:last-child Structured pseudo-class
:first-of-type Structured pseudo-class
:last-of-type ##Structured pseudo-class


4. Powerful scalability

In Jumony Core 3, it provides users with the greatest scalability. You can customize HTML specifications, implement your own parser, graft other DOM models to the Jumony API, invent your own CSS selector pseudo-class, or even change your own API, such as jQuery style.

Jumony Core has many derivative projects, such as crawling websites, providing jQuery-style APIs, developing websites, making MHT files, adding CSS selector support for HAP parsing results, etc. These projects all require Benefiting from the powerful scalability of Jumony Core, it can exert powerful functions.


【Related recommendations】

1. Free html online video tutorial

2.

html development manual

3.

php.cn original html5 video tutorial

The above is the detailed content of Detailed explanation of a perfect HTML parsing engine (Jumony). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Is HTML easy to learn for beginners? Is HTML easy to learn for beginners? Apr 07, 2025 am 12:11 AM

HTML is suitable for beginners because it is simple and easy to learn and can quickly see results. 1) The learning curve of HTML is smooth and easy to get started. 2) Just master the basic tags to start creating web pages. 3) High flexibility and can be used in combination with CSS and JavaScript. 4) Rich learning resources and modern tools support the learning process.

The Roles of HTML, CSS, and JavaScript: Core Responsibilities The Roles of HTML, CSS, and JavaScript: Core Responsibilities Apr 08, 2025 pm 07:05 PM

HTML defines the web structure, CSS is responsible for style and layout, and JavaScript gives dynamic interaction. The three perform their duties in web development and jointly build a colorful website.

Understanding HTML, CSS, and JavaScript: A Beginner's Guide Understanding HTML, CSS, and JavaScript: A Beginner's Guide Apr 12, 2025 am 12:02 AM

WebdevelopmentreliesonHTML,CSS,andJavaScript:1)HTMLstructurescontent,2)CSSstylesit,and3)JavaScriptaddsinteractivity,formingthebasisofmodernwebexperiences.

What is an example of a starting tag in HTML? What is an example of a starting tag in HTML? Apr 06, 2025 am 12:04 AM

AnexampleofastartingtaginHTMLis,whichbeginsaparagraph.StartingtagsareessentialinHTMLastheyinitiateelements,definetheirtypes,andarecrucialforstructuringwebpagesandconstructingtheDOM.

Gitee Pages static website deployment failed: How to troubleshoot and resolve single file 404 errors? Gitee Pages static website deployment failed: How to troubleshoot and resolve single file 404 errors? Apr 04, 2025 pm 11:54 PM

GiteePages static website deployment failed: 404 error troubleshooting and resolution when using Gitee...

How to implement adaptive layout of Y-axis position in web annotation? How to implement adaptive layout of Y-axis position in web annotation? Apr 04, 2025 pm 11:30 PM

The Y-axis position adaptive algorithm for web annotation function This article will explore how to implement annotation functions similar to Word documents, especially how to deal with the interval between annotations...

HTML, CSS, and JavaScript: Essential Tools for Web Developers HTML, CSS, and JavaScript: Essential Tools for Web Developers Apr 09, 2025 am 12:12 AM

HTML, CSS and JavaScript are the three pillars of web development. 1. HTML defines the web page structure and uses tags such as, etc. 2. CSS controls the web page style, using selectors and attributes such as color, font-size, etc. 3. JavaScript realizes dynamic effects and interaction, through event monitoring and DOM operations.

How to use CSS3 and JavaScript to achieve the effect of scattering and enlarging the surrounding pictures after clicking? How to use CSS3 and JavaScript to achieve the effect of scattering and enlarging the surrounding pictures after clicking? Apr 05, 2025 am 06:15 AM

To achieve the effect of scattering and enlarging the surrounding images after clicking on the image, many web designs need to achieve an interactive effect: click on a certain image to make the surrounding...

See all articles