PHP DOM: Using XPath
Core points
- XPath is a syntax for querying XML documents that provides a simpler and cleaner way to write functionality and reduces the amount of code required to write queries and filter XML data.
- XPath query can be performed using two functions:
query()
andevaluate()
. Although both perform queries, the difference is that the type of result they return,query()
returnsDOMNodeList
, whileevaluate()
returns typed results as much as possible. - Using XPath can make the code more concise and efficient. In the comparison test, the speed advantage of using pure XPath is quite obvious, with the XPath version about 10% faster than the non-XPath version.
- PHP DOM allows you to extend standard XPath functions with custom functions. This includes integrating PHP's own functions into XPath queries and registering PHP functions used in XPath. This extends the functionality of XPath to enable it to perform more complex queries.
This article will explore XPath in depth, including its features and how it is implemented in PHP. You will find that XPath can greatly reduce the amount of code required to write queries and filter XML data, and generally improve performance. I will demonstrate the PHP DOM XPath functionality using the same DTD and XML from the previous post. For a quick review, here is what DTD and XML look like:
<!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]>
<?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library>
Basic XPath query
XPath is a syntax for querying XML documents. The simplest form is to define the path to the element you want to access. Using the XML document above, the following XPath query returns a collection of all existing book
elements:
//library/book
That's it. Two forward slashes indicate library
are the root elements of the document, and a single slash indicates book
is its child elements. Very simple, isn't it? But what if you want to specify a specific book? Suppose you want to return any book written by "An Author". The XPath will be:
//library/book/author[text() = "An Author"]/..
You can use text()
to perform a comparison on the value of a node in square brackets, and the trailing "/.." means we want the parent element (i.e. move one node upward). XPath query can be performed using one of two functions: query()
and evaluate()
. Both perform queries, but the difference is the type of result they return. query()
will always return DOMNodeList
, and evaluate()
returns typed results as much as possible. For example, if your XPath query returns the number of books written by a particular author rather than the actual book itself, then query()
will return an empty DOMNodeList
. evaluate()
will return the number directly, so you can use it immediately without having to extract data from the node.
XPath's code and speed advantages
Let's make a quick demonstration, returning the number of books written by a specific author. We will first look at a viable approach, but it does not use XPath. This is to show you how to do this without using XPath and why XPath is so powerful.
<!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]>
The next method achieves the same result, but uses XPath to select books written only by a specific author:
<?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library>
Please note that we eliminated the need for PHP to test author values this time. However, we can go a step further and use the XPath function count()
to calculate the number of occurrences of this path.
//library/book
We only need one line of XPath to retrieve the required information without the need to use PHP to perform laborious filtering. In fact, this is an easier and more concise way to write this feature! Note that evaluate()
is used in the last example. This is because the function count()
returns a typed result. Using query()
will return DOMNodeList
, but you will find that it is an empty list. This not only makes your code more concise, but also has the advantage of speed. I found that version 1 has an average speed of 30% faster than version 2, but version 3 is about 10% faster than version 2 (about 15% faster than version 1). While these measurements vary based on your server and query, using pure XPath often brings considerable speed benefits while also making your code easier to read and maintain.
XPath function
XPath can use quite a lot of functions, and there are many excellent resources detailing the available functions. If you find yourself iterating over DOMNodeLists
or comparing nodeValues
, you may find an XPath function that eliminates a lot of PHP code. You have seen the usage of the count()
function. Let's use the id()
function to return the title of the book with the given ISBN. The XPath expression you need to use is:
//library/book/author[text() = "An Author"]/..
Note that the values to be searched here are enclosed in quotes and separated by spaces; no comma-separated terms are required.
<?php public function getNumberOfBooksByAuthor($author) { $total = 0; $elements = $this->domDocument->getElementsByTagName("author"); foreach ($elements as $element) { if ($element->nodeValue == $author) { $total++; } } return $total; // 修正:这里应该是 $total,而不是 $number } ?>
Executing complex functions in XPath is relatively simple; the trick is to be familiar with the functions available.
Using PHP functions in XPath
Sometimes you may find yourself needing some more powerful features that standard XPath functions cannot provide. Fortunately, PHP DOM also allows you to integrate PHP's own functions into XPath queries. Let's consider returning the number of words in the title of the book. The simplest function, we can write the method like this:
<!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]>
However, we can also integrate the function str_word_count()
directly into the XPath query. Several steps need to be completed for this. First, we must register a namespace using the XPath object. The PHP function in the XPath query begins with "php:functionString
", followed by the name of the function you want to use, enclosed in parentheses. Additionally, the namespace to be defined is http://php.net/xpath
. The namespace must be set to this; any other value will cause an error. Then we need to call registerPHPFunctions()
, which tells PHP that whenever we encounter a function with "php:
" as the namespace, it should be handled by PHP. The actual syntax for calling a function is:
<?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library>
Put all of this together and get the following reimplementation of getNumberOfWords()
:
//library/book
Note that you do not need to call the XPath function text()
to provide the text of the node. The registerPHPFunctions()
method will do this automatically. However, the following is also valid:
//library/book/author[text() = "An Author"]/..
Register PHP functions are not limited to functions that come with PHP. You can define your own functions and provide them in XPath. The only difference is that when defining a function you use "php:function
" instead of "php:functionString
". In addition, only the function itself or static methods can be provided. Calling instance methods is not supported. Let's demonstrate the basic functionality using a regular function that is beyond the scope of the class. The function we will use will return only the books of "George Orwell". For each node you want to include in the query, it must return true
.
<?php public function getNumberOfBooksByAuthor($author) { $total = 0; $elements = $this->domDocument->getElementsByTagName("author"); foreach ($elements as $element) { if ($element->nodeValue == $author) { $total++; } } return $total; // 修正:这里应该是 $total,而不是 $number } ?>
The argument passed to the function is an array of DOMElements
. The function is responsible for iterating over the array and determining whether the node to be tested should be returned in DOMNodeList
. In this example, the node to be tested is /book
, which we use /author
to determine. Now we can create the method getGeorgeOrwellBooks()
:
<?php public function getNumberOfBooksByAuthor($author) { $query = "//library/book/author[text() = '$author']/.."; $xpath = new DOMXPath($this->domDocument); $result = $xpath->query($query); return $result->length; } ?>
If compare()
is a static method, then you need to modify the XPath query to read:
<?php public function getNumberOfBooksByAuthor($author) { $query = "count(//library/book/author[text() = '$author']/..)"; $xpath = new DOMXPath($this->domDocument); return $xpath->evaluate($query); } ?>
In fact, all of these features can be easily written in XPath only, but this example shows how to extend an XPath query to make it more complex. The object method cannot be called in XPath. If you find that you need to access certain object properties or methods to complete XPath query, the best solution is to use XPath to complete the part you can do, and then use any object methods or attributes to process the generated DOMNodeList
as needed.
Summary
XPath is a great way to reduce the amount of code written and speed up code execution when processing XML data. Although not part of the official DOM specification, additional features provided by PHP DOM allow you to extend standard XPath functions with custom functions. This is a very powerful feature, and as you become more familiar with the XPath function, you may find yourself relying less and less on it.
(Picture from Fotolia)
FAQs (FAQ) about PHP DOM with XPath
What is XPath and how does it work in PHP DOM?
XPath (XML Path Language) is a query language used to select nodes from an XML document. In PHP DOM, XPath is used to traverse elements and properties in an XML document. It allows you to find and select specific parts of an XML document in a variety of ways, such as selecting a node by name, selecting a node by its attribute value, or selecting a node by its location in the document. This makes it a powerful tool for parsing and manipulating XML data in PHP.
How to create an instance of DOMXPath?
To create an instance of DOMXPath, you first need to create an instance of the DOMDocument class. Once you have obtained the DOMDocument object, you can create a new DOMXPath object by passing the DOMDocument object to the DOMXPath constructor. Here is an example:
<!DOCTYPE library [ <!ELEMENT library (book*)> <!ELEMENT book (title, author, genre, chapter*)> <!ATTLIST book isbn ID #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT genre (#PCDATA)> <!ELEMENT chapter (chaptitle,text)> <!ATTLIST chapter position NMTOKEN #REQUIRED> <!ELEMENT chaptitle (#PCDATA)> <!ELEMENT text (#PCDATA)> ]>
How to use XPath to select a node?
You can select nodes using the query()
method of the DOMXPath object. The query()
method takes the XPath expression as a parameter and returns a DOMNodeList object containing all nodes matching the expression. For example:
<?xml version="1.0" encoding="utf-8"?> <library> <book isbn="isbn1234"> <title>A Book</title> <author>An Author</author> <genre>Horror</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text></text> </chapter> </book> <book isbn="isbn1235"> <title>Another Book</title> <author>Another Author</author> <genre>Science Fiction</genre> <chapter position="first"> <chaptitle>chapter one</chaptitle> <text>Sit Dolor Amet...</text> </chapter> </book> </library>
This will select all <book>
elements that are child elements of the <title>
element.
and query()
methods in evaluate()
DOMXPath?
Both the query()
and evaluate()
methods are used to evaluate XPath expressions. The difference is the type of result they return. The query()
method returns the DOMNodeList of all nodes that match the XPath expression. On the other hand, evaluate()
returns a typed result, such as a boolean, number, or string, depending on the XPath expression. If the expression result is a node set, evaluate()
will return a DOMNodeList.
How to handle namespaces in XPath query?
To handle namespaces in XPath query, you need to register the namespace with the DOMXPath object using the registerNamespace()
method. This method has two parameters: the prefix and the namespace URI. After registering the namespace, you can use prefixes in your XPath query. For example:
//library/book
How to use XPath to select properties?
You can use the @
symbol followed by the property name to select properties in XPath. For example, to select all <a></a>
properties of the href
element, you can use the following XPath expression: //a/@href
.
How to use XPath function in PHP DOM?
XPath provides many functions that can be used in XPath expressions. These functions can be used to manipulate strings, numbers, node sets, and more. To use the XPath function in PHP DOM, simply include the function in the XPath expression. For example, to select all <book>
elements with a price element with a value greater than 30, you can use the number()
function as shown below: //book[number(price) > 30]
.
Can I use XPath with HTML documents in PHP DOM?
Yes, you can use XPath with HTML documents in PHP DOM. However, since HTML is not always well-formed XML, you may have problems trying to use XPath with HTML. To avoid these problems, you can use the loadHTML()
method of the DOMDocument class to load the HTML document. This method parses the HTML and corrects any formatting errors, allowing you to use XPath with the generated DOMDocument object.
How to handle errors when using XPath in PHP DOM?
When using XPath in PHP DOM, errors may occur for a number of reasons, such as an erroneous XPath expression format or an XML document cannot be loaded. To handle these errors, you can enable user error handling using the libxml_use_internal_errors()
function. This function will cause libxml errors to be stored internally, allowing you to process them in your code. You can then use the libxml_get_errors()
function to retrieve the errors and process them as needed.
Can I modify an XML document using XPath in PHP DOM?
While XPath itself does not provide a way to modify XML documents, you can use XPath with the DOM API to modify XML documents. You can use XPath to select the node you want to modify, and then use the methods provided by the DOM API to modify. For example, you can use the removeChild()
method of the DOMNode class to delete a node, or use the setAttribute()
method of the DOMElement class to change the value of the attribute.
The above is the detailed content of PHP DOM: Using XPath. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.
