用PHP&XML编制迷你搜索引擎(一)_PHP
一、认识XML
大家可能对XML还很陌生,我这里不想系统的讲解XML为何许物也,我只是对本文用到的一些概念进行一些介绍,如果您已经使用过XML,哪怕是初学者。您也可以跳过这章。
谈起XML,我不妨先给您一段我们熟悉的html的代码。
(1) html>
(2) title>page title/title>
(3) body>
(4) p>center>font color="red">TEXT/font>/center>/p>
(5) a href="www.yahoo.com">img src="yahoo.gif"/>/a>
(6) /body>
(7) /html>
上面这段代码从结构上就可以符合XML的规则。
他符合下面几个特点
1、引用同一个元素的时候,使用一致的大小写如center>/Center>就是不符合规定的
2、任何属性值(如 href="????")要用"" 扩起来如a href=www.yahoo.com>就是不正确的
3、所有元素必须由打开和关闭>标注组成,元素应该形如body>/body>、或空元素img ... />
请注意结尾的 /> 少了/就是错误的代码
4、所有元素必须彼此嵌套,就像写程序的循环一样,而且,所有的元素必须嵌套于根元素之中如上面的代码所有的内容都嵌套于html>/html>之中。
5、元素名称(即上面的body a p img等)应为字母开头,其实,最好就是一个英文单词,请注意大小写。
怎么样,XML不是太烦吧,你可以理解为他是一个很好的包含数据的树形的结构类型。
好了,大家来熟悉一下我们程序中用到的那个XML吧。
links>网络狂飙之谜你搜索引擎采用PHP和XML技术构建
web memo="memo1" url="">name1/web>
sub>电脑网络
web memo="nemo2">name2/web>
sub>程序设计语言
web memo="memo3">name3/web>
sub>PHP
web url="http://www.phpbuilder.com/" memo="[英文]PHP开发资源。">
www.phpbuilder.com/web>
web url="http://www.fokus.gmd.de" memo="[英文]PHP开发手册。 ">
PHP Manual
其实,它的结构相当简单,根元素就是links,sub代表着一个类别,web就是一个网站的信息,其中包含着属性,url代表网站的联接,memo为备注信息,
在第1行加上 (没有会出错)另存为xyz.xml,用IE5以上的浏览器打开看看。
怎么样,他的树形的结构一览无余。
那么我们的mini的搜索引擎为什么要使用他呢。第一个原因就是我在奥索网还不能使用mysql(真惭愧),其次,对于小数据量的搜索引擎来说,它的数据量很小,如果用数据库来做,效率未必有多高。最重要的一点是,他维护起来相当的简单,减少了人力,并且不用编写繁琐的数据库的维护的程序,例如,我们要添加一个类别或者网页,只要编辑文本的文件,加上一个web>???/web>或是sub>????/sub>就可以了,而且,如果想把一个类别移动到另一个地方的话,我们只要将这一部分的sub,ctrl-x,ctrl-v不就行了(树形结构吗)。
其实,XML的功能我只用到了一点的皮毛,以后,我会奉献给大家更深入的文章。
二、PHP如何解析XML
注:本章的内容借鉴自网易虚拟社区(我懒得敲了),加以修改。
XML解析器的两种基本类型:
基于树型的解析器:将XML文档转换成树型结构。这类解析器分析整篇文章,同时提供一个API来访问所产生树的每个元素。其通用的标准为DOM(文档对象模式)。 使用过Javascript可能用过XMLDOM。
基于事件的解析器:将XML文档视为一系列的事件。当一个特殊事件发生时,解析器将调用开发者提供的函数来处理。
基于事件的解析器有一个XML文档的数据集中视图,也就是说它集中在XML文档的数据部分,而不是其结构。这些解析器从头到尾处理文档,并将类似于-元素的开始、元素的结尾、特征数据的开始等等-事件通过回调(callback)函数报告
给应用程序。以下是一个"Hello-World"的XML文档范例:
greeting>
Hello World
/greeting>
基于事件的解析器将报告为三个事件:
开始元素:greeting
CDATA项的开始,值为:Hello World
结束元素:greeting
不像基于树型的解析器,基于事件的解析器不产生描述文档的结构。在CDATA项中,基于事件的解析器不会让你得到父元素greeting的信息。
然而,它提供一个更底层的访问,这就使得可以更好地利用资源和更快地访问。通过这种方式,就没有必要将整个文档放入内存;而事实上,整个文档甚至可以大于实际内存值。
准备
用于产生XML解析器实例的函数为xml_parser_create()。该实例将用于以后的所有函数。这个思路非常类似于PHP中MySQL函数的连接标记。在解析文档前,基于事件的解析器通常要求你注册回调函数-用于特定的事件发生时调用。Expat没有例外事件,它定义了如下七个可能事件:
对象 XML解析函数 描述
元素 xml_set_element_handler() 元素的开始和结束
字符数据 xml_set_character_data_handler() 字符数据的开始
外部实体 xml_set_external_entity_ref_handler() 外部实体出现
未解析外部实体 xml_set_unparsed_entity_decl_handler() 未解析的外部实体
出现
处理指令 xml_set_processing_instruction_handler() 处理指令的出现
记法声明 xml_set_notation_decl_handler() 记法声明的出现
默认 xml_set_default_handler() 其它没有指定处理函数的事件
所有的回调函数必须将解析器的实例作为其第一个参数(此外还有其它参数)。
更详细的说明可以参见PHP的说明。
下列用来显示 XML 元素结构 (Element Structure)
下面的范例摘自PHP手册范例,
他是我们的搜索引擎的基本结构,但是,我就不加以注释了,因为,我们下一章将会介绍。
$file = "data.xml";
$depth = array();
function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i $depth[$parser]; $i++) {
print " ";
}
print "$name
";
$depth[$parser]++;
}
function endElement($parser, $name, $attrs)
{
global $depth;
$depth[$parser]--;
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Detailed explanation of Oracle error 3114: How to solve it quickly, specific code examples are needed. During the development and management of Oracle database, we often encounter various errors, among which error 3114 is a relatively common problem. Error 3114 usually indicates a problem with the database connection, which may be caused by network failure, database service stop, or incorrect connection string settings. This article will explain in detail the cause of error 3114 and how to quickly solve this problem, and attach the specific code

Wormhole is a leader in blockchain interoperability, focused on creating resilient, future-proof decentralized systems that prioritize ownership, control, and permissionless innovation. The foundation of this vision is a commitment to technical expertise, ethical principles, and community alignment to redefine the interoperability landscape with simplicity, clarity, and a broad suite of multi-chain solutions. With the rise of zero-knowledge proofs, scaling solutions, and feature-rich token standards, blockchains are becoming more powerful and interoperability is becoming increasingly important. In this innovative application environment, novel governance systems and practical capabilities bring unprecedented opportunities to assets across the network. Protocol builders are now grappling with how to operate in this emerging multi-chain

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Use PHPXML functions to process XML data: Parse XML data: simplexml_load_file() and simplexml_load_string() load XML files or strings. Access XML data: Use the properties and methods of the SimpleXML object to obtain element names, attribute values, and subelements. Modify XML data: add new elements and attributes using the addChild() and addAttribute() methods. Serialized XML data: The asXML() method converts a SimpleXML object into an XML string. Practical example: parse product feed XML, extract product information, transform and store it into a database.

[Analysis of the meaning and usage of midpoint in PHP] In PHP, midpoint (.) is a commonly used operator used to connect two strings or properties or methods of objects. In this article, we’ll take a deep dive into the meaning and usage of midpoints in PHP, illustrating them with concrete code examples. 1. Connect string midpoint operator. The most common usage in PHP is to connect two strings. By placing . between two strings, you can splice them together to form a new string. $string1=&qu

Due to space limitations, the following is a brief article: Apache2 is a commonly used web server software, and PHP is a widely used server-side scripting language. In the process of building a website, sometimes you encounter the problem that Apache2 cannot correctly parse the PHP file, causing the PHP code to fail to execute. This problem is usually caused by Apache2 not configuring the PHP module correctly, or the PHP module being incompatible with the version of Apache2. There are generally two ways to solve this problem, one is

Analysis of new features of Win11: How to skip logging in to a Microsoft account. With the release of Windows 11, many users have found that it brings more convenience and new features. However, some users may not like having their system tied to a Microsoft account and wish to skip this step. This article will introduce some methods to help users skip logging in to a Microsoft account in Windows 11 and achieve a more private and autonomous experience. First, let’s understand why some users are reluctant to log in to their Microsoft account. On the one hand, some users worry that they

How to change the search engine in Google Chrome? Google Chrome is a very popular browser among users. It not only has simple and easy-to-use services, practical tools and other auxiliary functions, but also can meet the different needs of different users. Search engines generally default to Google. If we want to How should I set it up to replace it? Let me share the method below. Replacement method 1. Click to open Google Chrome. 2. Click the three-dot icon to open the menu interface. 3. Click the Settings option to enter the browser’s settings interface. 4. Find the search engine module in the settings interface. 5. Click the Manage Search Engine button. 6. You can see an add button. Click this add button to add a search engine.
