Home Backend Development PHP Tutorial 正则获取网页源码keyword和description ,蛋有点疼

正则获取网页源码keyword和description ,蛋有点疼

Jun 23, 2016 pm 02:00 PM

情况一:  

1

2

3

<META NAME="description" CONTENT="华尔街债券(bond.wswire.com)

是全球第一债券网站,为您提供全球债券市场最迅速最专业的债券资讯和全天候的债券理财、债券评级及报价服务,

华尔街债券覆盖交易所债券市场、银行间债券市场、银行同业拆借及公开市场等各方面的债券信息服务。华尔街债券汇聚多家顶级专业机构分析研究报告、每日两次的精确数据分析以及图文并茂的市况报道。"><META NAME="keywords" CONTENT="华尔街,电讯,华尔街电讯,全球债券,国债,债券,债市,企业债,企债,可转债,回购,正回购,赎回,债券公告,利率,金融债,央行,短期融资券,记账式国债,货币政策,财经,汇率,票据,公开市场,稳定收益,公债,柜台交易,银行间债市,同业拆借,债券资讯,融资债,债券理财,债券评级,银行间市场,交易所市场,海外市场,央行票据">

Copy after login


情况二:

1

<meta name=keywords content="微波炉使用高火档能耗低更节能(图),环保新知,,,微波炉,,,高火,,,节能,,,省电,,"><meta name=description content="微波炉使用高火档能耗低更节能(图)">

Copy after login



注意:可能大小写,还有就是name,和content属性[color=#FF6600] 位置 不一样[/color]

小弟试着写了一下,只能匹配一写网页,不知道有什么问题。大牛请解答,拜谢!
keyword:

1

2

1.preg_match("/<meta[\s]+name=[&#39;\"]keywords[&#39;\"] content=[&#39;\"](.*)[&#39;\"]/isU",$this->tmpHtml,$inarr);

2.preg_match("/<meta[\s] content=[&#39;\"](.*)[&#39;\"] name=[&#39;\"]keywords[&#39;\"]/isU",$this->tmpHtml,$inarr2);

Copy after login

1

2

1.preg_match("/<meta[\s]+name=[&#39;\"]description[&#39;\"] content=[&#39;\"](.*)[&#39;\"]/isU",$this->tmpHtml,$inarr);

2.preg_match("/<meta[\s]+content=[&#39;\"](.*)[&#39;\"] name=[&#39;\"]description[&#39;\"]/isU",$this->tmpHtml,$inarr2);

Copy after login

说明:一些网页能匹配,一些不能


回复讨论(解决方案)

哦对了,忘了说明了,有的网页 是这样的:




keywords 和description没有双引号。匹配不了、希望大哥们帮我完善一下,最好测试通过

不是有个get_meta_tags函数么

不是有个get_meta_tags函数么
+1,可以返回一个meta的数组的,再提取需要的就是了

呵呵,见笑了,恩谢谢啊,foolbirdflyfirst yangball

name在前面:

1

2

3

<meta(\s)name=(\&#39;|\"|)keywords(\&#39;|\"|)(\s*)content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)><meta(\s)name=(\&#39;|\"|)

keywords(\&#39;|\"|)(\s*)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)><meta(\s)name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)

content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)><meta(\s)name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)>

Copy after login
Copy after login


name在后面:

1

2

3

4

<meta(\s)content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)name=(\&#39;|\"|)keywords(\&#39;|\"|)(\s*)>

<meta(\s)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)name=(\&#39;|\"|)keywords(\&#39;|\"|)(\s*)>

<meta(\s)content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)>

<meta(\s)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)>

Copy after login

name在前面:

1

2

3

<meta(\s)name=(\&#39;|\"|)keywords(\&#39;|\"|)(\s*)content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)><meta(\s)name=(\&#39;|\"|)

keywords(\&#39;|\"|)(\s*)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)><meta(\s)name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)

content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)><meta(\s)name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)>

Copy after login
Copy after login

name在后面:

1

2

3

4

<meta(\s)content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)name=(\&#39;|\"|)keywords(\&#39;|\"|)(\s*)><meta(\s)

content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)name=(\&#39;|\"|)keywords(\&#39;|\"|)(\s*)><meta(\s)content=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)

name=(\&#39;|\"|)description(\&#39;|\"|)(\s*)><meta(\s)content=(\&#39;|\"|)|(\&#39;|\"|)(\s*)name=(\&#39;|\"|)

description(\&#39;|\"|)(\s*)>

Copy after login

根据楼上,进一步得出:
name在前:

1

2

3

4

5

6

<(\s*)(meta|META|Meta)(\s*)(name|NAME|Name)=(\&#39;|\"|)(keywords|KEYWORDS|Keywords)(\&#39;|\"|)(\s*)

(content|CONTENT|Content)=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)><(\s*)(meta|META|Meta)(\s*)(name|NAME|Name)=(\&#39;|\"|)(

keywords|KEYWORDS|Keywords)(\&#39;|\"|)(\s*)(content|CONTENT|Content)=(\&#39;|\"|)|(\&#39;|\"|)(\s*)><(\s*)(meta|META|Meta)

(\s*)(name|NAME|Name)=(\&#39;|\"|)(description|DESCRIPTION|Description)(\&#39;|\"|)(\s*)

(content|CONTENT|Content)=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)><(\s*)(meta|META|Meta)(\s*)(name|NAME|Name)=(\&#39;|\"|)

(description|DESCRIPTION|Description)(\&#39;|\"|)(\s*)(content|CONTENT|Content)=(\&#39;|\"|)|(\&#39;|\"|)(\s*)>

Copy after login

name在后:

1

<(\s*)(meta|META|Meta)(\s*)(content|CONTENT|Content)=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)(name|NAME|Name)=(\&#39;|\"|)(keywords|KEYWORDS|Keywords)(\&#39;|\"|)(\s*)><(\s*)(meta|META|Meta)(\s*)(content|CONTENT|Content)=(\&#39;|\"|)|(\&#39;|\"|)(\s*)(name|NAME|Name)=(\&#39;|\"|)(keywords|KEYWORDS|Keywords)(\&#39;|\"|)(\s*)><(\s*)(meta|META|Meta)(\s*)(content|CONTENT|Content)=(\&#39;|\"|)(.*)(\&#39;|\"|)(\s*)(name|NAME|Name)=(\&#39;|\"|)(description|DESCRIPTION|Description)(\&#39;|\"|)(\s*)><(\s*)(meta|META|Meta)(\s*)(content|CONTENT|Content)=(\&#39;|\"|)|(\&#39;|\"|)(\s*)(name|NAME|Name)=(\&#39;|\"|)(description|DESCRIPTION|Description)(\&#39;|\"|)(\s*)>

Copy after login

 以上就是正则获取网页源码keyword和description ,蛋有点疼的内容,更多相关内容请关注PHP中文网(www.php.cn)!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to match multiple words or strings using Golang regular expression? How to match multiple words or strings using Golang regular expression? May 31, 2024 am 10:32 AM

Golang regular expressions use the pipe character | to match multiple words or strings, separating each option as a logical OR expression. For example: matches "fox" or "dog": fox|dog matches "quick", "brown" or "lazy": (quick|brown|lazy) matches "Go", "Python" or "Java": Go|Python |Java matches words or 4-digit zip codes: ([a-zA

How to replace a string starting with something with php regular expression How to replace a string starting with something with php regular expression Mar 24, 2023 pm 02:57 PM

PHP regular expressions are a powerful tool for text processing and conversion. It can effectively manage text information by parsing text content and replacing or intercepting it according to specific patterns. Among them, a common application of regular expressions is to replace strings starting with specific characters. We will explain this as follows

How to use regular expressions to remove Chinese characters in php How to use regular expressions to remove Chinese characters in php Mar 03, 2023 am 10:12 AM

How to remove Chinese in PHP using regular expressions: 1. Create a PHP sample file; 2. Define a string containing Chinese and English; 3. Use "preg_replace('/([\x80-\xff]*)/i', '',$a);" The regular method can remove Chinese characters from the query results.

How to use regular matching to remove html tags in php How to use regular matching to remove html tags in php Mar 21, 2023 pm 05:17 PM

In this article, we will learn how to remove HTML tags and extract plain text content from HTML strings using PHP regular expressions. To demonstrate how to remove HTML tags, let's first define a string containing HTML tags.

How to verify if a URL is HTTPS protocol using PHP regex How to verify if a URL is HTTPS protocol using PHP regex Jun 24, 2023 am 08:16 AM

Website security has attracted more and more attention, and using the HTTPS protocol to ensure the security of data transmission has become an important part of current website development. In PHP development, how to use regular expressions to verify whether the URL is HTTPS protocol? here we come to find out. Regular expression Regular expression is an expression used to describe rules. It is a powerful tool for processing text and is widely used in text matching, search and replacement. In PHP development, we can use regular expressions to match http in the URL

Sharing tips on using PHP regular expressions to implement Chinese replacement function Sharing tips on using PHP regular expressions to implement Chinese replacement function Mar 24, 2024 pm 05:57 PM

Sharing tips on using PHP regular expressions to implement the Chinese replacement function. In web development, we often encounter situations where Chinese content needs to be replaced. As a popular server-side scripting language, PHP provides powerful regular expression functions, which can easily realize Chinese replacement. This article will share some techniques for using regular expressions to implement Chinese substitution in PHP, and provide specific code examples. 1. Use the preg_replace function to implement Chinese replacement. The preg_replace function in PHP can be used

PHP regular replacement examples: quickly master replacement skills PHP regular replacement examples: quickly master replacement skills Feb 29, 2024 pm 06:33 PM

PHP regular replacement examples: quickly master replacement skills. With the development of the Internet, website development has become more and more common. In website development, it is often necessary to replace strings, and regular expressions are a very powerful tool that can quickly search and replace strings. This article will introduce how to use regular expressions in the PHP language to perform replacement operations, and provide specific code examples to help readers quickly master replacement techniques. 1.preg_replace function in PHP, you can use preg

An in-depth explanation of PHP regular expression escaping An in-depth explanation of PHP regular expression escaping Mar 21, 2023 pm 02:52 PM

​Regular expressions are a powerful tool for matching strings and can facilitate string manipulation. However, in the process of writing regular expressions, sometimes you may need to match some special characters, such as "\", "|", "{", etc. These characters have special meanings in regular expressions and need to be escaped.

See all articles