正则抓取天涯数据,递归失败,求解。。。。
前言:我的面向对象基础一般。。
我是抓的一个模块(鬼话)。
它的下一页是用时间戳来搞的。
我就想,在抓取第一页的所有标题URL之后,顺便抓取下一页的地址(‘可以抓取’);
现在是,第一页的所有标题的URL抓完了,下一页的URL也抓了,
我想递归100次,抓100页的所有标题的URL。
看代码吧。
public function getAllPage($url){ /** * curl_setopt($ch, CURLOPT_FAILONERROR, true);//记录错误信息设置 * curl_errno可以获得错误码,当然也包括错误的http状态码 curl_error可以获得错误信息 */ $ch = curl_init($url);//初始化一个句柄 curl_setopt($ch,CURLOPT_RETURNTRANSFER,true); curl_setopt($ch,CURLOPT_TIMEOUT,1111111); $html = curl_exec($ch); curl_close($ch); //修饰一下,从何时开始。 $length = strpos($html, 'class="mt5'); $newHtml = substr($html, $length); //修饰END $pattern = "#\/post-.*\.shtml#i"; //正则表达式 preg_match_all($pattern, $newHtml,$matches); //抓取下一页链接地址 $nextPagePattern = "#\<a href=\"(.*)\"\srel#"; preg_match($nextPagePattern, $newHtml,$nextPage); $nextPageUrl = "http://bbs.tianya.cn".$nextPage['1']; //下一页 END //链接。全。 foreach($matches['0'] as $k=>$v){ $matches[$k] = 'http://bbs.tianya.cn'.$v; } //之前的递归是在这里的,一运行直接死掉了。。。。。 return array( '0'=>$matches, '1'=>$nextPageUrl, ); }
我想问一下,这个思路有没有问题?
递归的代码方便给一个么= =!!
回复讨论(解决方案)
foreach($ matches['0'] as $k=>$v){
$matches[$k] = 'http://bbs.tianya.cn'.$v;
}
循环中修改数组的意义是什么呢?
你的递归部分的代码也贴出来
foreach($ matches['0'] as $k=>$v){
$matches[$k] = 'http://bbs.tianya.cn'.$v;
}
循环中修改数组的意义是什么呢?
你的递归部分的代码也贴出来
1:补全,抓取到的地址没有域名。
2:递归。。。我那个注释之前是这样写的,
for($i=0;$i<100;$i++){ $this->getAllPage($nextPageUrl) }
for($i=0;$i $this->getAllPage($nextPageUrl)
}
这么做并不是递归抓取100次。
而是循环了一百次,每次都在执行一个递归函数,而你的递归函数是没有出口的(没有跳出递归的出口,会导致无限递归),当然会死了。
for($i=0;$i $this->getAllPage($nextPageUrl)
}
这么做并不是递归抓取100次。
而是循环了一百次,每次都在执行一个递归函数,而你的递归函数是没有出口的(没有跳出递归的出口,会导致无限递归),当然会死了。
请赐教?
for($i=0;$i $this->getAllPage($nextPageUrl)
}
这么做并不是递归抓取100次。
而是循环了一百次,每次都在执行一个递归函数,而你的递归函数是没有出口的(没有跳出递归的出口,会导致无限递归),当然会死了。
出口就是判断么?
对你的需求,可以这么做:
public function getAllPage($url, $depth, &$result)
$depth控制递归的深度,初始为0。 引用型的$result 记录最终的匹配到的结果。
递归的跳出部分:
if($depth == 100){
return;
}
递归函数的递归部分:
$nextPageUrl = "http://bbs.tianya.cn".$nextPage['1'];
foreach($matches['0'] as $k=>$v){
$result[] = 'http://bbs.tianya.cn'.$v;
}
getAllPage($nextPageUrl,$depth+1,$result);
递归函数初始调用:
$result = array();
getAllPage($url,0,$result);
for($i=0;$i $this->getAllPage($nextPageUrl)
}
这么做并不是递归抓取100次。
而是循环了一百次,每次都在执行一个递归函数,而你的递归函数是没有出口的(没有跳出递归的出口,会导致无限递归),当然会死了。
出口就是判断么?
还真不是。你即使去掉循环,只用getAllPage(...)那一部分,也会死的。
对你的需求,可以这么做:
public function getAllPage($url, $depth, &$result)
$depth控制递归的深度,初始为0。 引用型的$result 记录最终的匹配到的结果。
递归的跳出部分:
if($depth == 100){
return;
}
递归函数的递归部分:
$nextPageUrl = "http://bbs.tianya.cn".$nextPage['1'];
foreach($matches['0'] as $k=>$v){
$result[] = 'http://bbs.tianya.cn'.$v;
}
getAllPage($nextPageUrl,$depth+1,$result);
递归函数初始调用:
$result = array();
getAllPage($url,0,$result);
多谢!!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Using preprocessing statements and PDO in PHP can effectively prevent SQL injection attacks. 1) Use PDO to connect to the database and set the error mode. 2) Create preprocessing statements through the prepare method and pass data using placeholders and execute methods. 3) Process query results and ensure the security and performance of the code.

PHP uses MySQLi and PDO extensions to interact in database operations and server-side logic processing, and processes server-side logic through functions such as session management. 1) Use MySQLi or PDO to connect to the database and execute SQL queries. 2) Handle HTTP requests and user status through session management and other functions. 3) Use transactions to ensure the atomicity of database operations. 4) Prevent SQL injection, use exception handling and closing connections for debugging. 5) Optimize performance through indexing and cache, write highly readable code and perform error handling.

PHP is used to build dynamic websites, and its core functions include: 1. Generate dynamic content and generate web pages in real time by connecting with the database; 2. Process user interaction and form submissions, verify inputs and respond to operations; 3. Manage sessions and user authentication to provide a personalized experience; 4. Optimize performance and follow best practices to improve website efficiency and security.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.
