Home Backend Development PHP Tutorial PHP爬取糗事百科主页糗事

PHP爬取糗事百科主页糗事

Jun 13, 2016 pm 12:20 PM
find gt mysql quot

PHP爬取糗事百科首页糗事

突然想获取一些网上的数据来玩玩,因为有SAE的MySql数据库,让它在那呆着没有什么卵用!于是就开始用PHP编写一个爬取糗事百科首页糗事的小程序,数据都保存在MySql中,岂不是很好玩!

说干就干!首先确定思路

获取HTML源码--->解析HTML--->保存到数据库

没有什么难的

1、创建PHP文件“getDataToDB.php”,

2、获取指定URL的HTML源码

这里我用的是curl函数,详细内容参见PHP手册

代码为

<span style="font-family:Times New Roman;font-size:14px;">// 获取对应链接的HTMLCODEfunction GetHtmlCode($url) {	$ch = curl_init (); // 初始化一个cur对象	curl_setopt ( $ch, CURLOPT_URL, $url ); // 设置需要抓取的网页	curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 ); // 设置crul参数,要求结果保存到字符串中还是输出到屏幕上	curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 1000 ); // 设置链接延迟	$HtmlCode = curl_exec ( $ch ); // 运行curl,请求网页	return $HtmlCode;}</span>
Copy after login
3、引入第三方文件’simple_html_dom.php‘来解析HTML

这里我没有能力使用正则表达式,就在网上海搜,终于找到这个,就像Java使用Jsoup(使用Jsoup解析滁州学院官网获取新闻列表)一样,具体参见BLOG

代码如下

<span style="font-family:Times New Roman;font-size:14px;">function getFmlDataToDB() {	$link = mysql_connect ( SAE_MYSQL_HOST_M . ':' . SAE_MYSQL_PORT, SAE_MYSQL_USER, SAE_MYSQL_PASS );	// 获取源码	$html = str_get_html ( GetHtmlCode ( "http://www.qiushibaike.com/" ) );		if ($link) {		mysql_select_db ( SAE_MYSQL_DB, $link );		mysql_query ( 'set names utf8' );		// class="article block untagged mb15"		foreach ( $html->find ( 'div[class=article block untagged mb15]' ) as $per ) {						$z = null;			$t = null;			$w = null;			$d = null;			$p = null;			$ds = null;			$ps = null;						// //作者			$author = $per->find ( 'div[class=author]' );			if ($author != null) {				$a = $author [0]->find ( 'a' );				$z = $a [1]->innertext;			} else {				$z = 'no author';			}						// 头像链接						if ($author != null) {				$icon = $author [0]->find ( 'a' );				$t = $icon [0]->src->innertext;			} else {				$t = '...............';			}						// 文章内容			$content = $per->find ( 'div[class=content]' );			$w = $content [0]->innertext;						// 点赞数			$vote1 = $per->find ( 'div[class=stats]' );			$vote2 = $vote1 [0]->find ( 'span[class=stats-vote]' );			$vote3 = $vote2 [0]->find ( 'i[class=number]' );						$d = $vote3 [0]->innertext;			// 评论数			$comments1 = $vote1 [0]->find ( 'span[class=stats-comments]' );			$comments2 = $comments1 [0]->find ( 'a[class=qiushi_comments]' );			$comments3 = $comments2 [0]->find ( 'i[class=number]' );			$p = $comments3 [0]->innertext;			// 顶 数			$up_down = $per->find ( 'div[class=stats-buttons bar clearfix]' );						$up_down1 = $up_down [0]->find ( 'ul' );			$li = $up_down1 [0]->find ( 'li' );			$up = $li [0]->find ( 'span[class=number hidden]' );			$ds = $up [0]->innertext;			// 拍 数			$down = $li [1]->find ( 'span[class=number hidden]' );			$ps = $down [0]->innertext;		}	} else {		echo '数据库链接KO';	}}</span>
Copy after login
这个代码写的有点纠结,我试了一下不能直接获取子节点的数据,只能从外层一层一层的剥开解析,如果有新的写法,我会更新,也请各位看官看看。

4、创建数据库,将数据插入到数据库中

这里我使用的SAE中的MySQL,具体的连接方发参见使用PHP连接SAE中的MySql数据库

需要注意的就是编码格式,区要在执行语句前加上这样一句话

<span style="font-family:Microsoft YaHei;font-size:14px;">mysql_query ( 'set names utf8' );</span>
Copy after login
核心代码如下:

<span style="font-family:Microsoft YaHei;font-size:14px;">			$sql = "INSERT INTO `app_bmhjqs`.`db_fml` (`id`, `author`, `icon_url`, `content`, `vote`, `comments`, `up`, `down`) VALUES (NULL, '$z', '$t', '$w', '$d', '$p', '$ds', '$ps');";			// 解决乱码			mysql_query ( 'set names utf8' );			$result = mysql_query ( $sql );</span>
Copy after login

这样一来,获取--->解析--->插入就完成了,效果就是运行一次PHP文件,数据库就添加了糗事百科首页上的糗事!我想可不可以写个定时器,每隔一定时间就运行一次代码,这一点在java我可以实现,在php我不会,毕竟是个没长毛的小鸟!百度吧。。。搜到这样的写法

<span style="font-family:Times New Roman;font-size:14px;">// 定时器// ignore_user_abort (); // run script. in background// set_time_limit ( 0 ); // run script. forever// $interval = 30; // do every 15 minutes..// do {// 	echo date ( 'Y-m-d H:i:s', time () );// 	echo '写入数据库';// 	//getFmlDataToDB ();	// } while ( true );</span>
Copy after login
在文件里加上这样的代码,正好在学校断网前,发布到了SAE上,我没有测试!只能等到第二天来查看结果了!

今天早上,我迫不及待的打开电脑,打开SAE数据库,情况如下:

额滴神!受不鸟了,赶紧把定时器关掉了,写了个按钮触发事件!这样下去,数据库会被挤满的!

好了,PHP爬取糗事百科首页糗事就此完成

如果你感觉这篇Blog对你有所帮助,就点个赞吧!



Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

MySQL's Role: Databases in Web Applications MySQL's Role: Databases in Web Applications Apr 17, 2025 am 12:23 AM

The main role of MySQL in web applications is to store and manage data. 1.MySQL efficiently processes user information, product catalogs, transaction records and other data. 2. Through SQL query, developers can extract information from the database to generate dynamic content. 3.MySQL works based on the client-server model to ensure acceptable query speed.

How to start mysql by docker How to start mysql by docker Apr 15, 2025 pm 12:09 PM

The process of starting MySQL in Docker consists of the following steps: Pull the MySQL image to create and start the container, set the root user password, and map the port verification connection Create the database and the user grants all permissions to the database

Laravel Introduction Example Laravel Introduction Example Apr 18, 2025 pm 12:45 PM

Laravel is a PHP framework for easy building of web applications. It provides a range of powerful features including: Installation: Install the Laravel CLI globally with Composer and create applications in the project directory. Routing: Define the relationship between the URL and the handler in routes/web.php. View: Create a view in resources/views to render the application's interface. Database Integration: Provides out-of-the-box integration with databases such as MySQL and uses migration to create and modify tables. Model and Controller: The model represents the database entity and the controller processes HTTP requests.

Solve database connection problem: a practical case of using minii/db library Solve database connection problem: a practical case of using minii/db library Apr 18, 2025 am 07:09 AM

I encountered a tricky problem when developing a small application: the need to quickly integrate a lightweight database operation library. After trying multiple libraries, I found that they either have too much functionality or are not very compatible. Eventually, I found minii/db, a simplified version based on Yii2 that solved my problem perfectly.

How to install mysql in centos7 How to install mysql in centos7 Apr 14, 2025 pm 08:30 PM

The key to installing MySQL elegantly is to add the official MySQL repository. The specific steps are as follows: Download the MySQL official GPG key to prevent phishing attacks. Add MySQL repository file: rpm -Uvh https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm Update yum repository cache: yum update installation MySQL: yum install mysql-server startup MySQL service: systemctl start mysqld set up booting

Laravel framework installation method Laravel framework installation method Apr 18, 2025 pm 12:54 PM

Article summary: This article provides detailed step-by-step instructions to guide readers on how to easily install the Laravel framework. Laravel is a powerful PHP framework that speeds up the development process of web applications. This tutorial covers the installation process from system requirements to configuring databases and setting up routing. By following these steps, readers can quickly and efficiently lay a solid foundation for their Laravel project.

MySQL and phpMyAdmin: Core Features and Functions MySQL and phpMyAdmin: Core Features and Functions Apr 22, 2025 am 12:12 AM

MySQL and phpMyAdmin are powerful database management tools. 1) MySQL is used to create databases and tables, and to execute DML and SQL queries. 2) phpMyAdmin provides an intuitive interface for database management, table structure management, data operations and user permission management.

Centos install mysql Centos install mysql Apr 14, 2025 pm 08:09 PM

Installing MySQL on CentOS involves the following steps: Adding the appropriate MySQL yum source. Execute the yum install mysql-server command to install the MySQL server. Use the mysql_secure_installation command to make security settings, such as setting the root user password. Customize the MySQL configuration file as needed. Tune MySQL parameters and optimize databases for performance.

See all articles