Table of Contents
PHP crawls embarrassing things on the homepage of Encyclopedia of Embarrassing Things
Home Backend Development PHP Tutorial PHP crawls the embarrassing things on the home page of the Encyclopedia of Embarrassing Things_PHP Tutorial

PHP crawls the embarrassing things on the home page of the Encyclopedia of Embarrassing Things_PHP Tutorial

Jul 13, 2016 am 09:53 AM
Encyclopedia front page

PHP crawls embarrassing things on the homepage of Encyclopedia of Embarrassing Things

Suddenly I want to get some online data for fun, because there is SAE’s MySql database, and it is of no use leaving it there! So I started using PHP to write a small program that crawled the embarrassing things on the homepage of the Encyclopedia of Embarrassing Things. The data was all saved in MySql. Wouldn't it be fun!

Just do it! First determine the idea

Get HTML source code--->Parse HTML--->Save to database

Nothing difficult

1. Create the PHP file "getDataToDB.php",

2. Get the HTML source code of the specified URL

I am using the curl function here. For details, please refer to the PHP manual

The code is

<span new="" style="font-family:Times">// 获取对应链接的HTMLCODE
function GetHtmlCode($url) {
	$ch = curl_init (); // 初始化一个cur对象
	curl_setopt ( $ch, CURLOPT_URL, $url ); // 设置需要抓取的网页
	curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 ); // 设置crul参数,要求结果保存到字符串中还是输出到屏幕上
	curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 1000 ); // 设置链接延迟
	$HtmlCode = curl_exec ( $ch ); // 运行curl,请求网页
	return $HtmlCode;
}</span>
Copy after login
3. Introduce the third-party file ‘simple_html_dom.php’ to parse HTML

I don’t have the ability to use regular expressions here, so I searched online and finally found this, just like using Jsoup in Java (using Jsoup to parse the official website of Chuzhou University to get the news list). For details, see BLOG

The code is as follows

<span new="" style="font-family:Times">function getFmlDataToDB() {
	$link = mysql_connect ( SAE_MYSQL_HOST_M . &#39;:&#39; . SAE_MYSQL_PORT, SAE_MYSQL_USER, SAE_MYSQL_PASS );
	// 获取源码
	$html = str_get_html ( GetHtmlCode ( http://www.qiushibaike.com/ ) );
	
	if ($link) {
		mysql_select_db ( SAE_MYSQL_DB, $link );
		mysql_query ( &#39;set names utf8&#39; );
		// class=article block untagged mb15
		foreach ( $html->find ( &#39;div[class=article block untagged mb15]&#39; ) as $per ) {
			
			$z = null;
			$t = null;
			$w = null;
			$d = null;
			$p = null;
			$ds = null;
			$ps = null;
			
			// //作者
			$author = $per->find ( &#39;div[class=author]&#39; );
			if ($author != null) {
				$a = $author [0]->find ( &#39;a&#39; );
				$z = $a [1]->innertext;
			} else {
				$z = &#39;no author&#39;;
			}
			
			// 头像链接
			
			if ($author != null) {
				$icon = $author [0]->find ( &#39;a&#39; );
				$t = $icon [0]->src->innertext;
			} else {
				$t = &#39;...............&#39;;
			}
			
			// 文章内容
			$content = $per->find ( &#39;div[class=content]&#39; );
			$w = $content [0]->innertext;
			
			// 点赞数
			$vote1 = $per->find ( &#39;div[class=stats]&#39; );
			$vote2 = $vote1 [0]->find ( &#39;span[class=stats-vote]&#39; );
			$vote3 = $vote2 [0]->find ( &#39;i[class=number]&#39; );
			
			$d = $vote3 [0]->innertext;
			// 评论数
			$comments1 = $vote1 [0]->find ( &#39;span[class=stats-comments]&#39; );
			$comments2 = $comments1 [0]->find ( &#39;a[class=qiushi_comments]&#39; );
			$comments3 = $comments2 [0]->find ( &#39;i[class=number]&#39; );
			$p = $comments3 [0]->innertext;
			// 顶 数
			$up_down = $per->find ( &#39;div[class=stats-buttons bar clearfix]&#39; );
			
			$up_down1 = $up_down [0]->find ( &#39;ul&#39; );
			$li = $up_down1 [0]->find ( &#39;li&#39; );
			$up = $li [0]->find ( &#39;span[class=number hidden]&#39; );
			$ds = $up [0]->innertext;
			// 拍 数
			$down = $li [1]->find ( &#39;span[class=number hidden]&#39; );
			$ps = $down [0]->innertext;

		}
	} else {
		echo &#39;数据库链接KO&#39;;
	}
}</span>
Copy after login
This code is a bit confusing to write. I tried it and couldn't get the data of the child nodes directly. I could only peel off the outer layers and parse them layer by layer. If there is a new way to write it, I will update it. Please take a look. .

4. Create a database and insert data into the database

Here I use MySQL in SAE. For specific connection methods, see Using PHP to connect to the MySql database in SAE

What you need to pay attention to is the encoding format. You should add this sentence before the execution statement

<span style="font-family:Microsoft">mysql_query ( &#39;set names utf8&#39; );</span>
Copy after login
The core code is as follows:

<span style="font-family:Microsoft">			$sql = INSERT INTO `app_bmhjqs`.`db_fml` (`id`, `author`, `icon_url`, `content`, `vote`, `comments`, `up`, `down`) VALUES (NULL, &#39;$z&#39;, &#39;$t&#39;, &#39;$w&#39;, &#39;$d&#39;, &#39;$p&#39;, &#39;$ds&#39;, &#39;$ps&#39;);;
			// 解决乱码
			mysql_query ( &#39;set names utf8&#39; );
			$result = mysql_query ( $sql );</span>
Copy after login

In this way, Get--->Parse--->Insert is completed. The effect is to run the PHP file once, and the embarrassing things on the homepage of the Encyclopedia of Embarrassing Things will be added to the database! I wonder if I can write a timer to run the code at a certain interval. I can do this in Java, but I can't in PHP. After all, I am a little bird with no hair! Baidu. . . I found this way of writing

<span new="" style="font-family:Times">// 定时器
// ignore_user_abort (); // run script. in background
// set_time_limit ( 0 ); // run script. forever
// $interval = 30; // do every 15 minutes..

// do {
// 	echo date ( &#39;Y-m-d H:i:s&#39;, time () );
// 	echo &#39;写入数据库&#39;;
// 	//getFmlDataToDB ();
	
// } while ( true );</span>
Copy after login
Add this code to the file and publish it to SAE just before the school disconnects. I have not tested it! I can only wait until the next day to check the results!

This morning, I couldn’t wait to turn on my computer and open the SAE database. The situation is as follows:

Oh my god! I couldn't stand it anymore, so I quickly turned off the timer and wrote a button to trigger the event! If this continues, the database will be crowded!


www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1004123.htmlTechArticlePHP crawls the homepage of the Encyclopedia of Embarrassing Things. Suddenly I want to get some online data for fun, because there is SAE's MySql database , there is no use in letting it stay there! So I started writing a program in PHP...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to set up Google Chrome homepage How to set up Google Chrome homepage Mar 02, 2024 pm 04:04 PM

How to set up Google Chrome homepage? Google Chrome is the most popular web browser software today. This browser has simple and efficient features that users like. When using browsers, different people have different settings preferences. Some people like to use Google Chrome. The browser is set as the default homepage, and some people like to set the homepage as other search engines, so where should it be set? Next, the editor will bring you a quick method to set up the homepage of Google Chrome. I hope it can be helpful to you. How to quickly set the Google Chrome homepage 1. Open Google Chrome (as shown in the picture). 2. Click the menu button in the upper right corner of the interface (as shown in the picture). 3. Select the "Settings" option (as shown in the picture). 4. In the settings menu, find "Search Engine" (such as

What should I do if the Google Chrome homepage changes to 360? Restore the Google Chrome homepage What should I do if the Google Chrome homepage changes to 360? Restore the Google Chrome homepage Mar 15, 2024 am 08:16 AM

What should I do if the Google Chrome homepage changes to 360? Google Chrome is a simple and convenient browser, but many friends find that the simple homepage has been replaced by a 360 homepage during use. If they want to restore it to its original style, how should it be set? Below, the editor will show you how to restore the Google Chrome homepage. Solution: 1. First open Google Chrome. 2. If you want to change it to the default, click the three dots in the upper right corner. 3. Click [Settings] to open the settings page. 4. Click [Startup]. 5. As shown in the picture, [Open a specific web page or a group of web pages] here is the URL of 360 Navigation. 6. Click the three dots on the right side of the 360 ​​navigation. 7. Click [Remove].​

UniApp realizes the design and development method of home page and navigation page UniApp realizes the design and development method of home page and navigation page Jul 07, 2023 pm 09:09 PM

Design and development method of UniApp to realize home page and navigation page 1. Introduction UniApp is a cross-platform development tool built on the Vue.js framework, which can compile a set of codes to produce applications for multiple platforms. In UniApp, the homepage and navigation page are two necessary pages when developing applications. This article will introduce how to design and develop these two pages in UniApp, and provide corresponding code examples. 2. Home page design and development method Page structure UniApp’s home page generally includes a title bar, carousel, and classification

List of NPC functions on the homepage of 'Adventure Treasure Hunt and Defeat the Demon King' List of NPC functions on the homepage of 'Adventure Treasure Hunt and Defeat the Demon King' Feb 10, 2024 am 11:00 AM

"Adventure Treasure Hunt and Defeat the Demon King" is a RogueLike war chess game with a Western fantasy background. The new game is online. New players have encountered many problems when entering. What are the functions of the four NPCs on the homepage? Next, the editor will bring you a sharing list of the four homepage NPC functions in "Adventure Treasure Hunt and Defeat the Demon King". Adventure treasure hunting and then defeat the devil. Home page NPC functions. Introduce the functions of the 4 home page NPCs: 1. Adventure group: adventure group upgrade, season adventure group upgrade, upper limit upgrade of equipment (backpack). After an adventure, remember to clean up some waste equipment. Otherwise, it will occupy the grid space), and the upper limit of the number of characters will be upgraded (there are only 8 character slots initially, and golden characters can exceed them) 2. Trainer: Train characters (purple and gold can be trained), upgrade

How to return to the homepage from an html subpage How to return to the homepage from an html subpage Nov 15, 2023 am 10:33 AM

Methods to return to the homepage from html subpages: 1. Use hyperlinks; 2. Use JavaScript; 3. Use browser history. Detailed introduction: 1. Use hyperlinks, add a hyperlink in the sub-page, link it to the URL of the home page, add a "return to home page" link at the bottom of the sub-page or in the navigation bar, use "<a>" tag to create a hyperlink, set the "href" attribute to the URL of the homepage; 2. Use JavaScript to implement the function of returning to the homepage through JavaScript code, etc.

How to set up the homepage of 360 browser How to set up the homepage of 360 browser Apr 07, 2024 pm 01:40 PM

How to set up the 360 ​​browser homepage? 360 Browser is a very secure web browser software. This browser has rich functions and services. Many users like to use this browser for work. The homepage of 360 Browser is very rich in content, and many users are very interested in it. I like this homepage, and many users prefer a simpler homepage. So how do we set the homepage of 360 Browser? Next, the editor will introduce to you how to set up the 360 ​​browser homepage. Come and take a look. Introduction to how to set up the home page of 360 Browser 1. First, you need to enter the main interface of 360 Secure Browser (as shown in the picture). 2. Click the "Three Stripes" option in the upper right corner, and then click the "Settings" option that appears in the drop-down menu to enter the settings interface.

How to design a Java switch grocery shopping system with carousel function on the home page How to design a Java switch grocery shopping system with carousel function on the home page Nov 01, 2023 am 11:20 AM

How to design a Java switch grocery shopping system with a carousel function on the homepage. With the development of the Internet, people's lifestyles are also constantly changing. More and more people are choosing to shop online, including groceries. In order to meet the needs of users, many grocery shopping platforms have launched the function of online ordering of groceries. In these platforms, the home page carousel is one of the very important functions. This article will introduce how to design a Java switch grocery shopping system with a carousel function on the homepage. 1. Functional requirements analysis Before designing the home page carousel function, we need to analyze and understand

DZ homepage URL simplification: remove index.php DZ homepage URL simplification: remove index.php Mar 12, 2024 pm 04:30 PM

DZ homepage URL simplification: remove index.php, specific code examples are required. When using the Discuz! forum system, we often need to optimize the URL, and removing index.php is a common operation. By removing index.php, the URL can be made more concise and beautiful, and it is also beneficial to search engine optimization. Below we will introduce how to simplify the DZ homepage URL and remove the specific code examples of the index.php part. First, we need to log in to Di

See all articles