


PHP crawls the embarrassing things on the home page of the Encyclopedia of Embarrassing Things_PHP Tutorial
PHP crawls embarrassing things on the homepage of Encyclopedia of Embarrassing Things
Suddenly I want to get some online data for fun, because there is SAE’s MySql database, and it is of no use leaving it there! So I started using PHP to write a small program that crawled the embarrassing things on the homepage of the Encyclopedia of Embarrassing Things. The data was all saved in MySql. Wouldn't it be fun!
Just do it! First determine the idea
Get HTML source code--->Parse HTML--->Save to database
Nothing difficult
1. Create the PHP file "getDataToDB.php",
2. Get the HTML source code of the specified URL
I am using the curl function here. For details, please refer to the PHP manual
The code is
<span new="" style="font-family:Times">// 获取对应链接的HTMLCODE function GetHtmlCode($url) { $ch = curl_init (); // 初始化一个cur对象 curl_setopt ( $ch, CURLOPT_URL, $url ); // 设置需要抓取的网页 curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 ); // 设置crul参数,要求结果保存到字符串中还是输出到屏幕上 curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 1000 ); // 设置链接延迟 $HtmlCode = curl_exec ( $ch ); // 运行curl,请求网页 return $HtmlCode; }</span>
I don’t have the ability to use regular expressions here, so I searched online and finally found this, just like using Jsoup in Java (using Jsoup to parse the official website of Chuzhou University to get the news list). For details, see BLOG
The code is as follows
<span new="" style="font-family:Times">function getFmlDataToDB() { $link = mysql_connect ( SAE_MYSQL_HOST_M . ':' . SAE_MYSQL_PORT, SAE_MYSQL_USER, SAE_MYSQL_PASS ); // 获取源码 $html = str_get_html ( GetHtmlCode ( http://www.qiushibaike.com/ ) ); if ($link) { mysql_select_db ( SAE_MYSQL_DB, $link ); mysql_query ( 'set names utf8' ); // class=article block untagged mb15 foreach ( $html->find ( 'div[class=article block untagged mb15]' ) as $per ) { $z = null; $t = null; $w = null; $d = null; $p = null; $ds = null; $ps = null; // //作者 $author = $per->find ( 'div[class=author]' ); if ($author != null) { $a = $author [0]->find ( 'a' ); $z = $a [1]->innertext; } else { $z = 'no author'; } // 头像链接 if ($author != null) { $icon = $author [0]->find ( 'a' ); $t = $icon [0]->src->innertext; } else { $t = '...............'; } // 文章内容 $content = $per->find ( 'div[class=content]' ); $w = $content [0]->innertext; // 点赞数 $vote1 = $per->find ( 'div[class=stats]' ); $vote2 = $vote1 [0]->find ( 'span[class=stats-vote]' ); $vote3 = $vote2 [0]->find ( 'i[class=number]' ); $d = $vote3 [0]->innertext; // 评论数 $comments1 = $vote1 [0]->find ( 'span[class=stats-comments]' ); $comments2 = $comments1 [0]->find ( 'a[class=qiushi_comments]' ); $comments3 = $comments2 [0]->find ( 'i[class=number]' ); $p = $comments3 [0]->innertext; // 顶 数 $up_down = $per->find ( 'div[class=stats-buttons bar clearfix]' ); $up_down1 = $up_down [0]->find ( 'ul' ); $li = $up_down1 [0]->find ( 'li' ); $up = $li [0]->find ( 'span[class=number hidden]' ); $ds = $up [0]->innertext; // 拍 数 $down = $li [1]->find ( 'span[class=number hidden]' ); $ps = $down [0]->innertext; } } else { echo '数据库链接KO'; } }</span>
4. Create a database and insert data into the database
Here I use MySQL in SAE. For specific connection methods, see Using PHP to connect to the MySql database in SAE
What you need to pay attention to is the encoding format. You should add this sentence before the execution statement
<span style="font-family:Microsoft">mysql_query ( 'set names utf8' );</span>
<span style="font-family:Microsoft"> $sql = INSERT INTO `app_bmhjqs`.`db_fml` (`id`, `author`, `icon_url`, `content`, `vote`, `comments`, `up`, `down`) VALUES (NULL, '$z', '$t', '$w', '$d', '$p', '$ds', '$ps');; // 解决乱码 mysql_query ( 'set names utf8' ); $result = mysql_query ( $sql );</span>
In this way, Get--->Parse--->Insert is completed. The effect is to run the PHP file once, and the embarrassing things on the homepage of the Encyclopedia of Embarrassing Things will be added to the database! I wonder if I can write a timer to run the code at a certain interval. I can do this in Java, but I can't in PHP. After all, I am a little bird with no hair! Baidu. . . I found this way of writing
<span new="" style="font-family:Times">// 定时器 // ignore_user_abort (); // run script. in background // set_time_limit ( 0 ); // run script. forever // $interval = 30; // do every 15 minutes.. // do { // echo date ( 'Y-m-d H:i:s', time () ); // echo '写入数据库'; // //getFmlDataToDB (); // } while ( true );</span>
This morning, I couldn’t wait to turn on my computer and open the SAE database. The situation is as follows:
Oh my god! I couldn't stand it anymore, so I quickly turned off the timer and wrote a button to trigger the event! If this continues, the database will be crowded!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to set up Google Chrome homepage? Google Chrome is the most popular web browser software today. This browser has simple and efficient features that users like. When using browsers, different people have different settings preferences. Some people like to use Google Chrome. The browser is set as the default homepage, and some people like to set the homepage as other search engines, so where should it be set? Next, the editor will bring you a quick method to set up the homepage of Google Chrome. I hope it can be helpful to you. How to quickly set the Google Chrome homepage 1. Open Google Chrome (as shown in the picture). 2. Click the menu button in the upper right corner of the interface (as shown in the picture). 3. Select the "Settings" option (as shown in the picture). 4. In the settings menu, find "Search Engine" (such as

What should I do if the Google Chrome homepage changes to 360? Google Chrome is a simple and convenient browser, but many friends find that the simple homepage has been replaced by a 360 homepage during use. If they want to restore it to its original style, how should it be set? Below, the editor will show you how to restore the Google Chrome homepage. Solution: 1. First open Google Chrome. 2. If you want to change it to the default, click the three dots in the upper right corner. 3. Click [Settings] to open the settings page. 4. Click [Startup]. 5. As shown in the picture, [Open a specific web page or a group of web pages] here is the URL of 360 Navigation. 6. Click the three dots on the right side of the 360 navigation. 7. Click [Remove].

Design and development method of UniApp to realize home page and navigation page 1. Introduction UniApp is a cross-platform development tool built on the Vue.js framework, which can compile a set of codes to produce applications for multiple platforms. In UniApp, the homepage and navigation page are two necessary pages when developing applications. This article will introduce how to design and develop these two pages in UniApp, and provide corresponding code examples. 2. Home page design and development method Page structure UniApp’s home page generally includes a title bar, carousel, and classification

"Adventure Treasure Hunt and Defeat the Demon King" is a RogueLike war chess game with a Western fantasy background. The new game is online. New players have encountered many problems when entering. What are the functions of the four NPCs on the homepage? Next, the editor will bring you a sharing list of the four homepage NPC functions in "Adventure Treasure Hunt and Defeat the Demon King". Adventure treasure hunting and then defeat the devil. Home page NPC functions. Introduce the functions of the 4 home page NPCs: 1. Adventure group: adventure group upgrade, season adventure group upgrade, upper limit upgrade of equipment (backpack). After an adventure, remember to clean up some waste equipment. Otherwise, it will occupy the grid space), and the upper limit of the number of characters will be upgraded (there are only 8 character slots initially, and golden characters can exceed them) 2. Trainer: Train characters (purple and gold can be trained), upgrade

Methods to return to the homepage from html subpages: 1. Use hyperlinks; 2. Use JavaScript; 3. Use browser history. Detailed introduction: 1. Use hyperlinks, add a hyperlink in the sub-page, link it to the URL of the home page, add a "return to home page" link at the bottom of the sub-page or in the navigation bar, use "<a>" tag to create a hyperlink, set the "href" attribute to the URL of the homepage; 2. Use JavaScript to implement the function of returning to the homepage through JavaScript code, etc.

How to set up the 360 browser homepage? 360 Browser is a very secure web browser software. This browser has rich functions and services. Many users like to use this browser for work. The homepage of 360 Browser is very rich in content, and many users are very interested in it. I like this homepage, and many users prefer a simpler homepage. So how do we set the homepage of 360 Browser? Next, the editor will introduce to you how to set up the 360 browser homepage. Come and take a look. Introduction to how to set up the home page of 360 Browser 1. First, you need to enter the main interface of 360 Secure Browser (as shown in the picture). 2. Click the "Three Stripes" option in the upper right corner, and then click the "Settings" option that appears in the drop-down menu to enter the settings interface.

How to design a Java switch grocery shopping system with a carousel function on the homepage. With the development of the Internet, people's lifestyles are also constantly changing. More and more people are choosing to shop online, including groceries. In order to meet the needs of users, many grocery shopping platforms have launched the function of online ordering of groceries. In these platforms, the home page carousel is one of the very important functions. This article will introduce how to design a Java switch grocery shopping system with a carousel function on the homepage. 1. Functional requirements analysis Before designing the home page carousel function, we need to analyze and understand

DZ homepage URL simplification: remove index.php, specific code examples are required. When using the Discuz! forum system, we often need to optimize the URL, and removing index.php is a common operation. By removing index.php, the URL can be made more concise and beautiful, and it is also beneficial to search engine optimization. Below we will introduce how to simplify the DZ homepage URL and remove the specific code examples of the index.php part. First, we need to log in to Di
