Home Backend Development PHP Tutorial How to grab BT Paradise movie data

How to grab BT Paradise movie data

Jul 30, 2016 pm 01:30 PM
lt match quot

I had a rest at night and wanted to watch two good movies.

I searched for a long time but couldn’t find what I wanted to watchHow to grab BT Paradise movie data.

I suddenly thought that someone had crawled Zhihu’s user data before. I had a whimHow to grab BT Paradise movie data,

It’s okay to crawl down the movie information of BT Paradise,I can check the database directly next time. How to grab BT Paradise movie dataHow to grab BT Paradise movie data

I can only say that I am so bored How to grab BT Paradise movie data, haha, I can still code ^_^


1. Grab the website html source code

<span style="font-size:24px;">$url = "www.bttiantang.cc";
$html = shell_exec("curl $url");</span>
Copy after login

2. Get the total number of pages, Total number of movies (regular matching)

<span style="font-size:24px;">preg_match("/<span class=\"pageinfo\">.*?<\/span>/", $html, $pageCount);
preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount);</span>
Copy after login

3. Capture movie information (regular matching information)

<span style="font-size:24px;">preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime);

preg_match("/<font color=\"#FF6600\">(.*?)<i>/" , $pageInfo[0][$i], $movieName);
        
preg_match("/<strong>(\d{1})<\/strong>/" , $pageInfo[0][$i], $movieScore_int);
     
preg_match("/<em class=\"fm\">(\d{1})<\/em>/" , $pageInfo[0][$i], $movieScore_decimal);
        
preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl);
      
preg_match("/<p class=\"des\">(.*?)<\/p>/" , $pageInfo[0][$i], $actor);
       </span>
Copy after login

4. Insert into the database and you’re done

Generally speaking, the speed of php crawling is quite fast. It takes less than 4 minutes to collect more than 20,000 pieces of information.

start:01:22:54

end:01:26:11



Attached database screenshot:



Attached source code:

<?php

$url = "www.bttiantang.cc";
$html = shell_exec("curl $url");

preg_match("/<span class=\"pageinfo\">.*?<\/span>/", $html, $pageCount);
preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount);

$pageSize = intval($pageCount[0][0]);
$movieCount = $pageCount[0][1];

$conn = mysql_connect('***','***','');
mysql_select_db('***',$conn);
mysql_query('set names utf8',$conn);

for($j=1;$j<=$pageSize;$j++){
    $movieHtml = shell_exec("curl $url?PageNo=$j");
    preg_match_all("/<div class=\"item cl\">.*?<\/div>/s", $movieHtml, $pageInfo);
    for($i=0;$i<count($pageInfo[0]);$i++){
        preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime);
        /******clear ad*****/
            if(empty($updateTime))continue;
        /*******************/
        $updateTime = str_replace('/','-',$updateTime[0]);


        preg_match("/<font color=\"#FF6600\">(.*?)<i>/" , $pageInfo[0][$i], $movieName);
        /*****same conditions*****/
        if(empty($movieName))
            preg_match("/<b>(.*?)<i>/" , $pageInfo[0][$i], $movieName);
        if(empty($movieName))
            preg_match("/<b>(.*?)<\/b>/" , $pageInfo[0][$i], $movieName);
        /************************/
        $movieName = $movieName[1];

        preg_match("/<strong>(\d{1})<\/strong>/" , $pageInfo[0][$i], $movieScore_int);
        $movieScore_int = $movieScore_int[1];
        preg_match("/<em class=\"fm\">(\d{1})<\/em>/" , $pageInfo[0][$i], $movieScore_decimal);
        $movieScore_decimal = $movieScore_decimal[1];
        $movieScore = floatval($movieScore_int.'.'.$movieScore_decimal);

        preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl);
        $movieUrl = $movieUrl[1];

        preg_match("/<p class=\"des\">(.*?)<\/p>/" , $pageInfo[0][$i], $actor);
        $movieActor = str_replace("<em>",'',str_replace("</em>",'',$actor[1]));

        mysql_unbuffered_query("insert into movie (name,actor,url,update_ts,score) values ('$movieName','$movieActor','$movieUrl',<span style="white-space:pre">	</span>'$updateTime','$movieScore')");
    }

}

?>
Copy after login

This movie information is grabbed from BT Paradise and does not involve confidential information. Therefore, I do not bear any legal responsibility!

If any relevant movie information involves your copyright or intellectual property rights or other interests, please inform us and it will be deleted as soon as possible after confirmation.

Copyright Statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.

The above introduces how to crawl BT Paradise movie data, including aspects of content. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

php提交表单通过后,弹出的对话框怎样在当前页弹出,该如何解决 php提交表单通过后,弹出的对话框怎样在当前页弹出,该如何解决 Jun 13, 2016 am 10:23 AM

php提交表单通过后,弹出的对话框怎样在当前页弹出php提交表单通过后,弹出的对话框怎样在当前页弹出而不是在空白页弹出?想实现这样的效果:而不是空白页弹出:------解决方案--------------------如果你的验证用PHP在后端,那么就用Ajax;仅供参考:HTML code

请教怎么修改url某一参数的参数值呢?是要拆开了再拼回去吗 请教怎么修改url某一参数的参数值呢?是要拆开了再拼回去吗 Jun 13, 2016 am 10:24 AM

请问如何修改url某一参数的参数值呢?是要拆开了再拼回去吗?那么请问如何修改url某一参数的参数值呢?是要拆开了再拼回去吗?http://127.0.0.1/myo/newuser.php?mod=search&type=fastone比如现在我要修改mod=new要怎么做呢?------解决方案--------------------发送了请求

Microsoft is rolling out Windows 11 23H2 build to the release preview channel with Copilot Microsoft is rolling out Windows 11 23H2 build to the release preview channel with Copilot Sep 28, 2023 pm 07:17 PM

Everyone is looking forward to today's Windows 1123H2 release. In fact, Microsoft has just launched updates to the release preview, which is the closest channel before the official release stage. Known as Build 22631, Microsoft said they are rolling out the new rebranded chat app, phone link, and play together widgets that have been tested on other internal channels over the past few months. "This new update will have the same servicing branch and codebase as Windows 11 version 22H2 and will be cumulative with all newly announced features, including Copilot in Windows (preview)," Microsoft promises. Redmond officials further

Match matching method in java Match matching method in java Apr 28, 2023 pm 10:31 PM

Note that match is used for matching operations, and its return value is of boolean type. Through match, you can simply verify whether a certain element exists in the list. Example // Verify whether there is a string in the list starting with a, and match the first one, that is, return truebooleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out .println(anyStartsWithA);//true//Verify whether the string in the list

How to use regular expressions to match strings in Java? How to use regular expressions to match strings in Java? Apr 19, 2023 pm 02:37 PM

Concept 1. Various Match operations can be used to determine whether a given Predicate meets the elements of a Stream. 2. Match operation is a terminal operation and returns a Boolean value. Instance booleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//truebooleanallStartsWithA=stringCollection.stream().

How to use java Match How to use java Match Apr 18, 2023 pm 01:55 PM

Concept 1. Various Match operations can be used to determine whether a given Predicate meets the elements of a Stream. 2. Match operation is a terminal operation and returns a Boolean value. Instance booleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//truebooleanallStartsWithA=stringCollection.stream().

不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没有关问题 不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没有关问题 Jun 13, 2016 am 10:15 AM

不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没问题。

图片消失怎么解决 图片消失怎么解决 Apr 07, 2024 pm 03:02 PM

图片消失如何解决先是图片文件上传$file=$_FILES['userfile'];  if(is_uploaded_file($file['tmp_name'])){$query=mysql_query("INSERT INTO gdb_banner(image_src ) VALUES ('images/{$file['name'

See all articles