How to grab BT Paradise movie data
I had a rest at night and wanted to watch two good movies.
I searched for a long time but couldn’t find what I wanted to watch.
I suddenly thought that someone had crawled Zhihu’s user data before. I had a whim,
It’s okay to crawl down the movie information of BT Paradise,I can check the database directly next time.
I can only say that I am so bored , haha, I can still code
^_^
1. Grab the website html source code
<span style="font-size:24px;">$url = "www.bttiantang.cc"; $html = shell_exec("curl $url");</span>
2. Get the total number of pages, Total number of movies (regular matching)
<span style="font-size:24px;">preg_match("/<span class=\"pageinfo\">.*?<\/span>/", $html, $pageCount); preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount);</span>
3. Capture movie information (regular matching information)
<span style="font-size:24px;">preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime); preg_match("/<font color=\"#FF6600\">(.*?)<i>/" , $pageInfo[0][$i], $movieName); preg_match("/<strong>(\d{1})<\/strong>/" , $pageInfo[0][$i], $movieScore_int); preg_match("/<em class=\"fm\">(\d{1})<\/em>/" , $pageInfo[0][$i], $movieScore_decimal); preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl); preg_match("/<p class=\"des\">(.*?)<\/p>/" , $pageInfo[0][$i], $actor); </span>
4. Insert into the database and you’re done
Generally speaking, the speed of php crawling is quite fast. It takes less than 4 minutes to collect more than 20,000 pieces of information.
start:01:22:54
end:01:26:11
Attached database screenshot:
Attached source code:
<?php $url = "www.bttiantang.cc"; $html = shell_exec("curl $url"); preg_match("/<span class=\"pageinfo\">.*?<\/span>/", $html, $pageCount); preg_match_all("/\d{1,10000}/",$pageCount[0],$pageCount); $pageSize = intval($pageCount[0][0]); $movieCount = $pageCount[0][1]; $conn = mysql_connect('***','***',''); mysql_select_db('***',$conn); mysql_query('set names utf8',$conn); for($j=1;$j<=$pageSize;$j++){ $movieHtml = shell_exec("curl $url?PageNo=$j"); preg_match_all("/<div class=\"item cl\">.*?<\/div>/s", $movieHtml, $pageInfo); for($i=0;$i<count($pageInfo[0]);$i++){ preg_match("/\d{4}\/\d{2}\/\d{2}/" , $pageInfo[0][$i], $updateTime); /******clear ad*****/ if(empty($updateTime))continue; /*******************/ $updateTime = str_replace('/','-',$updateTime[0]); preg_match("/<font color=\"#FF6600\">(.*?)<i>/" , $pageInfo[0][$i], $movieName); /*****same conditions*****/ if(empty($movieName)) preg_match("/<b>(.*?)<i>/" , $pageInfo[0][$i], $movieName); if(empty($movieName)) preg_match("/<b>(.*?)<\/b>/" , $pageInfo[0][$i], $movieName); /************************/ $movieName = $movieName[1]; preg_match("/<strong>(\d{1})<\/strong>/" , $pageInfo[0][$i], $movieScore_int); $movieScore_int = $movieScore_int[1]; preg_match("/<em class=\"fm\">(\d{1})<\/em>/" , $pageInfo[0][$i], $movieScore_decimal); $movieScore_decimal = $movieScore_decimal[1]; $movieScore = floatval($movieScore_int.'.'.$movieScore_decimal); preg_match("/href=\"(.*?)\"/" , $pageInfo[0][$i], $movieUrl); $movieUrl = $movieUrl[1]; preg_match("/<p class=\"des\">(.*?)<\/p>/" , $pageInfo[0][$i], $actor); $movieActor = str_replace("<em>",'',str_replace("</em>",'',$actor[1])); mysql_unbuffered_query("insert into movie (name,actor,url,update_ts,score) values ('$movieName','$movieActor','$movieUrl',<span style="white-space:pre"> </span>'$updateTime','$movieScore')"); } } ?>
This movie information is grabbed from BT Paradise and does not involve confidential information. Therefore, I do not bear any legal responsibility!
If any relevant movie information involves your copyright or intellectual property rights or other interests, please inform us and it will be deleted as soon as possible after confirmation.
Copyright Statement: This article is an original article by the blogger and may not be reproduced without the blogger's permission.
The above introduces how to crawl BT Paradise movie data, including aspects of content. I hope it will be helpful to friends who are interested in PHP tutorials.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

php提交表单通过后,弹出的对话框怎样在当前页弹出php提交表单通过后,弹出的对话框怎样在当前页弹出而不是在空白页弹出?想实现这样的效果:而不是空白页弹出:------解决方案--------------------如果你的验证用PHP在后端,那么就用Ajax;仅供参考:HTML code

请问如何修改url某一参数的参数值呢?是要拆开了再拼回去吗?那么请问如何修改url某一参数的参数值呢?是要拆开了再拼回去吗?http://127.0.0.1/myo/newuser.php?mod=search&type=fastone比如现在我要修改mod=new要怎么做呢?------解决方案--------------------发送了请求

Everyone is looking forward to today's Windows 1123H2 release. In fact, Microsoft has just launched updates to the release preview, which is the closest channel before the official release stage. Known as Build 22631, Microsoft said they are rolling out the new rebranded chat app, phone link, and play together widgets that have been tested on other internal channels over the past few months. "This new update will have the same servicing branch and codebase as Windows 11 version 22H2 and will be cumulative with all newly announced features, including Copilot in Windows (preview)," Microsoft promises. Redmond officials further

Note that match is used for matching operations, and its return value is of boolean type. Through match, you can simply verify whether a certain element exists in the list. Example // Verify whether there is a string in the list starting with a, and match the first one, that is, return truebooleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out .println(anyStartsWithA);//true//Verify whether the string in the list

Concept 1. Various Match operations can be used to determine whether a given Predicate meets the elements of a Stream. 2. Match operation is a terminal operation and returns a Boolean value. Instance booleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//truebooleanallStartsWithA=stringCollection.stream().

Concept 1. Various Match operations can be used to determine whether a given Predicate meets the elements of a Stream. 2. Match operation is a terminal operation and returns a Boolean value. Instance booleananyStartsWithA=stringCollection.stream().anyMatch((s)->s.startsWith("a"));System.out.println(anyStartsWithA);//truebooleanallStartsWithA=stringCollection.stream().

不用数据库来实现用户的简单的下载,代码如下,但是却不能下载,请高手找下原因,文件路劲什么的没问题。

图片消失如何解决先是图片文件上传$file=$_FILES['userfile']; if(is_uploaded_file($file['tmp_name'])){$query=mysql_query("INSERT INTO gdb_banner(image_src ) VALUES ('images/{$file['name'
