Home System Tutorial LINUX Use Python to crawl the entire video information of station B

Use Python to crawl the entire video information of station B

Feb 19, 2024 pm 11:45 PM
linux linux tutorial Red Hat linux system linux command linux certification red hat linux linux video

I think everyone is familiar with Station B. In fact, there are a lot of search results on the crawler website of Station B. However, what I read on paper is ultimately shallow, and I definitely know that I have to do it in detail, so I am here. In the end, the total amount of data crawled was 7.6 million items.

Preparation

First open station B, find a video on the homepage and click on it. For normal operation, open the developer tools. This time, the goal is to obtain video information by crawling the API provided by Station B without parsing the web page. The speed of parsing the web page is too slow and the IP address is easily blocked.

Check the JS option and F5 to refresh

Use Python to crawl the entire video information of station B

Found the api address

Use Python to crawl the entire video information of station B

Copy it, remove unnecessary content, and get https://api.bilibili.com/x/web-interface/archive/stat?aid=15906633 . Open it with a browser and you will get The following json data

Use Python to crawl the entire video information of station B

Hands-on coding

Okay, the code can be coded here. Data is obtained through continuous iteration of request. In order to make the crawler more efficient, multi-threading can be used.

Core code

Use Python to crawl the entire video information of station B

Iterative crawling

Use Python to crawl the entire video information of station B

The most important part of the entire project is about 20 lines of code, which is quite concise.

The running effect is roughly like this. The number is how many links have been crawled. In fact, the entire site information can be crawled in one or two days.

Use Python to crawl the entire video information of station B

As for how to process it after crawling, it depends on your preference. I save it as a csv file first, and then summarize and insert it into the database.

Database Table

Use Python to crawl the entire video information of station B

Since I crawled this content a few months ago, the data is actually lagging behind.

Total amount of data

Use Python to crawl the entire video information of station B

Query the top ten videos

Use Python to crawl the entire video information of station B

Check the top ten videos with the most replies

Use Python to crawl the entire video information of station B

The above is the detailed content of Use Python to crawl the entire video information of station B. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1268
29
C# Tutorial
1243
24
Linux Architecture: Unveiling the 5 Basic Components Linux Architecture: Unveiling the 5 Basic Components Apr 20, 2025 am 12:04 AM

The five basic components of the Linux system are: 1. Kernel, 2. System library, 3. System utilities, 4. Graphical user interface, 5. Applications. The kernel manages hardware resources, the system library provides precompiled functions, system utilities are used for system management, the GUI provides visual interaction, and applications use these components to implement functions.

How to check the warehouse address of git How to check the warehouse address of git Apr 17, 2025 pm 01:54 PM

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.

How to run java code in notepad How to run java code in notepad Apr 16, 2025 pm 07:39 PM

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

How to run sublime after writing the code How to run sublime after writing the code Apr 16, 2025 am 08:51 AM

There are six ways to run code in Sublime: through hotkeys, menus, build systems, command lines, set default build systems, and custom build commands, and run individual files/projects by right-clicking on projects/files. The build system availability depends on the installation of Sublime Text.

What is the main purpose of Linux? What is the main purpose of Linux? Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

laravel installation code laravel installation code Apr 18, 2025 pm 12:30 PM

To install Laravel, follow these steps in sequence: Install Composer (for macOS/Linux and Windows) Install Laravel Installer Create a new project Start Service Access Application (URL: http://127.0.0.1:8000) Set up the database connection (if required)

git software installation git software installation Apr 17, 2025 am 11:57 AM

Installing Git software includes the following steps: Download the installation package and run the installation package to verify the installation configuration Git installation Git Bash (Windows only)

How to run sublime python How to run sublime python Apr 16, 2025 am 08:54 AM

How to run Python scripts in Sublime Text: Install Python interpreter configuration Interpreter path in Sublime Text Press Ctrl B (Windows/Linux) or Cmd B (macOS) to run the script If an interactive console is required, press Ctrl \ (Windows/Linux) or Cmd \ (macOS)

See all articles