nodejs crawler framework superagent
This time I will bring you the nodejs crawler framework superagent. What are the precautions of the nodejs crawler framework superagent. Here are the actual cases, let’s take a look.
Preface
I have heard of crawlers for a long time. I started to learn nodejs in the past few days and wrote a crawlerhttps://github.com/leichangchun/node-crawlers/tree/master/superagent_cheerio_demo, crawl the article title, user name, number of reads, number of recommendations and user avatar on the homepage of the blog park, and now make a short summary.
Use these points:
1. The core module of node - File system
2. Third party used for http requests Module -- superagent
3. Third-party module for parsing DOM -- cheerio
For detailed explanations and APIs of several modules, please refer to each link. There are only simple usages in the demo.
Use npm to manage dependencies, and dependency information will be stored in package.json
1 2 |
|
Introducing the required functional modules
1 2 3 4 |
|
Request + Parse page
If you want to climb to the content of the blog park homepage, First, you need to request the homepage address and get the returned html. Here, superagent is used to make an http request. The basic usage method is as follows:
1 2 3 4 |
|
Initiate a get request to the specified url. When the request is incorrect, an error will be returned (no In case of error, error is null or undefined), and res is the returned data.
After getting the html content, we need to use cheerio to parse the DOM in order to get the data we want. Cheerio must first load the target html and then parse it. The API is very similar to the jquery API. , familiar with jquery and getting started very quickly. Look directly at the code example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Storing data
After parsing the DOM above, the required information content has been spliced and the image has been obtained URL, store it now, store the content in a txt file in the specified directory, and download the image to the specified directory
Create the directory first and use the nodejs core file system
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
With the specification After the directory, you can write data. The content of the txt file is already there. Just write it directly. Use writeFile()
1 2 3 4 |
|
to get the link to the picture, so you need to use superagent to download the picture and save it locally. . superagent can directly return a response stream, and then cooperate with the nodejs pipeline to directly write the image content to the local
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Effect
Execute the demo and see the effect. The data has climbed down normally
A very simple demo, it may not be that rigorous, but it is always the first small step towards node.
I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!
Recommended reading:
H5’s method of reading files and uploading them to the server
How to achieve the animation effect of picture rotation in html5
The above is the detailed content of nodejs crawler framework superagent. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Node.js can be used as a backend framework as it offers features such as high performance, scalability, cross-platform support, rich ecosystem, and ease of development.

To connect to a MySQL database, you need to follow these steps: Install the mysql2 driver. Use mysql2.createConnection() to create a connection object that contains the host address, port, username, password, and database name. Use connection.query() to perform queries. Finally use connection.end() to end the connection.

The following global variables exist in Node.js: Global object: global Core module: process, console, require Runtime environment variables: __dirname, __filename, __line, __column Constants: undefined, null, NaN, Infinity, -Infinity

There are two npm-related files in the Node.js installation directory: npm and npm.cmd. The differences are as follows: different extensions: npm is an executable file, and npm.cmd is a command window shortcut. Windows users: npm.cmd can be used from the command prompt, npm can only be run from the command line. Compatibility: npm.cmd is specific to Windows systems, npm is available cross-platform. Usage recommendations: Windows users use npm.cmd, other operating systems use npm.

The main differences between Node.js and Java are design and features: Event-driven vs. thread-driven: Node.js is event-driven and Java is thread-driven. Single-threaded vs. multi-threaded: Node.js uses a single-threaded event loop, and Java uses a multi-threaded architecture. Runtime environment: Node.js runs on the V8 JavaScript engine, while Java runs on the JVM. Syntax: Node.js uses JavaScript syntax, while Java uses Java syntax. Purpose: Node.js is suitable for I/O-intensive tasks, while Java is suitable for large enterprise applications.

Yes, Node.js is a backend development language. It is used for back-end development, including handling server-side business logic, managing database connections, and providing APIs.

Node.js and Java each have their pros and cons in web development, and the choice depends on project requirements. Node.js excels in real-time applications, rapid development, and microservices architecture, while Java excels in enterprise-grade support, performance, and security.

Yes, Node.js can be used for front-end development, and key advantages include high performance, rich ecosystem, and cross-platform compatibility. Considerations to consider are learning curve, tool support, and small community size.
