Detailed explanation of the web request module of Node.js crawler
This article mainly introduces the web request module of Node.js crawler. The editor thinks it is quite good. Now I will share it with you and give it as a reference. Let’s follow the editor to take a look, I hope it can help everyone.
This article introduces the web request module of Node.js crawler and shares it with everyone. The details are as follows:
Note: If you download the latest nodegrass version, some methods have been updated. The examples in this article are no longer suitable. Please check the examples in the open source address for details.
1. Why should I write such a module?
The author wants to use Node.js to write a crawler. Although the method of requesting remote resources provided by the official Node.js API is very simple, please refer to
http:// nodejs.org/api/http.html Two methods are provided for Http requests: http.get(options, callback) and http.request(options, callback).
You will know by looking at the method, get The method is used for get requests, while the request method provides more parameters, such as other request methods, the port of the requesting host, etc. Requests for Https are similar to Http. The simplest example:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
For the above code, we just want to request the remote host and get the response information, such as response status, response header, and response body content. The second parameter of the get method is a callback function. We obtain the response information asynchronously. Then, in the callback function, the res object listens to data. The second parameter in the on method is another callback, and you get d (the response information you requested), it is very likely that callbacks will be introduced again when operating it, layer by layer, and finally faint. . . Regarding asynchronous programming, some students who are used to writing code in a synchronous way are very confused. Of course, some excellent synchronization libraries have been provided at home and abroad, such as Lao Zhao's Wind.js... It seems It's a bit far-fetched. In fact, what we ultimately want to get when calling get is the response information, and we don't care about the monitoring process such as res.on because it is too lazy. I don’t want to res.on('data',func) every time, so the nodegrass I want to introduce today was born.
2. Nodegrass requests resources, like Jquery’s $.get(url,func)
The simplest example:
1 2 3 4 5 6 7 8 |
|
At first glance, there is no difference from the official original get, it is indeed almost the same=. =! It just lacks a layer of event monitoring callbacks of res.on('data',func). Believe it or not, I seem to feel much more comfortable anyway. The second parameter is also a callback function, in which the parameter data is the response body content, status is the response status, and headers are the response headers. After getting the response content, we can extract any information we are interested in from the obtained resources. Of course, in this example, it is just a simple printed console. The third parameter is the character encoding. Currently, Node.js does not support gbk. Nodegrass internally refers to iconv-lite for processing. Therefore, if the webpage encoding you request is gbk, such as Baidu. Just add this parameter.
So what about https requests? If it is an official api, you have to introduce the https module, but the request get method is similar to http, so nodegrass integrates them by the way. Look at the example:
1 2 3 4 5 6 7 8 |
|
nodegrass will automatically identify whether it is http or https based on the url. Of course, your url must have it. You cannot just write www.baidu.com/ but need http. ://www.baidu.com/.
For post requests, nodegrass provides the post method. See the example:
1 2 3 4 5 6 7 8 9 |
|
The above is part of Sina Weibo Auth2.0 request accessToken, among which Use nodegrass's post to request access_token's api.
Compared with the get method, the post method provides more headers request header parameters and options--post data, which are all types of object literals:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
3. Use nodegrass as a proxy server? ……**
Look at the example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
It’s that simple. Of course, the proxy server is much more complicated. This is not Yes, but at least when you access the local port 8088, do you see the blog page?
The open source address of nodegrass: https://github.com/scottkiss/nodegrass
Related recommendations:
Node.js development information crawler process Code Sharing
NodeJS Encyclopedia Crawler Instance Tutorial
Related Problems Solving Crawler Problems
The above is the detailed content of Detailed explanation of the web request module of Node.js crawler. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics










![WLAN expansion module has stopped [fix]](https://img.php.cn/upload/article/000/465/014/170832352052603.gif?x-oss-process=image/resize,m_fill,h_207,w_330)
If there is a problem with the WLAN expansion module on your Windows computer, it may cause you to be disconnected from the Internet. This situation is often frustrating, but fortunately, this article provides some simple suggestions that can help you solve this problem and get your wireless connection working properly again. Fix WLAN Extensibility Module Has Stopped If the WLAN Extensibility Module has stopped working on your Windows computer, follow these suggestions to fix it: Run the Network and Internet Troubleshooter to disable and re-enable wireless network connections Restart the WLAN Autoconfiguration Service Modify Power Options Modify Advanced Power Settings Reinstall Network Adapter Driver Run Some Network Commands Now, let’s look at it in detail

This article details methods to resolve event ID10000, which indicates that the Wireless LAN expansion module cannot start. This error may appear in the event log of Windows 11/10 PC. The WLAN extensibility module is a component of Windows that allows independent hardware vendors (IHVs) and independent software vendors (ISVs) to provide users with customized wireless network features and functionality. It extends the capabilities of native Windows network components by adding Windows default functionality. The WLAN extensibility module is started as part of initialization when the operating system loads network components. If the Wireless LAN Expansion Module encounters a problem and cannot start, you may see an error message in the event viewer log.

How to use WebSocket and JavaScript to implement an online speech recognition system Introduction: With the continuous development of technology, speech recognition technology has become an important part of the field of artificial intelligence. The online speech recognition system based on WebSocket and JavaScript has the characteristics of low latency, real-time and cross-platform, and has become a widely used solution. This article will introduce how to use WebSocket and JavaScript to implement an online speech recognition system.

WebSocket and JavaScript: Key technologies for realizing real-time monitoring systems Introduction: With the rapid development of Internet technology, real-time monitoring systems have been widely used in various fields. One of the key technologies to achieve real-time monitoring is the combination of WebSocket and JavaScript. This article will introduce the application of WebSocket and JavaScript in real-time monitoring systems, give code examples, and explain their implementation principles in detail. 1. WebSocket technology

Introduction to how to use JavaScript and WebSocket to implement a real-time online ordering system: With the popularity of the Internet and the advancement of technology, more and more restaurants have begun to provide online ordering services. In order to implement a real-time online ordering system, we can use JavaScript and WebSocket technology. WebSocket is a full-duplex communication protocol based on the TCP protocol, which can realize real-time two-way communication between the client and the server. In the real-time online ordering system, when the user selects dishes and places an order

How to use WebSocket and JavaScript to implement an online reservation system. In today's digital era, more and more businesses and services need to provide online reservation functions. It is crucial to implement an efficient and real-time online reservation system. This article will introduce how to use WebSocket and JavaScript to implement an online reservation system, and provide specific code examples. 1. What is WebSocket? WebSocket is a full-duplex method on a single TCP connection.

JavaScript and WebSocket: Building an efficient real-time weather forecast system Introduction: Today, the accuracy of weather forecasts is of great significance to daily life and decision-making. As technology develops, we can provide more accurate and reliable weather forecasts by obtaining weather data in real time. In this article, we will learn how to use JavaScript and WebSocket technology to build an efficient real-time weather forecast system. This article will demonstrate the implementation process through specific code examples. We

JavaScript tutorial: How to get HTTP status code, specific code examples are required. Preface: In web development, data interaction with the server is often involved. When communicating with the server, we often need to obtain the returned HTTP status code to determine whether the operation is successful, and perform corresponding processing based on different status codes. This article will teach you how to use JavaScript to obtain HTTP status codes and provide some practical code examples. Using XMLHttpRequest
