Sample code sharing of http crawler under node-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Sample code sharing of http crawler under node

小云云

Jan 13, 2018 am 09:10 AM

http node Example

This article mainly introduces the sample code of the http crawler based on node. The editor thinks it is quite good, so I will share it with you now and give it as a reference. Let’s follow the editor to take a look, I hope it can help everyone.

Every moment, whether you sleep or not, there will be massive data coming and going on the Internet, from customer service to server, and from server to server. The role completed by http's get and request is to obtain and submit data. Next, we start writing a simple small crawler to crawl the course interface of the chapter about node in the novice tutorial.

Crawl all the data on the home page of the Node.js tutorial

Create node-http.js, the code is as follows, there are detailed comments in the code, you can understand it by yourself Ha

var http=require(&#39;http&#39;);//获取http模块
var url=&#39;/nodejs/nodejs-tutorial.html&#39;;//定义node官网地址变量

http.get(url,function(res){
  var html=&#39;&#39;;

  // 这里将会触发data事件，不断触发不断跟新html直至完毕
  res.on(&#39;data&#39;,function(data){
    html +=data
  })

  // 当数据获取完成将会触发end事件，这里将会打印初node官网的html
  res.on(&#39;end&#39;,function(){
    console.log(html)
  })
}).on(&#39;error&#39;,function(){
  console.log(&#39;获取node官网相关数据出错&#39;)
})

Copy after login

In the terminal execution result, it was found that all the html of this page has been crawled down

G:\node\node-http> node node-http.js
<!Doctype html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta property="qc:admins" content="465267610762567726375" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Node.js 教程 | 菜鸟教程</title>
<link rel=&#39;dns-prefetch&#39; href=&#39;//s.w.org&#39; />
<link rel="canonical" href="http://www.php.cn/nodejs/nodejs-tutorial.html" />
<meta name="keywords" content="Node.js 教程,node,Node.js,nodejs">
<meta name="description" content="Node.js 教程  简单的说 Node.js 就是运行在服务端的 JavaScript。 Node.js 是一个基于Chrome JavaScript 运行时建立的一个平台
。 Node.js是一个事件驱动I/O服务端JavaScript环境，基于Google的V8引擎，V8引擎执行Javascript的速度非常快，性能非常好。  谁适合阅读本教程？ 如果你是一个前端程序员，你不懂得像PHP、Python或Ruby等动态编程语言，..">
<link rel="shortcut icon" href="//static.runoob.com/images/favicon.ico" rel="external nofollow" rel="external nofollow" mce_href="//static.runoob.com/images/favicon.ico" rel="external nofollow" rel="external nofollow" type="image/x-icon">
<link rel="stylesheet" href="/wp-content/themes/runoob/style.css?v=1.141" rel="external nofollow" type="text/css" media="all" />
<link rel="stylesheet" href="//cdn.bootcss.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="external nofollow" media="all" />
<!--[if gte IE 9]><!-->
。。。。。。。。。。
这里只展示部分不然你半天看不到头

Copy after login

Of course, crawl it HTML is of no use to us. Now we need to do some filtering. For example, in this node tutorial, I want to know what the course catalog is, so that I can choose the ones I am interested in to learn. Let’s go directly to the code:

But before that we need to download the cheerio module (cheerio is nodejs’ page crawling module, specially customized for the server, a fast, flexible and implemented jQuery core implementation. Suitable for all kinds of Web crawler program.) You can search for details on your own. The use of cheerio is very similar to the use of jquery, so you don’t have to worry about getting started.

PS G:\node\node-http> npm install cheerio

Copy after login

Create node-http-more.js, the code is as follows:

var http=require(&#39;http&#39;);//获取http模块
var cheerio=require(&#39;cheerio&#39;);//引入cheerio模块
var url=&#39;http://www.php.cn/nodejs/nodejs-tutorial.html&#39;;//定义node官网地址变量
// filer node chapter
function filerNodeChapter(html){
  // 将爬取得HTML装载起来
  var $=cheerio.load(html);
  // 拿到左侧边栏的每个目录
  var nodeChapter=$(&#39;#leftcolumn a&#39;);
  //这里我希望我能获取的到的最终数据格式这个样子的,如此我们能知道每个目录的地址及标题
  /**
   * [{id:,title:}]
   */
  var chapterData=[];
  nodeChapter.each(function(item){
    // 获取每项的地址及标题
    var id=$(this).attr(&#39;href&#39;);
    var title=$(this).text();
    chapterData.push({
      id:id,
      title:title
    })
  })

  return chapterData;

}

//获取每个数据
function getChapterData(nodeChapter){
  nodeChapter.forEach(function(item){
    console.log(&#39; 【 &#39;+item.id+&#39; 】&#39;+item.title+&#39;\n&#39;)
  });
}

http.get(url,function(res){
  var html=&#39;&#39;;

  // 这里将会触发data事件，不断触发不断跟新html直至完毕
  res.on(&#39;data&#39;,function(data){
    html +=data
  })

  // 当数据获取完成将会触发end事件，这里将会打印初node官网的html
  res.on(&#39;end&#39;,function(){
    //console.log(html)
    // 过滤出node.js的课程目录
    var nodeChapter= filerNodeChapter(html);

    //循环打印所获取的数据
    getChapterData(nodeChapter)
  })
}).on(&#39;error&#39;,function(){
  console.log(&#39;获取node官网相关数据出错&#39;)
})

Copy after login

Terminal execution results and printing Out of the course catalog

G:\node\node-http> node node-http-more.js
 【 /nodejs/nodejs-tutorial.html 】
Node.js 教程

 【 /nodejs/nodejs-install-setup.html 】
Node.js 安装配置

 【 /nodejs/nodejs-http-server.html 】
Node.js 创建第一个应用

 【 nodejs-npm.html 】 NPM 使用介绍

 【 nodejs-repl.html 】 Node.js REPL

 【 nodejs-callback.html 】 Node.js 回调函数

 【 nodejs-event-loop.html 】 Node.js 事件循环

 【 nodejs-event.html 】 Node.js EventEmitter

 【 nodejs-buffer.html 】 Node.js Buffer

 【 nodejs-stream.html 】 Node.js Stream

 【 /nodejs/nodejs-module-system.html 】
Node.js 模块系统
。。。。。。。。。。。
这里就不全部给出，你可以自己尝试着运行操作查看所有结果

Copy after login

Related recommendations:

Detailed explanation of the web request module of Node.js crawler

Node.js Development Information Crawler Process Code Sharing

NodeJS Encyclopedia Crawler Example Tutorial

The above is the detailed content of Sample code sharing of http crawler under node. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7737

Java Tutorial

1643

CakePHP Tutorial

1397

Laravel Tutorial

1290

PHP Tutorial

1233

Related knowledge

Understand common application scenarios of web page redirection and understand the HTTP 301 status code Feb 18, 2024 pm 08:41 PM

Understand the meaning of HTTP 301 status code: common application scenarios of web page redirection. With the rapid development of the Internet, people's requirements for web page interaction are becoming higher and higher. In the field of web design, web page redirection is a common and important technology, implemented through the HTTP 301 status code. This article will explore the meaning of HTTP 301 status code and common application scenarios in web page redirection. HTTP301 status code refers to permanent redirect (PermanentRedirect). When the server receives the client's

Oracle DECODE function detailed explanation and usage examples Mar 08, 2024 pm 03:51 PM

The DECODE function in Oracle is a conditional expression that is often used to return different results based on different conditions in query statements. This article will introduce the syntax, usage and sample code of the DECODE function in detail. 1. DECODE function syntax DECODE(expr,search1,result1[,search2,result2,...,default]) expr: the expression or field to be compared. search1,

Introduction to Python functions: Usage and examples of isinstance function Nov 04, 2023 pm 03:15 PM

Introduction to Python functions: Usage and examples of the isinstance function Python is a powerful programming language that provides many built-in functions to make programming more convenient and efficient. One of the very useful built-in functions is the isinstance() function. This article will introduce the usage and examples of the isinstance function and provide specific code examples. The isinstance() function is used to determine whether an object is an instance of a specified class or type. The syntax of this function is as follows

Go language indentation specifications and examples Mar 22, 2024 pm 09:33 PM

Indentation specifications and examples of Go language Go language is a programming language developed by Google. It is known for its concise and clear syntax, in which indentation specifications play a crucial role in the readability and beauty of the code. effect. This article will introduce the indentation specifications of the Go language and explain in detail through specific code examples. Indentation specifications In the Go language, tabs are used for indentation instead of spaces. Each level of indentation is one tab, usually set to a width of 4 spaces. Such specifications unify the coding style and enable teams to work together to compile

Pi Node Teaching: What is a Pi Node? How to install and set up Pi Node? Mar 05, 2025 pm 05:57 PM

Detailed explanation and installation guide for PiNetwork nodes This article will introduce the PiNetwork ecosystem in detail - Pi nodes, a key role in the PiNetwork ecosystem, and provide complete steps for installation and configuration. After the launch of the PiNetwork blockchain test network, Pi nodes have become an important part of many pioneers actively participating in the testing, preparing for the upcoming main network release. If you don’t know PiNetwork yet, please refer to what is Picoin? What is the price for listing? Pi usage, mining and security analysis. What is PiNetwork? The PiNetwork project started in 2019 and owns its exclusive cryptocurrency Pi Coin. The project aims to create a one that everyone can participate

HTTP 200 OK: Understand the meaning and purpose of a successful response Dec 26, 2023 am 10:25 AM

HTTP Status Code 200: Explore the Meaning and Purpose of Successful Responses HTTP status codes are numeric codes used to indicate the status of a server's response. Among them, status code 200 indicates that the request has been successfully processed by the server. This article will explore the specific meaning and use of HTTP status code 200. First, let us understand the classification of HTTP status codes. Status codes are divided into five categories, namely 1xx, 2xx, 3xx, 4xx and 5xx. Among them, 2xx indicates a successful response. And 200 is the most common status code in 2xx

http request 415 error solution Nov 14, 2023 am 10:49 AM

Solution: 1. Check the Content-Type in the request header; 2. Check the data format in the request body; 3. Use the appropriate encoding format; 4. Use the appropriate request method; 5. Check the server-side support.

Application and example analysis of PHP dot operator Mar 28, 2024 pm 12:06 PM

Application and example analysis of PHP dot operator In PHP, the dot operator (".") is an operator used to connect two strings. It is very commonly used and very flexible when concatenating strings. By using the dot operator, we can easily concatenate multiple strings to form a new string. The following will introduce the use of PHP dot operators through example analysis. 1. Basic usage First, let’s look at a basic usage example. Suppose there are two variables $str1 and $str2, which store two words respectively.

See all articles