


Nodejs practical experience: eventproxy module controls concurrency_node.js
Goal
Create a lesson4 project and write code in it.
The entry point of the code is app.js. When node app.js is called, it will output the titles, links and chapters of all topics on the CNode (https://cnodejs.org/) community homepage. A comment, in json format.
Output example:
[ { "title": "【公告】发招聘帖的同学留意一下这里", "href": "http://cnodejs.org/topic/541ed2d05e28155f24676a12", "comment1": "呵呵呵呵" }, { "title": "发布一款 Sublime Text 下的 JavaScript 语法高亮插件", "href": "http://cnodejs.org/topic/54207e2efffeb6de3d61f68f", "comment1": "沙发!" } ]
Challenge
Based on the above goal, output the author of comment1 and his points value in the cnode community.
Example:
[ { "title": "【公告】发招聘帖的同学留意一下这里", "href": "http://cnodejs.org/topic/541ed2d05e28155f24676a12", "comment1": "呵呵呵呵", "author1": "auser", "score1": 80 }, ... ]
Knowledge Points
Experience the beauty of Node.js’ callback hell
Learn to use eventproxy, a tool to control concurrency
Course content
In this chapter we come to the most awesome part of Node.js - asynchronous concurrency.
In the previous lesson, we introduced how to use superagent and cheerio to get the homepage content. It only needs to initiate an http get request. But this time, we need to retrieve the first comment of each topic, which requires us to initiate a request for the link of each topic and use cheerio to retrieve the first comment.
CNode currently has 40 topics per page, so we need to initiate 1 40 requests to achieve our goal in this lesson.
We initiated the latter 40 requests concurrently :), and we will not encounter multi-threading or locking. The concurrency model of Node.js is different from multi-threading, so abandon those concepts. To be more specific, I won't go into scientific issues such as why asynchronous is asynchronous and why Node.js can be single-threaded but concurrent. For students who are interested in this aspect, I highly recommend @puling's "Nine Lights and One Deep Node.js": http://book.douban.com/subject/25768396/.
Some friends who are more sophisticated may have heard of concepts such as promises and generators. But as for me, I can only talk about callbacks. The main reason is that I personally only like callbacks.
We need to use three libraries for this course: superagent cheerio eventproxy(https://github.com/JacksonTian/eventproxy )
You can do the scaffolding work yourself, and we will write the program together step by step.
First app.js should look like this
var eventproxy = require('eventproxy'); var superagent = require('superagent'); var cheerio = require('cheerio'); // url 模块是 Node.js 标准库里面的 // http://nodejs.org/api/url.html var url = require('url'); var cnodeUrl = 'https://cnodejs.org/'; superagent.get(cnodeUrl) .end(function (err, res) { if (err) { return console.error(err); } var topicUrls = []; var $ = cheerio.load(res.text); // 获取首页所有的链接 $('#topic_list .topic_title').each(function (idx, element) { var $element = $(element); // $element.attr('href') 本来的样子是 /topic/542acd7d5d28233425538b04 // 我们用 url.resolve 来自动推断出完整 url,变成 // https://cnodejs.org/topic/542acd7d5d28233425538b04 的形式 // 具体请看 http://nodejs.org/api/url.html#url_url_resolve_from_to 的示例 var href = url.resolve(cnodeUrl, $element.attr('href')); topicUrls.push(href); }); console.log(topicUrls); });
Run node app.js
The output is as shown below:
OK, now we have got the addresses of all urls. Next, we crawl all these addresses and we are done. Node.js is that simple.
Before crawling, we still need to introduce the eventproxy library.
Students who have written asynchronously in js should all know that if you want to obtain data from two or three addresses concurrently and asynchronously, and after obtaining the data, use these data together, the conventional way of writing is to maintain one yourself counter.
First define a var count = 0, and then count every time the crawl is successful. If you want to capture data from three sources, since you don’t know who will complete these asynchronous operations first, then every time the capture is successful, check count === 3. When the value is true, use another function to continue the operation.
Eventproxy plays the role of this counter. It helps you manage whether these asynchronous operations are completed. After completion, it will automatically call the processing function you provided and pass the captured data as parameters.
Assuming that we do not use eventproxy or counters, the way to capture three sources is as follows:
// Refer to jquery’s $.get method
$.get("http://data1_source", function (data1) { // something $.get("http://data2_source", function (data2) { // something $.get("http://data3_source", function (data3) { // something var html = fuck(data1, data2, data3); render(html); }); }); });
Everyone has written the above code. First get data1, then get data2, then get data3, then fuck them and output.
But everyone should have thought that in fact, the data from these three sources can be obtained in parallel. The acquisition of data2 does not depend on the completion of data1, and similarly, data3 does not depend on data2.
So we use a counter to write it, and it will be written like this:
(function () { var count = 0; var result = {}; $.get('http://data1_source', function (data) { result.data1 = data; count++; handle(); }); $.get('http://data2_source', function (data) { result.data2 = data; count++; handle(); }); $.get('http://data3_source', function (data) { result.data3 = data; count++; handle(); }); function handle() { if (count === 3) { var html = fuck(result.data1, result.data2, result.data3); render(html); } } })();
Even though it’s ugly, it’s not that ugly. The main thing is that the code I write looks good.
If we use eventproxy, it would be written like this:
var ep = new eventproxy(); ep.all('data1_event', 'data2_event', 'data3_event', function (data1, data2, data3) { var html = fuck(data1, data2, data3); render(html); }); $.get('http://data1_source', function (data) { ep.emit('data1_event', data); }); $.get('http://data2_source', function (data) { ep.emit('data2_event', data); }); $.get('http://data3_source', function (data) { ep.emit('data3_event', data); });
It looks much better, right? It’s just an advanced counter.
ep.all('data1_event', 'data2_event', 'data3_event', function (data1, data2, data3) {});
这一句,监听了三个事件,分别是 data1_event, data2_event, data3_event,每次当一个源的数据抓取完成时,就通过 ep.emit() 来告诉 ep 自己,某某事件已经完成了。
当三个事件未同时完成时,ep.emit() 调用之后不会做任何事;当三个事件都完成的时候,就会调用末尾的那个回调函数,来对它们进行统一处理。
eventproxy 提供了不少其他场景所需的 API,但最最常用的用法就是以上的这种,即:
先 var ep = new eventproxy(); 得到一个 eventproxy 实例。
告诉它你要监听哪些事件,并给它一个回调函数。ep.all('event1', 'event2', function (result1, result2) {})。
在适当的时候 ep.emit('event_name', eventData)。
eventproxy 这套处理异步并发的思路,我一直觉得就像是汇编里面的 goto 语句一样,程序逻辑在代码中随处跳跃。本来代码已经执行到 100 行了,突然 80 行的那个回调函数又开始工作了。如果你异步逻辑复杂点的话,80 行的这个函数完成之后,又激活了 60 行的另外一个函数。并发和嵌套的问题虽然解决了,但老祖宗们消灭了几十年的 goto 语句又回来了。
至于这套思想糟糕不糟糕,我个人倒是觉得还是不糟糕,用熟了看起来蛮清晰的。不过 js 这门渣渣语言本来就乱嘛,什么变量提升(http://www.cnblogs.com/damonlan/archive/2012/07/01/2553425.html )啊,没有 main 函数啊,变量作用域啊,数据类型常常简单得只有数字、字符串、哈希、数组啊,这一系列的问题,都不是事儿。
编程语言美丑啥的,咱心中有佛就好。
回到正题,之前我们已经得到了一个长度为 40 的 topicUrls 数组,里面包含了每条主题的链接。那么意味着,我们接下来要发出 40 个并发请求。我们需要用到 eventproxy 的 #after API。
大家自行学习一下这个 API 吧:https://github.com/JacksonTian/eventproxy#%E9%87%8D%E5%A4%8D%E5%BC%82%E6%AD%A5%E5%8D%8F%E4%BD%9C
我代码就直接贴了哈。
// 得到 topicUrls 之后 // 得到一个 eventproxy 的实例 var ep = new eventproxy(); // 命令 ep 重复监听 topicUrls.length 次(在这里也就是 40 次) `topic_html` 事件再行动 ep.after('topic_html', topicUrls.length, function (topics) { // topics 是个数组,包含了 40 次 ep.emit('topic_html', pair) 中的那 40 个 pair // 开始行动 topics = topics.map(function (topicPair) { // 接下来都是 jquery 的用法了 var topicUrl = topicPair[0]; var topicHtml = topicPair[1]; var $ = cheerio.load(topicHtml); return ({ title: $('.topic_full_title').text().trim(), href: topicUrl, comment1: $('.reply_content').eq(0).text().trim(), }); }); console.log('final:'); console.log(topics); }); topicUrls.forEach(function (topicUrl) { superagent.get(topicUrl) .end(function (err, res) { console.log('fetch ' + topicUrl + ' successful'); ep.emit('topic_html', [topicUrl, res.text]); }); });
输出长这样:
完整的代码请查看 lesson4 目录下的 app.js 文件
总结
今天介绍的 eventproxy 模块是控制并发用的,有时我们需要同时发送 N 个 http 请求,然后利用得到的数据进行后期的处理工作,如何方便地判断数据已经全部并发获取得到,就可以用到该模块了。而模块不仅可以在服务端使用,也可以应用在客户端

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

JavaScript is the core language of modern web development and is widely used for its diversity and flexibility. 1) Front-end development: build dynamic web pages and single-page applications through DOM operations and modern frameworks (such as React, Vue.js, Angular). 2) Server-side development: Node.js uses a non-blocking I/O model to handle high concurrency and real-time applications. 3) Mobile and desktop application development: cross-platform development is realized through ReactNative and Electron to improve development efficiency.

This article demonstrates frontend integration with a backend secured by Permit, building a functional EdTech SaaS application using Next.js. The frontend fetches user permissions to control UI visibility and ensures API requests adhere to role-base

I built a functional multi-tenant SaaS application (an EdTech app) with your everyday tech tool and you can do the same. First, what’s a multi-tenant SaaS application? Multi-tenant SaaS applications let you serve multiple customers from a sing

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.
