Home Web Front-end JS Tutorial How to control the number of concurrencies using async and enterproxy

How to control the number of concurrencies using async and enterproxy

Jun 14, 2018 pm 02:53 PM
async js concurrent

I believe concurrency is familiar to everyone. This article mainly introduces to you the relevant information about using async and enterproxy to control the number of concurrencies. The article introduces it in detail through sample code, which is of certain significance to everyone's study or work. The reference learning value, friends who need it, please study together below.

Let’s talk about concurrency and parallelism

Concurrency, in the operating system, means that several programs are started in a period of time Between running and completion, these programs are all running on the same processor, but only one program is running on the processor at any point in time.

Concurrency is something we often mention. Whether it is a web server or app, concurrency is everywhere. In the operating system, it refers to several programs in a period of time between starting and running, and these programs They all run on the same processor, and only one program is running on the processor at any point in time. Many websites have limits on the number of concurrent connections, so when requests are sent too quickly, the return value will be empty or an error will be reported. What's more, some websites may block your IP because you send too many concurrent connections and think you are making malicious requests.

Compared with concurrency, parallelism may be a lot unfamiliar. Parallelism refers to a group of programs executing at an independent and asynchronous speed, which is not equal to overlap in time (occurring at the same moment). Multiple CPU cores can be implemented by increasing the number of CPU cores. Programs (tasks) are carried out simultaneously. That's right, parallel tasks can be performed simultaneously

Use enterproxy to control the number of concurrencies

enterproxy is a tool that Pu Lingda made a major contribution to , bringing about a change in the thinking of event-based programming, using the event mechanism to decouple complex business logic, solving the criticism of the coupling of callback functions, turning serial waiting into parallel waiting, and improving execution efficiency in multi-asynchronous collaboration scenarios

How do we use enterproxy to control the number of concurrencies? Usually if we don't use enterproxy and homemade counters, if we grab three sources:

This deeply nested, serial way

 var render = function (template, data) {
 _.template(template, data);
 };
$.get("template", function (template) {
 // something
 $.get("data", function (data) {
 // something
 $.get("l10n", function (l10n) {
 // something
 render(template, data, l10n);
 });
 });
});
Copy after login

Remove this deep nesting in the past Method, our conventional way of writing is to maintain a counter by ourselves

(function(){
 var count = 0;
 var result = {};
 
 $.get('template',function(data){
 result.data1 = data;
 count++;
 handle();
 })
 $.get('data',function(data){
 result.data2 = data;
 count++;
 handle();
 })
 $.get('l10n',function(data){
 result.data3 = data;
 count++;
 handle();
 })

 function handle(){
 if(count === 3){
  var html = fuck(result.data1,result.data2,result.data3);
  render(html);
 }
 }
})();
Copy after login

Here, enterproxy can play the role of this counter. It helps you manage whether these asynchronous operations are completed. After completion, it will automatically call you to provide processing function, and pass the captured data as parameters

var ep = new enterproxy();
ep.all('data_event1','data_event2','data_event3',function(data1,data2,data3){
 var html = fuck(data1,data2,data3);
 render(html);
})

$.get('http:example1',function(data){
 ep.emit('data_event1',data);
})

$.get('http:example2',function(data){
 ep.emit('data_event2',data);
})

$.get('http:example3',function(data){
 ep.emit('data_event3',data);
})
Copy after login

enterproxy also provides APIs required for many other scenarios. You can learn this API by yourself enterproxy

Use async to control the number of concurrency

If we have 40 requests to make, many websites may think you are making a malicious request because you send too many concurrent connections. Block your IP.
So we always need to control the number of concurrency, and then slowly crawl these 40 links.

Use mapLimit in async to control the number of concurrency at one time to 5, and only capture 5 links at a time.

 async.mapLimit(arr, 5, function (url, callback) {
 // something
 }, function (error, result) {
 console.log("result: ")
 console.log(result);
 })
Copy after login

We should first know what concurrency is, why we need to limit the number of concurrencies, and what solutions are available. Then you can go to the documentation to see how to use the API. The async documentation is a great way to learn these syntaxes.

Simulate a set of data, the data returned here is false, and the return delay is random.

var concurreyCount = 0;
var fetchUrl = function(url,callback){
 // delay 的值在 2000 以内,是个随机的整数 模拟延时
 var delay = parseInt((Math.random()* 10000000) % 2000,10);
 concurreyCount++;
 console.log('现在并发数是 ' , concurreyCount , ' 正在抓取的是' , url , ' 耗时' + delay + '毫秒');
 setTimeout(function(){
 concurreyCount--;
 callback(null,url + ' html content');
 },delay);
}

var urls = [];
for(var i = 0;i<30;i++){
 urls.push(&#39;http://datasource_&#39; + i)
}
Copy after login

Then we use async.mapLimit to crawl concurrently and obtain the results.

async.mapLimit(urls,5,function(url,callback){
 fetchUrl(url,callbcak);
},function(err,result){
 console.log(&#39;result: &#39;);
 console.log(result);
})
Copy after login

The simulation is excerpted from alsotang

The following results are obtained after running the output

We found that the number of concurrency starts from 1, but increases to At 5 o'clock, it is no longer increasing. When there are tasks, continue to crawl, and the number of concurrent connections is always controlled at 5.

Complete the node simple crawler system

Because the concurrency controlled by eventproxy is used in the tutorial example of "node teaching is not guaranteed" by senior alsotang Quantity, let’s complete a simple node crawler that uses async to control the number of concurrencies.

The crawling target is the homepage of this site (manual face protection)

The first step, first we need to use the following modules:

  • url: used for url parsing, here url.resolve() is used to generate a legal domain name

  • async: a practical module that provides powerful functions and asynchronous JavaScript work

  • cheerio: A fast, flexible, jQuery core implementation specially customized for the server

  • superagent: A very convenient client request in nodejs Agent module

Install the dependent module through npm

The second step is to introduce the dependent module through require and determine the crawling object URL:

var url = require("url");
var async = require("async");
var cheerio = require("cheerio");
var superagent = require("superagent");
var baseUrl = &#39;http://www.chenqaq.com&#39;;
Copy after login

Step 3: Use superagent to request the target URL, and use cheerio to process baseUrl to get the target content URL, and save it in the array arr

superagent.get(baseUrl)
 .end(function (err, res) {
 if (err) {
  return console.error(err);
 }
 var arr = [];
 var $ = cheerio.load(res.text);
 // 下面和jQuery操作是一样一样的..
 $(".post-list .post-title-link").each(function (idx, element) {
  $element = $(element);
  var _url = url.resolve(baseUrl, $element.attr("href"));
  arr.push(_url);
 });
 // 验证得到的所有文章链接集合
 output(arr);
 // 第四步:接下来遍历arr,解析每一个页面需要的信息
})
Copy after login

We need a function to verify the captured url object , it’s very simple. We only need a function to traverse arr and print it out:

function output(arr){
 for(var i = 0;i<arr.length;i++){
  console.log(arr[i]);
 }
}
Copy after login

第四步:我们需要遍历得到的URL对象,解析每一个页面需要的信息。

这里就需要用到async控制并发数量,如果你上一步获取了一个庞大的arr数组,有多个url需要请求,如果同时发出多个请求,一些网站就可能会把你的行为当做恶意请求而封掉你的ip

async.mapLimit(arr,3,function(url,callback){
 superagent.get(url)
  .end(function(err,mes){
   if(err){
    console.error(err);
    console.log(&#39;message info &#39; + JSON.stringify(mes));
   }
   console.log(&#39;「fetch」&#39; + url + &#39; successful!&#39;);
   var $ = cheerio.load(mes.text);
   var jsonData = {
    title:$(&#39;.post-card-title&#39;).text().trim(),
    href: url,
   };
   callback(null,jsonData);
  },function(error,results){
   console.log(&#39;results &#39;);
   console.log(results);
  })
 })
Copy after login

得到上一步保存url地址的数组arr,限制最大并发数量为3,然后用一个回调函数处理 「该回调函数比较特殊,在iteratee方法中一定要调用该回调函数,有三种方式」

  • callback(null) 调用成功

  • callback(null,data) 调用成功,并且返回数据data追加到results

  • callback(data) 调用失败,不会再继续循环,直接到最后的callback

好了,到这里我们的node简易的小爬虫就完成了,来看看效果吧

嗨呀,首页数据好少,但是成功了呢。

参考资料

Node.js 包教不包会 - alsotang

enterproxy

async

async Documentation

上面是我整理给大家的,希望今后会对大家有帮助。

相关文章:

在js中如何实现登录需要滑动验证

在js中如何实现判断文件类型大小

在vue中如何使用cdn优化

The above is the detailed content of How to control the number of concurrencies using async and enterproxy. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Recommended: Excellent JS open source face detection and recognition project Recommended: Excellent JS open source face detection and recognition project Apr 03, 2024 am 11:55 AM

Face detection and recognition technology is already a relatively mature and widely used technology. Currently, the most widely used Internet application language is JS. Implementing face detection and recognition on the Web front-end has advantages and disadvantages compared to back-end face recognition. Advantages include reducing network interaction and real-time recognition, which greatly shortens user waiting time and improves user experience; disadvantages include: being limited by model size, the accuracy is also limited. How to use js to implement face detection on the web? In order to implement face recognition on the Web, you need to be familiar with related programming languages ​​and technologies, such as JavaScript, HTML, CSS, WebRTC, etc. At the same time, you also need to master relevant computer vision and artificial intelligence technologies. It is worth noting that due to the design of the Web side

How can concurrency and multithreading of Java functions improve performance? How can concurrency and multithreading of Java functions improve performance? Apr 26, 2024 pm 04:15 PM

Concurrency and multithreading techniques using Java functions can improve application performance, including the following steps: Understand concurrency and multithreading concepts. Leverage Java's concurrency and multi-threading libraries such as ExecutorService and Callable. Practice cases such as multi-threaded matrix multiplication to greatly shorten execution time. Enjoy the advantages of increased application response speed and optimized processing efficiency brought by concurrency and multi-threading.

Application of concurrency and coroutines in Golang API design Application of concurrency and coroutines in Golang API design May 07, 2024 pm 06:51 PM

Concurrency and coroutines are used in GoAPI design for: High-performance processing: Processing multiple requests simultaneously to improve performance. Asynchronous processing: Use coroutines to process tasks (such as sending emails) asynchronously, releasing the main thread. Stream processing: Use coroutines to efficiently process data streams (such as database reads).

How does Java database connection handle transactions and concurrency? How does Java database connection handle transactions and concurrency? Apr 16, 2024 am 11:42 AM

Transactions ensure database data integrity, including atomicity, consistency, isolation, and durability. JDBC uses the Connection interface to provide transaction control (setAutoCommit, commit, rollback). Concurrency control mechanisms coordinate concurrent operations, using locks or optimistic/pessimistic concurrency control to achieve transaction isolation to prevent data inconsistencies.

In-depth understanding of the functions and features of Go language In-depth understanding of the functions and features of Go language Mar 21, 2024 pm 05:42 PM

Functions and features of Go language Go language, also known as Golang, is an open source programming language developed by Google. It was originally designed to improve programming efficiency and maintainability. Since its birth, Go language has shown its unique charm in the field of programming and has received widespread attention and recognition. This article will delve into the functions and features of the Go language and demonstrate its power through specific code examples. Native concurrency support The Go language inherently supports concurrent programming, which is implemented through the goroutine and channel mechanisms.

A guide to unit testing Go concurrent functions A guide to unit testing Go concurrent functions May 03, 2024 am 10:54 AM

Unit testing concurrent functions is critical as this helps ensure their correct behavior in a concurrent environment. Fundamental principles such as mutual exclusion, synchronization, and isolation must be considered when testing concurrent functions. Concurrent functions can be unit tested by simulating, testing race conditions, and verifying results.

How to use atomic classes in Java function concurrency and multi-threading? How to use atomic classes in Java function concurrency and multi-threading? Apr 28, 2024 pm 04:12 PM

Atomic classes are thread-safe classes in Java that provide uninterruptible operations and are crucial for ensuring data integrity in concurrent environments. Java provides the following atomic classes: AtomicIntegerAtomicLongAtomicReferenceAtomicBoolean These classes provide methods for getting, setting, and comparing values ​​to ensure that the operation is atomic and will not be interrupted by threads. Atomic classes are useful when working with shared data and preventing data corruption, such as maintaining concurrent access to a shared counter.

Golang process scheduling: Optimizing concurrent execution efficiency Golang process scheduling: Optimizing concurrent execution efficiency Apr 03, 2024 pm 03:03 PM

Go process scheduling uses a cooperative algorithm. Optimization methods include: using lightweight coroutines as much as possible to reasonably allocate coroutines to avoid blocking operations and use locks and synchronization primitives.

See all articles