


In the Go crawler framework Colly, how does the thread count setting of Queue and request delay affect the concurrent processing of requests?
The number of Queue threads and request delay of Go language crawler framework Colly
Efficient concurrent request processing is crucial when using the Go crawler framework Colly. This article will dig into how thread count settings and request delays in queue
in Colly affect concurrent processing and answer a common question.
Problem: Interaction between number of threads and request delay
Suppose we set queue
's number of threads to 2:
q, _ := queue.New(2, storage)
And added 3 requests. Meanwhile, colly.Limit()
is used to set the delay of each request to 5 seconds. It is expected that two requests are issued almost simultaneously and respond after 5 seconds, and the third request is delayed by another 5 seconds. However, the actual result is:
- Two requests are created.
- After 5 seconds, the first request responds and a third request is created.
- After 5 seconds, the second request responds.
- After 5 seconds, the third request responds.
This is not processed in parallel. Why does the number of threads of queue
seem to fail? Does colly.Limit()
affect the concurrency of queue
? Is onrequest
callback function just creating a request, not actually making a request?
Analysis: Independence between number of threads and request delay
Colly's queue
manages the number of concurrent requests, while colly.Limit()
sets the delay for each request. The two are independent mechanisms.
The number of threads of queue
limits the number of requests processed simultaneously. colly.Limit()
applies a delay before each request is issued.
In the above case:
-
queue
creates two requests, butcolly.Limit()
makes them both wait for 5 seconds. - The first request is issued after the delay is over. After the response,
queue
releases a thread and creates a third request. - The second request is also sent and responded after waiting for 5 seconds.
- The third request is also sent and responded after waiting for 5 seconds.
Therefore, the request delay masks the concurrency of queue
.
onrequest
callback and request issuance time
onrequest
callback function is fired when the request is added to queue
, not when the request is actually issued. It is used to perform some preprocessing operations before the request is issued.
Conclusion: Coordinate the number of threads and request delays
The delay of colly.Limit()
will affect the concurrency effect of the number of queue
threads. To achieve true concurrency, careful coordination of thread count and request delay settings is required. If high concurrency is required, the delay set by colly.Limit()
should be minimized or removed, or a finer concurrency control mechanism should be considered. If you need to control the crawl speed, it is recommended to use a finer granular control method instead of relying on colly.Limit()
.
The above is the detailed content of In the Go crawler framework Colly, how does the thread count setting of Queue and request delay affect the concurrent processing of requests?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Do you want to know how to display child categories on the parent category archive page? When you customize a classification archive page, you may need to do this to make it more useful to your visitors. In this article, we will show you how to easily display child categories on the parent category archive page. Why do subcategories appear on parent category archive page? By displaying all child categories on the parent category archive page, you can make them less generic and more useful to visitors. For example, if you run a WordPress blog about books and have a taxonomy called "Theme", you can add sub-taxonomy such as "novel", "non-fiction" so that your readers can

In IntelliJ...

Factors of rising virtual currency prices include: 1. Increased market demand, 2. Decreased supply, 3. Stimulated positive news, 4. Optimistic market sentiment, 5. Macroeconomic environment; Decline factors include: 1. Decreased market demand, 2. Increased supply, 3. Strike of negative news, 4. Pessimistic market sentiment, 5. Macroeconomic environment.

Understand the randomness of circular dependencies in Spring project startup. When developing Spring project, you may encounter randomness caused by circular dependencies at project startup...

JDBC...

Why is the return value empty when using RedisTemplate for batch query? When using RedisTemplate for batch query operations, you may encounter the returned results...

Discussion on the reasons why JavaScript cannot obtain user computer hardware information In daily programming, many developers will be curious about why JavaScript cannot be directly obtained...

Analysis of the reason why Python script cannot be found when submitting a PyFlink job on YARN When you try to submit a PyFlink job through YARN, you may encounter...
