Table of Contents
Comparison of the performance of using the file_get_content series of functions and using the curl series of functions to collect images. The curl function
Home Backend Development PHP Tutorial Performance comparison of using file_get_content series functions and using curl series functions to collect images, curl function_PHP tutorial

Performance comparison of using file_get_content series functions and using curl series functions to collect images, curl function_PHP tutorial

Jul 13, 2016 am 10:01 AM
c content curl file get use function and picture Compared performance of series collection

Comparison of the performance of using the file_get_content series of functions and using the curl series of functions to collect images. The curl function

Since the car content in the background of a car website of the company mainly comes from cars For Home, editor colleagues have to manually add cars to Autohome every day, which is really a pain in the ass. Ever since, in order to change this situation, as a development coder, my task has come. . . That is to prepare a function. As long as you paste the corresponding car home URL, the data can be automatically filled into the form in our backend. At present, the basic filling has been implemented, but the corresponding car photo album is still not collected. Come in.

I have done the function of collecting pictures before, but most of the cars in Autohome have a lot of pictures. At the beginning, I planned to use the previous method of collecting pictures, that is, use file_get_content to get the corresponding URL. content, then match the address of the image, then use file_get_content to obtain the content of these image URLs, and then load it locally. The code is as follows:

<?<span>php
</span><span>header</span>('Content-type:text/html;charset=utf-8'<span>);
</span><span>set_time_limit</span>(0<span>);

</span><span>class</span><span> runtime  
{  
    </span><span>var</span> <span>$StartTime</span> = 0<span>;  
    </span><span>var</span> <span>$StopTime</span> = 0<span>;  
   
    </span><span>function</span><span> get_microtime()  
    {  
        </span><span>list</span>(<span>$usec</span>, <span>$sec</span>) = <span>explode</span>(' ', <span>microtime</span><span>());  
        </span><span>return</span> ((<span>float</span>)<span>$usec</span> + (<span>float</span>)<span>$sec</span><span>);  
    }  
   
    </span><span>function</span><span> start()  
    {  
        </span><span>$this</span>->StartTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> stop()  
    {  
        </span><span>$this</span>->StopTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> spent()  
    {  
        </span><span>return</span> <span>round</span>((<span>$this</span>->StopTime - <span>$this</span>->StartTime) * 1000, 1<span>);  
    }  
   
}  

</span><span>$runtime</span>= <span>new</span><span> runtime();  
</span><span>$runtime</span>-><span>start();  

</span><span>$url</span> = 'http://car.autohome.com.cn/pic/series-s15306/289.html#pvareaid=102177'<span>;
</span><span>$rs</span> = <span>file_get_contents</span>(<span>$url</span><span>);
</span><span>//</span><span> echo $rs;exit;</span>
<span>preg_match_all</span>('/(\/pic\/series-s15306\/289-\d+\.html)/', <span>$rs</span>, <span>$urlArr</span><span>);

</span><span>$avalie</span> = <span>array_unique</span>(<span>$urlArr</span>[0<span>]);
</span><span>$count</span> = <span>array</span><span>();
</span><span>foreach</span> (<span>$avalie</span> <span>as</span> <span>$key</span> => <span>$ul</span><span>) {
   </span><span>$pattern</span> = '/<img src="(http:\/\/car1\.autoimg\.cn\/upload\/\d+\/\d+\/\d+\/.*?\.jpg)"/'<span>;
   </span><span>preg_match_all</span>(<span>$pattern</span>, <span>file_get_contents</span>('http://car.autohome.com.cn'.<span>$ul</span>), <span>$imgSrc</span><span>);
   </span><span>$count</span> = <span>array_merge</span>(<span>$count</span>, <span>$imgSrc</span>[1<span>]);
}


</span><span>foreach</span>(<span>$count</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) {
  </span><span>$data</span>[<span>$k</span>] = <span>file_get_contents</span>(<span>$v</span><span>);
}

</span><span>foreach</span>(<span>$data</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) {
  </span><span>file_put_contents</span>('./pic2/'.<span>time</span>().'_'.<span>rand</span>(1, 10000).'.jpg', <span>$v</span><span>);
}

</span><span>$runtime</span>-><span>stop();  
</span><span>echo</span> "页面执行时间: ".<span>$runtime</span>->spent()." 毫秒"; 
Copy after login

It turns out that this method is better with fewer pictures, but it is quite laggy if there are too many pictures. . It is also difficult to run local tests, let alone go online when the time comes. After Baidu, I used the curl method to download images. After testing, it did improve, but it still felt a bit slow. It would be great if PHP had multiple threads. . .

After some tossing and looking for information, I found that the curl library of PHP can actually simulate multi-threading, that is, using the curl_multi_* series of functions. After rewriting, the code became like this:

 

<?<span>php
</span><span>header</span>('Content-type:text/html;charset=utf-8'<span>);
</span><span>set_time_limit</span>(0<span>);

</span><span>class</span><span> runtime  
{  
    </span><span>var</span> <span>$StartTime</span> = 0<span>;  
    </span><span>var</span> <span>$StopTime</span> = 0<span>;  
   
    </span><span>function</span><span> get_microtime()  
    {  
        </span><span>list</span>(<span>$usec</span>, <span>$sec</span>) = <span>explode</span>(' ', <span>microtime</span><span>());  
        </span><span>return</span> ((<span>float</span>)<span>$usec</span> + (<span>float</span>)<span>$sec</span><span>);  
    }  
   
    </span><span>function</span><span> start()  
    {  
        </span><span>$this</span>->StartTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> stop()  
    {  
        </span><span>$this</span>->StopTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> spent()  
    {  
        </span><span>return</span> <span>round</span>((<span>$this</span>->StopTime - <span>$this</span>->StartTime) * 1000, 1<span>);  
    }  
   
}  

</span><span>$runtime</span>= <span>new</span><span> runtime();  
</span><span>$runtime</span>-><span>start();  


</span><span>$url</span> = 'http://car.autohome.com.cn/pic/series-s15306/289.html#pvareaid=102177'<span>;
</span><span>$rs</span> = <span>file_get_contents</span>(<span>$url</span><span>);
</span><span>preg_match_all</span>('/(\/pic\/series-s15306\/289-\d+\.html)/', <span>$rs</span>, <span>$urlArr</span><span>);

</span><span>$avalie</span> = <span>array_unique</span>(<span>$urlArr</span>[0<span>]);
</span><span>$count</span> = <span>array</span><span>();
</span><span>foreach</span> (<span>$avalie</span> <span>as</span> <span>$key</span> => <span>$ul</span><span>) {
   </span><span>$pattern</span> = '/<img src="(http:\/\/car1\.autoimg\.cn\/upload\/\d+\/\d+\/\d+\/.*?\.jpg)"/'<span>;
   </span><span>preg_match_all</span>(<span>$pattern</span>, <span>file_get_contents</span>('http://car.autohome.com.cn'.<span>$ul</span>), <span>$imgSrc</span><span>);
   </span><span>$count</span> = <span>array_merge</span>(<span>$count</span>, <span>$imgSrc</span>[1<span>]);
}

</span><span>$handle</span> =<span> curl_multi_init();

</span><span>foreach</span>(<span>$count</span> <span>as</span> <span>$k</span> => <span>$v</span><span>) {
  </span><span>$curl</span>[<span>$k</span>] = curl_init(<span>$v</span><span>);
  curl_setopt(</span><span>$curl</span>[<span>$k</span>], CURLOPT_RETURNTRANSFER, 1<span>);
  curl_setopt(</span><span>$curl</span>[<span>$k</span>], CURLOPT_HEADER, 0<span>);
  curl_setopt(</span><span>$curl</span>[<span>$k</span>], CURLOPT_TIMEOUT, 30<span>);
  curl_multi_add_handle (</span><span>$handle</span>, <span>$curl</span>[<span>$k</span><span>]);
}

</span><span>$active</span> = <span>null</span><span>;

</span><span>do</span><span> {
    </span><span>$mrc</span> = curl_multi_exec(<span>$handle</span>, <span>$active</span><span>);
} </span><span>while</span> (<span>$mrc</span> ==<span> CURLM_CALL_MULTI_PERFORM);

</span><span>while</span> (<span>$active</span> && <span>$mrc</span> ==<span> CURLM_OK) {
    // 这句在php5.3以后的版本很关键,因为没有这句,可能curl_multi_select可能会永远返回-1,这样就永远死在循环里了
    </span><span>while</span> (curl_multi_exec(<span>$handle</span>, <span>$active</span>) ===<span> CURLM_CALL_MULTI_PERFORM);

    </span><span>if</span> (curl_multi_select(<span>$handle</span>) != -1<span>) {
        </span><span>do</span><span> {
            </span><span>$mrc</span> = curl_multi_exec(<span>$handle</span>, <span>$active</span><span>);
        } </span><span>while</span> (<span>$mrc</span> ==<span> CURLM_CALL_MULTI_PERFORM);
    }
}

</span><span>foreach</span> (<span>$curl</span> <span>as</span> <span>$k</span> => <span>$v</span><span>) {
    </span><span>if</span> (curl_error(<span>$curl</span>[<span>$k</span>]) == ""<span>) {
        </span><span>$data</span>[<span>$k</span>] = curl_multi_getcontent(<span>$curl</span>[<span>$k</span><span>]);
    }
    curl_multi_remove_handle(</span><span>$handle</span>, <span>$curl</span>[<span>$k</span><span>]);
    curl_close(</span><span>$curl</span>[<span>$k</span><span>]);
}

</span><span>foreach</span>(<span>$data</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) {
    </span><span>$file</span> = <span>time</span>().'_'.<span>rand</span>(1000, 9999).'.jpg'<span>;
    </span><span>file_put_contents</span>('./pic3/'.<span>$file</span>, <span>$v</span><span>); 
}

curl_multi_close(</span><span>$handle</span><span>);

</span><span>$runtime</span>-><span>stop();  
</span><span>echo</span> "页面执行时间: ".<span>$runtime</span>->spent()." 毫秒"; 
Copy after login

Well, multi-threaded collection is really refreshing. After a series of tests and comparisons, out of 5 tests, curl multi-threading was faster than file_get_content 4 times, and the time was still 3 to 5 times that of file_get_content. To sum up, this method will be used as much as possible in future collections to improve efficiency.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/971771.htmlTechArticleComparison of the performance of using file_get_content series functions and using curl series functions to collect pictures. The curl function is used by a company's automotive website. The car content in the background is mainly from cars...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Xiaomi 15 series full codenames revealed: Dada, Haotian, Xuanyuan Xiaomi 15 series full codenames revealed: Dada, Haotian, Xuanyuan Aug 22, 2024 pm 06:47 PM

The Xiaomi Mi 15 series is expected to be officially released in October, and its full series codenames have been exposed in the foreign media MiCode code base. Among them, the flagship Xiaomi Mi 15 Ultra is codenamed "Xuanyuan" (meaning "Xuanyuan"). This name comes from the Yellow Emperor in Chinese mythology, which symbolizes nobility. Xiaomi 15 is codenamed "Dada", while Xiaomi 15Pro is named "Haotian" (meaning "Haotian"). The internal code name of Xiaomi Mi 15S Pro is "dijun", which alludes to Emperor Jun, the creator god of "The Classic of Mountains and Seas". Xiaomi 15Ultra series covers

Performance comparison of different Java frameworks Performance comparison of different Java frameworks Jun 05, 2024 pm 07:14 PM

Performance comparison of different Java frameworks: REST API request processing: Vert.x is the best, with a request rate of 2 times SpringBoot and 3 times Dropwizard. Database query: SpringBoot's HibernateORM is better than Vert.x and Dropwizard's ORM. Caching operations: Vert.x's Hazelcast client is superior to SpringBoot and Dropwizard's caching mechanisms. Suitable framework: Choose according to application requirements. Vert.x is suitable for high-performance web services, SpringBoot is suitable for data-intensive applications, and Dropwizard is suitable for microservice architecture.

Complete collection of excel function formulas Complete collection of excel function formulas May 07, 2024 pm 12:04 PM

1. The SUM function is used to sum the numbers in a column or a group of cells, for example: =SUM(A1:J10). 2. The AVERAGE function is used to calculate the average of the numbers in a column or a group of cells, for example: =AVERAGE(A1:A10). 3. COUNT function, used to count the number of numbers or text in a column or a group of cells, for example: =COUNT(A1:A10) 4. IF function, used to make logical judgments based on specified conditions and return the corresponding result.

The best time to buy Huawei Mate 60 series, new AI elimination + image upgrade, and enjoy autumn promotions The best time to buy Huawei Mate 60 series, new AI elimination + image upgrade, and enjoy autumn promotions Aug 29, 2024 pm 03:33 PM

Since the Huawei Mate60 series went on sale last year, I personally have been using the Mate60Pro as my main phone. In nearly a year, Huawei Mate60Pro has undergone multiple OTA upgrades, and the overall experience has been significantly improved, giving people a feeling of being constantly new. For example, recently, the Huawei Mate60 series has once again received a major upgrade in imaging capabilities. The first is the new AI elimination function, which can intelligently eliminate passers-by and debris and automatically fill in the blank areas; secondly, the color accuracy and telephoto clarity of the main camera have been significantly upgraded. Considering that it is the back-to-school season, Huawei Mate60 series has also launched an autumn promotion: you can enjoy a discount of up to 800 yuan when purchasing the phone, and the starting price is as low as 4,999 yuan. Commonly used and often new products with great value

How to optimize the performance of multi-threaded programs in C++? How to optimize the performance of multi-threaded programs in C++? Jun 05, 2024 pm 02:04 PM

Effective techniques for optimizing C++ multi-threaded performance include limiting the number of threads to avoid resource contention. Use lightweight mutex locks to reduce contention. Optimize the scope of the lock and minimize the waiting time. Use lock-free data structures to improve concurrency. Avoid busy waiting and notify threads of resource availability through events.

What is Bitget Launchpool? How to use Bitget Launchpool? What is Bitget Launchpool? How to use Bitget Launchpool? Jun 07, 2024 pm 12:06 PM

BitgetLaunchpool is a dynamic platform designed for all cryptocurrency enthusiasts. BitgetLaunchpool stands out with its unique offering. Here, you can stake your tokens to unlock more rewards, including airdrops, high returns, and a generous prize pool exclusive to early participants. What is BitgetLaunchpool? BitgetLaunchpool is a cryptocurrency platform where tokens can be staked and earned with user-friendly terms and conditions. By investing BGB or other tokens in Launchpool, users have the opportunity to receive free airdrops, earnings and participate in generous bonus pools. The income from pledged assets is calculated within T+1 hours, and the rewards are based on

Things to note when Golang functions receive map parameters Things to note when Golang functions receive map parameters Jun 04, 2024 am 10:31 AM

When passing a map to a function in Go, a copy will be created by default, and modifications to the copy will not affect the original map. If you need to modify the original map, you can pass it through a pointer. Empty maps need to be handled with care, because they are technically nil pointers, and passing an empty map to a function that expects a non-empty map will cause an error.

Performance comparison of Java frameworks Performance comparison of Java frameworks Jun 04, 2024 pm 03:56 PM

According to benchmarks, for small, high-performance applications, Quarkus (fast startup, low memory) or Micronaut (TechEmpower excellent) are ideal choices. SpringBoot is suitable for large, full-stack applications, but has slightly slower startup times and memory usage.

See all articles