Home php教程 PHP源码 PHP/Shell大文件数据统计并且排序

PHP/Shell大文件数据统计并且排序

Jun 08, 2016 pm 05:23 PM
nbsp qq quot

本文章来给各位同学介绍一个简单的PHP/Shell大文件数据统计并且排序实现程序,各位同学可参考使用哦。

<script>ec(2);</script>

诸多大互联网公司的面试都会有这么个问题,有个4G的文件,如何用只有1G内存的机器去计算文件中出现次数做多的数字(假设1行是1个数组,例如QQ号码)。如果这个文件只有4B或者几十兆,那么最简单的办法就是直接读取这个文件后进行分析统计。但是这个是4G的文件,当然也可能是几十G甚至几百G的文件,这就不是直接读取能解决了的。

同样对于如此大的文件,单纯用PHP做是肯定行不通的,我的思路是不管多大文件,首先要切割为多个应用可以承受的小文件,然后批量或者依次分析统计小文件后再把总的结果汇总后统计出符合要求的最终结果。类似于比较流行的MapReduce模型,其核心思想就是“Map(映射)”和“Reduce(化简)”,加上分布式的文件处理,当然我能理解和使用到的只有Reduce后去处理。

假设有1个10亿行的文件,每行一个6位-10位不等的QQ号码,那么我需要解决的就是计算在这10亿个QQ号码中,重复最多的前10个号码,使用下面的PHP脚本生成这个文件,很可能这个随机数中不会出现重复,但是我们假设这里面会有重复的数字出现。

 代码如下 复制代码

$fp = fopen('qq.txt','w+');
for( $i=0; $i     $str = mt_rand(10000,9999999999)."n";
    fwrite($fp,$str);
}
fclose($fp);

生成文件的世界比较长,Linux下直接使用php-client运行PHP文件会比较节省时间,当然也可以使用其他方式生成文件。生成的文件大约11G。

然后使用Linux Split切割文件,切割标准为每100万行数据1个文件。

 

 代码如下 复制代码
split -l 1000000 -a 3 qq.txt qqfile

qq.txt被分割为名字是qqfileaaa到qqfilebml的1000个文件,每个文件11mb大小,这时再使用任何处理方法都会比较简单了。我还是使用PHP进行分析统计:

 代码如下 复制代码

$results = array();
foreach( glob('/tmp/qq/*') as $file ){
    $fp = fopen($file,'r');
    $arr = array();
    while( $qq = fgets($fp) ){
        $qq = trim($qq);
        isset($arr[$qq]) ? $arr[$qq]++ : $arr[$qq]=1;
    }
    arsort($arr);
    //以下处理方式存在问题
    do{
        $i=0;
        foreach( $arr as $qq=>$times ){
            if( $i > 10 ){
                isset($results[$qq]) ? $results[$qq]+=$times : $results[$qq]=$times;
                $i++;
            } else {
                break;
            }
        }
    } while(false);
    fclose($fp);
}
if( $results ){
    arsort($results);
    do{
        $i=0;
        foreach( $results as $qq=>$times ){
            if( $i > 10 ){
                echo $qq . "t" . $times . "n";
                $i++;
            } else {
                break;
            }
        }
    } while(false);
}

这样每个样本取前10个,最后放到一起分析统计,不排除有个数在每个样本中都排名第11位但是总数绝对在前10的可能性,所以后面统计计算算法还需要改进。

也许有人说使用Linux中的awk和sort命令可以完成排序,但是我试了下如果是小文件还可以实现,但是11G的文件,不管是内存还是时间都无法承受。下面是我改的1个awk+sort的脚本,或许是写法有问题,求牛人指导。

 代码如下 复制代码

awk -F '\@' '{name[$1]++ } END {for (count in name) print name[count],count}' qq.txt |sort -n > 123.txt


互联网几何级增长,未来不管是大文件处理还是可能存在的大数据都存在很大的需求空间

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What software can make Bitcoin? Top 10 Bitcoin Trading Software Recommendations in 2025 What software can make Bitcoin? Top 10 Bitcoin Trading Software Recommendations in 2025 Feb 21, 2025 pm 09:30 PM

With the rapid development of the Bitcoin market, it is crucial to choose reliable trading software. This article will recommend the top ten Bitcoin trading software in 2025 to help you trade efficiently and safely. These software have been rigorously screened and consider factors such as functionality, security, user-friendliness and support levels. From beginner-friendly platforms to complex tools for experienced traders, you will find the best options for your trading needs in this list.

Summary of essential software for 2025 currency circle Summary of essential software for 2025 currency circle Feb 21, 2025 pm 09:42 PM

This guide provides an overview of the essential software tools in the currency circle that helps users manage and trade crypto assets more efficiently. These software cover a wide range of categories from trading platforms to analytical tools and security solutions. The guide is designed to help users prepare for the upcoming crypto market in 2025.

gateio exchange app old version gateio exchange app old version download channel gateio exchange app old version gateio exchange app old version download channel Mar 04, 2025 pm 11:36 PM

Gateio Exchange app download channels for old versions, covering official, third-party application markets, forum communities and other channels. It also provides download precautions to help you easily obtain old versions and solve the problems of discomfort in using new versions or device compatibility.

Which is the best market viewing software? Top 10 virtual currency exchange market viewing software Which is the best market viewing software? Top 10 virtual currency exchange market viewing software Feb 21, 2025 pm 09:48 PM

For those engaged in virtual currency trading, choosing an excellent market viewing software is crucial. This article aims to introduce readers to the ten most acclaimed virtual currency exchange viewing software to help them make informed choices. These software offer a variety of features, including real-time quotes, technical analysis tools, charts and custom alerts to meet the needs of different traders. Whether you are an experienced professional or a beginner, this article will provide you with valuable insights to help you find the best viewing software for your trading style.

The latest price of Bitcoin in 2018-2024 USD The latest price of Bitcoin in 2018-2024 USD Feb 15, 2025 pm 07:12 PM

Real-time Bitcoin USD Price Factors that affect Bitcoin price Indicators for predicting future Bitcoin prices Here are some key information about the price of Bitcoin in 2018-2024:

What are the different ways of promoting H5 and mini programs? What are the different ways of promoting H5 and mini programs? Apr 06, 2025 am 11:03 AM

There are differences in the promotion methods of H5 and mini programs: platform dependence: H5 depends on the browser, and mini programs rely on specific platforms (such as WeChat). User experience: The H5 experience is poor, and the mini program provides a smooth experience similar to native applications. Communication method: H5 is spread through links, and mini programs are shared or searched through the platform. H5 promotion methods: social sharing, email marketing, QR code, SEO, paid advertising. Mini program promotion methods: platform promotion, social sharing, offline promotion, ASO, cooperation with other platforms.

Top 10 Currency Trading App Platforms List The latest ranking of the top 10 Currency Exchanges Top 10 Currency Trading App Platforms List The latest ranking of the top 10 Currency Exchanges Feb 21, 2025 pm 09:33 PM

Many exchange platforms with strong comprehensive strength have emerged in the field of digital asset trading. Among them, OKX, Binance and Huobi have become the industry benchmark with their strong technical strength, complete security guarantees and rich product lines. Established in 2013, OKX has a strong technical team and the developed high-performance trading engine and security system ensures the user's trading experience. Binance is known for its largest trading volume in the world. Its diversified product ecosystem and leading technological advantages lead industry innovation. Huobi has been deeply involved in the industry for many years and has a good user reputation and brand influence. Its global layout and compliant operations provide users with a reliable trading environment.

Ouyi official login entrance 2025 Ouyi okx trading platform official version entrance Ouyi official login entrance 2025 Ouyi okx trading platform official version entrance Feb 15, 2025 pm 07:15 PM

Fully understand the official entrance of Ouyi OKX trading platform Real-time monitoring of the use of Ouyi OKX trading platform In-depth discussion on the security guarantee and development plan of Ouyi OKX trading platform

See all articles