在内存小于文件大小的情况下,大文件中快速查找定位一行
内存 大文件
比如有一个文件ABC 56
DEF 100
RET 300
...
文件有2列,第一列都是不重复的,第2列表示次数(当成一个数字就行了)。
如果文件大小为2G或者更大,内存只有1G的情况,如何快速定位到“ABC 56” 这一行。
请大拿们给个清晰点的解决方法。
回复讨论(解决方案)
没明白您是什么意思?
如果是打开文件想快速找到某一行的话,可以使用vi或者more将文件打开;
然后输入: /ABC 回车就好了
fopen,再fscanf。
一次读一行就好啊。内存不会成为限制因素的。
有没有人知道啊?
如果是一行一行读,那效率就不行啦。
还有没有更快速的方法呢?
我的思路是建一张哈希表,然后根据哈希算法,再用那个哈希碰撞的原理去排重。
不知道各位有什么好的意见没
建hash表的话,岂不是要先对文件的内容进行hash?
可以用其他的工具来处理,未必一定要用算法。
比如awk:
awk '/ABC\t56/{ print NR}' file
可以获取匹配行的行号。
建议lz说下具体的需求,如果仅仅是获取行号的话,方案很多。
但是如果还有其他需求的话,类似awk这么做未必是最佳方案。
有没有人知道啊?
如果是一行一行读,那效率就不行啦。
还有没有更快速的方法呢?
我的思路是建一张哈希表,然后根据哈希算法,再用那个哈希碰撞的原理去排重。
不知道各位有什么好的意见没 那你不也得先一行一行读出来再哈希吗?
嫌一行一行读太慢,可以一块一块读
有没有人知道啊?
如果是一行一行读,那效率就不行啦。
还有没有更快速的方法呢?
我的思路是建一张哈希表,然后根据哈希算法,再用那个哈希碰撞的原理去排重。
不知道各位有什么好的意见没 那你不也得先一行一行读出来再哈希吗?
嫌一行一行读太慢,可以一块一块读
是的读块 比较符合你的需求
楼主可参考:
http://www.fantxi.com/blog/archives/php-read-large-file/
http://sjolzy.cn/php-large-file-read-operation.html
建hash表的话,岂不是要先对文件的内容进行hash?
可以用其他的工具来处理,未必一定要用算法。
比如awk:
awk '/ABC\t56/{ print NR}' file
可以获取匹配行的行号。
建议lz说下具体的需求,如果仅仅是获取行号的话,方案很多。
但是如果还有其他需求的话,类似awk这么做未必是最佳方案。
需求就是怎么能快速找到? 比如我想知道ABC后面的数字,或者DEF后面的数字...
有没有人知道啊?
如果是一行一行读,那效率就不行啦。
还有没有更快速的方法呢?
我的思路是建一张哈希表,然后根据哈希算法,再用那个哈希碰撞的原理去排重。
不知道各位有什么好的意见没 那你不也得先一行一行读出来再哈希吗?
嫌一行一行读太慢,可以一块一块读
内存怎么一块一块读呢? 能给个例子吗?

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP handles file uploads through the $\_FILES variable. The methods to ensure security include: 1. Check upload errors, 2. Verify file type and size, 3. Prevent file overwriting, 4. Move files to a permanent storage location.

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.
