


Export database files under Linux for statistics + deduplication
This article mainly talks about how to implement database file statistics and deduplication in Linux. Friends who are interested can learn it!
1. Export the database table to a text file
mysql -h host -P port -u user -p password -A database -e "select email,domain,time from ent_login_01_000" > ent_login_01_000.txt
A total of logged-in users in the last 3 months will be counted, divided into tables by month, and there are 128 tables per month, all exported to files, a total of 80G
2. grep finds all 2018-12 2019-01 2019-02
find ./ -type f -name "ent_login_*" | xargs cat |grep "2018-12" > 2018-12.txt
find ./ -type f -name "ent_login_*" |xargs cat |grep "2019-01" > 2019-01.txt
find ./ -type f -name "ent_login_*" |xargs cat |grep "2019-02" > 2019-02.txt
3. Use awk sort and uniq to only remove the previous user, and First go to the duplicate lines
cat 2019-02.txt|awk -F " " '{print $1"@"$2}'|sort -T /mnt/public/phpdev/187_test/tmp/|uniq > 2019-02-awk-sort-uniq.txt
cat 2019-01.txt|awk -F " " '{print $1"@"$2}'|sort -T /mnt/public/ phpdev/187_test/tmp/|uniq > 2019-01-awk-sort-uniq.txt
cat 2018-12.txt|awk -F " " '{print $1"@"$2}'| sort -T /mnt/public/phpdev/187_test/tmp/|uniq > 2018-12-awk-sort-uniq.txt
uniq only removes consecutive duplicate lines, sort can arrange the lines into consecutive The -T is because the temporary directory of /tmp is occupied by default. The root directory is not enough for me, so I changed the temporary directory.
These files occupy more than 100 G
I want to learn more For Linux tutorials, please pay attention to Linux Video Tutorials on the PHP Chinese website!
The above is the detailed content of Export database files under Linux for statistics + deduplication. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The five basic components of the Linux system are: 1. Kernel, 2. System library, 3. System utilities, 4. Graphical user interface, 5. Applications. The kernel manages hardware resources, the system library provides precompiled functions, system utilities are used for system management, the GUI provides visual interaction, and applications use these components to implement functions.

To view the Git repository address, perform the following steps: 1. Open the command line and navigate to the repository directory; 2. Run the "git remote -v" command; 3. View the repository name in the output and its corresponding address.

Oracle is not only a database company, but also a leader in cloud computing and ERP systems. 1. Oracle provides comprehensive solutions from database to cloud services and ERP systems. 2. OracleCloud challenges AWS and Azure, providing IaaS, PaaS and SaaS services. 3. Oracle's ERP systems such as E-BusinessSuite and FusionApplications help enterprises optimize operations.

Although Notepad cannot run Java code directly, it can be achieved by using other tools: using the command line compiler (javac) to generate a bytecode file (filename.class). Use the Java interpreter (java) to interpret bytecode, execute the code, and output the result.

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

There are six ways to run code in Sublime: through hotkeys, menus, build systems, command lines, set default build systems, and custom build commands, and run individual files/projects by right-clicking on projects/files. The build system availability depends on the installation of Sublime Text.

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

To install Laravel, follow these steps in sequence: Install Composer (for macOS/Linux and Windows) Install Laravel Installer Create a new project Start Service Access Application (URL: http://127.0.0.1:8000) Set up the database connection (if required)
