Mysql solution for counting 5 million+ daily table data?-PHP Tutorial-php.cn

Table of Contents

Reply content:

Home

Backend Development

PHP Tutorial

Mysql solution for counting 5 million+ daily table data?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Aug 18, 2016 am 09:15 AM

mysql php

<code>请教：
现在有每天的日表数据（一天生成一张）， 每张表数据大概在500w左右。
需要从每天的日表数据中统计：根据appid统计ip数，同时ip需要去重。 
大概的sql是：</code>

Copy after login

select appid, count(distinct(ip)) from log0812_tb where iptype = 4 group by appid;

<code>然后将统计的appid 和 ip数，放入到另一张统计表中。 

1、直接执行sql的话，肯定超时了（系统仅配置了400ms读取时间）。
2、如果将数据都取出到内存中再做操作，内存又不足了，给的内存只有50M。。。（不为难程序员的需求不是好公司）
 
请问，还有优化的解决方案吗？
谢谢 </code>

Copy after login

Reply content:

<code>请教：
现在有每天的日表数据（一天生成一张）， 每张表数据大概在500w左右。
需要从每天的日表数据中统计：根据appid统计ip数，同时ip需要去重。 
大概的sql是：</code>

Copy after login

select appid, count(distinct(ip)) from log0812_tb where iptype = 4 group by appid;

<code>然后将统计的appid 和 ip数，放入到另一张统计表中。 

1、直接执行sql的话，肯定超时了（系统仅配置了400ms读取时间）。
2、如果将数据都取出到内存中再做操作，内存又不足了，给的内存只有50M。。。（不为难程序员的需求不是好公司）
 
请问，还有优化的解决方案吗？
谢谢 </code>

Copy after login

Let’s first talk about the possible optimizations in the table below:

Make a combined index (appid, ip)
IP stores integers, not strings

If it still times out, then try to read the data into the memory, but your memory is only 50M, then you can try to use HyperLogLog. The memory consumed is very small, but the statistical data will be slightly biased, about 2%

Finally, it is best not to store this kind of log data in sql. You can choose some nosql such as hbase and mongodb, which can meet your needs very well

@manong
Thank you, the two optimization solutions you mentioned are both good.

I built a joint index of typeid, appid, and ip, so that this statement is executed through the index query without returning the table, and the time is controlled below 1.5s, which is effective.

As for the HyperLogLog algorithm, I just roughly checked it and didn’t put it into practice, but thank you for the recommendation.

I use another method to process: schedule tasks to process these 5 million+ data in batches. After deduplicating the data taken twice, do array_diff to compare the second different data, and then sum it to get the total count. In this way, the time can also be controlled below 1s. A trick here is to convert the array of the first comparison into a string and then store it in the array. Convert the string to array for the second comparison. This will save a lot of memory, because after trying it, nested arrays are better than long characters. Arrays of string values consume memory.

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1670

CakePHP Tutorial

1428

Laravel Tutorial

1329

PHP Tutorial

1274

C# Tutorial

1256

Related knowledge

MySQL: The Database, phpMyAdmin: The Management Interface Apr 29, 2025 am 12:44 AM

MySQL and phpMyAdmin can be effectively managed through the following steps: 1. Create and delete database: Just click in phpMyAdmin to complete. 2. Manage tables: You can create tables, modify structures, and add indexes. 3. Data operation: Supports inserting, updating, deleting data and executing SQL queries. 4. Import and export data: Supports SQL, CSV, XML and other formats. 5. Optimization and monitoring: Use the OPTIMIZETABLE command to optimize tables and use query analyzers and monitoring tools to solve performance problems.

What is the significance of the session_start() function? May 03, 2025 am 12:18 AM

session_start()iscrucialinPHPformanagingusersessions.1)Itinitiatesanewsessionifnoneexists,2)resumesanexistingsession,and3)setsasessioncookieforcontinuityacrossrequests,enablingapplicationslikeuserauthenticationandpersonalizedcontent.

Steps to add and delete fields to MySQL tables Apr 29, 2025 pm 04:15 PM

In MySQL, add fields using ALTERTABLEtable_nameADDCOLUMNnew_columnVARCHAR(255)AFTERexisting_column, delete fields using ALTERTABLEtable_nameDROPCOLUMNcolumn_to_drop. When adding fields, you need to specify a location to optimize query performance and data structure; before deleting fields, you need to confirm that the operation is irreversible; modifying table structure using online DDL, backup data, test environment, and low-load time periods is performance optimization and best practice.

How to uninstall MySQL and clean residual files Apr 29, 2025 pm 04:03 PM

To safely and thoroughly uninstall MySQL and clean all residual files, follow the following steps: 1. Stop MySQL service; 2. Uninstall MySQL packages; 3. Clean configuration files and data directories; 4. Verify that the uninstallation is thorough.

How to use MySQL functions for data processing and calculation Apr 29, 2025 pm 04:21 PM

MySQL functions can be used for data processing and calculation. 1. Basic usage includes string processing, date calculation and mathematical operations. 2. Advanced usage involves combining multiple functions to implement complex operations. 3. Performance optimization requires avoiding the use of functions in the WHERE clause and using GROUPBY and temporary tables.

An efficient way to batch insert data in MySQL Apr 29, 2025 pm 04:18 PM

Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.

Composer: The Package Manager for PHP Developers May 02, 2025 am 12:23 AM

Composer is a dependency management tool for PHP, and manages project dependencies through composer.json file. 1) parse composer.json to obtain dependency information; 2) parse dependencies to form a dependency tree; 3) download and install dependencies from Packagist to the vendor directory; 4) generate composer.lock file to lock the dependency version to ensure team consistency and project maintainability.

Detailed explanation of the installation steps of MySQL on macOS system Apr 29, 2025 pm 03:36 PM

Installing MySQL on macOS can be achieved through the following steps: 1. Install Homebrew, using the command /bin/bash-c"$(curl-fsSLhttps://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)". 2. Update Homebrew and use brewupdate. 3. Install MySQL and use brewinstallmysql. 4. Start MySQL service and use brewservicesstartmysql. After installation, you can use mysql-u

See all articles