


Match the overlap between the csdn user database and official users and filter out the overlapping users_PHP tutorial
Process:
1. Obtain the user database of csdn and import it locally
When trying to open editplus, it prompts that there is insufficient memory and cannot find a way. My colleague checked it under Linux and the basic format is as follows:
Username # Password # Email
Username# Password# Email
Corresponding data structure:
CREATE TABLE IF NOT EXISTS `csdn_userdb` (
`id` int(10) NOT NULL auto_increment,
`username` varchar(50) character set gbk NOT NULL,
`password` varchar(50) character set gbk NOT NULL,
`email` varchar(50) character set gbk NOT NULL,
PRIMARY KEY (`id`),
KEY `username` (`username`),
KEY `email` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=gbk AUTO_INCREMENT=1;
I have always suspected that fopen opens files into the cache, but practice has proved that it is very fast, so there should be no Write to the cache, the following is the code for importing data
$ link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle = fopen("C:UserszhudongDesktopwww.csdn.net.sql ", "r");
while (!feof($handle)){
$i++;
$buffer = fgets($handle);
list($u,$p,$ e) = explode(" # ",$buffer);
mysql_query("INSERT INTO csdn_userdb(username,password,email) VALUES ('$u','$p','$e')",$link );
if ($i%1000 == 0) echo $i."n";
}
fclose($handle);
?>
The efficiency of the above code is very poor, so the modified code is as follows:
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle = fopen("C:UserszhudongDesktopwww.csdn .net.sql", "r");
$perpage = 50;
while (!feof($handle)){
$i++;
$buffer = fgets($handle);
list($u,$p,$e) = explode(" # ",$buffer);
$insertValue[] = "('$u','$p','$e') ";
if ($i% $perpage == 0){ $perpage == 0){
$insrtValueString = implode(',',$insertValue);
mysql_query("INSERT INTO csdn_userdb( username,password,email) VALUES $instrtValueString",$link);
echo $i."n";
$insertValue = array();
}
}
fclose($ handle);
In order to find out what factors affect the efficiency of imported data, we conducted tests based on different settings
Total CSDN user data 6428600
When $perpage=500 ;Data after import: 5,902,000; data loss 526600 Loss rate: 8%; Data table engine: MyISAM; Index: Yes; Total time taken: 15 minutes
When $perpage=200, total data after import: 6,210,200; data loss :218400; Loss rate: 3.3%; Data table engine: MYISAM; Index: Yes; Total time taken: 30 minutes
When $perpage=200, the total number of data after import: 6,210,200; Data loss: 218400; Loss rate: 3.3 %; Data table engine: INNODB; Index: Yes; Total time taken: 65 minutes
When $perpage=200, the total number of data after import: 6,210,200; Data loss: 218400; Loss rate: 3.3%; Data table engine: MYISAM ; Index: None; Total time taken: 14 minutes (rebuild the index separately after the data is imported)
When $perpage=50, the total number of data after import: 6,371,200; data loss: 57400, loss rate: 0.8%; data table engine : MYISAM; Index: None: Total time taken: 20 minutes
Based on the above situation, the summary is as follows:
1. Importing data first and then adding index is twice as efficient as adding index first and then importing data
2. The efficiency of InnoDB in single-process data insertion is much lower than that of MYISAM
3. When perpage=50, the data loss rate is less than 1%
Because there will be a timeout problem when executing through the browser, and the efficiency is low, it is run through the command line. I encountered a little trouble during the process and was delayed. It took less time
At first I executed the following code:
php.exe E:usrwwwimportcsdndb.php
But it kept reporting an error: call to undefined function mysql_connect
After much trouble, I found that php.ini was not loaded
Correct code For:
php.exe -c E:/usr/local/apache2/php.ini importcsdndb.php
2. Import the user data that needs to be matched to the local
command line Enter msyql (you don’t know how to use Baidu)
Then execute: mysql>source C:/Users/zhudong/Desktop/userdb.sql
3. Compare and filter users
After the comparison program is written, remember to add it in the command Run under the line:
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
//$handle_email = fopen("E:/records_email.txt","a");
$username_num = $email_num = $uid = 0;
while ($uid<2181106) {
$nextuid=$uid+10000;
$query = mysql_query("SELECT * FROM pw_members WHERE uid>'$uid' AND uid<'$nextuid'");
while ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
$username = $rt['username'];
$email = $rt['email'];
$query2 = mysql_query("SELECT * FROM scdn_userdb WHERE username='$username' OR email='$email'");
while ($rt2 = mysql_fetch_array($query2,MYSQL_ASSOC)) {
if ($rt['password'] = md5($rt2['password'])) {
if ($rt2['username'] == $username) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'username_num='.$username_num."rn";
continue;
}
/*
if ($rt2['email'] == $email) {
$email_num++;
fwrite($handle_email,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'email_num='.$email_num."rn";
}
*/
}
}
mysql_free_result($query2);
}
$uid = $nextuid;
}
?>
您看到的以上的代码是非常蹩脚的,因为其效率特别低 ,几百万的数据,要跑10多个小时,怎么能忘记连表查询这么基本的东西呢,以下为修正后的方法
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
while($uid<2181106) {//此处的数字为要对比用户库的最大ID
$nextuid= $uid+10000;
$query = mysql_query("SELECT m.uid,m.username,m.password,m.email,u.password as csdn_password,u.email as csdn_email FROM own_members m LEFT JOIN csdn_userdb u USING(username) WHERE m.uid>'$uid' AND m.uid<='$nextuid' AND u.username!=''");
while ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
if ($rt['password'] == md5($rt['csdn_password'])) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt['username'].'|'.$rt['csdn_password'].'|'.$rt['csdn_email']."rn");
echo 'username_num='.$username_num."rn";
}
}
$uid = $nextuid;
echo 'uid='.$uid;
}
?>
总对比时间25分钟,相比较之前10多个小时的执行真是大有提升
总重名用户:34175
占总会员比例:1.7%
1.7%的重名用户还是挺严重的,希望本文对各位站长对比出本站的用户有所帮助

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Apple's latest releases of iOS18, iPadOS18 and macOS Sequoia systems have added an important feature to the Photos application, designed to help users easily recover photos and videos lost or damaged due to various reasons. The new feature introduces an album called "Recovered" in the Tools section of the Photos app that will automatically appear when a user has pictures or videos on their device that are not part of their photo library. The emergence of the "Recovered" album provides a solution for photos and videos lost due to database corruption, the camera application not saving to the photo library correctly, or a third-party application managing the photo library. Users only need a few simple steps

How to use MySQLi to establish a database connection in PHP: Include MySQLi extension (require_once) Create connection function (functionconnect_to_db) Call connection function ($conn=connect_to_db()) Execute query ($result=$conn->query()) Close connection ( $conn->close())

To handle database connection errors in PHP, you can use the following steps: Use mysqli_connect_errno() to obtain the error code. Use mysqli_connect_error() to get the error message. By capturing and logging these error messages, database connection issues can be easily identified and resolved, ensuring the smooth running of your application.

Using the database callback function in Golang can achieve: executing custom code after the specified database operation is completed. Add custom behavior through separate functions without writing additional code. Callback functions are available for insert, update, delete, and query operations. You must use the sql.Exec, sql.QueryRow, or sql.Query function to use the callback function.

Through the Go standard library database/sql package, you can connect to remote databases such as MySQL, PostgreSQL or SQLite: create a connection string containing database connection information. Use the sql.Open() function to open a database connection. Perform database operations such as SQL queries and insert operations. Use defer to close the database connection to release resources.

JSON data can be saved into a MySQL database by using the gjson library or the json.Unmarshal function. The gjson library provides convenience methods to parse JSON fields, and the json.Unmarshal function requires a target type pointer to unmarshal JSON data. Both methods require preparing SQL statements and performing insert operations to persist the data into the database.

Use the DataAccessObjects (DAO) library in C++ to connect and operate the database, including establishing database connections, executing SQL queries, inserting new records and updating existing records. The specific steps are: 1. Include necessary library statements; 2. Open the database file; 3. Create a Recordset object to execute SQL queries or manipulate data; 4. Traverse the results or update records according to specific needs.

PHP database connection guide: MySQL: Install the MySQLi extension and create a connection (servername, username, password, dbname). PostgreSQL: Install the PgSQL extension and create a connection (host, dbname, user, password). Oracle: Install the OracleOCI8 extension and create a connection (servername, username, password). Practical case: Obtain MySQL data, PostgreSQL query, OracleOCI8 update record.
