Home Backend Development PHP Tutorial PHP+MYSQL implements full-text search and full-text search tools

PHP+MYSQL implements full-text search and full-text search tools

May 26, 2018 pm 04:47 PM

How to use PHP to implement full-text search function?
Many people may be able to come up with several solutions right away, such as: file retrieval method, using SQL like statement, etc., but these methods are quite inefficient.
Here we introduce a relatively efficient method to implement PHP full-text retrieval, which is to use the FULLTEXT field type of MYSQL. However, MYSQL's FULLTEXT field does not support Chinese very well. This article also introduces how to implement the Chinese full-text search function through PHP+MYSQL.
First of all, you need to use a PHP Chinese word segmentation extension module??SCWS. Regarding the installation and use of this module, you can go to www.ftphp.com/scws to find relevant content (please leave a message if you have any questions).
Then take a look at the relevant information about the fulltext field type of mysql:
MySQL versions after 3.23.23 begin to support full-text indexing and search. The full-text index in MySQL is a FULLTEXT type index.
FULLTEXT indexes are used on MyISAM tables and can be created on CHAR, VARCHAR or TEXT columns at or after CREATE TABLE using ALTER TABLE or CREATE INDEX. For large databases, it is very fast to load the data into a table without a FULLTEXT index and then use ALTER TABLE (or CREATE INDEX) to create the index. Loading data into a table that already has a FULLTEXT index will be very slow.

MYSQL full-text search is completed through the MATCH() function.
The following is a simple example:
1. Create a new data table:

CREATE TABLE fulltext_sample(copy TEXT,FULLTEXT(copy)) TYPE=MyISAM;
Copy after login

The copy here is a fulltext type field. If the full text search field is not added when creating the table, it can also be added through alert, such as:

ALTER TABLE fulltext_sample ADD FULLTEXT(copy)
Copy after login

2. Insert data:

INSERT INTO fulltext_sample VALUES
('It appears good from here'),
('The here and the past'),
('Why are we hear'),
('An all-out alert'),
('All you need is love'),
('A good alert');
Copy after login

3. Data retrieval:

SELECT * FROM fulltext_sample WHERE MATCH(copy) AGAINST('love');
Copy after login

The above is the full-text search function of mysql. Note: Searching on the full-text index is not case-sensitive.

Let’s look at how to implement Chinese full-text search.
The fulltext field is based on words, and words need to be separated by spaces. However, in Chinese sentences, the words are not separated by spaces, so we need to segment Chinese words, which is why we need to emphasize the above. The Chinese word segmentation extension module is used for words.
However, despite segmenting Chinese words, MYSQL still cannot achieve full-text retrieval of Chinese through MATCH. This requires a certain method for conversion. A relatively simple and practical method is to use the following function (of course there are better ones), It converts Chinese into urlencode.

function q_encode($str)
{
$data = array_filter(explode(" ",$str));
$data = array_flip(array_flip($data));
foreach ($data as $ss) {
  if (strlen($ss)>1 ) 
   $data_code .= str_replace("%","",urlencode($ss)) . " ";
}
$data_code = trim($data_code);
return $data_code;
}
Copy after login

Save the converted content to the pre-defined fulltext field. Similarly, when querying, the query keywords need to be converted in the same way.

How to implement UTF8 full-text search with PHP+Mysql

This article explains how to quickly perform full-text search in massive data? MySQL provides a full-text index function, that is, setting the FULLTEXT index attribute on the field, and then searching through the MATCH AGAINST statement of SELECT.

TouchUs - The Global Yellow Pages & Business Directory (www.touchus.org), a pure English site we developed, uses this function of MySQL to achieve an average full-text retrieval time of less than 0.5 seconds for more than 100,000 pieces of data. However, when developing the Chinese website of TouchUs - City Yellow Pages (www.city39.cn), we encountered new problems. It turns out that in English typesetting, words are distinguished by spaces, which FULLText can fully support, but for Chinese or East Asian characters, it is not so simple. Because there is no obvious separation between words in Chinese, MySQL cannot Supports full-text search with Chinese characters.

How to make MySQL also support Chinese full-text search? An idea came up accidentally, that is, after Chinese word segmentation, it is possible to encode the Chinese into English characters, so as to establish a specific connection between Chinese and English, and then perform full-text search. In this way, wouldn't it be possible to realize Chinese characters? Is the full text indexed? After testing, the answer is yes. The following is the specific process implemented in the City Yellow Pages network:

1. Create a separate index table, for example, corresponding to the members table, we create a members_index table. M Members (members) User information full -text

user_id user_id

user_name index_intro

user_introduction

Add FullText index in the index_intro of the members_index table.

2. Perform Chinese word segmentation processing on the contents of the User_introduction field of the user information table (members)

中文分词的处理过程,可以参考简易中文分词系统http://www.ftphp.com/scws/,在城市黄页网中,我们采用了scws的PHP扩展模块方式来实现中文分词。scws的php扩展模块安装非常简单,只需简单编译配置后即可使用。在具体的php代码中,我们写了如下的函数来实现分词后将分词结果用空格进行连接。

//中文分词函数
function str_fc($str) {
$so = scws_new();
$so->set_charset('utf8');
// 这里没有调用 set_dict 和 set_rule 系统会自动试调用 ini 中指定路径下的词典和规则文件
$so->send_text($str);
while ($tmp = $so->get_result())
{
foreach (  $tmp as $ss ){
$s = trim($ss[word]);
if ( $s )
$mystr .= trim($ss[word]) . " ";
//echo urlencode(trim($ss[word])) . " ";
}
}
return $mystr;
}
Copy after login

该函数返回就是用空格连接的分词结果。

3. 对分词结果进行编码,可以采用多种编码方式,比如base64编码、urlencode编码、汉字转拼音等,对gb2312甚至可以采用区位码编码方式。考虑到存储空间以及便利性,我们采用了PHP的urlencode编码方式。需要注意的是,在编码前,我们可以去掉重复的分词来节约存储空间,编码后要去掉编码结果中的%符号,因为urlencode采用RFC 1738???行编码,会产生很多%,而%在MySQL是通配符。下面是编码过程用到的PHP代码

$data = str_fc($data);  //中文分词
$data = array_filter(explode(" ",$data)); //删除数组空项
$data = array_flip(array_flip($data));  //删除重复项
//对分词结果进行urlcode编码
foreach (  $data as $ss ) {
if (strlen($ss)>1 )
$data_code .= str_replace("%","",urlencode($ss)) . " ";
}
Copy after login

这里的$data_code就是编码后的结果。把编码结果根据user_id存入用户信息全文索

引表(members_index)

4. 在进行搜索处理时,首先对用户输入的关键字进行同样的分词编码处理,然后通过MySQL的SELECT的MATCH  AGAINST语句进行全文快速检索,根据检索结的user_id即可调用用户信息表(members)中的原始数据进行显示,而没有必要进行一次解码重组。

以上MySQL UTF8中文全文检索方法.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1269
29
C# Tutorial
1249
24
PHP and Python: Comparing Two Popular Programming Languages PHP and Python: Comparing Two Popular Programming Languages Apr 14, 2025 am 12:13 AM

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1? Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1? Apr 17, 2025 am 12:06 AM

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values ​​to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP in Action: Real-World Examples and Applications PHP in Action: Real-World Examples and Applications Apr 14, 2025 am 12:19 AM

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP: A Key Language for Web Development PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

The Enduring Relevance of PHP: Is It Still Alive? The Enduring Relevance of PHP: Is It Still Alive? Apr 14, 2025 am 12:12 AM

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

How does PHP type hinting work, including scalar types, return types, union types, and nullable types? How does PHP type hinting work, including scalar types, return types, union types, and nullable types? Apr 17, 2025 am 12:25 AM

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values ​​and handle functions that may return null values.

PHP and Python: Code Examples and Comparison PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

PHP vs. Other Languages: A Comparison PHP vs. Other Languages: A Comparison Apr 13, 2025 am 12:19 AM

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

See all articles