Table of Contents
Teach you step by step how to do a keyword matching project (search engine) ---- On the 20th day, teach you how to do it on the 20th day
Home Backend Development PHP Tutorial Teach you step by step how to do a keyword matching project (search engine) ---- Day 20, teach you how to do it on the 20th day_PHP Tutorial

Teach you step by step how to do a keyword matching project (search engine) ---- Day 20, teach you how to do it on the 20th day_PHP Tutorial

Jul 13, 2016 am 10:19 AM
- Do Key words match Hand in hand search engine teach you project

Teach you step by step how to do a keyword matching project (search engine) ---- On the 20th day, teach you how to do it on the 20th day

Guest appearance: Diaosi’s cheating form Things like artifacts and databases

Object-oriented sublimation: object-oriented understanding - first acquaintance with new students, object-oriented extras - sleepwalking of thoughts (1), object-oriented understanding - how to find classes

Load Balancing: Load Balancing - Concept Understanding, Load Balancing - Implementation Configuration (Nginx)

Complaints: Some people reported such information, saying that the article became harder to understand towards the end and could not keep up with the rhythm. Some people also asked why Xiao Shuai Shuai’s ability increased so fast, and whether I was stupid. Some people just read the text without looking at the code, because the code is too difficult to understand.

Actually, I have been thinking about this issue these days, so I had no choice but to launch some object-oriented courses. I hope it will be helpful to those who can't keep up. In fact, to be honest, if readers don't give feedback, I will have to carry out the course according to what I think Xiaoshuai Shuai is.

Day 20

Starting point: Teach you step by step how to do keyword matching project (search engine) ---- Day 1

Review: Teach you step by step how to do keyword matching project (search engine) ---- Day 19

It is said that Xiao Shuai Shuai wrote the first version in order to solve the word segmentation algorithm. When he showed it to Boss Yu, he was asked to rewrite it.

The reasons are as follows:

1. How to test and test data?

2. Has Splitter done too much?

3. What should I do if there are repeated phrases in dresses like xxl dresses?

Xiao Shuai Shuai took these questions and began to reconstruct.

First he discovered this, the judgment of Chinese, English and Chinese-English, and the calculation of length. He wrote this as a class:

<?<span>php

</span><span>class</span><span> UTF8 {

    </span><span>/*</span><span>*
     * 检测是否utf8
     * @param $char
     * @return bool
     </span><span>*/</span>
    <span>public</span> <span>static</span> <span>function</span> is(<span>$char</span><span>){
        </span><span>return</span> (<span>preg_match</span>("/^([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){1}/",<span>$char</span>) ||
            <span>preg_match</span>("/([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){1}$/",<span>$char</span>) ||
            <span>preg_match</span>("/([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){2,}/",<span>$char</span><span>));
    }

    </span><span>/*</span><span>*
     * 计算utf8字的个数
     * @param $char
     * @return float|int
     </span><span>*/</span>
    <span>public</span> <span>static</span> <span>function</span> length(<span>$char</span><span>) {

        </span><span>if</span>(self::is(<span>$char</span><span>))
            </span><span>return</span> <span>ceil</span>(<span>strlen</span>(<span>$char</span>)/3<span>);
        </span><span>return</span> <span>strlen</span>(<span>$char</span><span>);
    }

    </span><span>/*</span><span>*
     * 检测是否为词组
     * @param $word
     * @return bool
     </span><span>*/</span>
    <span>public</span> <span>static</span> <span>function</span> isPhrase(<span>$word</span><span>){

        </span><span>if</span>(self::length(<span>$word</span>)<=1<span>)
            </span><span>return</span> <span>false</span><span>;
        </span><span>return</span> <span>true</span><span>;
    }

}</span>
Copy after login

Xiao Shuai Shuai also considered that the source of the dictionary may come from multiple places, such as the test data I gave. This can solve the problem that Boss Yu mentioned that cannot be tested. Xiao Shuai Shuai took a cut of the source of the dictionary. Created a class, the class is as follows:

<?<span>php

</span><span>class</span><span> DBSegmentation {

    </span><span>public</span> <span>$cid</span><span>;

    </span><span>/*</span><span>*
     * 获取类目下分词的词组数据
     * @return array
     </span><span>*/</span>
    <span>public</span> <span>function</span><span> transferDictionary(){
        </span><span>$ret</span> = <span>array</span><span>();
        </span><span>$sql</span> = "select word from category_linklist where cid='<span>$this</span>->cid'"<span>;
        </span><span>$words</span> = DB::makeArray(<span>$sql</span><span>);
        </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$strWords</span><span>){
            </span><span>$words</span> = <span>explode</span>(",",<span>$strWords</span><span>);

            </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$word</span><span>){
                </span><span>if</span>(UTF8::isPhrase(<span>$word</span><span>)){
                    </span><span>$ret</span>[] = <span>$word</span><span>;
                }
            }
        }
        </span><span>return</span> <span>$ret</span><span>;
    }
} 

</span><span>class</span><span> TestSegmentation {
    
    </span><span>public</span> <span>function</span><span> transferDictionary(){
        </span><span>$words</span> = <span>array</span><span>(
            </span>"连衣裙,连衣",
            "XXL,xxl,加大,加大码",
            "X码,中码",
            "外套,衣,衣服,外衣,上衣",
            "女款,女士,女生,女性"<span>
        );

        </span><span>$ret</span> = <span>array</span><span>();
        </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$strWords</span><span>){
            </span><span>$words</span> = <span>explode</span>(",",<span>$strWords</span><span>);

            </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$word</span><span>){
                </span><span>if</span>(UTF8::isPhrase(<span>$word</span><span>)){
                    </span><span>$ret</span>[] = <span>$word</span><span>;
                }
            }
        }
        </span><span>return</span> <span>$ret</span><span>;

    }
}</span>
Copy after login

Then Splitter will focus on word segmentation. The code is as follows:

<span>class</span><span> Splitter {

    </span><span>public</span> <span>$keyword</span><span>;
    </span><span>private</span> <span>$dictionary</span> = <span>array</span><span>();

    </span><span>public</span> <span>function</span> setDictionary(<span>$dictionary</span> = <span>array</span><span>()){

        </span><span>usort</span>(<span>$dictionary</span>,<span>function</span>(<span>$a</span>,<span>$b</span><span>){
            </span><span>return</span> (UTF8::length(<span>$a</span>)>UTF8::length(<span>$b</span>))?1:-1<span>;
        });

        </span><span>$this</span>->dictionary = <span>$dictionary</span><span>;
    }

    </span><span>public</span> <span>function</span><span> getDictionary(){
        </span><span>return</span> <span>$this</span>-><span>dictionary;
    }

    </span><span>/*</span><span>*
     * 把关键词拆分成词组或者单词
     * @return KeywordEntity $keywordEntity
     </span><span>*/</span>
    <span>public</span> <span>function</span> <span>split</span><span>(){

        </span><span>$remainKeyword</span> = <span>$this</span>-><span>keyword;

        </span><span>$keywordEntity</span> = <span>new</span> KeywordEntity(<span>$this</span>-><span>keyword);

        </span><span>foreach</span>(<span>$this</span>->dictionary <span>as</span> <span>$phrase</span><span>){

            </span><span>$matchTimes</span> = <span>preg_match_all</span>("/<span>$phrase</span>/",<span>$remainKeyword</span>,<span>$matches</span><span>);
            </span><span>if</span>(<span>$matchTimes</span>>0<span>){
                </span><span>$keywordEntity</span>->addElement(<span>$phrase</span>,<span>$matchTimes</span><span>);

                </span><span>$remainKeyword</span> = <span>str_replace</span>(<span>$phrase</span>,"::",<span>$remainKeyword</span><span>);
            }
        }

        </span><span>$remainKeywords</span> = <span>explode</span>("::",<span>$remainKeyword</span><span>);
        </span><span>foreach</span>(<span>$remainKeywords</span> <span>as</span> <span>$splitWord</span><span>){

            </span><span>if</span>(!<span>empty</span>(<span>$splitWord</span><span>)){
                </span><span>$keywordEntity</span>->addElement(<span>$splitWord</span><span>);
            }
        }

        </span><span>return</span> <span>$keywordEntity</span><span>;

    }

}


</span><span>class</span><span> KeywordEntity {

    </span><span>public</span> <span>$keyword</span><span>;
    </span><span>public</span> <span>$elements</span> = <span>array</span><span>();

    </span><span>public</span> <span>function</span> __construct(<span>$keyword</span><span>){
        </span><span>$this</span>->keyword = <span>$keyword</span><span>;
    }

    </span><span>public</span> <span>function</span> addElement(<span>$word</span>,<span>$times</span>=1<span>){

        </span><span>if</span>(<span>isset</span>(<span>$this</span>->elements[<span>$word</span><span>])){
            </span><span>$this</span>->elements[<span>$word</span>]->times += <span>$times</span><span>;
        }</span><span>else</span>
            <span>$this</span>->elements[] = <span>new</span> KeywordElement(<span>$word</span>,<span>$times</span><span>);
    }

    </span><span>/*</span><span>*
     * @desc 计算UTF8字符串权重
     * @param string $word
     * @return float
     </span><span>*/</span>
    <span>public</span> <span>function</span> calculateWeight(<span>$word</span><span>)
    {
        </span><span>$element</span> = <span>$this</span>->elements[<span>$word</span><span>];
        </span><span>return</span> <span>ROUND</span>(<span>strlen</span>(<span>$element</span>->word)*<span>$element</span>->times / <span>strlen</span>(<span>$this</span>->keyword), 3<span>);
    }
}


</span><span>class</span><span> KeywordElement {
    </span><span>public</span> <span>$word</span><span>;
    </span><span>public</span> <span>$times</span><span>;

    </span><span>public</span> <span>function</span> __construct(<span>$word</span>,<span>$times</span><span>){
        </span><span>$this</span>->word = <span>$word</span><span>;
        </span><span>$this</span>->times = <span>$times</span><span>;
    }
}</span>
Copy after login

He also left the weight calculation to a class to handle specifically.

After Xiao Shuai Shuai finished writing, he also wrote a test example:

<?<span>php

</span><span>$segmentation</span> = <span>new</span><span> TestSegmentation();

</span><span>$splitter</span> = <span>new</span><span> Splitter();
</span><span>$splitter</span>->setDictionary(<span>$segmentation</span>-><span>transferDictionary());
</span><span>$splitter</span>->keyword = "连衣裙xxl裙连衣裙"<span>;
</span><span>$keywordEntity</span> = <span>$splitter</span>-><span>split</span><span>();

</span><span>var_dump</span>(<span>$keywordEntity</span>);
Copy after login

This way, even if your algorithm changes, it will be able to face it calmly.

Xiao Shuaishuai understands this. When you feel that a class does too many things, you can consider the single responsibility principle.

Single Responsibility Principle: A class has only one reason for its change. There should be only one responsibility. Each responsibility is an axis of change. If a class has more than one responsibility, these responsibilities are coupled together. This can lead to fragile designs. When one responsibility changes, it may affect other responsibilities. In addition, multiple responsibilities are coupled together, which affects reusability. For example: To achieve the separation of logic and interface. 【From Baidu Encyclopedia】

When Mr. Yu mentioned whether there are other word segmentation algorithms and whether we can use them, Xiao Shuaishuai was very happy because the code is so beautiful now.

How Xiao Shuai Shuai plays with third-party word segmentation extensions, please stay tuned for the next chapter’s breakdown: I’ll teach you step by step how to do keyword matching projects (search engines) ---- Day 21

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/873919.htmlTechArticleTeach you step by step how to do keyword matching projects (search engines) ---- On the 20th day, teach you how to do it Guest appearance on the 20th day: Diaosi’s deceptive form artifact, database thing, object-oriented sublimation: face...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Clair Obscur: Expedition 33 - How To Get Perfect Chroma Catalysts
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1677
14
PHP Tutorial
1278
29
C# Tutorial
1257
24
How to adjust aperture on Xiaomi Mi 14 Ultra? How to adjust aperture on Xiaomi Mi 14 Ultra? Mar 19, 2024 am 09:01 AM

Adjusting the aperture size has a crucial impact on the photo effect. Xiaomi Mi 14 Ultra provides unprecedented flexibility in camera aperture adjustment. In order to allow everyone to adjust the aperture smoothly and realize the free adjustment of the aperture size, the editor here brings you a detailed tutorial on how to set the aperture on Xiaomi Mi 14Ultra. How to adjust the aperture on Xiaomi Mi 14Ultra? Start the camera, switch to &quot;Professional Mode&quot;, and select the main camera - W lens. Click on the aperture, open the aperture dial, A is automatic, select f/1.9 or f/4.0 as needed.

Can AI conquer Fermat's last theorem? Mathematician gave up 5 years of his career to turn 100 pages of proof into code Can AI conquer Fermat's last theorem? Mathematician gave up 5 years of his career to turn 100 pages of proof into code Apr 09, 2024 pm 03:20 PM

Fermat's last theorem, about to be conquered by AI? And the most meaningful part of the whole thing is that Fermat’s Last Theorem, which AI is about to solve, is precisely to prove that AI is useless. Once upon a time, mathematics belonged to the realm of pure human intelligence; now, this territory is being deciphered and trampled by advanced algorithms. Image Fermat's Last Theorem is a "notorious" puzzle that has puzzled mathematicians for centuries. It was proven in 1993, and now mathematicians have a big plan: to recreate the proof using computers. They hope that any logical errors in this version of the proof can be checked by a computer. Project address: https://github.com/riccardobrasca/flt

How to set Chinese in Cheat Engine? How to set Chinese in ce modifier How to set Chinese in Cheat Engine? How to set Chinese in ce modifier Mar 18, 2024 pm 01:20 PM

Ce Modifier (CheatEngine) is a game modification tool dedicated to modifying and editing game memory. So how to set Chinese in CheatEngine? Next, the editor will tell you how to set Chinese in Ce Modifier. I hope it can Help friends in need. In the new software we download, it can be confusing to find that the interface is not in Chinese. Even though this software was not developed in China, there are ways to convert it to the Chinese version. This problem can be solved by simply applying the Chinese patch. After downloading and installing the CheatEngine (ce modifier) ​​software, open the installation location and find the folder named languages, as shown in the figure below

Teach you how to use the new advanced features of iOS 17.4 'Stolen Device Protection' Teach you how to use the new advanced features of iOS 17.4 'Stolen Device Protection' Mar 10, 2024 pm 04:34 PM

Apple rolled out the iOS 17.4 update on Tuesday, bringing a slew of new features and fixes to iPhones. The update includes new emojis, and EU users will also be able to download them from other app stores. In addition, the update also strengthens the control of iPhone security and introduces more "Stolen Device Protection" setting options to provide users with more choices and protection. "iOS17.3 introduces the "Stolen Device Protection" function for the first time, adding extra security to users' sensitive information. When the user is away from home and other familiar places, this function requires the user to enter biometric information for the first time, and after one hour You must enter information again to access and change certain data, such as changing your Apple ID password or turning off stolen device protection.

How to update Honor MagicOS 8.0 on Honor 90 GT? How to update Honor MagicOS 8.0 on Honor 90 GT? Mar 18, 2024 pm 06:46 PM

Honor 90GT is a cost-effective smartphone with excellent performance and excellent user experience. However, sometimes we may encounter some problems, such as how to update Honor MagicOS8.0 on Honor 90GT? This step may be different for different mobile phones and different models. So, let us discuss how to upgrade the system correctly. How to update Honor MagicOS 8.0 on Honor 90GT? According to news on February 28, Honor today pushed the MagicOS8.0 public beta update for its three mobile phones 90GT/100/100Pro. The package version number is 8.0.0.106 (C00E106R3P1) 1. Ensure your Honor The battery of the 90GT is fully charged;

Planet Mojo: Building a Web3 game metaverse from the auto-chess game Mojo Melee Planet Mojo: Building a Web3 game metaverse from the auto-chess game Mojo Melee Mar 14, 2024 pm 05:55 PM

Popular Metaverse game projects founded in the last crypto cycle are accelerating their expansion. On March 4, PlanetMojo, the Web3 game metaverse platform, announced a number of important developments in its game ecology, including the announcement of the upcoming parkour game GoGoMojo, the launch of the new season "Way of War" in the flagship auto-chess game MojoMelee, and the celebration of the new The first ETH series "WarBannerNFT" launched this season in cooperation with MagicEden. In addition, PlanetMojo also revealed that they plan to launch Android and iOS mobile versions of MojoMelee later this year. This project will be launched at the end of 2021. After nearly two years of hard work in the bear market, it will soon be completed.

DaVinci Resolve Studio now supports AV1 hardware encoding for AMD graphics cards DaVinci Resolve Studio now supports AV1 hardware encoding for AMD graphics cards Mar 06, 2024 pm 10:04 PM

Recent news, lackMagic has launched the 18.5PublicBeta2 public beta update of the DaVinci Resolve Studio video editing software, bringing AV1 encoding support to AMD Radeon graphics cards. After updating to the latest version, AMD graphics card users will be able to take advantage of hardware acceleration for AV1 encoding in DaVinci Resolve Studio. Although the official does not specify the supported architectures or models, it is expected that all AMD graphics card users can try this feature. In 2018, AOMedia released a new video coding standard AV1 (AOMediaVideoCodec1.0). AV1 is produced by a number of

How to set up Cheat Engine in Chinese? Cheat Engine setting Chinese method How to set up Cheat Engine in Chinese? Cheat Engine setting Chinese method Mar 13, 2024 pm 04:49 PM

CheatEngine is a game editor that can edit and modify the game's memory. However, its default language is non-Chinese, which is inconvenient for many friends. So how to set Chinese in CheatEngine? Today, the editor will give you a detailed introduction to how to set up Chinese in CheatEngine. I hope it can help you. Setting method one: 1. Double-click to open the software and click "edit" in the upper left corner. 2. Then click “settings” in the option list below. 3. In the opened window interface, click "languages" in the left column

See all articles