Table of Contents
回复讨论(解决方案)
Home Backend Development PHP Tutorial 百度百科的关键词链接是怎样实现的呢

百度百科的关键词链接是怎样实现的呢

Jun 23, 2016 pm 02:14 PM

关键字链接

百度百科的关键词带有链接。我在想少量关键词,只需要简单替换成链接就可以。可是百度的关键词是狠多的,可能成万上千万个。如果替换上万次,那效率也太低了吧。请教这样的功能是怎样实现的呢?谢谢!

附截图:


回复讨论(解决方案)

我也想知道。

百科的关键词是按类别相关性来分配的,所以不会有很多个关键词。
另外你感觉可能要调用replace函数很多次,这只是PHP的正常实现方式。实际上,用C语言来遍历一次整篇文章即可,这个效率还是远远超过PHP的实现方式的。

dream1206    如果一个类别的关键字有一万个  一篇文章替换一万次;你认为合理不?

dream1206    如果一个类别的关键字有一万个  一篇文章替换一万次;你认为合理不? 你还没明白我的意思,如果算法得当,只需要 遍历一次整篇文章。
替换只是针对文章中的某个字符串,已经检查过的内容并不需要再去检查,明白吗?
当然如果考虑到其它因素,例如关键词冲突例如 研究,研究生 这个功能还是蛮复杂的

我也想知道啊,老师现在逼着我做啊,不会。。

少量的关键词 php有个strtr函数

dream1206    如果一个类别的关键字有一万个  一篇文章替换一万次;你认为合理不?
当然不合理!
但是你为什么不反过来做呢?
抄写一遍文章,对于文章中的每一个词去检查是否在关键词集合中,不就快多了吗?

记得我发过基于 trie 的关键词匹配代码

引用 3 楼 anydy2008 的回复:dream1206    如果一个类别的关键字有一万个  一篇文章替换一万次;你认为合理不?
当然不合理!
但是你为什么不反过来做呢?
抄写一遍文章,对于文章中的每一个词去检查是否在关键词集合中,不就快多了吗?

记得我发过基于 trie 的关键词匹配代码

版主  但我怎么可以知道文章里的是词语呢。
比如:

文章  秦始皇东巡洛阳   

关键词集合  秦始皇  洛阳

程序是不知道应该将文章的  秦始皇在关键词中也匹配,因为它不知道“秦始皇”是个词呢。

这就只能说中文的自身的问题了,比如魔兽世界经典的黑色魔纹胸甲,断句失败就是黑/色魔/纹胸/甲

好吧,我再发一遍

include 'TTrie.php';class wordkey extends TTrie {  function b() {    $t = array_pop($this->buffer);    $this->buffer[] = "<b>$t</b>";  }}$p = new wordkey;$p->set('秦始皇', 'b');$p->set('洛阳', 'b');$t = $p->match('秦始皇东巡洛阳');echo join('', $t);
Copy after login
秦始皇东巡洛阳

TTrie.php
class TTrie {  protected $buffer = array();  protected $dict = array( array() );  protected $input = 0; //字符串当前偏移  protected $backtracking = 0; //字符串回溯位置  public $debug = 0;  public $savematch = 1;  function set($word, $action='') {	if(is_array($word)) {		foreach($word as $k=>$v) $this->set($k, $v);		return;	}	$p = count($this->dict);	$cur = 0; //当前节点号	foreach(str_split($word) as $c) {		if (isset($this->dict[$cur][$c])) { //已存在就下移			$cur = $this->dict[$cur][$c];			continue;		}		$this->dict[$p]= array(); //创建新节点		$this->dict[$cur][$c] = $p; //在父节点记录子节点号		$cur = $p; //把当前节点设为新插入的		$p++;	}	$this->dict[$cur]['acc'] = $action; //一个词结束,标记叶子节点  }  function getto($ch) {	$i =& $this->input; //字符串当前偏移	$p =& $this->backtracking; //字符串回溯位置	$len = strlen($this->doc);	$t = '';	$this->input++;//	while($this->input<$len && $this->doc{$this->input} != $ch) $t .= $this->doc{$this->input++};//	$t .= $this->doc{$this->input++};	do {		if($this->input >= $len) break;		$t .= $this->doc{$this->input};	}while($this->doc{$this->input++} != $ch);	return trim($t);  }	  function match($s) {	$this->doc =& $s;	$this->buffer = array();	$ret = array();	$cur = 0; //当前节点,初始为根节点	$i =& $this->input; //字符串当前偏移	$p =& $this->backtracking; //字符串回溯位置	$i = $p = 0;	$s .= "\0"; //附加结束符	$len = strlen($s);	$buf = '';	while($i < $len) {		$c = $s{$i};		if(isset($this->dict[$cur][$c])) { //如果存在			$cur = $this->dict[$cur][$c]; //转到对应的位置			if(isset($this->dict[$cur][$s[$i+1]])) {//检查下一个字符是否也能匹配,长度优先				$i++;				continue;			}			if(isset($this->dict[$cur]['acc'])) { //是叶子节点,单词匹配!				if($buf != '') {					$this->buffer[] = $buf;					$buf = '';				}				if($this->savematch) $this->buffer[] = substr($s, $p, $i - $p + 1); //取出匹配位置和匹配的词				$ar = explode(',', $this->dict[$cur]['acc']);				call_user_func_array( array($this, array_shift($ar)), $ar );				$p = $i + 1; //设置下一个回溯位置				$cur = 0; //重置当前节点为根节点			}		} else { //不匹配			$buf .= $s{$p}; //substr($s, $p, $i - $p + 1); //保存未匹配位置和未匹配的内容			$cur = 0; //重置当前节点为根节点			$i = $p; //把当前偏移设为回溯位置			$p = $i + 1; //设置下一个回溯位置		}		$i++; //下一个字符	}	if(trim($buf, "\0")) $this->buffer[] = trim($buf, "\0");	return $this->buffer;  }  function __call($method, $param) {	if($this->debug) printf("偏移:%d 回溯:%d\n", $this->input, $this->backtracking);  }}
Copy after login

传说中的 PHP文字高亮 ,很好的class啊……

mark  我是来学习的

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How does session hijacking work and how can you mitigate it in PHP? How does session hijacking work and how can you mitigate it in PHP? Apr 06, 2025 am 12:02 AM

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

Describe the SOLID principles and how they apply to PHP development. Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to debug CLI mode in PHPStorm? How to debug CLI mode in PHPStorm? Apr 01, 2025 pm 02:57 PM

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

How to automatically set permissions of unixsocket after system restart? How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

Framework Security Features: Protecting against vulnerabilities. Framework Security Features: Protecting against vulnerabilities. Mar 28, 2025 pm 05:11 PM

Article discusses essential security features in frameworks to protect against vulnerabilities, including input validation, authentication, and regular updates.

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

See all articles