Home Backend Development PHP Tutorial Two methods of converting Chinese characters to Pinyin in PHP + PHP method of extracting Chinese characters (Chinese)_PHP tutorial

Two methods of converting Chinese characters to Pinyin in PHP + PHP method of extracting Chinese characters (Chinese)_PHP tutorial

Jul 13, 2016 am 10:29 AM
web development Enterprise security information Technology security software database mobile development system security Website security cyber security network technology software development

Method 1: According to ASCII code conversion, the GB2312 library cannot do anything about polyphonic characters.

The GB2312 standard contains a total of 6763 Chinese characters. Chinese characters that are not within the range cannot be converted, such as the character "镕" written by former Chinese Prime Minister Zhu Rongji.

In GB2312, the collected Chinese characters are "partitioned", and each zone contains 94 Chinese characters/symbols. This representation is also called location code.
Areas 01-09 are special symbols.
Areas 16-55 are first-level Chinese characters, sorted by pinyin. (3755)
Areas 56-87 are second-level Chinese characters, sorted by radical/stroke. (3008)
Areas 10-15 and 88-94 are not coded.
The occupied code bits are 72*94=6768. Among them, 5 vacancies are D7FA-D7FE. So there are actually 6763 Chinese characters in total. Among them, there are 3755 first-level Chinese characters and 3008 second-level Chinese characters. This type of algorithm can actually convert only 3755 Chinese characters.


Advantages: No large text library is used, the file is relatively small, regular expressions are not used, and the performance is relatively high. Support initial letter conversion.
Disadvantages: Chinese characters not included in GB2312 cannot be converted, and multi-phonetic characters cannot be recognized.
(If you don’t have high requirements for pinyin conversion, it is recommended to use this.)

[php] view plaincopyTwo methods of converting Chinese characters to Pinyin in PHP + PHP method of extracting Chinese characters (Chinese)_PHP tutorialTwo methods of converting Chinese characters to Pinyin in PHP + PHP method of extracting Chinese characters (Chinese)_PHP tutorial
  • // This type is converted based on ASCII code, and the GB2312 library cannot do anything with polyphonic characters.
  • // The GB2312 standard contains a total of 6763 Chinese characters. Chinese characters that are not within the range cannot be converted, such as the character "镕" written by former Chinese Prime Minister Zhu Rongji.  
  • class pinyin{  
  •     public static function utf8_to($s, $isfirst = false) {  
  •         return self::to(self::utf8_to_gb2312($s), $isfirst);  
  •     }  
  •   
  •     public static function utf8_to_gb2312($s) {  
  •         return iconv('UTF-8', 'GB2312//IGNORE', $s);  
  •     }  
  •   
  •     // 字符串必须为GB2312编码  
  •     public static function to($s, $isfirst = false) {  
  •         $res = '';  
  •         $len = strlen($s);  
  •         $pinyin_arr = self::get_pinyin_array();  
  •         for($i=0; $i
  •             $ascii = ord($s{$i});  
  •             if($ascii > 0x80) {  
  •                 $ascii2 = ord($s{++$i});  
  •                 $ascii = $ascii * 256 + $ascii2 - 65536;  
  •             }  
  •   
  •             if($ascii  0) {  
  •                 if(($ascii >= 48 && $ascii = 97 && $ascii 
  •                     $res .= $s{$i}; // 0-9 a-z  
  •                 }elseif($ascii >= 65 && $ascii 
  •                     $res .= strtolower($s{$i}); // A-Z  
  •                 }else{  
  •                     $res .= '_';  
  •                 }  
  •             }elseif($ascii  -10247) {  
  •                 $res .= '_';  
  •             }else{  
  •                 foreach($pinyin_arr as $py=>$asc) {  
  •                     if($asc 
  •                         $res .= $isfirst ? $py{0} : $py;  
  •                         break;  
  •                     }  
  •                 }  
  •             }  
  •         }  
  •         return $res;  
  •     }  
  •   
  •     public static function to_first($s) {  
  •         $ascii = ord($s{0});  
  •         if($ascii > 0xE0) {  
  •             $s = self::utf8_to_gb2312($s{0}.$s{1}.$s{2});  
  •         }elseif($ascii 
  •             if($ascii >= 65 && $ascii 
  •                 return strtolower($s{0});  
  •             }elseif($ascii >= 97 && $ascii 
  •                 return $s{0};  
  •             }else{  
  •                 return false;  
  •             }  
  •         }  
  •   
  •         if(strlen($s) 
  •             return false;  
  •         }  
  •   
  •         $asc = ord($s{0}) * 256 + ord($s{1}) - 65536;  
  •   
  •         if($asc>=-20319 && $asc
  •         if($asc>=-20283 && $asc
  •         if($asc>=-19775 && $asc
  •         if($asc>=-19218 && $asc
  •         if($asc>=-18710 && $asc
  •         if($asc>=-18526 && $asc
  •         if($asc>=-18239 && $asc
  •         if($asc>=-17922 && $asc
  •         if($asc>=-17417 && $asc
  •         if($asc>=-16474 && $asc
  •         if($asc>=-16212 && $asc
  •         if($asc>=-15640 && $asc
  •         if($asc>=-15165 && $asc
  •         if($asc>=-14922 && $asc
  •         if($asc>=-14914 && $asc
  •         if($asc>=-14630 && $asc
  •         if($asc>=-14149 && $asc
  •         if($asc>=-14090 && $asc
  •         if($asc>=-13318 && $asc
  •         if($asc>=-12838 && $asc
  •         if($asc>=-12556 && $asc
  •         if($asc>=-11847 && $asc
  •         if($asc>=-11055 && $asc
  •         return false;  
  •     }
  • public static function get_pinyin_array() {
  • static $py_arr;
  • if(isset($py_arr)) return $py_arr;
  •                                                                                                                                                         $k,           'a|ai|an|ang|ao|ba|bai|ban|bang|bao|bei|ben|beng|bi|bian|biao|bie|bin|bing|bo|bu|ca |cai|can|cang|cao|ce|ceng|cha|chai|chan|chang|chao|che|chen|cheng|chi|chong|chou|chu|chuai|chuan|chuang|chui|chun|chuo|ci |cong|cou|cu|cuan|cui|cun|cuo|da|dai|dan|dang|dao|de|deng|di|dian|diao|die|ding|diu|dong|dou|du|duan|dui |dun|duo|e|en|er|fa|fan|fang|fei|fen|feng|fo|fou|fu|ga|gai|gan|gang|gao|ge|gei|gen|geng|gong|gou |gu|gua|guai|guan|guang|gui|gun|guo|ha|hai|han|hang|hao|he|hei|hen|heng|hong|hou|hu|hua|huai|huan|huang|hui |hun|huo|ji|jia|jian|jiang|jiao|jie|jin|jing|jiong|jiu|ju|juan|jue|jun|ka|kai|kan|kang|kao|ke|ken|keng|kong |kou|ku|kua|kuai|kuan|kuang|kui|kun|kuo|la|lai|lan|lang|lao|le|lei|leng|li|lia|lian|liang|liao|lie|lin|ling |liu|long|lou|lu|lv|luan|lue|lun|luo|ma|mai|man|mang|mao|me|mei|men|meng|mi|mian|miao|mie|min|ming|miu |mo|mou|mu|na|nai|nan|nang|nao|ne|nei|nen|neng|ni|nian|niang|niao|nie|nin|ning|niu|nong|nu|nv|nuan|nue |nuo|o|ou|pa|pai|pan|pang|pao|pei|pen|peng|pi|pian|piao|pie|pin|ping|po|pu|qi|qia|qian|qiang|qiao|qie |qin|qing|qiong|qiu|qu|quan|que|qun|ran|rang|rao|re|ren|reng|ri|rong|rou|ru|ruan|rui|run|ruo|sa|sai|san |sang|sao|se|sen|seng|sha|shai|shan|shang|shao|she|shen|sheng|shi|shou|shu|shua|shuai|shuan|shuang|shui|shun|shuo|si|song |sou|su|suan|sui|sun|suo|ta|tai|tan|tang|tao|te|teng|ti|tian|tiao|tie|ting|tong|tou|tu|tuan|tui|tun|tuo |wa|wai|wan|wang|wei|wen|weng|wo|wu|xi|xia|xian|xiang|xiao|xie|xin|xing|xiong|xiu|xu|xuan|xue|xun|ya|yan |yang|yao|ye|yi|yin|ying|yo|yong|you|yu|yuan|yue|yun|za|zai|zan|zang|zao|ze|zei|zen|zeng|zha|zhai|zhan |zhang|zhao|zhe|zhen|zheng|zhi|zhong|zhou|zhu|zhua|zhuai|zhuan|zhuang|zhui|zhun|zhuo|zi|zong|zou|zu|zuan|zui|zun|zuo';
  •         $v = '-20319|-20317|-20304|-20295|-20292|-20283|-20265|-20257|-20242|-20230|-20051|-20036|-20032|-20026|-20002|-19990|-19986|-19982|-19976|-19805|-19784|-19775|-19774|-19763|-19756|-19751|-19746|-19741|-19739|-19728|-19725|-19715|-19540|-19531|-19525|-19515|-19500|-19484|-19479|-19467|-19289|-19288|-19281|-19275|-19270|-19263|-19261|-19249|-19243|-19242|-19238|-19235|-19227|-19224|-19218|-19212|-19038|-19023|-19018|-19006|-19003|-18996|-18977|-18961|-18952|-18783|-18774|-18773|-18763|-18756|-18741|-18735|-18731|-18722|-18710|-18697|-18696|-18526|-18518|-18501|-18490|-18478|-18463|-18448|-18447|-18446|-18239|-18237|-18231|-18220|-18211|-18201|-18184|-18183|-18181|-18012|-17997|-17988|-17970|-17964|-17961|-17950|-17947|-17931|-17928|-17922|-17759|-17752|-17733|-17730|-17721|-17703|-17701|-17697|-17692|-17683|-17676|-17496|-17487|-17482|-17468|-17454|-17433|-17427|-17417|-17202|-17185|-16983|-16970|-16942|-16915|-16733|-16708|-16706|-16689|-16664|-16657|-16647|-16474|-16470|-16465|-16459|-16452|-16448|-16433|-16429|-16427|-16423|-16419|-16412|-16407|-16403|-16401|-16393|-16220|-16216|-16212|-16205|-16202|-16187|-16180|-16171|-16169|-16158|-16155|-15959|-15958|-15944|-15933|-15920|-15915|-15903|-15889|-15878|-15707|-15701|-15681|-15667|-15661|-15659|-15652|-15640|-15631|-15625|-15454|-15448|-15436|-15435|-15419|-15416|-15408|-15394|-15385|-15377|-15375|-15369|-15363|-15362|-15183|-15180|-15165|-15158|-15153|-15150|-15149|-15144|-15143|-15141|-15140|-15139|-15128|-15121|-15119|-15117|-15110|-15109|-14941|-14937|-14933|-14930|-14929|-14928|-14926|-14922|-14921|-14914|-14908|-14902|-14894|-14889|-14882|-14873|-14871|-14857|-14678|-14674|-14670|-14668|-14663|-14654|-14645|-14630|-14594|-14429|-14407|-14399|-14384|-14379|-14368|-14355|-14353|-14345|-14170|-14159|-14151|-14149|-14145|-14140|-14137|-14135|-14125|-14123|-14122|-14112|-14109|-14099|-14097|-14094|-14092|-14090|-14087|-14083|-13917|-13914|-13910|-13907|-13906|-13905|-13896|-13894|-13878|-13870|-13859|-13847|-13831|-13658|-13611|-13601|-13406|-13404|-13400|-13398|-13395|-13391|-13387|-13383|-13367|-13359|-13356|-13343|-13340|-13329|-13326|-13318|-13147|-13138|-13120|-13107|-13096|-13095|-13091|-13076|-13068|-13063|-13060|-12888|-12875|-12871|-12860|-12858|-12852|-12849|-12838|-12831|-12829|-12812|-12802|-12607|-12597|-12594|-12585|-12556|-12359|-12346|-12320|-12300|-12120|-12099|-12089|-12074|-12067|-12058|-12039|-11867|-11861|-11847|-11831|-11798|-11781|-11604|-11589|-11536|-11358|-11340|-11339|-11324|-11303|-11097|-11077|-11067|-11055|-11052|-11045|-11041|-11038|-11024|-11020|-11019|-11018|-11014|-10838|-10832|-10815|-10800|-10790|-10780|-10764|-10587|-10544|-10533|-10519|-10331|-10329|-10328|-10322|-10315|-10309|-10307|-10296|-10281|-10274|-10270|-10262|-10260|-10256|-10254';  
  •         $key = explode('|', $k);  
  •         $val = explode('|', $v);  
  •         $py_arr = array_combine($key, $val);  
  •         arsort($py_arr);  
  •   
  • return $py_arr;
  •  } 
  • }
  • /*
  • var_dump(0xE0);
  • for($i=0; $i
  • var_dump("$i :". chr($i));
  • }
  • */
  • var_dump(pinyin::utf8_to('PHP Chinese characters to pinyin'));
  • var_dump(pinyin::utf8_to('GB2312 standard contains a total of 6763 Chinese characters. Chinese characters that are not within the range cannot be converted, such as the word "镕" by former Chinese Prime Minister Zhu Rongji.'));
  • var_dump(pinyin::utf8_to(''1234567890-=QWERTYUIOP[]ASDFGHJKL;ZXCVBNM,./abcdefghijklmnopqrstuvwxyz'));
  • var_dump(pinyin::utf8_to('PHP Chinese character to pinyin type', 1));
  • var_dump(pinyin::utf8_to('The GB2312 standard contains a total of 6763 Chinese characters. Chinese characters that are not within the range cannot be converted, such as: the word "镕" by former Chinese Prime Minister Zhu Rongji.', 1));
  • var_dump(pinyin::utf8_to('`1234567890-=QWERTYUIOP[]ASDFGHJKL;ZXCVBNM,./abcdefghijklmnopqrstuvwxyz', 1));
  • var_dump(pinyin::to_first('PHP Chinese characters to pinyin'));
  • var_dump(pinyin::to_first('The GB2312 standard contains a total of 6763 Chinese characters. Chinese characters that are not within the range cannot be converted, such as: the word "镕" for former Chinese Prime Minister Zhu Rongji.'));
  • var_dump(pinyin::to_first('▂`1234567890-=QWERTYUIOP[]ASDFGHJKL;ZXCVBNM,./abcdefghijklmnopqrstuvwxyz'));
  • ?>

  • Method 2: Array retrieval based on pinyin combination

    [php] view plaincopyTwo methods of converting Chinese characters to Pinyin in PHP + PHP method of extracting Chinese characters (Chinese)_PHP tutorialTwo methods of converting Chinese characters to Pinyin in PHP + PHP method of extracting Chinese characters (Chinese)_PHP tutorial
  • class pinyin{  
  •     private $d=array(  
  •        array("a",-20319),  
  •        array("ai",-20317),  
  •        array("an",-20304),  
  •        array("ang",-20295),  
  •        array("ao",-20292),  
  •        array("ba",-20283),  
  •        array("bai",-20265),  
  •        array("ban",-20257),  
  •        array("bang",-20242),  
  •        array("bao",-20230),  
  •        array("bei",-20051),  
  •        array("ben",-20036),  
  •        array("beng",-20032),  
  •        array("bi",-20026),  
  •        array("bian",-20002),  
  •        array("biao",-19990),  
  •        array("bie",-19986),  
  •        array("bin",-19982),  
  •        array("bing",-19976),  
  •        array("bo",-19805),  
  •        array("bu",-19784),  
  • Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    Video Face Swap

    Video Face Swap

    Swap faces in any video effortlessly with our completely free AI face swap tool!

    Hot Tools

    Notepad++7.3.1

    Notepad++7.3.1

    Easy-to-use and free code editor

    SublimeText3 Chinese version

    SublimeText3 Chinese version

    Chinese version, very easy to use

    Zend Studio 13.0.1

    Zend Studio 13.0.1

    Powerful PHP integrated development environment

    Dreamweaver CS6

    Dreamweaver CS6

    Visual web development tools

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)

    iOS 18 adds a new 'Recovered' album function to retrieve lost or damaged photos iOS 18 adds a new 'Recovered' album function to retrieve lost or damaged photos Jul 18, 2024 am 05:48 AM

    Apple's latest releases of iOS18, iPadOS18 and macOS Sequoia systems have added an important feature to the Photos application, designed to help users easily recover photos and videos lost or damaged due to various reasons. The new feature introduces an album called "Recovered" in the Tools section of the Photos app that will automatically appear when a user has pictures or videos on their device that are not part of their photo library. The emergence of the "Recovered" album provides a solution for photos and videos lost due to database corruption, the camera application not saving to the photo library correctly, or a third-party application managing the photo library. Users only need a few simple steps

    Detailed tutorial on establishing a database connection using MySQLi in PHP Detailed tutorial on establishing a database connection using MySQLi in PHP Jun 04, 2024 pm 01:42 PM

    How to use MySQLi to establish a database connection in PHP: Include MySQLi extension (require_once) Create connection function (functionconnect_to_db) Call connection function ($conn=connect_to_db()) Execute query ($result=$conn->query()) Close connection ( $conn->close())

    How to handle database connection errors in PHP How to handle database connection errors in PHP Jun 05, 2024 pm 02:16 PM

    To handle database connection errors in PHP, you can use the following steps: Use mysqli_connect_errno() to obtain the error code. Use mysqli_connect_error() to get the error message. By capturing and logging these error messages, database connection issues can be easily identified and resolved, ensuring the smooth running of your application.

    What are the advantages and disadvantages of C++ compared to other web development languages? What are the advantages and disadvantages of C++ compared to other web development languages? Jun 03, 2024 pm 12:11 PM

    The advantages of C++ in web development include speed, performance, and low-level access, while limitations include a steep learning curve and memory management requirements. When choosing a web development language, developers should consider the advantages and limitations of C++ based on application needs.

    How to use database callback functions in Golang? How to use database callback functions in Golang? Jun 03, 2024 pm 02:20 PM

    Using the database callback function in Golang can achieve: executing custom code after the specified database operation is completed. Add custom behavior through separate functions without writing additional code. Callback functions are available for insert, update, delete, and query operations. You must use the sql.Exec, sql.QueryRow, or sql.Query function to use the callback function.

    How to save JSON data to database in Golang? How to save JSON data to database in Golang? Jun 06, 2024 am 11:24 AM

    JSON data can be saved into a MySQL database by using the gjson library or the json.Unmarshal function. The gjson library provides convenience methods to parse JSON fields, and the json.Unmarshal function requires a target type pointer to unmarshal JSON data. Both methods require preparing SQL statements and performing insert operations to persist the data into the database.

    The potential of C++ in mobile app development: Talent and resources The potential of C++ in mobile app development: Talent and resources Jun 03, 2024 pm 03:11 PM

    C++ has great potential in mobile development because of: a huge developer community and rich learning resources; efficient memory management and low-level control, bringing excellent performance; WORA model, which can be written once and run across Android, iOS, and Windows; widely Used for game engine development, low latency and resource management functions meet high-performance game requirements.

    MySQL: Simple Concepts for Easy Learning MySQL: Simple Concepts for Easy Learning Apr 10, 2025 am 09:29 AM

    MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

    See all articles