PHP can also implement lexical analysis and custom languages!
There was a requirement in the previous project. The business staff wrote some custom formulas in Chinese, and then we needed to execute the results in the background and return the results to the interface, so we wrote this lexical analyzer based on the finite state machine. It's relatively simple, I hope it can inspire others.
1. Analysis requirements
Enter the Chinese formula and return the result, for example:
现有薪资=10000; 个税起点=3000; 当前年份=2021; 如果(当前年份=2022){ 个税起点=5000; } 返回 (现有薪资-个税起点) * 0.2;
2. Implement the requirements
The initial idea is to use string replacement to replace Chinese keywords with PHP keywords, and then call eval to execute. This is indeed possible, but I always feel that it is not It's beautiful and cannot be parsed dynamically. I just thought about implementing a simple lexical analysis by myself, and then combined with ast to convert the lexicon into php code for execution, wouldn't it be fun? The current version does not use abstract syntax trees to generate code, and all uses string concatenation. [Recommended learning: PHP video tutorial]
<?php /** * Class Lexer * @package Sett\OaLang * 词法分析器 */ class Lexer { // 内置关键字集合 public $keywordList = []; // 内置操作符集合 public $operatorList = [ "+", "-", "*", "/", "=", ">", "<", "!", "(", ")", "{", "}", ",", ";" ]; // 源代码 private $input; // 当前的字符 private $currChar; // 当前字符位置 private $currCharPos = 0; // 结束符 private $eof = "eof"; // 当前编码 private $currEncode = "UTF-8"; // 内置关键字 public const VAR = "variable"; public const STR = "string"; public const KW = "keyword"; public const OPR = "operator"; public const INT = "integer"; public const NIL = "null"; /** * Lexer constructor. * @param string $input */ public function __construct(string $input) { $this->input = $input; $this->currChar = mb_substr($this->input, $this->currCharPos, 1); } /** * @param array $keywordList */ public function setKeywordList($keywordList) { $this->keywordList = $keywordList; } /** * @return array * @throws Exception */ public function parseInput() { if ($this->input == "") { throw new Exception("code can not be empty"); } $tokens = []; do { $token = $this->nextToken(); if ($token["type"] != "eof") { $tokens[] = $token; } if ($token["type"] == self::KW) { $tokens[] = $this->makeToken(self::NIL, " "); } } while ($token["type"] != "eof"); return $tokens; } /** * @return array */ public function nextToken() { $this->skipBlankChar(); $this->currChar == "" && $this->currChar = $this->eof; if ($this->isCnLetter()) { $word = $this->matchUntilNextCharIsNotCn(); if ($this->isKeyword($word)) { $this->currCharPos -= 1; return $this->currToken(static::KW, $word); } // 不是关键字的全部归为变量 return $this->makeToken(static::VAR, $word); } // 如果是操作符 if ($this->isOperator()) { return $this->currToken(static::OPR, $this->currChar); } // 如果是数字 if ($this->isNumber()) { return $this->currToken(static::INT, $this->currChar); } // 如果是字符串 if ($str = $this->isStr()) { return $this->currToken(static::STR, $str); } // 如果是变量 if ($this->isVar()) { $word = $this->matchVar(); if ($this->isKeyword($word)) { return $this->currToken(static::KW, $word); } return $this->makeToken(static::VAR, $word); } if ($this->currChar == $this->eof) { return $this->currToken('eof', $this->currChar); } return $this->currToken(static::VAR, $this->currChar); } /** * @param string $input * @return string */ private function matchVar(string $input = "") { $word = $input ?: ''; while ($this->isVar()) { $word .= $this->currChar; $this->nextChar(); } return $word; } /** * @return bool * 是否为普通变量 */ private function isVar() { return $this->isCnLetter() || $this->isEnLetter(); } /** * 跳过空白字符 */ private function skipBlankChar() { while (ord($this->currChar) == 10 || ord($this->currChar) == 13 || ord($this->currChar) == 32) { $this->nextChar(); } } /** * @param string $type * @param $word * @return array * 记录当前token和下一个字符 */ private function currToken(string $type, $word) { $token = $this->makeToken($type, $word); $this->nextChar(); return $token; } /** * @param string $type * @param string $char * @return array */ private function makeToken(string $type, string $char) { return ["type" => $type, "char" => $char, "pos" => $this->currCharPos]; } /** * @return bool * 判断是否是英文字符 */ private function isEnLetter() { if ($this->currChar == "" || $this->currChar == $this->eof) { return false; } $ord = mb_ord($this->currChar, $this->currEncode); if ($ord > ord('a') && $ord < ord('z')) { return true; } return false; } /** * @return false|int * 是否中文字符 */ private function isCnLetter() { return preg_match("/^[\x{4e00}-\x{9fa5}]+$/u", $this->currChar); } /** * @return bool * 是否为数字 */ private function isNumber() { return is_numeric($this->currChar); } /** * @return bool * 是否是字符串 */ private function isStr() { return $this->matchCompleteStr(); } /** * @return string * 匹配完整字符串 */ private function matchCompleteStr() { $char = ""; if ($this->currChar == "\"") { $this->nextChar(); while ($this->currChar != "\"") { if ($this->currChar != "\"") { $char .= $this->currChar; } $this->nextChar(); } return $char; } return $char; } /** * @return bool * 是否是操作符 */ private function isOperator() { return in_array($this->currChar, $this->operatorList); } /** * @return string * 匹配中文字符 */ private function matchUntilNextCharIsNotCn() { $char = ""; while ($this->isCnLetter()) { $char .= $this->currChar; $this->nextChar(); } return $char; } /** * @return void 获取下一个字符 * 获取下一个字符 */ private function nextChar() { $this->currCharPos += 1; $this->currChar = mb_substr($this->input, $this->currCharPos, 1); if ($this->currChar == "") { $this->currChar = $this->eof; } } /** * @param string $input * @return bool * 是否是关键字 */ private function isKeyword(string $input) { return ($this->keywordList[$input] ?? "") != ""; } public function convert(array $tokens) { $code = ""; foreach ($this->lexerIterator($tokens) as $generator) { switch ($generator["type"]) { case static::KW: $code .= $this->keywordList[$generator["char"]]; break; case static::VAR: $code .= sprintf("$%s", $generator["char"]); break; case static::OPR: $code .= $this->replace($generator["char"]); break; case static::INT: $code .= $generator["char"]; break; case static::STR: $code .= sprintf("\"%s\"", $generator["char"]); break; default: $code .= $generator["char"]; } } return $code; } private function replace(string $char) { return str_replace("+", ".", $char); } /** * @param array $tokens * @return \Generator */ private function lexerIterator(array $tokens) { foreach ($tokens as $index => $token) { yield $token; } } }
3. How to use
require __DIR__ . "/vendor/autoload.php"; // 定义一段代码 $code = <<<EOF 姓名="腕豪"; 问候="你好啊"; 地址=(1+2) * 3; 如果(地址 > 3){ 地址=1; }否则{ 地址="艾欧尼亚" } 说话 = ("我"+"爱")+"你"; 返回 姓名+年龄; EOF; $lexer = new Lexer($code); // 自定义你的关键字 $kwMap = [ "如果" => "if", "否则" => "else", "返回" => "return", "否则如果" => "elseif" ]; $lexer->setKeywordList($kwMap); // 这里是生成的词 $tokens = $lexer->parseInput(); // 将生成的词转成php,当然你也可以尝试用php-parse转ast再转成php,这里只是简单的拼接 var_dump($lexer->convert($tokens));
to generate words
[{ "type": "variable", "char": "姓名", "pos": 2}, { "type": "operator", "char": "=", "pos": 2}, { "type": "string", "char": "腕豪", "pos": 7}, { "type": "operator", "char": ";", "pos": 8}, { "type": "variable", "char": "问候", "pos": 13}, { "type": "operator", "char": "=", "pos": 13}, { "typ e": "string", "char": "你好啊", "pos": 17}, { "type": "operator", "char": ";", "pos": 18}, { "type": "variable", "char": "地址", "pos": 23}, { "type": "operator", "char": "=", "pos": 23}, { "type": "operator", "char": "(", "pos": 24}, { "type": "integer", "char": "1", "pos": 25}, { "type": "operator", "char": " +", "pos": 26}, { "type": "integer", "char": "2", "pos": 27}, { "type": "operator", "char": ")", "pos": 28}, { "type": "operator", "char": "*", "pos": 30}, { "type": "integer", "char": "3", "pos": 32}, { "type": "operator", "char": ";", "pos": 33}, { "type": "keyword", "char": "如果", "pos": 37}, { "type": "nul l", "char": " ", "pos": 38}, { "type": "operator", "char": "(", "pos": 38}, { "type": "variable", "char": "地址", "pos": 41}, { "type": "operator", "char": ">", "pos": 42}, { "type": "integer", "char": "3", "pos": 44}, { "type": "operator", "char": ")", "pos": 45}, { "type": "operator", "char": "{", "pos": 46}, { "type": "variable", "char": "地址", "pos": 55}, { "type": "operator", "char": "=", "pos": 55}, { "type": "integer", "char": "1", "pos": 56}, { "type": "operator", "char": ";", "pos": 57}, { "type": "operator", "char": "}", "pos": 60}, { "type": "keyword", "char": "否则", "pos": 62}, { "type": "null", "char ": " ", "pos": 63}, { "type": "operator", "char": "{", "pos": 63}, { "type": "variable", "char": "地址", "pos": 72}, { "type": "operator", "char": "=", "pos": 72}, { "type": "string", "char": "艾欧尼亚", "pos": 78}, { "type": "operator", "char": ";", "pos": 79}, { "type": "operator", "char": "}", "pos": 82}, { "type": "variable", "char": "说话", "pos": 87}, { "type": "operator", "char": "=", "pos": 88}, { "type": "operator", "char": "(", "pos": 90}, { "type": "string", "char": "我", "pos": 93}, { "type": "operator", "char": "+", "pos": 94}, { "type": "string", "char": "爱", "pos": 97}, { "type": "operator", "char": ")", "pos": 98}, { "type": "operator", "char": "+", "pos": 99}, { "type": "string", "char": "你", "pos": 102}, { "type": "operator", "char": ";", "pos": 103}, { "type": "keyword", "char": "返回", "pos": 107}, { "type": "null", "char": " ", "pos": 108}, { "type": "variable", "char": "姓名", "pos": 111}, { "typ e": "operator", "char": "+", "pos": 111}, { "type": "variable", "char": "年龄", "pos": 114}, { "type": "operator", "char": ";", "pos": 114}]
Output:
$姓名="腕豪";$问候="你好啊";$地址=(1.2)*3;if ($地址>3){$地址=1;}else {$地址="艾欧尼亚";}$说话=("我"."爱")."你";return $姓名.$年龄;
Can it be executed? Of course. There are still some small bugs that I don’t want to change.
4. Usage Scenarios
What, some people actually say it’s useless? The oa system will always be useful.
The above is the detailed content of PHP can also implement lexical analysis and custom languages!. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.
