Table of Contents
Key Points
Disadvantages of previous implementations
Update Lexical Analyzer
Update parser
Update compilation phase
Update Zend virtual machine
Conclusion
Home Backend Development PHP Tutorial Re-Implementing the Range Operator in PHP

Re-Implementing the Range Operator in PHP

Feb 15, 2025 am 09:36 AM

SitePoint wonderful article recommendation: Improved PHP range operator implementation

This article is reproduced on SitePoint with the author's authorization. The following content is written by Thomas Punt and introduces the improved implementation method of PHP range operator. If you are interested in PHP internals and adding features to your favorite programming languages, now is a good time to learn!

This article assumes that readers can build PHP from source code. If this is not the case, please first read the "Building PHP" chapter of the PHP internal mechanism book.

Re-Implementing the Range Operator in PHP


In the previous article (tip: make sure you have read it), I showed a way to implement range operators in PHP. However, initial implementations are rarely the best, so this article aims to explore how to improve previous implementations.

Thanks again Nikita Popov for proofreading this article!

Key Points

  • Thomas Punt reimplements the range operator in PHP, moving the computational logic out of the Zend virtual machine, allowing the use of range operators in the context of constant expressions.
  • This reimplementation can be calculated at compile time (for literal operands) or at runtime (for dynamic operands). This not only brings a little benefit to Opcache users, but also allows constant expression functionality to be used with range operators.
  • The reimplementation process involves updating the lexer, parser, compilation stage, and Zend virtual machine. The lexical analyzer implementation remains the same, while the parser implementation is the same as the previous part. The compilation phase does not require updating the Zend/zend_compile.c file, as it already contains the necessary logic to handle binary operations. The Zend virtual machine has been updated to handle execution of the ZEND_RANGE opcode at runtime.
  • In the third part of this series, Punt plans to build this implementation by explaining how to overload this operator. This will enable the object to be used as operands and add appropriate support to the string.

Disadvantages of previous implementations

The initial implementation puts all the logic of the range operator in the Zend virtual machine, which forces the calculation to be performed purely at runtime when executing the ZEND_RANGE opcode. This not only means that for literal operands, the calculations cannot be transferred to compile time, but also means that some functions simply don't work.

In this implementation, we move the range operator logic out of the Zend virtual machine to be able to perform calculations at compile time (for literal operands) or runtime (for dynamic operands). This not only brings a little benefit to Opcache users, but more importantly, allows constant expression functionality to be used with range operators.

Example:

// 作为常量定义
const AN_ARRAY = 1 |> 100;

// 作为初始属性定义
class A
{
    private $a = 1 |> 2;
}

// 作为可选参数的默认值:
function a($a = 1 |> 2)
{
    //
}
Copy after login
Copy after login
Copy after login

So, without further ado, let's reimplement the range operator.

Update Lexical Analyzer

The lexical analyzer implementation remains completely unchanged. The token is first registered in Zend/zend_language_scanner.l (about 1200 lines):

<st_in_scripting>"|>" {
</st_in_scripting>    RETURN_TOKEN(T_RANGE);
}
Copy after login
Copy after login
Copy after login

Then declare in Zend/zend_language_parser.y (about 220 lines):

// 作为常量定义
const AN_ARRAY = 1 |> 100;

// 作为初始属性定义
class A
{
    private $a = 1 |> 2;
}

// 作为可选参数的默认值:
function a($a = 1 |> 2)
{
    //
}
Copy after login
Copy after login
Copy after login

The tokenizer extension must be regenerated again by entering the ext/tokenizer directory and executing the tokenizer_data_gen.sh file.

Update parser

The parser implementation is the same as before. Again we declare the priority and binding of the operator by adding the T_RANGE token to the end of the following line:

<st_in_scripting>"|>" {
</st_in_scripting>    RETURN_TOKEN(T_RANGE);
}
Copy after login
Copy after login
Copy after login

Then we update the expr_without_variable production rules again, but this time the semantic action (code inside the braces) will be slightly different. Update it with the following code (I put it under the T_SPACESHIP rule, about 930 lines):

%token T_RANGE           "|> (T_RANGE)"
Copy after login
Copy after login

This time, we used the zend_ast_create_binary_op function (rather than the zend_ast_create function), which created a ZEND_AST_BINARY_OP node for us. zend_ast_create_binary_op takes an opcode name that will be used to distinguish binary operations during the compilation phase.

Since we are now reusing the ZEND_AST_BINARY_OP node type, there is no need to define a new ZEND_AST_RANGE node type as before in the Zend/zend_ast.h file.

Update compilation phase

This time, there is no need to update the Zend/zend_compile.c file, as it already contains the necessary logic to handle binary operations. So we just need to reuse this logic by setting our operator to the ZEND_AST_BINARY_OP node.

The following is a simplified version of the zend_compile_binary_op function:

%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE
Copy after login
Copy after login

As we can see, it's very similar to the zend_compile_range function we created last time. The two important differences are how to get the opcode type and what happens when both operands are literals.

Opcode type is taken this time from the AST node (rather than hardcoded as last time), because the ZEND_AST_BINARY_OP node stores this value (as shown in the semantic action of the new production rule) to distinguish binary operations. When both operands are literals, the zend_try_ct_eval_binary_op function is called. This function looks like this:

    |   expr T_RANGE expr
            { $$ = zend_ast_create_binary_op(ZEND_RANGE, , ); }
Copy after login

This function obtains a callback from the get_binary_op function (source code) in Zend/zend_opcode.c according to the opcode type. This means we need to update this function next to fit the ZEND_RANGE opcode. Add the following case statement to the get_binary_op function (about 750 lines):

void zend_compile_binary_op(znode *result, zend_ast *ast) /* {{{ */
{
    zend_ast *left_ast = ast->child[0];
    zend_ast *right_ast = ast->child[1];
    uint32_t opcode = ast->attr;

    znode left_node, right_node;
    zend_compile_expr(&left_node, left_ast);
    zend_compile_expr(&right_node, right_ast);

    if (left_node.op_type == IS_CONST && right_node.op_type == IS_CONST) {
        if (zend_try_ct_eval_binary_op(&result->u.constant, opcode,
                &left_node.u.constant, &right_node.u.constant)
        ) {
            result->op_type = IS_CONST;
            zval_ptr_dtor(&left_node.u.constant);
            zval_ptr_dtor(&right_node.u.constant);
            return;
        }
    }

    do {
        // redacted code
        zend_emit_op_tmp(result, opcode, &left_node, &right_node);
    } while (0);
}
/* }}} */
Copy after login

Now we have to define the range_function function. This will be done in the Zend/zend_operators.c file with all other operators:

static inline zend_bool zend_try_ct_eval_binary_op(zval *result, uint32_t opcode, zval *op1, zval *op2) /* {{{ */
{
    binary_op_type fn = get_binary_op(opcode);

    /* don't evaluate division by zero at compile-time */
    if ((opcode == ZEND_DIV || opcode == ZEND_MOD) &&
        zval_get_long(op2) == 0) {
        return 0;
    } else if ((opcode == ZEND_SL || opcode == ZEND_SR) &&
        zval_get_long(op2)      return 0;
    }

    fn(result, op1, op2);
    return 1;
}
/* }}} */
Copy after login

The function prototype contains two new macros: ZEND_API and ZEND_FASTCALL. ZEND_API is used to control the visibility of a function by making it available to compile into an extension of a shared object. ZEND_FASTCALL is used to ensure that more efficient calling conventions are used, where the first two parameters will be passed in registers instead of stacks (more relevant for 64-bit builds on x86 than for 32-bit builds).

Function body is very similar to what we have in the Zend/zend_vm_def.h file in the previous article. VM-specific content no longer exists, including the HANDLE_EXCEPTION macro call (replaced with return FAILURE;), and the ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION macro call has been completely removed (this check and operation needs to be kept in the VM, so the macro will be called later from the VM code ). Furthermore, as mentioned earlier, we avoid using the GET_OPn_ZVAL_PTR pseudo-macro (rather than the GET_OPn_ZVAL_PTR_DEREF) to process references in the VM.

Another notable difference is that we are applying ZVAL_DEFEF to both operands to ensure that references are processed correctly. This was previously done using the pseudo-macro GET_OPn_ZVAL_PTR_DEREF inside the VM, but has now been transferred to this function. This is not done because it needs to be compiled at (because for compile-time processing both operands must be literals and they cannot be referenced), but because it enables range_function to be safely called elsewhere in the code base, Without worrying about reference processing. Therefore, most operator functions (except where performance is critical) perform reference processing, rather than in their VM opcode definitions.

Finally, we have to add the range_function prototype to the Zend/zend_operators.h file:

// 作为常量定义
const AN_ARRAY = 1 |> 100;

// 作为初始属性定义
class A
{
    private $a = 1 |> 2;
}

// 作为可选参数的默认值:
function a($a = 1 |> 2)
{
    //
}
Copy after login
Copy after login
Copy after login

Update Zend virtual machine

Now we have to update the Zend virtual machine again to handle the execution of the ZEND_RANGE opcode at runtime. Put the following code in Zend/zend_vm_def.h (bottom):

<st_in_scripting>"|>" {
</st_in_scripting>    RETURN_TOKEN(T_RANGE);
}
Copy after login
Copy after login
Copy after login

(Again, the opcode number must be one larger than the current highest opcode number, which can be seen at the bottom of the Zend/zend_vm_opcodes.h file.)

The definition this time is much shorter, because all work is handled in range_function. We just need to call this function and pass in the result operand of the current opline to save the calculated value. Exception checks removed from range_function and skip to the next opcode are still processed in the VM by a call to ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION. Furthermore, as mentioned earlier, we avoid using the GET_OPn_ZVAL_PTR pseudo-macro (rather than the GET_OPn_ZVAL_PTR_DEREF) to process references in the VM.

Now regenerate the VM by executing the Zend/zend_vm_gen.php file.

Finally, the beautiful printer needs to update the Zend/zend_ast.c file again. Update the priority table comment (about 520 lines):

%token T_RANGE           "|> (T_RANGE)"
Copy after login
Copy after login

Then, insert a case statement in the zend_ast_export_ex function to process the ZEND_RANGE opcode (about 1300 lines):

%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE
Copy after login
Copy after login

Conclusion

This article shows an alternative to implementing range operators, where the computational logic has been moved from the VM. This has the advantage of being able to use range operators in the context of constant expressions.

The third part of this series of articles will be built on this implementation, explaining how to overload this operator. This will allow objects to be used as operands (such as objects from GMP libraries or objects that implement __toString methods). It will also show how to add appropriate support to strings (unlike the ones seen in PHP's current range functions). But for now, I hope this is a good demonstration of some deeper aspects of ZE when implementing operators into PHP.

The above is the detailed content of Re-Implementing the Range Operator in PHP. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How does session hijacking work and how can you mitigate it in PHP? How does session hijacking work and how can you mitigate it in PHP? Apr 06, 2025 am 12:02 AM

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

Describe the SOLID principles and how they apply to PHP development. Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to debug CLI mode in PHPStorm? How to debug CLI mode in PHPStorm? Apr 01, 2025 pm 02:57 PM

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

Framework Security Features: Protecting against vulnerabilities. Framework Security Features: Protecting against vulnerabilities. Mar 28, 2025 pm 05:11 PM

Article discusses essential security features in frameworks to protect against vulnerabilities, including input validation, authentication, and regular updates.

How to automatically set permissions of unixsocket after system restart? How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

See all articles