


A brief discussion on PHP automated code audit technology, a brief discussion on PHP automated audit_PHP tutorial
A brief talk about PHP automated code audit technology, a brief talk about PHP automated audit
Original source: exploit Welcome to share the original to Bole Headlines
0×00
Since there is really nothing to update on the blog, I will summarize what I am doing so far and treat it as a blog, mainly talking about some of the technologies used in the project. There are currently many PHP automated audit tools on the market, including open source ones such as RIPS and Pixy, and commercial versions such as Fortify. RIPS currently only has the first version. Since it does not support PHP object-oriented analysis, the effect is not very satisfactory now. Pixy is a tool based on data flow analysis, but only supports PHP4. Fortify is a commercial version. Due to this limitation, research on it is impossible. Domestic research on PHP automatic auditing is generally done by companies. Currently, most of the tools use simple token flow analysis or are more direct and crude, using regular expressions for matching, and the effect will be very average.
0×01
The technology I want to talk about today is an implementation idea for PHP automated auditing based on static analysis, which is also the idea in my project. In order to carry out more effective variable analysis and taint analysis, and to cope with various flexible syntax expressions in PHP scripts, the effect of regular expressions is definitely not ideal. The idea I introduced is based on code static analysis technology and data Auditing of streaming analytics technology.
First of all, I think an effective audit tool at least contains the following modules:
1. Compile front-end module
Compile front-end module mainly uses abstract syntax tree construction and control flow graph construction methods in compilation technology to convert source code files into a form suitable for back-end static analysis.
2. Global information collection module
This module is mainly used to collect unified information on the analyzed source code files, such as collecting the definitions of how many classes there are in the audit project, and collecting the method names, parameters, And the starting and ending line numbers of the method definition code block are collected to speed up subsequent static analysis.
3. Data flow analysis module
This module is different from the data flow analysis algorithm in compilation technology. In the project, it pays more attention to the processing of the characteristics of the PHP language itself. When the call of a sensitive function is discovered during the inter-process and intra-process analysis of the system, data flow analysis is performed on the sensitive parameters in the function, that is, the specific changes of the variable are tracked to prepare for subsequent taint analysis.
4. Vulnerable code analysis module
This module performs taint data analysis based on global variables, assignment statements and other information collected by the data flow analysis module. Mainly targeting dangerous parameters in sensitive sinks, such as the first parameter in the mysql_query function, the corresponding data flow information is obtained through backtracking. If the parameter is found to have signs of user control during the backtracking process, it will be recorded. If the dangerous parameter has a corresponding code, the purification operation must also be recorded. Complete stain analysis by tracking and analyzing data on dangerous parameters.
0×02
With the module, how to implement an effective process to implement automated auditing? I used the following process:
The general process of the analysis system is as follows:
1. Framework initialization
First, initialize the analysis framework, mainly to collect information about all user-defined classes in the source code project to be analyzed, including class names, class attributes, class method names, and file paths where the classes are located.
These Records are stored in the global context class Context, which is designed using the singleton pattern and is resident in memory to facilitate subsequent analysis and use.
2. Determine Main File
Secondly, determine whether each PHP file is a Main file. In the PHP language, there is no so-called main function. Most PHP files in the Web are divided into two types: call and definition. PHP files of the definition type are used to define some business classes, tool classes, tool functions, etc., and are not provided to The user accesses the PHP file provided to the calling type for calling. What actually handles user requests is the calling type of PHP file, such as the global index.php file. Static analysis is mainly aimed at the PHP file that handles the call type requested by the user, that is, the Main File. The basis for judgment is:
Based on the completion of AST analysis, judge whether the number of code lines of class definitions and method definitions in a PHP file exceeds a range of all code lines in the file. If so, it is regarded as a defined type. The PHP file, otherwise the Main File, is added to the list of file names to be analyzed.
3. Construction of AST abstract syntax tree
This project is developed based on the PHP language itself. For the construction of its AST, we refer to the current excellent implementation of PHP AST construction——PHP Parser.
This open source project is developed based on the PHP language itself and can parse most of PHP's structures such as if, while, switch, array declaration, method call, global variables and other grammatical structures. It can complete part of the compilation front-end processing of this project very well.
4. CFG flow graph construction
Use the CFGBuilder method in the CFGGenerator class. The method is defined as follows:
The specific idea is to use recursion to build CFG. First, input the nodes collection obtained by traversing the AST. During the traversal, the type of the elements (nodes) in the collection is judged, such as whether it is a branch, jump, end, etc. statement, and the CFG is constructed according to the node type.
Here, the jump conditions (conditions) for branch statements and loop statements should be stored on the edges (Edge) in CFG to facilitate data flow analysis.
5. Collection of data flow information
For a block of code, the most effective information worth collecting is assignment statements, function calls, constants (const define), and registered variables (extract parse_str).
The function of the assignment statement is for subsequent variable tracking. In the implementation, I used a structure to represent the assigned value and location. Other data information is identified and obtained based on AST. For example, in a function call, determine whether the variable is escaped, encoded, etc., or whether the called function is a sink (such as mysql_query).
6. Variable purification and encoding information processing
$clearsql = addslashes($sql) ;
Assignment statement, when the right side is a filter function (user-defined filter function or built-in filter function), the return value of the calling function is purified, that is, the purification of $clearsql Tags plus addslashes.
Discover function calls and determine whether the function name is a safe function configured in the configuration file.
If yes, add the sanitization tag to the location symbol.
7. Inter-process analysis
If a call to a user function is found during the audit, inter-process analysis must be performed at this time. The code block of the specific method must be located in the analyzed project and the variables must be brought in for analysis.
The difficulty lies in how to perform variable backtracking, how to deal with methods with the same name in different files, how to support class method call analysis, and how to save user-defined sinks (such as calling the exec function in myexec. If there is no valid purification, then myexec should also be regarded as a dangerous function), how to classify user-defined sinks (such as SQLI XSS XPATH, etc.).
The processing flow is as follows:
8. Taint analysis
After the above process, the last thing to be done is taint analysis, which mainly focuses on some risk functions built into the system, such as echo that may cause XSS. And it is necessary to conduct effective analysis of the dangerous parameters in the dangerous function. These analyzes include determining whether effective purification has been carried out (such as escaping, regular matching, etc.), and formulating algorithms to retrace the previous assignment or other transformation of the variable. This is undoubtedly a test of the engineering capabilities of security researchers and is also the most important stage of automated auditing.
0×03
Through the above introduction, you can see that there are many pitfalls to implement your own automated audit tool. I also encountered many difficulties in my attempts, and static analysis does have certain limitations. For example, the string transformation process that can be easily obtained in dynamic analysis is difficult to implement in static analysis. This is not technically possible. The breakthrough is caused by the limitations of static analysis itself. Therefore, if pure static analysis wants to achieve low false positives and false negatives, after all, some dynamic ideas should be introduced, such as simulating the code in eval and character analysis. String transformation functions and regular expressions for processing, etc. Also, for some MVC-based frameworks, such as CI frameworks, the code is very scattered. For example, the data purification code is placed in the extension of the input class. For PHP applications like this, I think it is difficult to achieve a universal audit framework. It should To be treated individually.
The above is just a rough summary of my current attempts (currently not fully implemented) to share. After all, college dogs are not professionals. I hope it can inspire more and more security researchers to pay attention to this field.

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP is suitable for web development, especially in rapid development and processing dynamic content, but is not good at data science and enterprise-level applications. Compared with Python, PHP has more advantages in web development, but is not as good as Python in the field of data science; compared with Java, PHP performs worse in enterprise-level applications, but is more flexible in web development; compared with JavaScript, PHP is more concise in back-end development, but is not as good as JavaScript in front-end development.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.
