Table of Contents
The definition and use of strings
String conversion
The essence of PHP string
Multi-byte strings
 Zend Multibyte
Home Backend Development PHP Tutorial Strings, encoding, UTF-8 in PHP

Strings, encoding, UTF-8 in PHP

Jan 23, 2017 pm 02:58 PM

I have read a lot of articles on coding recently, so I divided it into two blog posts to talk about "PHP, strings, encoding, UTF-8" related knowledge. This blog post is the first half, divided into two. There are four major pieces of content, namely "Definition and Use of Strings", "String Conversion", "Essence of PHP Strings", and "Multi-byte Strings". The first half is relatively basic.

The definition and use of strings

There are four ways to set strings in PHP:

Single quoted strings

The single-quoted string is similar to the original string in Python, which means that the single-quoted string does not have variable parsing function and special character escaping function. For example, $str='hello\nworld', the \n does not have a newline function.

Double-quoted string

Double-quoted string has variable parsing function and special character escaping function that single-quoted string does not have.

I am very interested in the special escape of hexadecimal and octal strings. I would like to add:

1

2

\[0-7]{1,3} #八进制表达方式

\x[0-9A-Fa-f]{1,2} #十六进制表达方式

Copy after login

heredoc

This expression The expression is similar to a long string in Python and can define a string containing multiple lines. Its grammatical definition is very strict, so you need to pay attention when using it.

1

2

3

4

$str=<<<EOD

hello\n

world

EOD;

Copy after login

Nowdoc

Nowdoc is similar to a single-quoted string and does not parse variables. It is more suitable for defining a large section of text without escaping special characters.

Variable parsing

The most powerful part of PHP strings is variable parsing. Variables can be parsed according to context at runtime (this is an interpreted language). Produces many wonderful uses.

Simple variable parsing means that a string can contain "variables", "arrays", and "object properties". Complex syntax rules are to use {} symbols to operate (to form an expression).

Take an example to see the power of variable parsing

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

class beers {

    const softdrink = &#39;softdrink&#39;;

    public static $ale = &#39;ale&#39;;

    public $data = array(1,3,"k"=>4);

}

 $softdrink = "softdrink";

$ale = "ale";

$arr = array("arr1","arr2","arr3"=>"arr4","arr4"=>array(1,2));

$arr4 = "arr4";

$obj = new beers;

echo "line1:{$arr[1]}\n";

echo "line2:{$arr[&#39;arr4&#39;][0]}\n";

echo "line3:{$obj->data[1]}\n";

echo "line4:{${$arr[&#39;arr3&#39;]}}\n";

echo "line5:{${$arr[&#39;arr3&#39;]}[1]}\n";

echo "line6:{${beers::softdrink}}\n";

echo "line7:{${beers::$ale}}\n";

Copy after login

String conversion

Another reason why the PHP language is simpler than Python is the implicit conversion of types. Simplifies many operations, which are explained here through string conversion.

String type coercion

1

2

3

$var = 10 ;

$dvar = (string)$var ;

echo $dvar . "_" . gettype($dvar);

Copy after login

The strval() function is to get the string value of the variable:

1

2

3

$var = 10.2 ;

$dvar = strval($var) ;

echo gettype($var) . "_" . $dvar . "_" . gettype($dvar);

Copy after login

The settype() function is to set the variable Type:

1

2

3

$str = "10hello";

settype($str, "integer");

echo $str ;

Copy after login

During the forced type conversion process, certain rules will be followed when converting other types of values ​​​​to strings. For example, a Boolean value of TRUE is converted into a string of "1". It’s best to understand the relevant rules.

Automatic type conversion

The above two conversions belong to display conversions, and the more important thing to pay attention to is automatic type conversion. In an expression that requires a string, It will be automatically converted to a type. For details, see the example:

1

2

3

$bool = true;

$str = 10 + "hello"

echo $bool . "_" . $str ;

Copy after login

The essence of PHP string

Quoting the explanation of the PHP documentation:

The implementation of string in PHP Is an array of bytes plus an integer specifying the buffer length. There is no information on how to convert bytes into characters, it is up to the programmer to decide. There are no restrictions on what values ​​a string consists of, including bytes with a value of 0 that can appear anywhere in the string.

PHP does not specify the encoding of the string. How the string is encoded depends on the programmer. Strings are encoded according to the encoding of the PHP file. For example, if your file encoding is GBK, then the content of your code will be GBK.

To supplement the concept of binary safety, a byte with a value of 0 (NULL) can be at any position in the string. However, the bottom layer of some non-binary functions of PHP is the called C function, which will put NULL after it. characters are ignored.

As long as PHP’s file encoding is compatible with ASCII, string operations can be processed well. However, string operations are still Native in nature (no matter what the file encoding is), so you need to pay attention when using it:

  • Some functions assume that strings are encoded in single bytes. , but does not require the bytes to be interpreted as specific characters. For example, the sbustr() function.

  • Many functions need to pass encoding parameters explicitly, otherwise the default value will be obtained from the PHP.INI file, such as the htmlentities() function.

  • There are also some functions related to the local area, and these functions can only operate on single byte.

Generally speaking, although PHP does not support Unicode characters internally, it does support UTF-8 encoding. In most cases, there will be no problem, but the following situations may not be handled. Here is:

  • How to convert non-UTF-8 encoded strings

  • A UTF-8 encoded web page, but the user is submitting the form Sometimes, GBK encoding may be used (does not comply with meta tag)

  • A UTF-8 encoded PHP file, using strlen("China") returns 6 instead of actual characters Number (2)

 So how to solve this problem? PHP provides the mbstring extension!

Multi-byte strings

The mbstring extension is not turned on by default. You need --enable-mbstring during installation.

Let’s first take a look at the configuration of the mbstring directive in PHP.INI. It took a long time to gradually understand it.

  • I understand this parameter of mbstring.language as UTF-8

  • mbstring.internal_encoding This encoding has nothing to do with the PHP file encoding. In most mbstring functions, you need to specify the encoding of the string to be processed. If you do not specify it explicitly, the value of this parameter will be obtained by default. The value of this parameter is replaced by the default_charset parameter in higher versions of PHP.

  • mbstring.http_input This parameter specifies the default encoding of HTTP input (excluding GET parameters). Generally consistent with the encoding of the HTML page, the value of this parameter is replaced by the default_charset parameter.

  • mbstring.http_output This parameter misled me. What is HTTP output? Isn’t PHP output just a page? How can there be such a concept?

  • mbstring.encoding_translation, let’s focus on this parameter. It is turned off by default. If it is turned on, PHP will automatically convert the encoding of the POST variable and the name of the uploaded file to the value specified by mbstring.internal_encoding. , but I have not tested it. You can upload a file with a Chinese name. It is recommended to close it and let programmers deal with related issues.

Let’s look at some functions of the mbstring extension later:

  • mb_http_input(): Detect HTTP input character encoding, and think that for the file name of the file upload It is necessary to deal with it.

  • mb_convert_encoding(): A commonly used function, pay attention to the third parameter.

  • mb_detect_order(): Set/get the detection order of character encoding.

  • mb_list_encodings(): Returns the encoding list supported by the system.

Important note: PHP files must support certain encodings and must be ASCII compatible.

But do not use BIG-5 as the PHP file encoding, especially if the string appears in the form of identifiers or literals. If the PHP file encoding is actually BIG-5, then try to convert the input and output content to UTF-8.

 Zend Multibyte

Finally, let’s talk about the concept of Zend Multibyte. I don’t understand it very deeply. First of all, don’t confuse it with the mbstring extension. Zend Multibyte mode is turned off by default and can be turned on via the zend.multibyte command. Then specify the encoding of the PHP parser through the declare() function.

Then what is the significance of this command? As mentioned above, the encoding of PHP files needs to be ASCII-compatible, so what to do with non-compatible ASCII encodings like BIG-5? You can operate it through this command. When the PHP parser reads the mbstring.script_encoding encoding and uses this encoding to parse PHP files.

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1658
14
PHP Tutorial
1257
29
C# Tutorial
1231
24
How does session hijacking work and how can you mitigate it in PHP? How does session hijacking work and how can you mitigate it in PHP? Apr 06, 2025 am 12:02 AM

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

Explain different error types in PHP (Notice, Warning, Fatal Error, Parse Error). Explain different error types in PHP (Notice, Warning, Fatal Error, Parse Error). Apr 08, 2025 am 12:03 AM

There are four main error types in PHP: 1.Notice: the slightest, will not interrupt the program, such as accessing undefined variables; 2. Warning: serious than Notice, will not terminate the program, such as containing no files; 3. FatalError: the most serious, will terminate the program, such as calling no function; 4. ParseError: syntax error, will prevent the program from being executed, such as forgetting to add the end tag.

PHP and Python: Comparing Two Popular Programming Languages PHP and Python: Comparing Two Popular Programming Languages Apr 14, 2025 am 12:13 AM

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1? Explain secure password hashing in PHP (e.g., password_hash, password_verify). Why not use MD5 or SHA1? Apr 17, 2025 am 12:06 AM

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values ​​to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

What are HTTP request methods (GET, POST, PUT, DELETE, etc.) and when should each be used? What are HTTP request methods (GET, POST, PUT, DELETE, etc.) and when should each be used? Apr 09, 2025 am 12:09 AM

HTTP request methods include GET, POST, PUT and DELETE, which are used to obtain, submit, update and delete resources respectively. 1. The GET method is used to obtain resources and is suitable for read operations. 2. The POST method is used to submit data and is often used to create new resources. 3. The PUT method is used to update resources and is suitable for complete updates. 4. The DELETE method is used to delete resources and is suitable for deletion operations.

PHP: A Key Language for Web Development PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP in Action: Real-World Examples and Applications PHP in Action: Real-World Examples and Applications Apr 14, 2025 am 12:19 AM

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

Explain Arrow Functions (short closures) introduced in PHP 7.4. Explain Arrow Functions (short closures) introduced in PHP 7.4. Apr 06, 2025 am 12:01 AM

The arrow function was introduced in PHP7.4 and is a simplified form of short closures. 1) They are defined using the => operator, omitting function and use keywords. 2) The arrow function automatically captures the current scope variable without the use keyword. 3) They are often used in callback functions and short calculations to improve code simplicity and readability.

See all articles