what is unicode
Unicode is a character encoding scheme that sets a unified and unique binary encoding for each character in each language to achieve cross-language and cross-platform text conversion and processing requirements
Unicode meaning
Unicode provides a unique number for each character, no matter what platform, no matter what program, no matter what language. It was officially announced in 1994 and is an industry standard in the computer field, including character sets, encoding schemes, etc. Unicode was created to solve the limitations of traditional character encoding schemes. It sets a unified and unique binary encoding for each character in each language to achieve cross-language and cross-platform text conversion and processing requirements.
The Development of Unicode Encoding
When designing computers, 8 bits are used as a byte. Therefore, one byte can represent up to 256 characters. In the early days, for Western countries that used English, one byte could store uppercase and lowercase English letters, mathematics, and some symbols, so one byte was used to make the code table (ASCII). Later, computers were spread to other countries, and many countries used their own languages, such as Chinese, Japanese, Korean... The languages were complicated. In order to solve this problem, each country formulated its own code table. China formulated GB2312 in 1980 In the Chinese character encoding character set, there are many more Chinese characters than English. One byte is obviously not enough, so 2 bytes are used for encoding. However, although the character encodings defined by different countries can be used, they are often incompatible between different countries. If the computer wants to handle multiple language environments (using Chinese or other languages), it may not be able to support multiple language environments at the same time. In order to unify the encoding of all texts, Unicode was created to unify all languages into one set of encodings so that there would be no garbled characters.
Unicode encoding representation
When representing Unicode characters, U is usually used followed by a set of hexadecimal digits Represents a character, encoding from U 0000 to U FFFF, supporting more than 60,000 characters in total. Characters other than BMP
need to be represented using 5-digit or 6-digit hexadecimal.
Currently Unicode characters are divided into 17 groups, 0x0000 to 0x10FFFF. Each group is called a plane. Each plane has 65536 code points, a total of 1114112.
Unicode is like a table. All characters are written into the table. Each character corresponds to a number, called a code point. This number is generally not used directly. It is usually used
Use different encoding methods
UTF-8, UTF-16, and UTF-32 are encoding schemes for converting numbers into program data. UTF is the abbreviation of "UnicodeTransformation Format", which can be translated into
Unicode character set conversion format, that is, how to convert numbers defined by Unicode into program data
Decimal |
Unicode encoding |
UTF-8 byte stream |
0-127 bits | 0x000000-0x00007F | 0xxxxxxx(7 digits) |
128-2047 digits |
0x000080-0x0007FF | 110xxxxx 10xxxxxx (11 digits) |
2048-65535 digits | 0x000800-0x00FFFF | 1110xxxx 10xxxxxx 10xxxxxx (16 digits) |
65536-1114111 bits | 0x010000-0x10FFFF | ##11110xxx 10xxxxxx 10xxxxxx 10xxxxxx(21 bits)
The above is the detailed content of what is unicode. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











In-depth understanding of PHP: Implementation method of converting JSONUnicode to Chinese During development, we often encounter situations where we need to process JSON data, and Unicode encoding in JSON will cause us some problems in some scenarios, especially when Unicode needs to be converted When encoding is converted to Chinese characters. In PHP, there are some methods that can help us achieve this conversion process. A common method will be introduced below and specific code examples will be provided. First, let us first understand the Un in JSON

Unicode is a character encoding standard used to represent various languages and symbols. To convert Unicode encoding to Chinese characters, you can use Python's built-in functions chr() and ord().

Are you troubled by Chinese garbled characters in Eclipse? To try these solutions, you need specific code examples 1. Background introduction With the continuous development of computer technology, Chinese plays an increasingly important role in software development. However, many developers encounter garbled code problems when using Eclipse for Chinese development, which affects work efficiency. Then, this article will introduce some common garbled code problems and give corresponding solutions and code examples to help readers solve the Chinese garbled code problem in Eclipse. 2. Common garbled code problems and solution files

JSON (JavaScriptObjectNotation) is a lightweight data exchange format commonly used for data exchange between web applications. When processing JSON data, we often encounter Unicode-encoded Chinese characters (such as "u4e2du6587") and need to convert them into readable Chinese characters. In PHP, we can achieve this conversion through some simple methods. Next, we will detail how to convert JSONUnico

The differences between unicode and ascii include different encoding ranges, different storage spaces, and different compatibility. Detailed introduction: 1. The encoding range is different. The encoding range of ASCII is 0-127, which is mainly used to represent English letters. The encoding range of Unicode is much wider and can represent almost all language characters; 2. The storage space is different. ASCII usually Use 1 byte to store a character, while unicode may use 2 or more bytes to store a character; 3. Different compatibility, etc.

With the development of technologies such as big data and cloud computing, databases have become one of the important cornerstones of enterprise informatization. In applications developed in Java, connecting to MySQL database has become the norm. However, in this process, we often encounter a thorny problem - inconsistent Unicode character set encoding. This will not only affect our development efficiency, but also affect the performance and stability of the application. This article will introduce how to solve this problem and make Java connect to the MySQL database more smoothly. 1. Unicode

Sequential access Sequential access is a basic operation for processing strings in the Java language. Under this approach, each character in the input string is accessed sequentially from beginning to end, or sometimes from end to beginning. This section discusses seven technical examples of creating a 32-bit code point array from a string using sequential access methods and estimates their processing time. Example 1-1: Benchmark (no support for surrogate pairs) Listing 1 assigns a 16-bit char type value directly to a 32-bit code point value, without taking the surrogate pair into account at all: Listing 1. No support for surrogate pairs int[]toCodePointArray(Stringstr) {//Example1-1intlen=str.length();//t

During PHP development, processing JSON data is a very common operation. However, you may encounter some problems when processing JSON data containing Unicode characters, especially in data conversion and encoding conversion. This article will introduce some PHP programming techniques for efficiently handling Unicode conversion in JSON data, and provide specific code examples. When processing JSON data containing Unicode characters, it usually involves converting and encoding Unicode characters. in PHP
