Home Web Front-end JS Tutorial what is unicode

what is unicode

Jan 26, 2019 am 10:56 AM
unicode

Unicode is a character encoding scheme that sets a unified and unique binary encoding for each character in each language to achieve cross-language and cross-platform text conversion and processing requirements

Unicode meaning

Unicode provides a unique number for each character, no matter what platform, no matter what program, no matter what language. It was officially announced in 1994 and is an industry standard in the computer field, including character sets, encoding schemes, etc. Unicode was created to solve the limitations of traditional character encoding schemes. It sets a unified and unique binary encoding for each character in each language to achieve cross-language and cross-platform text conversion and processing requirements.

what is unicode

The Development of Unicode Encoding

When designing computers, 8 bits are used as a byte. Therefore, one byte can represent up to 256 characters. In the early days, for Western countries that used English, one byte could store uppercase and lowercase English letters, mathematics, and some symbols, so one byte was used to make the code table (ASCII). Later, computers were spread to other countries, and many countries used their own languages, such as Chinese, Japanese, Korean... The languages ​​were complicated. In order to solve this problem, each country formulated its own code table. China formulated GB2312 in 1980 In the Chinese character encoding character set, there are many more Chinese characters than English. One byte is obviously not enough, so 2 bytes are used for encoding. However, although the character encodings defined by different countries can be used, they are often incompatible between different countries. If the computer wants to handle multiple language environments (using Chinese or other languages), it may not be able to support multiple language environments at the same time. In order to unify the encoding of all texts, Unicode was created to unify all languages ​​into one set of encodings so that there would be no garbled characters.

what is unicode

Unicode encoding representation

When representing Unicode characters, U is usually used followed by a set of hexadecimal digits Represents a character, encoding from U 0000 to U FFFF, supporting more than 60,000 characters in total. Characters other than BMP

need to be represented using 5-digit or 6-digit hexadecimal.

Currently Unicode characters are divided into 17 groups, 0x0000 to 0x10FFFF. Each group is called a plane. Each plane has 65536 code points, a total of 1114112.

Unicode is like a table. All characters are written into the table. Each character corresponds to a number, called a code point. This number is generally not used directly. It is usually used

Use different encoding methods

what is unicode

UTF-8, UTF-16, and UTF-32 are encoding schemes for converting numbers into program data. UTF is the abbreviation of "UnicodeTransformation Format", which can be translated into

Unicode character set conversion format, that is, how to convert numbers defined by Unicode into program data

##11110xxx 10xxxxxx 10xxxxxx 10xxxxxx(21 bits)
Decimal
Unicode encoding
UTF-8 byte stream
0-127 bits 0x000000-0x00007F 0xxxxxxx(7 digits)
128-2047 digits
0x000080-0x0007FF 110xxxxx 10xxxxxx (11 digits)
2048-65535 digits 0x000800-0x00FFFF 1110xxxx 10xxxxxx 10xxxxxx (16 digits)
65536-1114111 bits 0x010000-0x10FFFF

The above is the detailed content of what is unicode. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1276
29
C# Tutorial
1256
24
In-depth understanding of PHP: Implementation method of converting JSON Unicode to Chinese In-depth understanding of PHP: Implementation method of converting JSON Unicode to Chinese Mar 05, 2024 pm 02:48 PM

In-depth understanding of PHP: Implementation method of converting JSONUnicode to Chinese During development, we often encounter situations where we need to process JSON data, and Unicode encoding in JSON will cause us some problems in some scenarios, especially when Unicode needs to be converted When encoding is converted to Chinese characters. In PHP, there are some methods that can help us achieve this conversion process. A common method will be introduced below and specific code examples will be provided. First, let us first understand the Un in JSON

How to convert unicode to Chinese How to convert unicode to Chinese Dec 14, 2023 am 10:57 AM

Unicode is a character encoding standard used to represent various languages ​​and symbols. To convert Unicode encoding to Chinese characters, you can use Python's built-in functions chr() and ord().

Try the method to solve the problem of Chinese garbled characters in Eclipse Try the method to solve the problem of Chinese garbled characters in Eclipse Jan 03, 2024 pm 05:28 PM

Are you troubled by Chinese garbled characters in Eclipse? To try these solutions, you need specific code examples 1. Background introduction With the continuous development of computer technology, Chinese plays an increasingly important role in software development. However, many developers encounter garbled code problems when using Eclipse for Chinese development, which affects work efficiency. Then, this article will introduce some common garbled code problems and give corresponding solutions and code examples to help readers solve the Chinese garbled code problem in Eclipse. 2. Common garbled code problems and solution files

PHP Tutorial: How to Convert JSON Unicode to Chinese Characters PHP Tutorial: How to Convert JSON Unicode to Chinese Characters Mar 05, 2024 pm 06:36 PM

JSON (JavaScriptObjectNotation) is a lightweight data exchange format commonly used for data exchange between web applications. When processing JSON data, we often encounter Unicode-encoded Chinese characters (such as "u4e2du6587") and need to convert them into readable Chinese characters. In PHP, we can achieve this conversion through some simple methods. Next, we will detail how to convert JSONUnico

What are the differences between unicode and ascii What are the differences between unicode and ascii Sep 06, 2023 am 11:56 AM

The differences between unicode and ascii include different encoding ranges, different storage spaces, and different compatibility. Detailed introduction: 1. The encoding range is different. The encoding range of ASCII is 0-127, which is mainly used to represent English letters. The encoding range of Unicode is much wider and can represent almost all language characters; 2. The storage space is different. ASCII usually Use 1 byte to store a character, while unicode may use 2 or more bytes to store a character; 3. Different compatibility, etc.

Solve the problem of inconsistent Unicode character set encoding when Java connects to MySQL database Solve the problem of inconsistent Unicode character set encoding when Java connects to MySQL database Jun 10, 2023 am 11:39 AM

With the development of technologies such as big data and cloud computing, databases have become one of the important cornerstones of enterprise informatization. In applications developed in Java, connecting to MySQL database has become the norm. However, in this process, we often encounter a thorny problem - inconsistent Unicode character set encoding. This will not only affect our development efficiency, but also affect the performance and stability of the application. This article will introduce how to solve this problem and make Java connect to the MySQL database more smoothly. 1. Unicode

How to use Unicode agent programming in Java How to use Unicode agent programming in Java May 06, 2023 pm 08:43 PM

Sequential access Sequential access is a basic operation for processing strings in the Java language. Under this approach, each character in the input string is accessed sequentially from beginning to end, or sometimes from end to beginning. This section discusses seven technical examples of creating a 32-bit code point array from a string using sequential access methods and estimates their processing time. Example 1-1: Benchmark (no support for surrogate pairs) Listing 1 assigns a 16-bit char type value directly to a 32-bit code point value, without taking the surrogate pair into account at all: Listing 1. No support for surrogate pairs int[]toCodePointArray(Stringstr) {//Example1-1intlen=str.length();//t

PHP Programming Tips: Efficiently Handle Unicode Conversion in JSON Data PHP Programming Tips: Efficiently Handle Unicode Conversion in JSON Data Mar 05, 2024 pm 05:03 PM

During PHP development, processing JSON data is a very common operation. However, you may encounter some problems when processing JSON data containing Unicode characters, especially in data conversion and encoding conversion. This article will introduce some PHP programming techniques for efficiently handling Unicode conversion in JSON data, and provide specific code examples. When processing JSON data containing Unicode characters, it usually involves converting and encoding Unicode characters. in PHP

See all articles