


[XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion
The audited information must be generated as an XML file, and the XML must be encoded in GB2312, because many of the collected news websites use UTF8 encoding, so garbled characters appear during the conversion process
I recently worked on a small project, and when I encountered such problems, I recorded them as a summary.
This project is divided into two parts, one is news data collection, the other is the review of the collected information, and finally the XML file is generated.
After the data collected has been edited by the user, an ACCESS file must be exported and then imported into the information review system. The field type that stores news information in the ACCESS library is the ntext type, while the corresponding field in the audit system library is the varchar (max) type. After importing, it was found that some blank characters will appear garbled, appearing as question marks (?). In fact, After subsequent testing, it turns out that this is not a blank (space) character, but a special character. What should I do? After several tests, it was found that the varchar(max) type should be changed to nvarchar(max) type, so that the imported data will no longer have such problems.
However, during the subsequent testing process, it was found that after the imported collected information was changed (through the .net program editing function), the information in the database was garbled again. After research, it was found that the insertion statement was written like this This kind of problem will not occur, such as insert into table name (news) values (N'"+updated value+""), why add N? Go to Baidu and you will understand.
At this point, in my mind I finally got relief, but the following problems made me depressed...
The reviewed information must be generated in XML format, because there are many news websites collected. The website uses UTF8 encoding, so garbled characters appear during the conversion process (it's still caused by the "blank" special character). What should I do? It is said on the Internet that converting UTF8 into GB2312 is enough, but in practice, it still cannot be solved. Problem, I have been working on it all morning to solve this problem, but in the end there is no way. When I was depressed, I suddenly thought of using the debugging function of VS to see what this special character is, and finally read the value of this field in the database. After taking it out, and then converting it into a character array, content.ToCharArray(); looked at it one by one and found that the character that caused the garbled code was ' '. Pay attention to the space in the quotation marks. This is not a space, but a space that cannot be recognized in GB2312. special characters, I suddenly thought, can I replace the value of this character directly with a space? I acted immediately, and sure enough, the garbled problem was solved. I wasted half a day on this stupid thing.
Note. , you must use the value obtained from debugging (because this is the real special character that causes garbled characters). When debugging, paste the
code as follows:
content = content.Replace(" ", " ");
The above is the detailed content of [XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to underline on the computer When entering text on the computer, we often need to use underlines to highlight certain content or mark it. However, for some people who are not very familiar with computer input methods, typing underline can be a bit confusing. This article will introduce you to how to underline on your computer. In different computer operating systems and software, the way to enter the underscore may be slightly different. The following will introduce the common methods on Windows operating system and Mac operating system respectively. First, let’s take a look at the operation in Windows

UTF8 encoded Chinese characters occupy 3 bytes. In UTF-8 encoding, one Chinese character is equal to three bytes, and one Chinese punctuation mark occupies three bytes; while in Unicode encoding, one Chinese character (including traditional Chinese) is equal to two bytes. UTF-8 uses 1~4 bytes to encode each character. One US-ASCIl character only needs 1 byte to encode. Latin, Greek, Cyrillic, Armenian, and Hebrew with diacritical marks. , Arabic, Syriac and other letters require 2-byte encoding.

As a powerful programming language, Golang has high performance and concurrency capabilities, and also provides rich standard library support, including support for encoding conversion. This article will deeply explore the implementation principles of encoding conversion in Golang and analyze it with specific code examples. What is transcoding? Encoding conversion refers to the process of converting a sequence of characters from one encoding to another. In actual development, we often need to handle conversions between different encodings, such as converting UTF-8 encoded strings.

Learning dedecms encoding conversion function is not complicated. Simple code examples can help you quickly master this skill. In dedecms, the encoding conversion function is usually used to deal with problems such as Chinese garbled characters and special characters to ensure the normal operation of the system and the accuracy of data. The following will introduce in detail how to use the encoding conversion function of dedecms, allowing you to easily cope with various encoding-related needs. 1.UTF-8 to GBK In dedecms, if you need to convert UTF-8 encoded string to G

In the MySQL database, character encoding is a very important concept. Character encoding refers to the way characters are mapped into binary data. There are many types of character encodings supported in MySQL, the most commonly used ones are Ascii encoding and UTF-8 encoding. These two character encodings play a very important role in MySQL, because data in MySQL is stored in character form, and the choice of character encoding may affect the performance and space of the database. Ascii encoding is an encoding method based on Latin letters.

Solution to garbled Chinese characters in node utf8: 1. Check the type of "SarchName" through "typeof"; 2. Use "Name=iconv.decode(name,'gbk')" to convert the encoding to utf8.

"How to use the DedeCMS encoding conversion plug-in requires specific code examples" DedeCMS is a powerful and easy-to-use open source content management system that is widely used in various website construction. In the process of using DedeCMS, sometimes you will encounter situations where you need to encode the content, especially when dealing with multi-language websites or data involving different encodings. In order to simplify this operation, DedeCMS provides a coding conversion plug-in, which can easily convert the coding of content and improve the flexibility of the website.

How to deal with encoding conversion issues in C++ development. During the C++ development process, we often encounter problems that require conversion between different encodings. Because there are differences between different encoding formats, you need to pay attention to some details when performing encoding conversion. This article will introduce how to deal with encoding conversion issues in C++ development. 1. Understand different encoding formats. Before dealing with encoding conversion issues, you first need to understand different encoding formats. Common encoding formats include ASCII, UTF-8, GBK, etc. ASCII is the earliest encoding format
