[XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion-XML/RSS Tutorial-php.cn

Home

Backend Development

XML/RSS Tutorial

[XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion

Y2J

Apr 22, 2017 pm 01:53 PM

utf8

The audited information must be generated as an XML file, and the XML must be encoded in GB2312, because many of the collected news websites use UTF8 encoding, so garbled characters appear during the conversion process

I recently worked on a small project, and when I encountered such problems, I recorded them as a summary.
This project is divided into two parts, one is news data collection, the other is the review of the collected information, and finally the XML file is generated.
After the data collected has been edited by the user, an ACCESS file must be exported and then imported into the information review system. The field type that stores news information in the ACCESS library is the ntext type, while the corresponding field in the audit system library is the varchar (max) type. After importing, it was found that some blank characters will appear garbled, appearing as question marks (?). In fact, After subsequent testing, it turns out that this is not a blank (space) character, but a special character. What should I do? After several tests, it was found that the varchar(max) type should be changed to nvarchar(max) type, so that the imported data will no longer have such problems.
However, during the subsequent testing process, it was found that after the imported collected information was changed (through the .net program editing function), the information in the database was garbled again. After research, it was found that the insertion statement was written like this This kind of problem will not occur, such as insert into table name (news) values (N'"+updated value+""), why add N? Go to Baidu and you will understand.
At this point, in my mind I finally got relief, but the following problems made me depressed...
The reviewed information must be generated in XML format, because there are many news websites collected. The website uses UTF8 encoding, so garbled characters appear during the conversion process (it's still caused by the "blank" special character). What should I do? It is said on the Internet that converting UTF8 into GB2312 is enough, but in practice, it still cannot be solved. Problem, I have been working on it all morning to solve this problem, but in the end there is no way. When I was depressed, I suddenly thought of using the debugging function of VS to see what this special character is, and finally read the value of this field in the database. After taking it out, and then converting it into a character array, content.ToCharArray(); looked at it one by one and found that the character that caused the garbled code was ' '. Pay attention to the space in the quotation marks. This is not a space, but a space that cannot be recognized in GB2312. special characters, I suddenly thought, can I replace the value of this character directly with a space? I acted immediately, and sure enough, the garbled problem was solved. I wasted half a day on this stupid thing.
Note. , you must use the value obtained from debugging (because this is the real special character that causes garbled characters). When debugging, paste the

code as follows:

content = content.Replace(" ", " ");

Copy after login

The above is the detailed content of [XML] Solution to garbled characters in UTF8 and GB2312 encoding conversion. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7719

Java Tutorial

1641

CakePHP Tutorial

1396

Laravel Tutorial

1289

PHP Tutorial

1233

Related knowledge

How to type underline on computer Feb 19, 2024 pm 08:36 PM

How to underline on the computer When entering text on the computer, we often need to use underlines to highlight certain content or mark it. However, for some people who are not very familiar with computer input methods, typing underline can be a bit confusing. This article will introduce you to how to underline on your computer. In different computer operating systems and software, the way to enter the underscore may be slightly different. The following will introduce the common methods on Windows operating system and Mac operating system respectively. First, let’s take a look at the operation in Windows

How many bytes do utf8 encoded Chinese characters occupy? Feb 21, 2023 am 11:40 AM

UTF8 encoded Chinese characters occupy 3 bytes. In UTF-8 encoding, one Chinese character is equal to three bytes, and one Chinese punctuation mark occupies three bytes; while in Unicode encoding, one Chinese character (including traditional Chinese) is equal to two bytes. UTF-8 uses 1~4 bytes to encode each character. One US-ASCIl character only needs 1 byte to encode. Latin, Greek, Cyrillic, Armenian, and Hebrew with diacritical marks. , Arabic, Syriac and other letters require 2-byte encoding.

Explore the implementation mechanism of golang encoding conversion Feb 19, 2024 pm 03:21 PM

As a powerful programming language, Golang has high performance and concurrency capabilities, and also provides rich standard library support, including support for encoding conversion. This article will deeply explore the implementation principles of encoding conversion in Golang and analyze it with specific code examples. What is transcoding? Encoding conversion refers to the process of converting a sequence of characters from one encoding to another. In actual development, we often need to handle conversions between different encodings, such as converting UTF-8 encoded strings.

A simple way to learn dedecms encoding conversion function Mar 14, 2024 pm 02:09 PM

Learning dedecms encoding conversion function is not complicated. Simple code examples can help you quickly master this skill. In dedecms, the encoding conversion function is usually used to deal with problems such as Chinese garbled characters and special characters to ensure the normal operation of the system and the accuracy of data. The following will introduce in detail how to use the encoding conversion function of dedecms, allowing you to easily cope with various encoding-related needs. 1.UTF-8 to GBK In dedecms, if you need to convert UTF-8 encoded string to G

Ascii and UTF-8 encoding for MySql: How to compress and convert MySQL's character encoding Jun 16, 2023 pm 12:07 PM

In the MySQL database, character encoding is a very important concept. Character encoding refers to the way characters are mapped into binary data. There are many types of character encodings supported in MySQL, the most commonly used ones are Ascii encoding and UTF-8 encoding. These two character encodings play a very important role in MySQL, because data in MySQL is stored in character form, and the choice of character encoding may affect the performance and space of the database. Ascii encoding is an encoding method based on Latin letters.

What to do if node utf8 Chinese characters are garbled Feb 08, 2023 am 10:29 AM

Solution to garbled Chinese characters in node utf8: 1. Check the type of "SarchName" through "typeof"; 2. Use "Name=iconv.decode(name,'gbk')" to convert the encoding to utf8.

How to use dedecms encoding conversion plug-in Mar 14, 2024 pm 06:03 PM

"How to use the DedeCMS encoding conversion plug-in requires specific code examples" DedeCMS is a powerful and easy-to-use open source content management system that is widely used in various website construction. In the process of using DedeCMS, sometimes you will encounter situations where you need to encode the content, especially when dealing with multi-language websites or data involving different encodings. In order to simplify this operation, DedeCMS provides a coding conversion plug-in, which can easily convert the coding of content and improve the flexibility of the website.

How to deal with encoding conversion problems in C++ development Aug 22, 2023 am 11:07 AM

How to deal with encoding conversion issues in C++ development. During the C++ development process, we often encounter problems that require conversion between different encodings. Because there are differences between different encoding formats, you need to pay attention to some details when performing encoding conversion. This article will introduce how to deal with encoding conversion issues in C++ development. 1. Understand different encoding formats. Before dealing with encoding conversion issues, you first need to understand different encoding formats. Common encoding formats include ASCII, UTF-8, GBK, etc. ASCII is the earliest encoding format

See all articles