How to Retain the BOM When Reading UTF-8 Files in Java?
Reading UTF-8 with BOM Marker: Understanding the Unexpected BOM Output
When reading files encoded in UTF-8 with a Byte-Order Mark (BOM), it's possible to encounter the BOM being included in the output string. This occurs because the BOM, a Unicode identifier, is stored as a specific byte sequence at the beginning of the file.
In the given Java code, the FileReader and BufferedReader are appropriately utilized for handling UTF-8 file reading. However, the issue arises in the subsequent line:
text = new String(tmp.getBytes(), "UTF-8");
This line attempts to decode the bytes stored in the tmp string using the UTF-8 character set. However, the getBytes() method on a String does not retain the BOM marker from the original file. As a result, the decoding process ignores the BOM, and it is effectively lost.
To retain the BOM marker in the output string, a slight adjustment to the code is necessary:
byte[] bytes = tmp.getBytes("UTF-8"); if (isUTF8WithBOM(bytes)) { text = new String(bytes, 3, bytes.length - 3); } else { text = new String(bytes, "UTF-8"); }
The isUTF8WithBOM method checks if the byte array begins with the UTF-8 BOM sequence (0xEF, 0xBB, 0xBF). If true, the BOM is removed by slicing the byte array to start from the third byte. This ensures that the subsequent decoding process includes the BOM marker in the output string.
The above is the detailed content of How to Retain the BOM When Reading UTF-8 Files in Java?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Troubleshooting and solutions to the company's security software that causes some applications to not function properly. Many companies will deploy security software in order to ensure internal network security. ...

Field mapping processing in system docking often encounters a difficult problem when performing system docking: how to effectively map the interface fields of system A...

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

Start Spring using IntelliJIDEAUltimate version...

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

Detailed explanation of the design of SKU and SPU tables on e-commerce platforms This article will discuss the database design issues of SKU and SPU in e-commerce platforms, especially how to deal with user-defined sales...

When using TKMyBatis for database queries, how to gracefully get entity class variable names to build query conditions is a common problem. This article will pin...
