Table of Contents
Question content
Workaround
Options, ideal first:
Home Java How to parse invalid (error/malformed) XML?

How to parse invalid (error/malformed) XML?

Feb 09, 2024 pm 11:20 PM
overflow

php editor Baicao introduces you how to parse invalid XML files. When processing XML files, you sometimes encounter invalid XML, perhaps because it is not well-formed or contains errors. Parsing invalid XML files is an important task to ensure that we get the required data correctly. To solve this problem, we can use PHP’s built-in functions and libraries to check and fix invalid XML. Below we will introduce in detail several commonly used methods to parse invalid XML files.

Question content

Currently, I'm working on a feature that involves parsing xml that we receive from other products. I decided to run some tests against some actual customer data and it looks like other products allow users to enter input that should be considered invalid. Anyway, I still have to try and figure out a way to parse it. We are using javax.xml.parsers.documentbuilder and I am getting the following error while typing.

<xml>
  ...
  <description>Example:Description:<THIS-IS-PART-OF-DESCRIPTION></description>
  ...
</xml>
Copy after login

As you may know, the description appears to contain an invalid tag (<this-is-part-of-description>). Now, this description tag is considered a leaf tag and should not have any nested tags inside. Regardless, this is still a problem and produces an exception on documentbuilder.parse(...)

I know this is invalid xml, but it is predictably invalid. Any ideas on ways to parse such input?

Workaround

"xml" is worse than invalid - it is not well-formed; see Well-formed and valid xml.

Informal assessments of the predictability of violations are not helpful. The text data is not xml. There is no consistent xml tool or library that can help you deal with it.

Options, ideal first:

  1. Let the provider resolve the issue themselves. Requires well-formed xml. (Technically, the term well-formed xml is redundant, but may help with emphasis.)

  2. Use tolerant tag parserFix issues before parsing to xml:

  3. 使用文本编辑器手动将数据处理为文本或 以编程方式使用字符/字符串函数。这样做 以编程方式可以从棘手到不可能作为 看起来是什么 可预测的往往不是——打破规则很少受到规则的约束

    • 对于无效字符错误,请使用正则表达式删除/替换无效字符:

      • php: preg_replace('/[^\x{0009}\x{000a}\x{000d} \x{0020}-\x{d7ff}\x{e000}-\x{fffd}]+/u', ' ', $s);
      • ruby: string.tr ("^\u{0009}\u{000a}\u{000d}\u{0020}-\u{d7ff}\u{e000‌​}-\u{fffd}", ' ')
      • javascript: inputstr.replace (/[^\x09\x0a\x0d\x20-\xff\x85\xa0-\ud7ff\ue000-\ufdcf\ufde0-\ufffd]/gm, '')
    • 对于与号,使用正则表达式将匹配项替换为 &amp;: 信用:blhsin演示

      &amp;(?!(?:#\d+|#x[0-9a-f]+|\w+);)
      Copy after login

      请注意,上述正则表达式不会接受注释或 cdata

      按照设计,标准 xml 解析器永远不会接受无效的 xml。

      您唯一的选择是在解析输入之前预处理输入以删除“可预见的无效”内容,或将其包装在 cdata 中。

      The above is the detailed content of How to parse invalid (error/malformed) XML?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1276
29
C# Tutorial
1256
24
Is H5 page production a front-end development? Is H5 page production a front-end development? Apr 05, 2025 pm 11:42 PM

Yes, H5 page production is an important implementation method for front-end development, involving core technologies such as HTML, CSS and JavaScript. Developers build dynamic and powerful H5 pages by cleverly combining these technologies, such as using the &lt;canvas&gt; tag to draw graphics or using JavaScript to control interaction behavior.

Why are the inline-block elements misaligned? How to solve this problem? Why are the inline-block elements misaligned? How to solve this problem? Apr 04, 2025 pm 10:39 PM

Regarding the reasons and solutions for misaligned display of inline-block elements. When writing web page layout, we often encounter some seemingly strange display problems. Compare...

How to use the clip-path attribute of CSS to achieve the 45-degree curve effect of segmenter? How to use the clip-path attribute of CSS to achieve the 45-degree curve effect of segmenter? Apr 04, 2025 pm 11:45 PM

How to achieve the 45-degree curve effect of segmenter? In the process of implementing the segmenter, how to make the right border turn into a 45-degree curve when clicking the left button, and the point...

How to control the top and end of pages in browser printing settings through JavaScript or CSS? How to control the top and end of pages in browser printing settings through JavaScript or CSS? Apr 05, 2025 pm 10:39 PM

How to use JavaScript or CSS to control the top and end of the page in the browser's printing settings. In the browser's printing settings, there is an option to control whether the display is...

How to customize the resize symbol through CSS and make it uniform with the background color? How to customize the resize symbol through CSS and make it uniform with the background color? Apr 05, 2025 pm 02:30 PM

The method of customizing resize symbols in CSS is unified with background colors. In daily development, we often encounter situations where we need to customize user interface details, such as adjusting...

How to compatible with multi-line overflow omission on mobile terminal? How to compatible with multi-line overflow omission on mobile terminal? Apr 05, 2025 pm 10:36 PM

Compatibility issues of multi-row overflow on mobile terminal omitted on different devices When developing mobile applications using Vue 2.0, you often encounter the need to overflow text...

The latest price of Bitcoin in 2018-2024 USD The latest price of Bitcoin in 2018-2024 USD Feb 15, 2025 pm 07:12 PM

Real-time Bitcoin USD Price Factors that affect Bitcoin price Indicators for predicting future Bitcoin prices Here are some key information about the price of Bitcoin in 2018-2024:

How to achieve segmentation effect with 45 degree curve border? How to achieve segmentation effect with 45 degree curve border? Apr 04, 2025 pm 11:48 PM

Tips for Implementing Segmenter Effects In user interface design, segmenter is a common navigation element, especially in mobile applications and responsive web pages. ...