配置 Xerces SAX 解析器以容忍 XML 语法错误

发布于 2024-09-10 22:06:22 字数 623 浏览 9 评论 0原文

我在解析错误生成的 XML 文档时收到此错误:

org.xml.sax.SAXParseException: The value of attribute "bar" associated with an element type "foo" must not contain the '<' character.

我知道导致问题的原因。正是这一行:

<foo bar="x<y">42</foo>

我应该

<foo bar="x&lt;y">42</foo>

知道这不是有效的 XML,但我的代码必须在无人值守的情况下下载和解析类似的文件,并且出于政治原因,可能无法说服供应商修复有问题的程序,尤其是当其他程序正在读取该文件并容忍此错误时。

有什么方法可以配置 Xerces 来容忍它吗?目前它将其视为致命错误。实现 ErrorHandler 来忽略它并不令人满意,因为这样文档的其余部分就不会被解析。

或者,您可以建议另一个可以配置为容忍此错误的基于流的解析器吗?使用 DOM 解析器是不可行的,因为这些文档达到数百兆字节。

I am getting this error when parsing an incorrectly-generated XML document:

org.xml.sax.SAXParseException: The value of attribute "bar" associated with an element type "foo" must not contain the '<' character.

I know what is causing the problem. It is this line:

<foo bar="x<y">42</foo>

It should have been

<foo bar="x<y">42</foo>

I am aware that this is not valid XML, but my code has to download and parse similar files unattended and for political reasons it might not be possible to persuade the supplier to fix the faulty program, especially when other programs are reading the file and tolerating this error.

Is there any way to configure Xerces to tolerate it? At present it treats it as a fatal error. Implementing an ErrorHandler to ignore it is not satisfactory because then the remainder of the document is not parsed.

Alternatively can you suggest another stream-based parser that can be configured to tolerate this error? Using a DOM parser is not feasible as these documents run into hundreds of megabytes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

独留℉清风醉 2024-09-17 22:06:22

...并且出于政治原因,可能无法说服供应商修复有缺陷的程序...

出于政治原因,您应该尽最大努力让他们修复它。在他们面前挥动需求规范,说明输入必须是格式良好的 XML。威胁要向他们收取开发定制解析器的费用。 (好吧,这可能行不通……)

如果不战而屈人之兵,你只是把问题留给其他将来必须与该供应商打交道的人。

... and for political reasons it might not be possible to persuade the supplier to fix the faulty program ...

For political reasons you ought to try your damnedest to get them to fix it. Wave the requirements specification in front of them that says that the input must be well-formed XML. Threaten to bill them for the cost of developing a bespoke parser. (OK, that probably won't work ...)

By giving up without a fight, you are just leaving the problem to trouble other people who have to deal with this supplier in the future.

吐个泡泡 2024-09-17 22:06:22

我认为您不会找到任何可以容忍此类错误的 XML 解析器。我唯一可以建议的是您预处理 XML 以消除可能发生的错误。

I don't think you will find any XML parsers that will tolerate this sort of error. The only thing I can suggest is that you pre-process the XML to remove errors that might occur.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文