配置 Xerces SAX 解析器以容忍 XML 语法错误
我在解析错误生成的 XML 文档时收到此错误:
org.xml.sax.SAXParseException: The value of attribute "bar" associated with an element type "foo" must not contain the '<' character.
我知道导致问题的原因。正是这一行:
<foo bar="x<y">42</foo>
我应该
<foo bar="x<y">42</foo>
知道这不是有效的 XML,但我的代码必须在无人值守的情况下下载和解析类似的文件,并且出于政治原因,可能无法说服供应商修复有问题的程序,尤其是当其他程序正在读取该文件并容忍此错误时。
有什么方法可以配置 Xerces 来容忍它吗?目前它将其视为致命错误。实现 ErrorHandler 来忽略它并不令人满意,因为这样文档的其余部分就不会被解析。
或者,您可以建议另一个可以配置为容忍此错误的基于流的解析器吗?使用 DOM 解析器是不可行的,因为这些文档达到数百兆字节。
I am getting this error when parsing an incorrectly-generated XML document:
org.xml.sax.SAXParseException: The value of attribute "bar" associated with an element type "foo" must not contain the '<' character.
I know what is causing the problem. It is this line:
<foo bar="x<y">42</foo>
It should have been
<foo bar="x<y">42</foo>
I am aware that this is not valid XML, but my code has to download and parse similar files unattended and for political reasons it might not be possible to persuade the supplier to fix the faulty program, especially when other programs are reading the file and tolerating this error.
Is there any way to configure Xerces to tolerate it? At present it treats it as a fatal error. Implementing an ErrorHandler
to ignore it is not satisfactory because then the remainder of the document is not parsed.
Alternatively can you suggest another stream-based parser that can be configured to tolerate this error? Using a DOM parser is not feasible as these documents run into hundreds of megabytes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
出于政治原因,您应该尽最大努力让他们修复它。在他们面前挥动需求规范,说明输入必须是格式良好的 XML。威胁要向他们收取开发定制解析器的费用。 (好吧,这可能行不通……)
如果不战而屈人之兵,你只是把问题留给其他将来必须与该供应商打交道的人。
For political reasons you ought to try your damnedest to get them to fix it. Wave the requirements specification in front of them that says that the input must be well-formed XML. Threaten to bill them for the cost of developing a bespoke parser. (OK, that probably won't work ...)
By giving up without a fight, you are just leaving the problem to trouble other people who have to deal with this supplier in the future.
我认为您不会找到任何可以容忍此类错误的 XML 解析器。我唯一可以建议的是您预处理 XML 以消除可能发生的错误。
I don't think you will find any XML parsers that will tolerate this sort of error. The only thing I can suggest is that you pre-process the XML to remove errors that might occur.