如何使用 Java 和 Xerces 解析符合 1.1 规范的 XML？

发布于 2025-01-06 09:41:18 字数 1351 浏览 4 评论 0原文

我正在尝试解析一个包含符合 XML 1.1 规范的 XML 内容的字符串。 XML 包含 XML 1.0 规范中不允许但 XML 1.1 规范中允许的字符引用（转换为 U+0001–U+001F 范围内的 Unicode 字符的字符引用）。

根据 Xerces2 网站，Xerces2 解析器支持解析 XML 1.1 文档。但是，我不知道如何告诉它我们尝试解析的 XML 包含符合 1.1 的 XML。

我正在使用 DocumentBuilder 来解析 XML（类似这样）：

public Element parseString(String xmlString) {
    try {
          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder documentBuilder = dbf.newDocumentBuilder();

          InputSource source = new InputSource(new StringReader(xmlString));

      // Throws org.xml.sax.SAXParseException becuase of the invalid character refs
          Document doc = documentBuilder.parse(source);

          return doc.getDocumentElement();

    } catch (ParserConfigurationException pce) {
          // Handle the error
    } catch (SAXException se) {
          // Handle the error
    } catch (IOException ioe) {
          // Handle the error
    }
}

我尝试设置 XML 标头以指示 XML 符合 1.1 规范...

xmlString = "<?xml version=\"1.1\" encoding=\"UTF-8\" ?>" + xmlString;

...但它仍然被解析为 1.0 XML（仍然生成无效字符引用异常）。

如何配置 Xerces 解析器将 XML 解析为 XML 1.1？是否有替代解析器可以为 XML 1.1 提供更好的支持？

原文

I'm trying to parse a String which contains XML content which conforms to the XML 1.1 spec. The XML contains character references which are not allowed in the XML 1.0 spec but which are allowed in the XML 1.1 spec (character references which translate to Unicode characters in the range U+0001–U+001F).

According the Xerces2 website, the Xerces2 parser supports parsing XML 1.1 documents. However, I cannot figure out how to tell it the XML we are trying to parse contains 1.1-compliant XML.

I'm using a DocumentBuilder to parse the XML (something like this):

public Element parseString(String xmlString) {
    try {
          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder documentBuilder = dbf.newDocumentBuilder();

          InputSource source = new InputSource(new StringReader(xmlString));

      // Throws org.xml.sax.SAXParseException becuase of the invalid character refs
          Document doc = documentBuilder.parse(source);

          return doc.getDocumentElement();

    } catch (ParserConfigurationException pce) {
          // Handle the error
    } catch (SAXException se) {
          // Handle the error
    } catch (IOException ioe) {
          // Handle the error
    }
}

I've tried setting the XML header to indicate the XML conforms to the 1.1 spec...

xmlString = "<?xml version=\"1.1\" encoding=\"UTF-8\" ?>" + xmlString;

...but it is still parsed as 1.0 XML (still generates the invalid character reference exceptions).

How can I configure the Xerces parser to parse the XML as XML 1.1? Is there an alternative parser which provides better support for XML 1.1?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无法回应 2025-01-13 09:41:18

请参阅此处以获取 xerces 支持的所有功能的列表。可能您必须打开以下 2 个功能。

http://xml.org/sax/features/unicode-normalization-checking

True：执行 Unicode 规范化检查（如 XML 1.1 建议的第 2.13 节和附录 B 中所述）并报告规范化错误。

False：不报告 Unicode 规范化错误。

http://xml.org/sax/features/xml-1.1

True：解析器支持 XML 1.0 和 XML 1.1。
错误：解析器仅支持 XML 1.0。
访问权限：只读
自：Xerces-J 2.7.0
注意：此功能的值取决于 SAX 解析器拥有的解析器配置是否支持 XML 1.1。

回复收藏 0 原文

夜清冷一曲。 2025-01-13 09:41:18

不确定如何使用 Xerces 执行此操作，但 Woodstox 开箱即用地支持 XML 1.1。虽然它主要是一个 Stax 解析器，但它也实现了 SAX API（从版本 3.2 开始）。

回复收藏 0 原文

~没有更多了~

关于作者

遗弃Ｍ

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何使用 Java 和 Xerces 解析符合 1.1 规范的 XML？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如何使用 Java 和 Xerces 解析符合 1.1 规范的 XML？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

櫻之舞

弥枳

m2429

寻找一个思念的角度

野却迷人

我怀念的。

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。