使用 JAXB 对与号 (&) 进行 Java XML 解组失败

发布于 2024-09-05 12:42:26 字数 3429 浏览 2 评论 0原文

我有以下 XML:

<?xml version="1.0" encoding="UTF-8"?>
<details>
  ...
  <address1>Test&amp;Address</address1>
  ...
</details>

当我尝试使用 JAXB 对其进行解组时,它会引发以下异常:

Caused by: org.xml.sax.SAXParseException: The reference to entity "Address" must end with the ';' delimiter.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)

但是当我将 XML 中的 & 更改为 &apos;,它有效。看起来问题只出在 & 符号 &amp; 上,我不明白为什么。

解组的代码是:

JAXBContext context = JAXBContext.newInstance("some.package.name", this.getClass().getClassLoader());
Unmarshaller unmarshaller = context.createUnmarshaller();
obj = unmarshaller.unmarshal(new StringReader(xml));

有人有一些见解吗?

编辑:我尝试了下面@abhin4v建议的解决方案(即在 &amp; 之后添加一个空格),但它似乎也不起作用。这是堆栈跟踪:

Caused by: org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)

I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<details>
  ...
  <address1>Test&Address</address1>
  ...
</details>

When I try to unmarshal it using JAXB, it throws the following exception:

Caused by: org.xml.sax.SAXParseException: The reference to entity "Address" must end with the ';' delimiter.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)

But when I changed the & in the XML to ', it works. Looks like the problem is only with ampersand & and I cannot understand why.

The code to unmarshal is:

JAXBContext context = JAXBContext.newInstance("some.package.name", this.getClass().getClassLoader());
Unmarshaller unmarshaller = context.createUnmarshaller();
obj = unmarshaller.unmarshal(new StringReader(xml));

Anyone have some insight?

EDIT: I tried the solution suggested by @abhin4v below (ie, add a space after &), but it doesn't seem to work too. Here's the stacktrace:

Caused by: org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
        at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
        at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
        at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:194)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

你穿错了嫁妆 2024-09-12 12:42:26

我也遇到过这个。第一遍我只是将 & 替换为令牌字符串 (AMPERSAND_TOKEN),通过 JAXB 发送它,然后重新替换 & 符号。不理想,但这是一个快速修复。

第二遍我做了很多重大改变,所以我不确定到底是什么解决了问题。我怀疑提供 JAXB 对 html dtds 的访问使其更快乐,但这只是一个猜测,可能特定于我的项目。

华泰

I've run into this too. First pass I simply replaced the & to a token string (AMPERSAND_TOKEN), sent it through JAXB, then re-replaced the ampersand. Not ideal, but it was a quick fix.

Second pass I made a lot of significant changes, so I'm not sure what exactly solved the problem. I suspect that providing JAXB access to the html dtds made it much happier, but that's only a guess and could be specific to my project.

HTH

妄想挽回 2024-09-12 12:42:26

Xerces 将 & 转换为 &,然后尝试解析 &Address,但失败,因为它不以 结尾;&Address 之间放置一个空格,它应该可以工作。 放置空格将不起作用,因为 Xerces 现在将尝试解析 & 并抛出 OP 中给出的第二个错误。您可以将测试包装在 CDATA 部分中,Xerces 将不会尝试解析实体。

Xerces converts & to & and then tries to resolve &Address which fails because it does not end with ;. Put a space between & and Address and it should work. Putting a space will not work as Xerces will now try to resolve & and throw the second error given in OP. You can wrap the test in a CDATA section and Xerces will not try to resolve the entities.

北城挽邺 2024-09-12 12:42:26

事实证明,问题是由于我正在使用的框架(Mentawai 框架)造成的。所述XML来自HTTP请求的POST主体。

显然,框架转换了 XML 正文中的字符实体,因此,& 变为 & 并且解组器无法解组 XML。

It turns out that the problem is because of the framework I'm using (Mentawai framework). The said XML comes from the POST body of an HTTP request.

Apparently, the framework converts the character entities in the XML body, therefore, & becomes & and the unmarshaller fails to unmarshal the XML.

德意的啸 2024-09-12 12:42:26

我发现添加 amp; 将修复解组错误。您希望它看起来像这样:

<address1>Test&amp;Address</address1>

我认为这告诉解组器应将&符号读取为数据值(在本例中为文本)而不是实体标识符。您可以通过错误看到它正在尝试将紧跟在 & 之后的“Address”视为实体名称

I've found that adding amp; will fix the unmarshalling error. You want it to look like this:

<address1>Test&amp;Address</address1>

I think this tells the unmarshaller that the ampersand should be read as a data value (text in this case) instead of an entity identifier. You can see by your errors that it's attempting to view "Address", which immediately follows the &, as an entity name

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文