JAXB 错误说明：1 字节 UTF-8 序列的字节 1 无效

发布于 2024-09-05 10:33:31 字数 1021 浏览 3 评论 0原文

我们正在使用 JAXB 解析 XML 文档并收到此错误：

[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)

这到底是什么意思以及我们如何解决这个问题？

我们执行的代码如下：

jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));

更新

问题似乎是由于 XML 文件中的这个“有趣”字符造成的： ¿

为什么这会导致这样的问题？

更新2

文件中有两个奇怪的字符。它们位于文件的中间。请注意，该文件是根据数据库中的数据创建的，并且那些奇怪的字符以某种方式进入了数据库。

更新 3

以下是完整的 XML 片段：

<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>

更新 4

请注意，没有标头。

特殊字符的十六进制是 BF

原文

We're parsing an XML document using JAXB and get this error:

[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)

What exactly does this mean and how can we resolve this??

We are executing the code as:

jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));

Update

Issue appears to be due to this "funny" character in the XML file: ¿

Why would this cause such a problem??

Update 2

There are two of those weird characters in the file. They are around the middle of the file. Note that the file is created based on data in a database and those weird characters somehow got into the database.

Update 3

Here is the full XML snippet:

<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>

Update 4

Note that there is no <?xml ...?> header.

The HEX for the special character is BF

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〆一缕阳光ご 2024-09-12 10:33:31

因此，您的问题是，当您的文件使用其他编码（可能是 ISO-8859-1 或 Windows）时，JAXB 将没有标头的 XML 文件视为 UTF-8 -1252，如果 0xBF 字符实际上是指 ¿）。

如果您可以更改文件的生产者，您可以添加具有实际编码规范的标头，或者仅使用 UTF-8 写入文件。

如果您无法更改生产者，则必须使用具有显式编码规范的 InputStreamReader ，因为（不幸的是）JAXB 不允许更改其默认编码：

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1"));

但是，此解决方案很脆弱 - 它会失败在具有不同编码规范的标头的输入文件上。

So, you problem is that JAXB treats XML files without <?xml ...?> header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if 0xBF character actually intended to mean ¿).

If you can change the producer of the file, you may add <?xml ...?> header with actual encoding specification, or just use UTF-8 to write a file.

If you can't change the producer, you have to use InputStreamReader with explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding:

results = (Results) unmarshaller.unmarshal(
   new InputStreamReader(new FileInputStream(inputFile), "ISO-8859-1"));

However, this solution is fragile - it fails on input files with <?xml ...?> header with different encoding specification.

回复收藏 0 原文

生寂 2024-09-12 10:33:31

这可能是一个字节顺序标记（BOM），并且是开头的特殊字节序列一个 UTF 文件。坦率地说，它们是令人讨厌的东西，并且在与 .net 系统交互时似乎特别常见。

尝试重新表述您的代码以使用 Reader 而不是 InputStream：

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

Reader 能够识别 UTF，并且可能会更好地解决它。更简单地说，将 File 直接传递给 Unmarshaller，并让 JAXBContext 处理它：

results = (Results) unmarshaller.unmarshal(inputFile);

That's probably a Byte Order Mark (BOM), and is a special byte sequence at the start of a UTF file. They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems.

Try rephrasing your code to use a Reader rather than an InputStream:

results = (Results) unmarshaller.unmarshal(new FileReader(inputFile));

A Reader is UTF-aware, and might make a better stab at it. More simply, pass the File directly to the Unmarshaller, and let the JAXBContext worry about it:

results = (Results) unmarshaller.unmarshal(inputFile);

回复收藏 0 原文

千紇 2024-09-12 10:33:31

听起来好像您的 XML 是使用 UTF-16 编码的，但该编码并未传递给 Unmarshaller。使用 Marshaller，您可以使用 marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16"); 进行设置，但由于 Unmarshaller 不需要支持任何属性，所以我不确定如何强制执行除了确保您的 XML 文档在初始元素中具有 encoding="UTF-16" 之外。

回复收藏 0 原文

~没有更多了~