JAXB 错误说明:1 字节 UTF-8 序列的字节 1 无效
我们正在使用 JAXB 解析 XML 文档并收到此错误:
[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
这到底是什么意思以及我们如何解决这个问题?
我们执行的代码如下:
jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));
更新
问题似乎是由于 XML 文件中的这个“有趣”字符造成的: ¿
为什么这会导致这样的问题?
更新2
文件中有两个奇怪的字符。它们位于文件的中间。请注意,该文件是根据数据库中的数据创建的,并且那些奇怪的字符以某种方式进入了数据库。
更新 3
以下是完整的 XML 片段:
<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>
更新 4
请注意,没有 标头。
特殊字符的十六进制是 BF
We're parsing an XML document using JAXB and get this error:
[org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
What exactly does this mean and how can we resolve this??
We are executing the code as:
jaxbContext = JAXBContext.newInstance(Results.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(getSchema());
results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile));
Update
Issue appears to be due to this "funny" character in the XML file: ¿
Why would this cause such a problem??
Update 2
There are two of those weird characters in the file. They are around the middle of the file. Note that the file is created based on data in a database and those weird characters somehow got into the database.
Update 3
Here is the full XML snippet:
<Description><![CDATA[Mt. Belvieu ¿ Texas]]></Description>
Update 4
Note that there is no <?xml ...?>
header.
The HEX for the special character is BF
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
因此,您的问题是,当您的文件使用其他编码(可能是 ISO-8859-1 或 Windows)时,JAXB 将没有
标头的 XML 文件视为 UTF-8 -1252,如果
0xBF
字符实际上是指¿
)。如果您可以更改文件的生产者,您可以添加具有实际编码规范的
标头,或者仅使用 UTF-8 写入文件。
如果您无法更改生产者,则必须使用具有显式编码规范的
InputStreamReader
,因为(不幸的是)JAXB 不允许更改其默认编码:但是,此解决方案很脆弱 - 它会失败在具有不同编码规范的
标头的输入文件上。
So, you problem is that JAXB treats XML files without
<?xml ...?>
header as UTF-8, when your file uses some other encoding (probably ISO-8859-1 or Windows-1252, if0xBF
character actually intended to mean¿
).If you can change the producer of the file, you may add
<?xml ...?>
header with actual encoding specification, or just use UTF-8 to write a file.If you can't change the producer, you have to use
InputStreamReader
with explicit encoding specification, because (unfortunately) JAXB don't allow to change its default encoding:However, this solution is fragile - it fails on input files with
<?xml ...?>
header with different encoding specification.这可能是一个字节顺序标记(BOM),并且是开头的特殊字节序列一个 UTF 文件。坦率地说,它们是令人讨厌的东西,并且在与 .net 系统交互时似乎特别常见。
尝试重新表述您的代码以使用
Reader
而不是InputStream
:Reader
能够识别 UTF,并且可能会更好地解决它。更简单地说,将File
直接传递给Unmarshaller
,并让JAXBContext
处理它:That's probably a Byte Order Mark (BOM), and is a special byte sequence at the start of a UTF file. They are, frankly, a pain in the arse, and seem particularly common when interacting with .net systems.
Try rephrasing your code to use a
Reader
rather than anInputStream
:A
Reader
is UTF-aware, and might make a better stab at it. More simply, pass theFile
directly to theUnmarshaller
, and let theJAXBContext
worry about it:听起来好像您的 XML 是使用 UTF-16 编码的,但该编码并未传递给 Unmarshaller。使用 Marshaller,您可以使用
marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16");
进行设置,但由于 Unmarshaller 不需要支持任何属性,所以我不确定如何强制执行除了确保您的 XML 文档在初始元素中具有
encoding="UTF-16"
之外。It sounds as if your XML is encoded with UTF-16 but that encoding is not getting passed to the Unmarshaller. With the Marshaller you can set that using
marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-16");
but because the Unmarshaller is not required to support any properties, I am not sure how to enforce that other than ensuring your XML document hasencoding="UTF-16"
in the initial<?xml?>
element.