2 字节 UTF-8 序列的无效字节 2

发布于 2024-08-24 06:35:48 字数 133 浏览 6 评论 0原文

我正在尝试使用 解析 XML 文件 但遇到错误消息2 字节 UTF-8 序列的无效字节 2。有谁知道是什么导致了这个问题?

I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8>
but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence. Does anybody know what caused this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

小伙你站住 2024-08-31 06:35:48

最常见的原因是输入了 ISO-8859-x(Latin-x,如 Latin-1),但解析器认为它正在获取 UTF-8。某些 Latin-1 字符序列(带有重音符号或变音符号的两个连续字符)形成的内容与 UTF-8 一样无效,特别是基于第一个字节,第二个字节具有意外的高位位。

当某些进程使用 Latin-1 转储 XML,但忘记输出 XML 声明(在这种情况下,XML 解析器必须默认为 UTF-8,根据 XML 规范),或者声称它是 UTF-8,即使事实并非如此。

Most commonly it's due to feeding ISO-8859-x (Latin-x, like Latin-1) but parser thinking it is getting UTF-8. Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8, and specifically such that based on first byte, second byte has unexpected high-order bits.

This can easily occur when some process dumps out XML using Latin-1, but either forgets to output XML declaration (in which case XML parser must default to UTF-8, as per XML specs), or claims it's UTF-8 even when it isn't.

装迷糊 2024-08-31 06:35:48

您可以尝试将 String.getBytes() 使用的默认字符编码更改为 utf-8。使用 VM 选项 -Dfile.encoding=utf-8。

You could try to change default character encoding used by String.getBytes() to utf-8. Use VM option -Dfile.encoding=utf-8.

血之狂魔 2024-08-31 06:35:48

即使文件以其他方式编码,解析器也设置为 UTF-8,或者文件被声明为使用 UTF-8 但实际上没有。

Either the parser is set for UTF-8 even though the file is encoded otherwise, or the file is declared as using UTF-8 but it really doesn't.

懒的傷心 2024-08-31 06:35:48

我也有同样的问题。我的问题是我使用 jdom 和 FileWriter(xmlFile) 创建了一个新的 XML 文件。 FileWriter 无法创建 UTF-8 文件。
相反,使用 FileOutputStream(xmlFile) 解决了这个问题。

I had the same problem. My problem was that I created a new XML file with jdom and the FileWriter(xmlFile). The FileWriter was not able to create a UTF-8 File.
Instead using the FileOutputStream(xmlFile) solved it.

帅哥哥的热头脑 2024-08-31 06:35:48

当尝试将 .xml 文件导入我的 java 工具时,我也遇到了同样的问题。我为此找到了一个很好的解决方案:
1. 使用 Notepad++ 打开 .xml 文件,然后将 .xml 文件另存为 .rtf 文件。然后在写字板应用程序中打开该文件。
2. 将.rtf 文件另存为.txt 文件,然后用记事本打开,然后再次另存为.xml 文件。在记事本中保存时,在弹出窗口末尾附近,确保选择选项“编码:UTF-8”。
它对我有用,希望对你也有用。

I had the same problem too when trying import my .xml file into my java tool. And I found a good solution for this:
1. Open the .xml file with Notepad++ then save the .xml file as .rtf file. Then open this file in WordPad application.
2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8".
It worked for mine, hope it's useful for yours too.

泅人 2024-08-31 06:35:48

对于那些仍然犯这样错误的人。

由于使用的是 UTF-8,请检查您的 xml 文档中是否有任何拉丁字母等:
我遇到了同样的问题,原因是我遇到了这个:

<n:name>Åke Jógvan Øyvind</n:name>

希望这有帮助

For those who still get such mistake.

since UTF-8 is being used check out your xml document for any latin letters or so:
I had the same problem and the reason was i had this:

<n:name>Åke Jógvan Øyvind</n:name>

Hope this helps

眼泪也成诗 2024-08-31 06:35:48

在这种情况下,切换输入编码可能会有所帮助:

XMLEventReader eventReader =
                            inputFactory.createXMLEventReader(in, 
                                    "utf-8"
                                    //"windows-1251"
                            );

The switching of the encoding for the input might help in this case:

XMLEventReader eventReader =
                            inputFactory.createXMLEventReader(in, 
                                    "utf-8"
                                    //"windows-1251"
                            );
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文