2 字节 UTF-8 序列的无效字节 2
我正在尝试使用 解析 XML 文件 但遇到错误消息
2 字节 UTF-8 序列的无效字节 2
。有谁知道是什么导致了这个问题?
I am trying to parse an XML file with <?version = 1.0, encoding = UTF-8>
but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence
. Does anybody know what caused this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
最常见的原因是输入了
ISO-8859-x
(Latin-x,如 Latin-1),但解析器认为它正在获取UTF-8
。某些 Latin-1 字符序列(带有重音符号或变音符号的两个连续字符)形成的内容与UTF-8
一样无效,特别是基于第一个字节,第二个字节具有意外的高位位。当某些进程使用 Latin-1 转储
XML
,但忘记输出XML
声明(在这种情况下,XML
解析器必须默认为UTF-8
,根据XML
规范),或者声称它是UTF-8
,即使事实并非如此。Most commonly it's due to feeding
ISO-8859-x
(Latin-x, like Latin-1) but parser thinking it is gettingUTF-8
. Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid asUTF-8
, and specifically such that based on first byte, second byte has unexpected high-order bits.This can easily occur when some process dumps out
XML
using Latin-1, but either forgets to outputXML
declaration (in which caseXML
parser must default toUTF-8
, as perXML
specs), or claims it'sUTF-8
even when it isn't.您可以尝试将 String.getBytes() 使用的默认字符编码更改为 utf-8。使用 VM 选项 -Dfile.encoding=utf-8。
You could try to change default character encoding used by String.getBytes() to utf-8. Use VM option -Dfile.encoding=utf-8.
即使文件以其他方式编码,解析器也设置为 UTF-8,或者文件被声明为使用 UTF-8 但实际上没有。
Either the parser is set for UTF-8 even though the file is encoded otherwise, or the file is declared as using UTF-8 but it really doesn't.
我也有同样的问题。我的问题是我使用 jdom 和 FileWriter(xmlFile) 创建了一个新的 XML 文件。 FileWriter 无法创建 UTF-8 文件。
相反,使用 FileOutputStream(xmlFile) 解决了这个问题。
I had the same problem. My problem was that I created a new XML file with jdom and the FileWriter(xmlFile). The FileWriter was not able to create a UTF-8 File.
Instead using the FileOutputStream(xmlFile) solved it.
当尝试将 .xml 文件导入我的 java 工具时,我也遇到了同样的问题。我为此找到了一个很好的解决方案:
1. 使用 Notepad++ 打开 .xml 文件,然后将 .xml 文件另存为 .rtf 文件。然后在写字板应用程序中打开该文件。
2. 将.rtf 文件另存为.txt 文件,然后用记事本打开,然后再次另存为.xml 文件。在记事本中保存时,在弹出窗口末尾附近,确保选择选项“编码:UTF-8”。
它对我有用,希望对你也有用。
I had the same problem too when trying import my .xml file into my java tool. And I found a good solution for this:
1. Open the .xml file with Notepad++ then save the .xml file as .rtf file. Then open this file in WordPad application.
2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8".
It worked for mine, hope it's useful for yours too.
对于那些仍然犯这样错误的人。
由于使用的是 UTF-8,请检查您的 xml 文档中是否有任何拉丁字母等:
我遇到了同样的问题,原因是我遇到了这个:
希望这有帮助
For those who still get such mistake.
since UTF-8 is being used check out your xml document for any latin letters or so:
I had the same problem and the reason was i had this:
Hope this helps
在这种情况下,切换输入编码可能会有所帮助:
The switching of the encoding for the input might help in this case: