java在文件开头读取一个不存在的奇怪字符
我的硬盘上有一个简单的 xml 文件。 当我用记事本++打开它时,这就是我看到的:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>
... more stuff here ...
</content>
但是当我使用FileInputStream
读取它时,我得到:
?<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>...
我正在使用JAXB来解析xml,它抛出一个异常“序言中不允许的内容” “因为那个”?符号。
这个额外的“?”是什么?符号?为什么它在那里?我该如何摆脱它?
I have a simple xml file on my hard drive.
When I open it with notepad++ this is what I see:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>
... more stuff here ...
</content>
But when I read it using a FileInputStream
I get:
?<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<content>...
I'm using JAXB to parse xml's and it throws an exception of "content not allowed in prolog" because of that "?" sign.
What is this extra "?" sign? why is it there and how do I get rid of it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这个额外的字符是一个字节顺序标记,一个特殊的 Unicode 字符代码,它让 XML 解析器知道什么文件中字节的字节顺序(小端或大端)是。
通常,您的 XML 解析器应该能够理解这一点。 (如果没有,我会认为这是 XML 解析器中的错误)。
作为解决方法,请确保生成此 XML 的程序不包含 BOM。
That extra character is a byte order mark, a special Unicode character code which lets the XML parser know what the byte order (little endian or big endian) of the bytes in the file is.
Normally, your XML parser should be able to understand this. (If it doesn't, I would regard that a bug in the XML parser).
As a workaround, make sure that the program that produces this XML leaves off the BOM.
检查文件的编码,我见过类似的事情,在大多数编辑器中打开文件,看起来不错,结果发现它是用 UTF-8 编码的,没有 BOM(或者,我不记得了我的头)。 Notepad++ 应该可以在两者之间切换。
Check the encoding of the file, I've seen a similar thing, openeing the file in most editors and it looked fine, turned out it was encoded with UTF-8 without BOM (or with, I can't recall off the top of my head). Notepad++ should be ok to switch between the two.
您可以使用 Notepad++ 查看
View > 中显示的所有符号。显示符号>显示所有字符
菜单。它会向您显示开头存在的额外字节。有可能是字节顺序标记。如果额外的字节确实是字节顺序标记,则此方法将无济于事。在这种情况下,您将需要下载十六进制编辑器,或者如果您安装了 Cygwin,请按照此响应最后一段中的步骤操作。一旦您可以看到十六进制代码的文件,请查找前两个字符。他们是否有 http://en.wikipedia.org/wiki/Byte_order_mark# 中提到的代码之一Representations_of_byte_order_marks_by_encoding如果它们确实是字节顺序标记或者您无法确定错误原因,请尝试this:
从菜单中选择
Encoding >使用UTF-8无BOM编码
,然后保存文件。(在 Linux 上,可以使用命令行工具来检查开头的内容。例如
xxd -g1 filename | head
或od -t cx1 filename | head
。)You can use Notepad++ to see show all symbols from the
View > Show Symbols > Show All Characters
menu. It would show you the extra bytes present in the beginning. There is a possibility that it is the byte order mark. If the extra bytes are indeed byte order mark, this approach would not help. In that case, you will need to download a hex editor or if you have Cygwin installed, follow the steps in the last paragraph of this response. Once you can see the file in terms of hex codes, look for the first two characters. Do they have one of the codes mentioned at http://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encodingIf they indeed are byte order mark or if you are unable to determine the cause of the error, just try this:
From the menu select,
Encoding > Encoding in UTF-8 without BOM
, and then save the file.(On Linux, one can use command line tools to check what's the in the beginning. e.g.
xxd -g1 filename | head
orod -t cx1 filename | head
.)您可能有一个换行符。删除那个。
选择
查看>显示符号>在 Notepad++ 中显示所有字符
以查看发生了什么。You might be having a newline. Delete that.
Select
View > Show Symbol > Show All Characters
in Notepad++ to see what's happening.这不是 jaxb 问题,问题在于您用来读取 xml 的方式...尝试使用输入流
this is not a jaxb problem, the problem resides in the way you use to read the xml ... try using an inputstream
除了 FileInputStream 之外,ByteArrayInputStream 也适用于我:
=>不再出现解组错误。
Next to the FileInputStream a ByteArrayInputStream worked also with me:
=> No unmarshaling error anymore.