Java有获取各种字节顺序标记的方法吗?
我正在 Java 中寻找一种实用方法或常量,它将返回与编码的适当字节顺序标记相对应的字节,但我似乎找不到。 有吗? 我真的很想做这样的事情:
byte[] bom = Charset.forName( CharEncoding.UTF8 ).getByteOrderMark();
其中 CharEncoding
来自 Apache Commons。
I am looking for a utility method or constant in Java that will return me the bytes that correspond to the appropriate byte order mark for an encoding, but I can't seem to find one. Is there one? I really would like to do something like:
byte[] bom = Charset.forName( CharEncoding.UTF8 ).getByteOrderMark();
Where CharEncoding
comes from Apache Commons.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Java 不识别 UTF-8 的字节顺序标记。 查看错误 4508058 和 6378911。
要点是添加了支持,破坏了向后兼容性,并被回滚。 您必须自己进行 UTF-8 中的 BOM 识别。
Java does not recognize byte order marks for UTF-8. See bugs 4508058 and 6378911.
The gist is that support was added, broke backwards compatibility, and was rolled back. You'll have to do BOM recognition in UTF-8 yourself.
Apache Commons IO 包含您要查找的内容,请参阅
org.apache.commons.io.ByteOrderMark
。Apache Commons IO contains what you are looking for, see
org.apache.commons.io.ByteOrderMark
.您可以这样生成 BOM:
如果您希望使用此方法为其他编码创建 BOM,请确保您使用的编码版本不会自动插入 BOM,否则会重复。 此技术仅适用于 Unicode 编码,不会为其他编码(例如 Windows-1252)产生有意义的结果。
You can generate the BOM like this:
If you wish to create the BOMs for other encodings using this method, make sure you use the version of the encoding that does not automatically insert the BOM or it will be repeated. This technique only applies to Unicode encodings and will not produce meaningful results for others (like Windows-1252).
据我所知,JDK 中没有任何内容,Apache 项目中也没有任何内容。
然而,Eclipse EMF 有一个 Enum 提供支持:
org.eclipse.emf.ecore.resource.ContentHandler.ByteOrderMark
我不确定这对您是否有帮助?
这里有一些关于每种编码类型的各种 BOM 的更多信息,您可以为此编写一个简单的帮助器类或枚举...
http://mindprod.com/jgloss/bom.html
希望有帮助。 老实说,我很惊讶这不在 Commons I/O 中。
There isn't anything in the JDK as far as I can see, nor any of the Apache projects.
Eclipse EMF has an Enum however that provides support:
org.eclipse.emf.ecore.resource.ContentHandler.ByteOrderMark
I'm not sure whether that's of any help to you?
There's some more info here on the various BOM's for each encoding type, you could write a simple helper class or enum for this...
http://mindprod.com/jgloss/bom.html
Hope that helps. I'm surprised this isn't in Commons I/O to be honest.
值得注意的是,许多编码不使用任何字节顺序标记。 例如,UTF-8 中的空字符串只是一个空 byte[]。 虽然为 UTF-8 指定了 BOM,但它很少在 Java 中使用,并且并不总是受支持。
It worth noting that many encodings don't use any byte order marks. e.g. an empty string in UTF-8 is just an empty byte[]. While there is a BOM specified for UTF-8 it is rarely used in Java and is not always supported.