Java有获取各种字节顺序标记的方法吗？

发布于 2024-07-15 19:31:18 字数 233 浏览 8 评论 0原文

我正在 Java 中寻找一种实用方法或常量，它将返回与编码的适当字节顺序标记相对应的字节，但我似乎找不到。有吗？我真的很想做这样的事情：

byte[] bom = Charset.forName( CharEncoding.UTF8 ).getByteOrderMark();

其中 CharEncoding 来自 Apache Commons。

原文

I am looking for a utility method or constant in Java that will return me the bytes that correspond to the appropriate byte order mark for an encoding, but I can't seem to find one. Is there one? I really would like to do something like:

byte[] bom = Charset.forName( CharEncoding.UTF8 ).getByteOrderMark();

Where CharEncoding comes from Apache Commons.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以笙箫默 2024-07-22 19:31:18

Java 不识别 UTF-8 的字节顺序标记。查看错误 4508058 和 6378911。

要点是添加了支持，破坏了向后兼容性，并被回滚。您必须自己进行 UTF-8 中的 BOM 识别。

回复收藏 0 原文

傲影 2024-07-22 19:31:18

Apache Commons IO 包含您要查找的内容，请参阅 org.apache.commons.io.ByteOrderMark。

回复收藏 0 原文

人事已非 2024-07-22 19:31:18

您可以这样生成 BOM：

byte[] utf8_bom = "\uFEFF".getBytes("UTF-8");
byte[] utf16le_bom = "\uFEFF".getBytes("UnicodeLittleUnmarked");

如果您希望使用此方法为其他编码创建 BOM，请确保您使用的编码版本不会自动插入 BOM，否则会重复。此技术仅适用于 Unicode 编码，不会为其他编码（例如 Windows-1252）产生有意义的结果。

You can generate the BOM like this:

byte[] utf8_bom = "\uFEFF".getBytes("UTF-8");
byte[] utf16le_bom = "\uFEFF".getBytes("UnicodeLittleUnmarked");

If you wish to create the BOMs for other encodings using this method, make sure you use the version of the encoding that does not automatically insert the BOM or it will be repeated. This technique only applies to Unicode encodings and will not produce meaningful results for others (like Windows-1252).

回复收藏 0 原文