如何检测文件的字符编码？

发布于 2024-09-18 05:17:37 字数 314 浏览 10 评论 0原文

我们的应用程序从用户处接收文件，如果这些文件属于我们支持的编码类型（即 UTF-8、Shift-JIS、EUC-JP），则必须对其进行验证，一旦验证该文件，我们还需要将该文件保存在我们的系统中并将其编码为元数据。

目前，我们使用 JCharDet （这是 mozilla 字符检测器的 java 端口），但是有一些 Shift-JIS 字符似乎无法检测为有效的 Shift-JIS 字符。

有什么想法我们还可以使用吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏见 2024-09-25 05:17:38

ICU4J 的 CharsetDetector 将为您提供帮助。

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(path));
CharsetDetector cd = new CharsetDetector();
cd.setText(bis);
String charsetName = cd.detect().getName();

顺便问一下，什么样的字符导致了这个错误，又导致了什么样的错误呢？我认为 ICU4J 也会有同样的问题，具体取决于字符和错误。

ICU4J's CharsetDetector will help you.

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(path));
CharsetDetector cd = new CharsetDetector();
cd.setText(bis);
String charsetName = cd.detect().getName();

By the way, what kind of character had caused the error, and what kind of error had caused? I think ICU4J would have same problem, depending on the character and the error.

回复收藏 0 原文