将已知编码的文件转换为 UTF-8

发布于 2024-10-07 04:45:10 字数 671 浏览 3 评论 0原文

我需要将文本文件转换为字符串,最后,我应该将其作为输入参数(类型为InputStream)放入IFile.create(Eclipse)。 正在寻找示例或如何做到这一点,但仍然无法弄清楚......需要您的帮助!

只是为了测试,我确实尝试将原始文本文件转换为使用此代码编码的 UTF-8,

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

但即使认为最终的 *.test.txt 文件具有 UTF-8 编码,内部字符也已损坏。

I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse).
Looking for the example or how to do that but still can not figure out...need your help!

just for testing, I did try to convert original text file to UTF-8 encoded with this code

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

娇女薄笑 2024-10-14 04:45:10

您需要使用 Charset 参数指定 InputStreamReader 的编码。

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

这也有效:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

另请参阅:

所以搜索我找到了所有这些链接:https://stackoverflow.com/search?q=java+detect+encoding


您可以在运行时通过 Charset 获取默认字符集(来自 JVM 运行的系统)。 defaultCharset()。

You need to specify the encoding of the InputStreamReader using the Charset parameter.

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This also works:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

See also:

SO search where I found all these links: https://stackoverflow.com/search?q=java+detect+encoding


You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文