在 Java 中读取 UTF-8 文件时出错

发布于 2024-09-11 04:00:23 字数 613 浏览 2 评论 0原文

我正在尝试从包含 unicode 字符的文件中读取一些句子。它确实打印出一个字符串，但由于某种原因它弄乱了 unicode 字符

这是我的代码：

public static String readSentence(String resourceName) {

    String sentence = null;
    try {
        InputStream refStream = ClassLoader
                .getSystemResourceAsStream(resourceName);
        BufferedReader br = new BufferedReader(new InputStreamReader(
                refStream, Charset.forName("UTF-8")));
        sentence = br.readLine();
    } catch (IOException e) {
        throw new RuntimeException("Cannot read sentence: " + resourceName);
    }
    return sentence.trim();
}

原文

I am trying to read in some sentences from a file that contains unicode characters. It does print out a string but for some reason it messes up the unicode characters

This is the code I have:

public static String readSentence(String resourceName) {

    String sentence = null;
    try {
        InputStream refStream = ClassLoader
                .getSystemResourceAsStream(resourceName);
        BufferedReader br = new BufferedReader(new InputStreamReader(
                refStream, Charset.forName("UTF-8")));
        sentence = br.readLine();
    } catch (IOException e) {
        throw new RuntimeException("Cannot read sentence: " + resourceName);
    }
    return sentence.trim();
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嘦怹 2024-09-18 04:00:23

问题可能出在字符串的输出方式上。

我建议您通过执行以下操作来确认您正在正确读取 Unicode 字符：

for (char c : sentence.toCharArray()) {
    System.err.println("char '" + ch + "' is unicode codepoint " + ((int) ch)));
}

并查看 Unicode 代码点对于混乱的字符是否正确。如果正确，则问题出在输出侧；如果不正确，则问题出在输入侧。

The problem is probably in the way that the string is being output.

I suggest that you confirm that you are correctly reading the Unicode characters by doing something like this:

for (char c : sentence.toCharArray()) {
    System.err.println("char '" + ch + "' is unicode codepoint " + ((int) ch)));
}

and see if the Unicode codepoints are correct for the characters that are being messed up. If they are correct, then the problem is output side: if not, then input side.

回复收藏 0 原文

小ぇ时光︴ 2024-09-18 04:00:23

首先，您可以创建 InputStreamReader，

new InputStreamReader(refStream, "UTF-8")

此外，您还应该验证资源是否确实包含 UTF-8 内容。

First, you could create the InputStreamReader as

new InputStreamReader(refStream, "UTF-8")

Also, you should verify if the resource really contains UTF-8 content.

回复收藏 0 原文

烟织青萝梦 2024-09-18 04:00:23

最烦人的原因之一可能是...您的 IDE 设置。

如果您的 IDE 默认控制台编码类似于 latin1 ，那么您将在不同的 java 代码变体中挣扎很长时间，但在您正确设置一些 IDE 选项之前没有任何帮助。

回复收藏 0 原文

~没有更多了~

关于作者

記憶穿過時間隧道

暂无简介

0 文章

0 评论

21 人气

关注发私信

内心激荡

文章 0 评论 0

关注

JSmiles

文章 0 评论 0

关注

赏烟花じ飞满天

文章 0 评论 0

关注

左秋

文章 0 评论 0

关注

迪街小绵羊

文章 0 评论 0

关注

瞳孔里扚悲伤

文章 0 评论 0

友情链接

文江博客

在 Java 中读取 UTF-8 文件时出错

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

内心激荡

JSmiles

赏烟花じ飞满天

左秋

迪街小绵羊

瞳孔里扚悲伤

友情链接

在 Java 中读取 UTF-8 文件时出错

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

内心激荡

JSmiles

赏烟花じ飞满天

左秋

迪街小绵羊

瞳孔里扚悲伤

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。