在 Java 中读取 UTF-8 文件时出错
我正在尝试从包含 unicode 字符的文件中读取一些句子。它确实打印出一个字符串,但由于某种原因它弄乱了 unicode 字符
这是我的代码:
public static String readSentence(String resourceName) {
String sentence = null;
try {
InputStream refStream = ClassLoader
.getSystemResourceAsStream(resourceName);
BufferedReader br = new BufferedReader(new InputStreamReader(
refStream, Charset.forName("UTF-8")));
sentence = br.readLine();
} catch (IOException e) {
throw new RuntimeException("Cannot read sentence: " + resourceName);
}
return sentence.trim();
}
I am trying to read in some sentences from a file that contains unicode characters. It does print out a string but for some reason it messes up the unicode characters
This is the code I have:
public static String readSentence(String resourceName) {
String sentence = null;
try {
InputStream refStream = ClassLoader
.getSystemResourceAsStream(resourceName);
BufferedReader br = new BufferedReader(new InputStreamReader(
refStream, Charset.forName("UTF-8")));
sentence = br.readLine();
} catch (IOException e) {
throw new RuntimeException("Cannot read sentence: " + resourceName);
}
return sentence.trim();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
问题可能出在字符串的输出方式上。
我建议您通过执行以下操作来确认您正在正确读取 Unicode 字符:
并查看 Unicode 代码点对于混乱的字符是否正确。如果正确,则问题出在输出侧;如果不正确,则问题出在输入侧。
The problem is probably in the way that the string is being output.
I suggest that you confirm that you are correctly reading the Unicode characters by doing something like this:
and see if the Unicode codepoints are correct for the characters that are being messed up. If they are correct, then the problem is output side: if not, then input side.
首先,您可以创建 InputStreamReader,
此外,您还应该验证资源是否确实包含 UTF-8 内容。
First, you could create the InputStreamReader as
Also, you should verify if the resource really contains UTF-8 content.
最烦人的原因之一可能是...您的 IDE 设置。
如果您的 IDE 默认控制台编码类似于
latin1
,那么您将在不同的 java 代码变体中挣扎很长时间,但在您正确设置一些 IDE 选项之前没有任何帮助。One of the most annoying reason could be... your IDE settings.
If your IDE default console encoding is something like
latin1
then you'll be struggling very long with different variations of java code but nothing help untill you correctly set some IDE options.