在Java中读取带有重音字符的文件

发布于 2024-11-04 10:43:32 字数 445 浏览 0 评论 0原文

我遇到了两个特殊字符,它们似乎未被 ISO-8859-1 字符集涵盖,即它们无法进入我的程序。

德语 ß 和挪威语 ø

我正在读取文件如下:

FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;

有没有办法让我读取这些字符而无需应用手动替换作为解决方法?

[编辑]

这就是它在屏幕上的样子。请注意,我对其他口音没有问题,例如 è 和很多...

在此处输入图像描述

I came across two special characters which seem not to be covered by the ISO-8859-1 character set i.e. they don't make it through to my program.

The German ß
and the Norwegian ø

i'm reading the files as follows:

FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;

Is there a way for me to read these characters without having to apply manual replacement as a workaround?

[EDIT]

this is how it looks on screen. Note that i have no problems with other accents e.g. è and the lot...

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

暮年慕年 2024-11-11 10:43:32

这两个字符都出现在 ISO-Latin-1 中(检查我的名字以了解我为什么要研究这个)。

如果字符读取不正确,最可能的原因是文件中的文本没有以该编码保存,而是以其他编码保存。

根据您的操作系统和文件来源,可能的编码可能是 UTF-8 或 Windows 代码页(如 850 或 437)。

最简单的方法是使用十六进制编辑器查看文件并报告保存的确切值对于这两个角色。

Both characters are present in ISO-Latin-1 (check my name to see why I've looked into this).

If the characters are not read in correctly, the most likely cause is that the text in the file is not saved in that encoding, but in something else.

Depending on your operating system and the origin of the file, possible encodings could be UTF-8 or a Windows code page like 850 or 437.

The easiest way is to look at the file with a hex editor and report back what exact values are saved for these two characters.

梦里南柯 2024-11-11 10:43:32

ISO-8859-1 涵盖 ß 和 ø,因此该文件可能是以不同的编码保存。您应该将文件的编码传递给new InputStreamReader()

ISO-8859-1 covers ß and ø, so the file is probably saved in a different encoding. You should pass in file's encoding to new InputStreamReader().

守护在此方 2024-11-11 10:43:32

假设您的文件可能是 UTF-8 编码的,请尝试以下操作:

InputStreamReader ir = new InputStreamReader(inputFile, "UTF-8");

Assuming that your file is probably UTF-8 encoded, try this:

InputStreamReader ir = new InputStreamReader(inputFile, "UTF-8");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文