在Java中读取带有重音字符的文件
我遇到了两个特殊字符,它们似乎未被 ISO-8859-1 字符集涵盖,即它们无法进入我的程序。
德语 ß
和挪威语 ø
我正在读取文件如下:
FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;
有没有办法让我读取这些字符而无需应用手动替换作为解决方法?
[编辑]
这就是它在屏幕上的样子。请注意,我对其他口音没有问题,例如 è 和很多...
I came across two special characters which seem not to be covered by the ISO-8859-1
character set i.e. they don't make it through to my program.
The German ß
and the Norwegian ø
i'm reading the files as follows:
FileInputStream inputFile = new FileInputStream(corpus[i]);
InputStreamReader ir = new InputStreamReader(inputFile, "ISO-8859-1") ;
Is there a way for me to read these characters without having to apply manual replacement as a workaround?
[EDIT]
this is how it looks on screen. Note that i have no problems with other accents e.g. è and the lot...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这两个字符都出现在 ISO-Latin-1 中(检查我的名字以了解我为什么要研究这个)。
如果字符读取不正确,最可能的原因是文件中的文本没有以该编码保存,而是以其他编码保存。
根据您的操作系统和文件来源,可能的编码可能是 UTF-8 或 Windows 代码页(如 850 或 437)。
最简单的方法是使用十六进制编辑器查看文件并报告保存的确切值对于这两个角色。
Both characters are present in ISO-Latin-1 (check my name to see why I've looked into this).
If the characters are not read in correctly, the most likely cause is that the text in the file is not saved in that encoding, but in something else.
Depending on your operating system and the origin of the file, possible encodings could be UTF-8 or a Windows code page like 850 or 437.
The easiest way is to look at the file with a hex editor and report back what exact values are saved for these two characters.
ISO-8859-1 涵盖 ß 和 ø,因此该文件可能是以不同的编码保存。您应该将文件的编码传递给
new InputStreamReader()
。ISO-8859-1 covers ß and ø, so the file is probably saved in a different encoding. You should pass in file's encoding to
new InputStreamReader()
.假设您的文件可能是 UTF-8 编码的,请尝试以下操作:
Assuming that your file is probably UTF-8 encoded, try this: