Java 中 UTF-8 转字符串
我在使用 UTF-8 字符集时遇到了一些问题。我有一个 UTF-8 编码的文件,我想加载和分析它。我正在使用 BufferedReader 逐行读取文件。
BufferedReader buffReader = new BufferedReader(new InputStreamReader
(new FileInputStream(file),"UTF-8"));
我的问题是,Java 中的法线 String 方法(例如 trim() 和 equals())不适合在我创建的用于读取 BufferedReader 的所有内容的循环的每次迭代中从 BufferReader 读取的行一起使用。 例如,在编码文件中,我有 < menu >
我希望我的程序按原样对待它,但是,目前,它被视为 ?? < menu >
与其他一些奇怪的字符混合在一起。 我想知道是否有一种方法可以删除所有字符集编码并仅保留纯文本,以便我可以使用 String 类的所有方法而不会出现复杂情况。 谢谢
I am having a little problem with the UTF-8 charset. I have a UTF-8 encoded file which I want to load and analyze. I am using BufferedReader to read the file line by line.
BufferedReader buffReader = new BufferedReader(new InputStreamReader
(new FileInputStream(file),"UTF-8"));
My problem is that the normals String methods (trim() and equals() for example) in Java are not suitable to use with the line read from the BufferReader in every iteration of the loop that I created to read all the content of the BufferedReader.
For example, in the encoded file, I have < menu >
which I want my program to treat it as it is, however, for now, it is seen as ?? < m e n u >
mixed with some others strange characters.
I want to know if there is a way to remove all the charset codifications and keep just the plain text so I can use all the methods of the String class without complications.
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
如果你的 jdk 还没有太旧(1.5),你可以这样做:
扫描仪还可以使用空格以外的分隔符。此示例从字符串中读取多个项目:
请参阅文档扫描仪在这里
If your jdk is not getting too old (1.5) you can do it like this :
The scanner can also use delimiters other than whitespace. This example reads several items in from a string:
see Doc for Scanner here