Java 中 UTF-8 转字符串

发布于 11-04 10:02 字数 518 浏览 8 评论 0原文

我在使用 UTF-8 字符集时遇到了一些问题。我有一个 UTF-8 编码的文件，我想加载和分析它。我正在使用 BufferedReader 逐行读取文件。

BufferedReader buffReader = new BufferedReader(new InputStreamReader
(new FileInputStream(file),"UTF-8"));

我的问题是，Java 中的法线 String 方法（例如 trim() 和 equals()）不适合在我创建的用于读取 BufferedReader 的所有内容的循环的每次迭代中从 BufferReader 读取的行一起使用。例如，在编码文件中，我有 < menu > 我希望我的程序按原样对待它，但是，目前，它被视为 ?? < menu > 与其他一些奇怪的字符混合在一起。我想知道是否有一种方法可以删除所有字符集编码并仅保留纯文本，以便我可以使用 String 类的所有方法而不会出现复杂情况。谢谢

原文

I am having a little problem with the UTF-8 charset. I have a UTF-8 encoded file which I want to load and analyze. I am using BufferedReader to read the file line by line.

BufferedReader buffReader = new BufferedReader(new InputStreamReader
(new FileInputStream(file),"UTF-8"));

My problem is that the normals String methods (trim() and equals() for example) in Java are not suitable to use with the line read from the BufferReader in every iteration of the loop that I created to read all the content of the BufferedReader.
For example, in the encoded file, I have < menu > which I want my program to treat it as it is, however, for now, it is seen as ?? < m e n u > mixed with some others strange characters.
I want to know if there is a way to remove all the charset codifications and keep just the plain text so I can use all the methods of the String class without complications.
Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野却迷人2024-11-11 10:02:23

如果你的 jdk 还没有太旧（1.5），你可以这样做：

Locale frLocale = new Locale("fr", "FR");
Scanner scanner = new Scanner(new FileInputStream(file), "UTF-8");
scanner.useLocale(frLocale);

for (; scanner.hasNextLine(); numLine++) {
 line = scanner.nextLine();
}

扫描仪还可以使用空格以外的分隔符。此示例从字符串中读取多个项目：

         String input = "1 fish 2 fish red fish blue fish";
         Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
         System.out.println(s.nextInt());
         System.out.println(s.nextInt());
         System.out.println(s.next());
         System.out.println(s.next());
         s.close(); 

prints the following output:

         1
         2
         red
         blue

请参阅文档扫描仪在这里

If your jdk is not getting too old (1.5) you can do it like this :

Locale frLocale = new Locale("fr", "FR");
Scanner scanner = new Scanner(new FileInputStream(file), "UTF-8");
scanner.useLocale(frLocale);

for (; scanner.hasNextLine(); numLine++) {
 line = scanner.nextLine();
}

The scanner can also use delimiters other than whitespace. This example reads several items in from a string:

         String input = "1 fish 2 fish red fish blue fish";
         Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
         System.out.println(s.nextInt());
         System.out.println(s.nextInt());
         System.out.println(s.next());
         System.out.println(s.next());
         s.close(); 

prints the following output:

         1
         2
         red
         blue

see Doc for Scanner here

回复收藏 0 原文

~没有更多了~