在java和csv文件中设置UTF-8

发布于 2024-10-02 06:58:53 字数 654 浏览 2 评论 0原文

我使用此代码通过 OpenCSV 将波斯语单词添加到 csv 文件中:

String[] entries="\u0645 \u062E\u062F\u0627".split("#");
try{
    CSVWriter writer=new CSVWriter(new OutputStreamWriter(new FileOutputStream("C:\\test.csv"), "UTF-8"));

    writer.writeNext(entries);
    writer.close();
}
catch(IOException ioe){
    ioe.printStackTrace();
}

当我打开生成的 csv 文件时, Excel,它包含“ứỶờịỆ”。其他程序(例如 notepad.exe)没有此问题,但我的所有用户都使用 MS Excel。

将 OpenCSV 替换为 SuperCSV 并不能解决此问题。

当我手动将波斯语字符输入 csv 文件时,没有任何问题。

I am using this code for add Persian words to a csv file via OpenCSV:

String[] entries="\u0645 \u062E\u062F\u0627".split("#");
try{
    CSVWriter writer=new CSVWriter(new OutputStreamWriter(new FileOutputStream("C:\\test.csv"), "UTF-8"));

    writer.writeNext(entries);
    writer.close();
}
catch(IOException ioe){
    ioe.printStackTrace();
}

When I open the resulting csv file, in Excel, it contains "ứỶờịỆ". Other programs such as notepad.exe don't have this problem, but all of my users are using MS Excel.

Replacing OpenCSV with SuperCSV does not solve this problem.

When I typed Persian characters into csv file manually, I don't have any problems.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

私藏温柔 2024-10-09 06:58:53

我花了一些时间但找到了解决你的问题的方法。

首先,我打开记事本并写下以下行:שלום,你好,привет
然后我使用 UTF-8 将其保存为文件 he-en-ru.csv。
然后我用 MS Excel 打开它,一切正常。

现在,我编写了一个简单的 java 程序,将这一行打印到文件中,如下所示:

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
    w.print(line);
    w.flush();
    w.close();

当我使用 Excel 打开此文件时,我看到了“乱码”。

然后我尝试读取 2 个文件的内容(如预期),看到记事本生成的文件包含 3 个字节前缀:

    239 EF
    187 BB
    191 BF

因此,我修改了代码以首先打印此前缀,然后打印文本:

    String line = "שלום, hello, привет";
    OutputStream os = new FileOutputStream("c:/temp/j.csv");
    os.write(239);
    os.write(187);
    os.write(191);

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));

    w.print(line);
    w.flush();
    w.close();

它起作用了!我使用 Excel 打开该文件并看到了我预期的文本。

底线:在写入内容之前写入这 3 个字节。此前缀表示内容采用 'UTF-8 with BOM'(否则它只是“没有 BOM 的 UTF-8”)。

I spent some time but found solution for your problem.

First I opened notepad and wrote the following line: שלום, hello, привет
Then I saved it as file he-en-ru.csv using UTF-8.
Then I opened it with MS excel and everything worked well.

Now, I wrote a simple java program that prints this line to file as following:

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));
    w.print(line);
    w.flush();
    w.close();

When I opened this file using excel I saw "gibrish."

Then I tried to read content of 2 files and (as expected) saw that file generated by notepad contains 3 bytes prefix:

    239 EF
    187 BB
    191 BF

So, I modified my code to print this prefix first and the text after that:

    String line = "שלום, hello, привет";
    OutputStream os = new FileOutputStream("c:/temp/j.csv");
    os.write(239);
    os.write(187);
    os.write(191);

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));

    w.print(line);
    w.flush();
    w.close();

And it worked! I opened the file using excel and saw text as I expected.

Bottom line: write these 3 bytes before writing the content. This prefix indicates that the content is in 'UTF-8 with BOM' (otherwise it is just 'UTF-8 without BOM').

最笨的告白 2024-10-09 06:58:53

不幸的是,CSV 是一种非常特殊的格式,没有元数据,也没有强制灵活编码的真正标准。只要您使用 CSV,就无法可靠地使用 ASCII 之外的任何字符。

您的替代方案:

  • 写入 XML(如果操作正确,它确实具有编码元数据)并让用户 将 XML 导入 Excel。
  • 使用 Apache POI 创建实际的 Excel 文档。

Unfortunately, CSV is a very ad hoc format with no metadata and no real standard that would mandate a flexible encoding. As long as you use CSV, you can't reliably use any characters outside of ASCII.

Your alternatives:

  • Write to XML (which does have encoding metadata if you do it right) and have the users import the XML into Excel.
  • Use Apache POI to create actual Excel documents.
悲凉≈ 2024-10-09 06:58:53

Excel 不使用 UTF8 打开 CSV 文件。这是一个已知问题。实际使用的编码取决于 Microsoft Windows 的区域设置。例如,对于德国 lcoale,Excel 将使用 CP1252 打开 CSV 文件。

您可以创建一个包含一些波斯语字符的 Excel 文件并将其另存为 CSV 文件。然后编写一个小的Java程序来读取这个文件并测试一些常见的编码。这就是我用来计算 CSV 文件中德语元音变音的正确编码的方法。

Excel doesn't use UTF8 to open CSV files. Thats a known problem. The actual encoding used depends on the locale settings of Microsoft Windows. With a German lcoale for example Excel would open a CSV file with CP1252.

You could create an Excel file containing some persian characters and save it as an CSV file. Then write a small Java program to read this file and test some common encodings. Thats the way I used to figure out the correct encoding for German umlauts in CSV files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文