使用java和UTF-16LE到UTF-8转换打开xls文件并将其保存为tsv文件

发布于 2025-01-07 15:14:17 字数 156 浏览 3 评论 0原文

我有两个问题:

有没有一种方法可以通过Java打开xls文件并将其另存为tsv文件? 编辑: 或者有没有一种方法可以通过Java将xls文件转换为tsv文件?

有没有一种方法可以使用 java 将 UTF-16LE 文件转换为 UTF-8 ?

谢谢

I've two questions:

Is there a way through which we can open a xls file and save it as a tsv file through Java?
EDIT:
Or is there a way through which we can convert a xls file into an tsv file through Java?

Is there a way in which we can convert a UTF-16LE file to UTF-8 using java ?

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

对不⑦ 2025-01-14 15:14:17

我有两个问题:

在 StackOverflow 上,您应该将其分为两个不同的问题...

我将回答您的第二个问题:

有没有一种方法可以将 UTF-16LE 文件转换为 UTF-8 使用
java?

是的当然。而且方法不止一种。

基本上,您想要读取指定输入编码(UTF-16LE)的输入文件,然后写入指定输出编码(UTF-8)的文件。

假设你有一些 UTF-16LE 编码的文件:

... $ file testInput.txt 
testInput.txt: Little-endian UTF-16 Unicode character data

然后你基本上可以在 Java 中执行类似的操作(这只是一个示例:你需要填写缺少的异常处理代码,也许不在末尾添加最后一个换行符,也许丢弃BOM(如果有)等):

    FileInputStream fis = new FileInputStream(new File("/home/.../testInput.txt") );
    InputStreamReader isr = new InputStreamReader( fis, Charset.forName("UTF-16LE") );
    BufferedReader br = new BufferedReader( isr );
    FileOutputStream fos = new FileOutputStream(new File("/home/.../testOutput.txt"));
    OutputStreamWriter osw = new OutputStreamWriter( fos, Charset.forName("UTF-8") );
    BufferedWriter bw = new BufferedWriter( osw );
    String line = null;
    while ( (line = br.readLine()) != null ) {
        bw.write(line);
        bw.newLine();   // will add an unnecessary newline at the end of your file, fix this
    }
    bw.flush();
    // take care of closing the streams here etc.

这将创建一个 UTF-8 编码的文件。

$ file testOutput.txt 
testOutput.txt: UTF-8 Unicode (with BOM) text

使用例如 hexdump 可以清楚地看到 BOM:

 $ hexdump testOutput.txt -C
00000000  ef bb bf ... (snip)

BOM 在 UTF-8 中以三个字节编码 (ef bb fb),而在 UTF-16 中则以两个字节编码。在 UTF16-LE 中,BOM 如下所示:

$ hexdump testInput.txt -C
00000000  ff fe ... (snip)

请注意,UTF-8 编码的文件可能有也可能没有(两者都完全有效)有“BOM”(字节顺序掩码)。 UTF-8 文件中的 BOM 并不那么愚蠢:您不关心字节顺序,但它可以帮助快速识别文本文件是否是 UTF-8 编码的。根据 Unicode 规范,带有 BOM 的 UTF-8 文件是完全合法的,因此无法处理以 BOM 开头的 UTF-8 文件的读者会被破坏。简单明了。

如果由于某种原因您正在使用无法处理 BOM 的损坏的 UTF-8 阅读器,那么您可能需要在将第一个字符串写入磁盘之前将其删除。

有关 BOM 的更多信息,请访问:

http://unicode.org/faq/utf_bom.html

I've two questions:

On StackOverflow you should split that into two different questions...

I'll answer your second question:

Is there a way in which we can convert a UTF-16LE file to UTF-8 using
java?

Yes of course. And there's more than one way.

Basically you want to read your input file specifying the input encoding (UTF-16LE) and then write the file specifying the output encoding (UTF-8).

Say you have some UTF-16LE encoded file:

... $ file testInput.txt 
testInput.txt: Little-endian UTF-16 Unicode character data

You then basically could do something like this in Java (it's just an example: you'll want to fill in missing exception handling code, maybe not put a last newline at the end, maybe discard the BOM if any, etc.):

    FileInputStream fis = new FileInputStream(new File("/home/.../testInput.txt") );
    InputStreamReader isr = new InputStreamReader( fis, Charset.forName("UTF-16LE") );
    BufferedReader br = new BufferedReader( isr );
    FileOutputStream fos = new FileOutputStream(new File("/home/.../testOutput.txt"));
    OutputStreamWriter osw = new OutputStreamWriter( fos, Charset.forName("UTF-8") );
    BufferedWriter bw = new BufferedWriter( osw );
    String line = null;
    while ( (line = br.readLine()) != null ) {
        bw.write(line);
        bw.newLine();   // will add an unnecessary newline at the end of your file, fix this
    }
    bw.flush();
    // take care of closing the streams here etc.

This shall create a UTF-8 encoded file.

$ file testOutput.txt 
testOutput.txt: UTF-8 Unicode (with BOM) text

The BOM can clearly be seen using, for example, hexdump:

 $ hexdump testOutput.txt -C
00000000  ef bb bf ... (snip)

The BOM is encoded on three bytes in UTF-8 (ef bb fb) while it's encoded on two bytes in UTF-16. In UTF16-LE the BOM looks like this:

$ hexdump testInput.txt -C
00000000  ff fe ... (snip)

Note that UTF-8 encoded files may or may not (both are totally valid) have a "BOM" (byte order mask). A BOM in a UTF-8 file is not that silly: you don't care about the byte order but it can help quickly identify a text file as being UTF-8 encoded. UTF-8 files with a BOM are fully legit according to the Unicode specs and hence readers unable to deal with UTF-8 files starting with a BOM are broken. Plain and simple.

If for whatever reason you're working with broken UTF-8 readers unable to cope with BOMs, then you may want to remove the BOM from the first String before writing it to disk.

More infos on BOMs here:

http://unicode.org/faq/utf_bom.html

情域 2025-01-14 15:14:17

有一个名为 jexcelapi 的库,允许您打开/编辑/保存 .xls 文件。
一旦您阅读了 .xls 文件,编写将其输出为 .tsv 的内容就不难了。

There is a library called jexcelapi that allows you to open/edit/save .xls files.
Once you have read the .xls file it would not be hard to write something that would output it as .tsv.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文