使用java和UTF-16LE到UTF-8转换打开xls文件并将其保存为tsv文件
我有两个问题:
有没有一种方法可以通过Java打开xls文件并将其另存为tsv文件? 编辑: 或者有没有一种方法可以通过Java将xls文件转换为tsv文件?
有没有一种方法可以使用 java 将 UTF-16LE 文件转换为 UTF-8 ?
谢谢
I've two questions:
Is there a way through which we can open a xls file and save it as a tsv file through Java?
EDIT:
Or is there a way through which we can convert a xls file into an tsv file through Java?
Is there a way in which we can convert a UTF-16LE file to UTF-8 using java ?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 StackOverflow 上,您应该将其分为两个不同的问题...
我将回答您的第二个问题:
是的当然。而且方法不止一种。
基本上,您想要读取指定输入编码(UTF-16LE)的输入文件,然后写入指定输出编码(UTF-8)的文件。
假设你有一些 UTF-16LE 编码的文件:
然后你基本上可以在 Java 中执行类似的操作(这只是一个示例:你需要填写缺少的异常处理代码,也许不在末尾添加最后一个换行符,也许丢弃BOM(如果有)等):
这将创建一个 UTF-8 编码的文件。
使用例如 hexdump 可以清楚地看到 BOM:
BOM 在 UTF-8 中以三个字节编码 (ef bb fb),而在 UTF-16 中则以两个字节编码。在 UTF16-LE 中,BOM 如下所示:
请注意,UTF-8 编码的文件可能有也可能没有(两者都完全有效)有“BOM”(字节顺序掩码)。 UTF-8 文件中的 BOM 并不那么愚蠢:您不关心字节顺序,但它可以帮助快速识别文本文件是否是 UTF-8 编码的。根据 Unicode 规范,带有 BOM 的 UTF-8 文件是完全合法的,因此无法处理以 BOM 开头的 UTF-8 文件的读者会被破坏。简单明了。
如果由于某种原因您正在使用无法处理 BOM 的损坏的 UTF-8 阅读器,那么您可能需要在将第一个字符串写入磁盘之前将其删除。
有关 BOM 的更多信息,请访问:
http://unicode.org/faq/utf_bom.html
On StackOverflow you should split that into two different questions...
I'll answer your second question:
Yes of course. And there's more than one way.
Basically you want to read your input file specifying the input encoding (UTF-16LE) and then write the file specifying the output encoding (UTF-8).
Say you have some UTF-16LE encoded file:
You then basically could do something like this in Java (it's just an example: you'll want to fill in missing exception handling code, maybe not put a last newline at the end, maybe discard the BOM if any, etc.):
This shall create a UTF-8 encoded file.
The BOM can clearly be seen using, for example, hexdump:
The BOM is encoded on three bytes in UTF-8 (ef bb fb) while it's encoded on two bytes in UTF-16. In UTF16-LE the BOM looks like this:
Note that UTF-8 encoded files may or may not (both are totally valid) have a "BOM" (byte order mask). A BOM in a UTF-8 file is not that silly: you don't care about the byte order but it can help quickly identify a text file as being UTF-8 encoded. UTF-8 files with a BOM are fully legit according to the Unicode specs and hence readers unable to deal with UTF-8 files starting with a BOM are broken. Plain and simple.
If for whatever reason you're working with broken UTF-8 readers unable to cope with BOMs, then you may want to remove the BOM from the first String before writing it to disk.
More infos on BOMs here:
http://unicode.org/faq/utf_bom.html
有一个名为 jexcelapi 的库,允许您打开/编辑/保存 .xls 文件。
一旦您阅读了 .xls 文件,编写将其输出为 .tsv 的内容就不难了。
There is a library called jexcelapi that allows you to open/edit/save .xls files.
Once you have read the .xls file it would not be hard to write something that would output it as .tsv.