Excel 电子表格中的字符编码（以及使用什么 Java 字符集对其进行解码）

发布于 2024-12-04 21:17:47 字数 479 浏览 5 评论 0原文

我正在使用 JExcel 库来读取 Excel 电子表格。电子表格上的每个单元格可能包含 44 种语言（英语、葡萄牙语、法语、中文等）中的任何一种的本地化字符串。今天，我没有告诉 API 任何有关其应该使用的编码的信息。它可以处理中文，但它总是搞砸葡萄牙语和德语。不知何故，默认编码（我的开发盒上为 MacRoman，生产上为 UTF-8）无法正确解释从 Excel 工作簿中提取的字符串。 JExcel 解释文件字符编码的方式肯定有问题。

话虽这么说...

Excel工作簿中的所有字符串都使用相同的字符集编码吗？

是否有工作簿元数据我可以询问这个字符集是什么（我还没有找到）？

如果我通过 jchardet (http://jchardet.sourceforge.net/) 之类的东西运行所有单元格，是否可能能够预测整个工作簿的字符编码（这很大程度上取决于第一个问题是“是的，给定工作簿中的所有字符串都使用相同的字符集进行编码”）？

问题太多，时间太少。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紫﹏色ふ单纯 2024-12-11 21:17:47

好吧，我没有直接得到答案，但 Matt 发现的规范为实际答案指明了方向： http ://sc.openoffice.org/excelfileformat.pdf

同时，只需将编码设置为始终“Cp1252”，我的问题就消失了。我不确定具体原因，但可以这么说，我并没有把礼物当作礼物，而是继续前进。

    WorkbookSettings workbookSettings = new WorkbookSettings();
    workbookSettings.setEncoding( "Cp1252" );
    Workbook.getWorkbook( theFile, workbookSettings );

我将称之为“已回答”。

Well I didn't get an answer directly, but Matt's discovery of a spec points the way towards an actual answer: http://sc.openoffice.org/excelfileformat.pdf

In the mean time, my problem went away by just setting the encoding to always be "Cp1252". I'm not sure exactly why, but I'm not looking a gift horse in the mouth, so to speak, and am moving on.

    WorkbookSettings workbookSettings = new WorkbookSettings();
    workbookSettings.setEncoding( "Cp1252" );
    Workbook.getWorkbook( theFile, workbookSettings );

I'll call this one answered.

回复收藏 0 原文