如何使用 POI 解析 Excel 文件中的 UTF-8 字符

发布于 2025-01-03 23:46:30 字数 875 浏览 1 评论 0原文

我一直在使用 POI 成功解析 XLS 和 XLSX 文件。但是，我无法从 Excel 电子表格中正确提取特殊字符，例如中文或日文等 UTF-8 编码字符。我已经弄清楚如何从 UTF-8 编码的 csv 或制表符分隔文件中提取数据，但对 Excel 文件却没有成功。有人可以帮忙吗？

（编辑： 评论中的代码片段）

HSSFSheet sheet = workbook.getSheet(worksheet); 
HSSFEvaluationWorkbook ewb = HSSFEvaluationWorkbook.create(workbook); 
while (rowCtr <= lastRow && !rowBreakOut) 
{ 
    Row row = sheet.getRow(rowCtr);//rows.next(); 
    for (int col=firstCell; col<lastCell && !breakOut; col++) { 
      Cell cell; 
      cell = row.getCell(col,Row.RETURN_BLANK_AS_NULL); 
      if (ctype == Cell.CELL_TYPE_STRING) { 
         sValue = cell.getStringCellValue(); 
         log.warn("String value = "+sValue); 
         String encoded = URLEncoder.encode(sValue, "UTF-8"); 
         log.warn("URL-encoded with UTF-8: " + encoded); 
         ....

原文

I have been using POI to parse XLS and XLSX files successfully. However, I am unable to correctly extract special characters, such as UTF-8 encoded characters like Chinese or Japanese, from an Excel spreadsheet. I have figured out how to extract data from a UTF-8 encoded csv or tab delimited file, but no luck with the Excel file. Can anyone help?

(Edit: Code snippet from comments)

HSSFSheet sheet = workbook.getSheet(worksheet); 
HSSFEvaluationWorkbook ewb = HSSFEvaluationWorkbook.create(workbook); 
while (rowCtr <= lastRow && !rowBreakOut) 
{ 
    Row row = sheet.getRow(rowCtr);//rows.next(); 
    for (int col=firstCell; col<lastCell && !breakOut; col++) { 
      Cell cell; 
      cell = row.getCell(col,Row.RETURN_BLANK_AS_NULL); 
      if (ctype == Cell.CELL_TYPE_STRING) { 
         sValue = cell.getStringCellValue(); 
         log.warn("String value = "+sValue); 
         String encoded = URLEncoder.encode(sValue, "UTF-8"); 
         log.warn("URL-encoded with UTF-8: " + encoded); 
         ....

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒儿 2025-01-10 23:46:30

从 Excel 文件中提取波斯语文本时，我遇到了同样的问题。我正在使用 Eclipse，只需转到 Project ->;属性并将“文本文件编码”更改为 UTF-8 解决了该问题。

回复收藏 0 原文

森林迷了鹿 2025-01-10 23:46:30

在 POI 中你可以像这样使用：

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

// Create a row and put some cells in it. Rows are 0 based.
Row row = sheet.createRow(1);

// Create a new font and alter it.
Font font = wb.createFont();
font.setCharSet(FontCharset.ARABIC.getValue());
font.setFontHeightInPoints((short)24);
font.setFontName("B Nazanin");
font.setItalic(true);
font.setStrikeout(true);

// Fonts are set into a style so create a new one to use.
CellStyle style = wb.createCellStyle();
style.setFont(font);

// Create a cell and put a value in it.
Cell cell = row.createCell(1);
cell.setCellValue("سلام");
cell.setCellStyle(style);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

并且可以在 FontCharset 中使用另一个字符集

in POI you can use like this:

Workbook wb = new HSSFWorkbook();
Sheet sheet = wb.createSheet("new sheet");

// Create a row and put some cells in it. Rows are 0 based.
Row row = sheet.createRow(1);

// Create a new font and alter it.
Font font = wb.createFont();
font.setCharSet(FontCharset.ARABIC.getValue());
font.setFontHeightInPoints((short)24);
font.setFontName("B Nazanin");
font.setItalic(true);
font.setStrikeout(true);

// Fonts are set into a style so create a new one to use.
CellStyle style = wb.createCellStyle();
style.setFont(font);

// Create a cell and put a value in it.
Cell cell = row.createCell(1);
cell.setCellValue("سلام");
cell.setCellStyle(style);

// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();

and can use another charset in FontCharset

回复收藏 0 原文

两仪 2025-01-10 23:46:30

解决方案很简单，读取任意编码的单元格字符串值（非英文字符）；只需使用以下方法：

sValue = cell.getRichStringCellValue().getString();

代替：

sValue = cell.getStringCellValue();

这适用于 UTF-8 编码的字符，如中文、阿拉伯文或日文。

PS 如果有人使用利用“Apache POI”库的命令行实用程序 nullpunkt/excel-to-json，请通过替换“getStringCellValue()”的出现来修改文件转换器/ExcelToJsonConverter.java避免将非英文字符读为“???”。

The solution is simple, to read cell string values of any encoding (non English characters); just use the following method:

sValue = cell.getRichStringCellValue().getString();

instead of:

sValue = cell.getStringCellValue();

This applies to UTF-8 encoded characters like Chinese, Arabic or Japanese.

P.S if anybody is using the Command line utility nullpunkt/excel-to-json which utilize the "Apache POI" library, modify the file converter/ExcelToJsonConverter.java by replacing the occurrences of "getStringCellValue()" to avoid reading non-english characters as "???".

回复收藏 0 原文

满天都是小星星 2025-01-10 23:46:30

使用UTF获取字节如下

cell.getStringCellValue().getBytes(Charset.forName("UTF-8"));

Get bytes using UTF as follows

cell.getStringCellValue().getBytes(Charset.forName("UTF-8"));

回复收藏 0 原文

~没有更多了~

关于作者

傾旎

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

如何使用 POI 解析 Excel 文件中的 UTF-8 字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

如何使用 POI 解析 Excel 文件中的 UTF-8 字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。