使用 HSSF 从 Excel 读取字符串值,但它是双精度的
我正在使用 HSSF-POI 来读取 Excel 数据。问题是我的单元格中的值看起来像数字,但实际上是字符串。如果我查看 Excel 中的格式单元格,它会显示类型为“文本”。 HSSF Cell 仍然认为它是数字。如何获取字符串形式的值?
如果我尝试使用cell.getRichStringValue,我会得到异常; if cell.toString
,它与 Excel 工作表中的值不完全相同。
编辑:在这个问题得到解决之前,我将使用
new BigDecimal(cell.getNumericCellValue()).toString()
I'm using HSSF-POI for reading excel data. The problem is I have values in a cell that look like a number but really are strings. If I look at the format cell in Excel, it says the type is "text". Still the HSSF Cell thinks it's numeric. How can I get the value as a string?
If I try to use cell.getRichStringValue
, I get exception; if cell.toString
, it's not the exact same value as in Excel sheet.
Edit: until this gets resolved, I'll use
new BigDecimal(cell.getNumericCellValue()).toString()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
您在 POI 中寻找的类是 DataFormatter
当 Excel 写入文件时,某些单元格存储为文字字符串,而其他单元格存储为数字。对于后者,表示单元格的浮点值存储在文件中,因此当您向 POI 询问单元格的值时,这就是它实际拥有的值。
但有时,尤其是在进行文本提取时(但并非总是如此),您希望使单元格值看起来像在 Excel 中一样。并不总是能够在 String 中准确地获得这一点(例如非完整空间填充),但 DataFormatter 类将使您接近。
如果您想要的是单元格的字符串,看起来就像在 Excel 中一样,只需执行以下操作:
格式化程序将按原样返回字符串单元格,而对于数字单元格,会将样式上的格式设置规则应用到数量细胞
The class you're looking for in POI is DataFormatter
When Excel writes the file, some cells are stored as literal Strings, while others are stored as numbers. For the latter, a floating point value representing the cell is stored in the file, so when you ask POI for the value of the cell that's what it actually has.
Sometimes though, especially when doing Text Extraction (but not always), you want to make the cell value look like it does in Excel. It isn't always possible to get that exactly in a String (non full space padding for example), but the DataFormatter class will get you close.
If you're after a String of the cell, looking much as you had it looking in Excel, just do:
The formatter will return String cells as-is, and for Numeric cells will apply the formatting rules on the style to the number of the cell
如果您正在解析的文档始终采用特定布局,您可以即时将单元格类型更改为“字符串”,然后检索值。例如,如果第 2 列应始终为字符串数据,请将其单元格类型设置为字符串,然后使用字符串类型的 get 方法读取它。
在我的测试中,更改单元格类型不会修改单元格的内容,但确实允许使用以下任一方法检索它:如果
没有未正确转换的值的示例,则很难知道这是否有效其行为与您在描述中描述的 cell.toString() 方法有任何不同。
If the documents you are parsing are always in a specific layout, you can change the cell type to "string" on the fly and then retrieve the value. For example, if column 2 should always be string data, set its cell type to string and then read it with the string-type get methods.
In my testing, changing the cell type did not modify the contents of the cell, but did allow it to be retrieved with either of the following approaches:
Without an example of a value that is not converting properly, it is difficult to know if this will behave any differently than the cell.toString() approach you described in the description.
您的意思是 HSSF-POI 说
cell.getCellType() == Cell.CELL_TYPE_NUMERIC
不是
Cell.CELL_TYPE_STRING
,因为它应该是这样?我认为这是 POI 中的一个错误,但每个单元格都包含一个 Variant,并且 Variant 有一个类型。在那里犯错误有点困难,所以我认为 Excel 使用一些额外的数据或启发式方法将该字段报告为文本。通常的 MS 方式,唉。
PS 您不能在包含数字的 Variant 上使用任何
getString()
,因为 Variant 数据的二进制表示形式取决于它的类型,并且尝试从实际数字中获取字符串会导致垃圾——因此例外。You mean HSSF-POI says
cell.getCellType() == Cell.CELL_TYPE_NUMERIC
NOT
Cell.CELL_TYPE_STRING
as it should be?I would think it's a bug in POI, but every cell contains a Variant, and Variant has a type. It's kind of hard to make a bug there, so instead I think Excel uses some extra data or heuristic to report the field as text. Usual MS way, alas.
P.S. You cannot use any
getString()
on a Variant containing numeric, as the binary representation of the Variant data depends on it's type, and trying to get a string from what is actually a number would result in garbage -- hence the exception.下面的代码可以很好地读取任何单元格类型,但该单元格应该包含数值,
例如
其中变量 gss 是 BigDecimal 类型。
This below code works fine to read any celltype but that cell should contain numeric value
e.g.
where variable gss is of BigDecimal type.
Excel 会将任何看起来像数字、日期或时间的字符串转换为字符串。请参阅 MS 知识库文章,该文章基本上建议输入带有额外字符的数字一个字符串。
Excel will convert anything that looks like a number or date or time from a string. See MS Knowledge base article, which basically suggests to enter the number with an extra character that makes it a string.
您可能正在处理 Excel 问题。创建电子表格时,默认单元格类型为通用。使用此类型,Excel 根据输入猜测类型,并且此类型与每个单元格一起保存。
当您稍后将单元格格式更改为文本时,您只是更改默认值。 Excel 不会自动更改每个单元格的类型。我还没有找到自动执行此操作的方法。
要确认这一点,您可以转到 Excel 并重新输入其中一个数字,然后查看它是否是 HSSF 中的文本。
您还可以使用此函数查看真实的单元格类型,
A1 是数字的单元格。它显示“l”代表文本,“v”代表数字。
You are probably dealing with an Excel problem. When you create the spreadsheet, the default cell type is Generic. With this type, Excel guesses the type based on the input and this type is saved with each cell.
When you later change the cell format to Text, you are just changing the default. Excel doesn't change every cell's type automatically. I haven't found a way to do this automatically.
To confirm this, you can go to Excel and retype one of the numbers and see if it's text in HSSF.
You can also look at the real cell type by using this function,
A1 is the cell for the number. It shows "l" for text, "v" for numbers.
Excel 的问题在于默认格式是通用的。使用此格式,Excel 将单元格中输入的数字存储为数字。在输入值之前,您必须将格式更改为文本。更改格式后重新输入值也可以。
如果内容在 Excel 中看起来像数字,则会在单元格的左上角出现绿色小三角形。如果是这种情况,则该值实际上存储为文本。
使用 new BigDecimal(cell.getNumericCellValue()).toString() 你仍然会遇到很多问题。例如,如果您有标识号(例如零件号或分类号),则您可能会遇到带有前导零的情况,这将是 getNumericCellValue() 方法的问题。
我尝试向创建我必须使用 POI 处理的文件的一方彻底解释如何正确创建 Excel。如果文件是由最终用户上传的,如果我提前知道列,我什至创建了一个验证程序来检查预期的细胞类型。作为副产品,您还可以检查所提供文件的各种其他内容(例如,是否提供了正确的列或强制值)。
The problem with Excel is that the default format is generic. With this format Excel stores numbers entered in the cell as numeric. You have to change the format to text before entering the values. Reentering the values after changing the format will also work.
That will lead to little green triangles in the left upper corner of the cells if the content looks like a number to Excel. If this is the case the value is really stored as text.
With new BigDecimal(cell.getNumericCellValue()).toString() you will still have a lot of problems. For example if you have identifying numbers (e.g. part numbers or classification numbers) you probably have cases that have leading zeros which will be a problem with the getNumericCellValue() approach.
I try to thoroughly explain how to correctly create the Excel to the party creating the files I have to handle with POI. If the files are uploaded by end users I even have created a validation program to check for expected cell types if I know the columns in advance. As a by-product you can also check various other things of the supplied files (e.g. are the right columns provided or mandatory values).
“问题是我的单元格中的值看起来像数字” =>在 Excel 中查看时看起来像数字吗?
“但确实是字符串” =>这意味着什么?你怎么知道它们确实是字符串?
“如果我查看格式单元格”=>什么是“格式单元格”???
'...在 Excel 中,它表示类型为“文本”' =>请解释一下。
“HSSF Cell 仍然认为它是数字。” =>您的意思是 the_cell.getCellType() 返回 Cell.CELL_TYPE_NUMERIC 吗?
“如何获取字符串形式的值?” => 如果是 NUMERIC,请使用 the_cell.getNumericCellValue() 获取数值,然后将其格式化为字符串。
“如果我尝试使用 cell.getRichStringValue,则会出现异常;” =>所以它不是一个字符串。
“如果是 cell.toString,它与 Excel 工作表中的值不完全相同。” =>因此 cell.toString() 的格式与 Excel 的格式不同。
无论 Excel 使用什么启发式方法来确定类型,都与您无关。重要的是存储在文件中并由 getCellType() 显示的决定的结果。
"The problem is I have values in a cell that look like a number" => look like number when viewed in Excel?
"but really are strings" => what does that mean? How do you KNOW that they really are strings?
"If I look at the format cell" => what's "the format cell"???
'... in Excel, it says the type is "text"' => Please explain.
"Still the HSSF Cell thinks it's numeric." => do you mean that the_cell.getCellType() returns Cell.CELL_TYPE_NUMERIC?
"How can I get the value as a string?" => if it's NUMERIC, get the numeric value using the_cell.getNumericCellValue(), then format it as a string any way you want to.
"If I try to use cell.getRichStringValue, I get exception;" => so it's not a string.
"if cell.toString, it's not the exact same value as in Excel sheet." => so cell.toString() doesn't format it the way that Excel formats it.
Whatever heuristic Excel uses to determine type is irrelevant to you. It's the RESULT of that decision as stored in the file and revealed by getCellType() that matters.