解析大型xlsx文件,日期格式案例
因为 xlsx 文件可能包含数百万行,所以我决定使用 poi 事件模型,以 here 为例(ExampleEventUserModel),但发现解析日期格式单元格时出现一些问题。
例如,我在单元格 A15 中的 Excel 日期为 01.10.2011,但 xml 具有:
<c r="A15" s="11"><v>40817</v></c>
它不是日期格式,甚至不是以毫秒为单位。
如何解析事件模型的日期?
此致。
Because xlsx file may contain million rows, I decided to use poi event model, took example from here(ExampleEventUserModel), but found some problem parsing Date format cells.
For example I have in excel date 01.10.2011 in cell A15, but xml has:
<c r="A15" s="11"><v>40817</v></c>
It is not in date format, even not in millis for example.
How to parse date for event model?
Best regards.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Excel 将日期存储为距固定起点的浮点数天数(以及天数的分数)。值
40817
是从该起点到 2011 年 10 月 1 日的天数。POI的
HSSFCell
有一个getDateCellValue()
方法,该方法返回 JavaDate
,并且会做你想做的事。Excel stores dates as a floating point number of days (and fractions of day) from a fixed starting point. The value
40817
is the number of days from that starting point to 1 October 2011.POI's
HSSFCell
has agetDateCellValue()
method that returns a JavaDate
, and will do what you want.关于如何确定单元格是否包含日期值的后续问题,答案是没有任何简单的方法。
单元格
元素具有类型属性t
,但它不用于此 Excel 序列日期类型。因此,区分包含编码为 40817 的日期的单元格和值为 40817 的单元格的唯一区别是应用于数字的格式(通过s
单元格属性引用)。为了确定该格式,您必须在关联的
styles.xml
中查找引用样式xf
的numFmtId
属性,并尝试确定该格式是否为日期格式。为了做到这一点,你必须应用一些启发式方法。总而言之,当您希望将日期与其他数字数据区分开来时,这是一个巨大的痛苦,对我来说,这是 SpreadsheetML 格式中的一个疏忽。
PS OOXML 标准确实为
t
类型属性定义了d
日期值,但这仅用于 ISO 8601 日期,而这些日期实际上并非由 Excel 生成。 (更新:此日期格式现在在 Excel 2013 中以严格模式使用)。In relation to your follow-up question about how to determine if the cell contains a date value or not the answer is that there isn't any easy way.
The cell,
<c>
, element has a type attributet
but it isn't used for this Excel serial date type. As such the only thing that distinguishes a cell that contains a date encoded as 40817 and a cell with the value 40817 is the format that is applied to the number (that is referenced via thes
cell attibute).In order to determine that format you would have to look up the
numFmtId
attribute of the referenced stylexf
in the associatedstyles.xml
and try to determine if that format is a date format. And in order to do that you would have to apply some heuristics.All in all this is a huge pain when you wish to differentiate dates from other numeric data and to me this is an oversight in the SpreadsheetML format.
P.S. The OOXML standard does define a
d
date value for thet
type attribute but that is only used for ISO 8601 dates which aren't actually produced by Excel. (Update: this date format is now used in Excel 2013 in strict mode).