确定 xlsx 单元格是否为 Excel 2007 电子表格的日期格式
我正在使用一些代码通过解析 xml 从 xlsx 文件中读取数据。除了日期单元格之外,一切都非常简单。
日期存储为整数,并具有“s”属性,该属性是样式表的索引,可用于获取日期格式字符串。以下是之前 stackoverflow 问题中的一些示例,链接如下:
19 = '时:分:秒 上午/下午';
20 = '高:毫米';
21 = '时:分:秒';
22 = 'm/d/yy h:mm';
这些是来自 ooxml 标准的内置日期格式字符串,但是 excel 似乎倾向于使用自定义格式字符串而不是内置字符串。以下是 Excel 2007 电子表格中的示例格式。 numFmtId 大于 164 是自定义格式。
;
确定单元格是否应格式化为日期很困难,因为我能找到的唯一指示符是 formatCode。这显然是一个日期,但单元格的格式可以有多种。我最初的尝试是在 formatCode 中查找 Ms、Ds 和 Ys,但这似乎有问题。
有人有幸解决这个问题吗?看来标准的 Excel 阅读库目前缺乏 xlsx 支持。我已经通读了标准并挖掘了很多 xlsx 文件,但运气不佳。
最好的信息似乎来自这个 stackoverflow 问题:
什么表明 Office 打开的 xml 单元格包含日期时间值
谢谢!
I'm working with some code that reads data from xlsx files by parsing the xml. It is all pretty straightforward, with the exception of date cell.
Dates are stored as integers and have an "s" attribute that is an index into the stylesheet, which can be used to get a date formatting string. Here are some examples from a previous stackoverflow question that is linked below:
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';
These are the built in date formatting strings from the ooxml standard, however it seems like excel tends to use custom formatted strings instead of the builtins. Here is an example format from an Excel 2007 spreadsheet. numFmtId greater than 164 is a custom format.
<numFmt formatCode="MM/DD/YY" numFmtId="165"/>
Determining if a cell should be formatted as a date is difficult because the only indicator I can find is the formatCode. This one is obviously a date, but cells could be formatted any number of ways. My initial attempt is to look for Ms, Ds, and Ys in the formatCode, but that seems problematic.
Has anybody had any luck with this problem? It seems like the standard excel reading libraries are lacking in xlsx support at this time. I've read through the standards and have dug through a lot of xlsx files without much luck.
The best information seems to come from this stackoverflow question:
what indicates an office open xml cell contains a date time value
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 Excel 数据模型中,实际上不存在整数这样的东西。一切都是漂浮物。日期和日期时间是浮点数,表示自变量纪元以来的天数和分数。时间是一天的一小部分。
谷歌(“xlsxrd”)。要了解最新动态,请加入python-excel 群组。
修改我发现您已经询问了那里有问题。如果您在那里提出了像这个问题一样具体的问题,或者回复了我的澄清请求,您将在两周前获得此信息。
查看 xlrd 文档。前面讨论了 Excel 日期。所有这些都适用于 Excel 2007 以及早期版本。特别是:需要解析自定义格式。有必要有一个用于日期格式的“标准”格式索引表。某些地方列出的“标准”格式不包括 CJK 语言环境中使用的格式。
可供您选择的选项:
(1) 借用
xlrd
源代码,包括 xldate_as_tuple 函数。(2) 选项 (1) + 获取
xlsxrd
螺栓固定套件并借用其源代码。(3) [推荐] 获取
xlsxrd
螺栓固定套件并使用它...您将获得一组可跨 Excel 版本 2.0 至 2007 和 Python 版本 2.1 至 2.7 运行的 API。In the Excel data model, there is really no such thing as an integer. Everything is a float. Dates and datetimes are floats, representing days and a fraction since a variable epoch. Times are fractions of a day.
google("xlsxrd"). To keep up to date, join the python-excel group.
Edit I see that you have already asked a question there. If you had asked a question there as specific as this one, or responded to my request for clarification, you would have this info over two weeks ago.
Have a look at the xlrd documentation. Up the front there is a discussion on Excel dates. All of it applies to Excel 2007 as well as earlier versions. In particular: it is necessary to parse custom formats. It is necessary to have a table of "standard" format indexes which are for date formats. "Standard" formats listed in some places don't include the formats used in CJK locales.
Options for you:
(1) Borrow from the
xlrd
source code, including the xldate_as_tuple function.(2) Option (1) + Get the
xlsxrd
bolt-on kit and borrow from its source code.(3) [Recommended] Get the
xlsxrd
bolt-on kit and use it ... you get a set of APIs that operate across Excel versions 2.0 to 2007 and Python versions 2.1 to 2.7.仅仅在数字格式代码中查找 Ms、Ds 和 Ys 是不够的
,代码是完全有效的数字格式,其中包含 Y 和 D,但不是日期格式。我专门测试了方括号 ('[' '] 之外的任何标准日期/时间格式字符 ('y'、'm'、'd'、'H'、'i'、's') ')。
即便如此,我还是发现了一些误报,主要与会计和货币格式有关。因为这些值通常以下划线(“_”)或空格后跟零(“0”)开头(我在日期格式中从未遇到过这两种情况,所以我明确地过滤掉了这些值。
我的一部分用于确定格式掩码是否为日期的(PHP)代码:
我确信可能仍然存在我遗漏的例外情况,但是(如果是这样)它们可能是极端情况
It isn't enough simply to look for Ms, Ds, and Ys in the number format code
is a perfectly valid number format, which contains both Y and D, but isn't a date format. I specifically test for any of the standard date/time formatting characters ('y', 'm', 'd', 'H', 'i', 's') that are outside of square braces ('[' ']').
Even then, I was finding that a few false positives were slipping through, mainly associated with accounting and currency formats. Because these typically begin with either an underscore ('_') or a space followed by a zero (' 0') (neither of which I've ever encountered in a date format, I explicitly filter these values out.
A part of my (PHP) code for determining if a format mask is a date or not:
I'm sure that there may still be exceptions that I'm missing, but (if so) they are probably extreme cases