我正在尝试使用 SharpZipLib 从 .xlsx 文件中读取数据以将其解压(在内存中)并读取内部 xml 文件。一切都很好,但要识别日期 - 它们以 julean 格式存储,我需要以某种方式识别数字是日期还是只是数字。在另一个主题中(不幸的是它死了,我需要快速回答)我从马克·贝克那里了解了一些事情,但这仍然不够......
“Excel 将日期存储为浮点值...整数部分是自 1900 年 1 月 1 日(或 1904 年 1 月 1 日,具体取决于所使用的日历)以来的天数,小数部分是比例一天的时间(即时间部分)...由于 1900 年被认为是闰年,这一事实变得更加尴尬。
唯一区分数据和数字的是数字格式掩码。如果您可以读取格式掩码,则可以使用它将该值识别为日期而不是数字...然后根据基准日期计算日期值/格式。”
“但是日期属性“s”的值不是始终为“1”吗?我知道它定义了样式,但也许吧?;)”
s 属性引用 styles.xml 中的样式 xf 条目,并且它并不总是日期的条目 1...这完全取决于工作簿中使用了多少种不同的样式。样式 xf 又引用数字格式掩码。要识别包含日期的单元格,您需要执行样式 xf ->数字格式查找,然后确定该数字格式掩码是否是日期/时间数字格式掩码(而不是百分比或会计数字格式掩码)
“还有一个问题 - 我现在正在查看 style.xml 的内容,在该部分中我看到如下元素:“”等,但没有 部分...是否有任何“标准”格式?或者我只是错过了什么?”
有人可以帮我吗 ? ?提前致谢。
I'm trying to read data from the .xlsx files using SharpZipLib to unpack it (in memory) and reading the inner xml files. Everything is fine but recognizing the dates - they're stored in julean format and I need to somehow recognize if a number is a date or only a number. In another topic (unfortunately it died and I need quick answer) I got to know some things from Mark Baker, but it's still not enough...
"Excel stores dates as a float value... the integer part being the number of days since 1/1/1900 (or 1/1/1904 depending on which calendar is being used), the fractional part being the proportion of a day (ie the time part)... made slightly more awkward by the fact that 1900 is considered a leap year.
The only thing that differentiates a data from a number is the number format mask. If you can read the format mask, you can use that to identify the value as a date rather than a number... then calculate the date value/formatting from the base date."
"But doesn't the attribute "s" for dates has always the value of "1"? I know it defines style, but maybe? ;)"
The s attribute references a style xf entry in styles.xml, and it won't always be entry 1 for dates... it all depends how many different styles are being used in the workbook. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask)
"One more question - I'm now looking at the style.xml's content and in the section I see elements like: "<xf numFmtId="14" ... applyNumberFormat="1" />", "<xf numFmtId="1" ... applyNumberFormat="1" />", etc. but there is no <numFmts> section... Are there any "standard" formats? Or am I just missing something?"
Can someone please help me out? Thanks in advance.
发布评论
评论(4)
您应该在 style.xml 顶部附近找到 numFmts 部分,作为 styleSheet 元素的一部分
EDIT
我一直在仔细检查我的 xlsx 阅读器代码(自从我深入研究以来已经很长一段时间了)图书馆的那一部分);并且有内置格式。小于 164 的数字格式代码 (numFmtId) 是“内置”的。
我的清单不完整:
You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element
EDIT
I've been double-checking my xlsx reader code (it's been a long while since I delved into that part of the library); and there are built-in formats. Number format codes (numFmtId) less than 164 are "built-in".
The list that I have is incomplete:
单元格可能有样式。这些是在 styleSheet 中索引 cellXfs 的 uint。每个 cellXfs 项都包含一组属性。最重要的是 NumberFormatID。如果其值在 14-22 范围内,则它是“标准”日期。如果它落在 165 - 180 范围内,则它是“格式化”日期,并且将具有相应的 NumberingFormat 属性。
标准日期
[x:cr="A2" s="2"][x:v]38046[/x:v][/x:c]
[x:xf numFmtId="14" fontId="0" fillId= "0" borderId="0" xfId="0" applyNumberFormat="1" /] (序数位置 2)
格式化日期
[x:cr="A4" s="4"][x:v]38048[/x :v][/x:c]
[x:xf numFmtId="166" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /](序号位置 4)
[x:numFmt numFmtId="166" formatCode="m/d;@" /]
此代码提取与这些日期格式相对应的样式 ID 列表。
Cells may have styles. These are uints that index cellXfs in the styleSheet. Each cellXfs item contains a set of attributes. The most important is NumberFormatID. If its value falls in the range 14-22 it is a "standard" date. If it falls in the range 165 - 180, it is a "formatted" date and will have a corresponding NumberingFormat attribute.
Standard Date
[x:c r="A2" s="2"][x:v]38046[/x:v][/x:c]
[x:xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /] (ordinal position 2)
Formatted Date
[x:c r="A4" s="4"][x:v]38048[/x:v][/x:c]
[x:xf numFmtId="166" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /](ordinal position 4)
[x:numFmt numFmtId="166" formatCode="m/d;@" /]
This code extracts a list of style IDs that correspond to these date formats.
我建议 numFmtId="14" 应被视为“Windows 短日期格式”,因为在澳大利亚,此格式将日期显示为“dd/mm/yy”,而不是“mm/dd/yy”。
I would suggest that numFmtId="14" should be considered to be "Windows Short Date format" as in Australia this format will display a date as, "dd/mm/yy", and not "mm/dd/yy".
有两种方法可以获取单元格的日期格式。
首先获取“s”或 StyleIndex。请注意下面的数字原始格式的日期 (40667):
cells 节点中的“s”属性指向从 0 开始的 styles.xml 节点的从零开始的数组。这是定位日期格式(如果有)的关键,映射到原始数字日期数据。您会看到 s=1,它指向 Excel 工作簿的以下单元格格式 styles.xml 部分中的第二个 xf 节点:
在第二个节点中,您会看到 numFmtId="14" 值。这就是 numberFormatID。它告诉您,这是确定日期数字应以什么形式显示所需的 id。但是该数字指向日期格式的两个可能位置。如果其数字在 14-22 范围内,则它是日期的内置样式。如果超出该范围,则(可能)是 Excel 文件所有者添加的自定义日期格式。在检查这两个地方之前你不会知道。
在第一种情况下,如果其值是 14-22,则需要将其映射到每个 Excel 文件具有的预构建日期格式之一(mm-dd-yy 等)。您可以在 OpenXML SDK 中找到该表。以下是 numFmtId 映射到内置日期格式的示例...
此时您知道它是一个日期以及它以什么格式呈现。如果它不是这些值之一,则它可能是自定义的数字。现在,您必须再次搜索 styles.xml 文件以查找具有匹配 numFmtId 值的样式节点。这些节点将包含自定义日期格式,如下所示:
请注意,如果您的 numFmtId 为 164,则您会找到其自定义日期格式。因此,要捕获所有这些疯狂的日期格式(自定义和内置),最好的选择是维护一系列可接受的“格式”作为字符串,找到您的 formatCode,然后查看它是否与代码中可接受的格式之一匹配。
祝你好运!
There are two ways to get the date format for a cell.
You start by grabbing the "s" or StyleIndex. Note the date in numeric raw format below (40667):
The "s" attribute in the cells nodes points to a zero-based array of styles.xml nodes starting at 0. This is the key to locating the date format, if any, that maps to the raw numeric date data. You see s=1, that points to the 2nd xf node in the following cell formatting styles.xml section of your Excel workbook:
In the second node you see the numFmtId="14" value. That is the numberFormatID. It tells you that that is the id needed to determine what your date number should be presented in. But that number points to two possible places for the date format. If its number is in the range 14-22 its a built in style for date. If its outside that range its (possibly) a custom date format added by the excel file owner. You wont know until you check both places.
In the first case, if its a value 14-22, you will need to map it to one of the pre-built date formats every excel file has (mm-dd-yy, etc.). You can locate that table in the OpenXML SDK. Here is a sample of those with the numFmtId mapped to the built-in date formats....
At this point you know its a date and what format its to be presented in. If its not one of those values, its likely a custom number. And you now have to search the styles.xml file again for a style node with a matching numFmtId value. Those nodes will contain the custom date format as follows:
Note that if your numFmtId was 164, you found its custom date format. So to catch all these crazy date formats, custom and built in, your best bet is to maintain a range of acceptable "formats" as strings, locate your formatCode, then see if it matches one of the acceptable ones in your code.
Good Luck!