什么表明 Office Open XML 单元格包含日期/时间值?

发布于 10-13 00:46 字数 1108 浏览 6 评论 0原文

我正在使用 Office Open XML SDK 并且对读取日期/时间值感到困惑。我的一个电子表格有此标记(由 Excel 2010 生成),

<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A2" t="s">
    <x:v>56</x:v>
  </x:c>
  <x:c r="B2" t="s">
    <x:v>64</x:v>
  </x:c>
  .
  .
  .
  <x:c r="J2" s="9">
    <x:v>17145</x:v>
  </x:c>

单元格 J2 中有一个日期序列值和一个样式属性 s="9"。然而,Office Open XML 规范表示 9 对应于后面的超链接。这是ECMA-376,第二版,第 1 部分 - 基础知识和标记语言参考.pdf 第 4,999 页的屏幕截图。

alt text

规范中包含的presetCellStyles.xml 文件也引用了builtinId 9,如下所示超级链接。

<followedHyperlink builtinId="9">

规范中的所有样式都只是视觉格式样式,而不是数字样式。数字样式在哪里定义?如何区分样式引用 s="9" 和指示单元格格式(视觉)样式与数字样式?

显然,我在错误的位置将单元格上的样式与其数字格式相匹配。在哪里可以找到此信息?

I'm reading an .xlsx file using the Office Open XML SDK and am confused about reading Date/Time values. One of my spreadsheets has this markup (generated by Excel 2010)

<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A2" t="s">
    <x:v>56</x:v>
  </x:c>
  <x:c r="B2" t="s">
    <x:v>64</x:v>
  </x:c>
  .
  .
  .
  <x:c r="J2" s="9">
    <x:v>17145</x:v>
  </x:c>

Cell J2 has a date serial value in it and a style attribute s="9". However, the Office Open XML Specification says that 9 corresponds to a followed hyperlink. This is a screen shot from page 4,999 of ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference.pdf.

alt text

The presetCellStyles.xml file included with the spec also refers to builtinId 9 as a followed hyperlink.

<followedHyperlink builtinId="9">

All of the styles in the spec are simply visual formatting styles, not number styles. Where are the number styles defined and how does one differentiate a style reference s="9" from indicating a cell formatting (visual) style vs a number style?

Obviously I'm looking in the wrong place to match styles on cells with their number formats. Where's the right place to find this information?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

江湖正好2024-10-20 00:46:41

s 属性引用 styles.xml 中的 style xf 条目。样式 xf 又引用数字格式掩码。要识别包含日期的单元格,您需要执行样式 xf ->数字格式查找,然后确定该数字格式掩码是否是日期/时间数字格式掩码(而不是百分比或会计数字格式掩码)。

style.xml 文件具有如下元素:

<xf numFmtId="14" ... applyNumberFormat="1" />
<xf numFmtId="1" ... applyNumberFormat="1" />

这些是 xf 条目,它们反过来为您提供引用数字格式掩码的 numFmtId。

您应该在 style.xml 顶部附近找到 numFmts 部分,作为 styleSheet 元素的一部分。

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
    <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <numFmts count="3">
            <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" /> 
            <numFmt numFmtId="165" formatCode="0.000" /> 
            <numFmt numFmtId="166" formatCode="#,##0.000" /> 
        </numFmts>

数字格式 id 可能位于此处,也可能是内置格式之一。小于 164 的数字格式代码 (numFmtId) 是“内置”的。

我的列表不完整:

0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';

9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';

37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';

44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '@';

27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';

59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';

缺失值主要与东亚变体格式有关。

The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).

The style.xml file has elements like:

<xf numFmtId="14" ... applyNumberFormat="1" />
<xf numFmtId="1" ... applyNumberFormat="1" />

These are the xf entries, which in turn give you a numFmtId that references the number format mask.

You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
    <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <numFmts count="3">
            <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" /> 
            <numFmt numFmtId="165" formatCode="0.000" /> 
            <numFmt numFmtId="166" formatCode="#,##0.000" /> 
        </numFmts>

The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".

The list that I have is incomplete:

0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';

9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';

37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';

44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '@';

27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';

59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';

The missing values are mainly related to east asian variant formats.

画▽骨i2024-10-20 00:46:41

我想添加我整理的解决方案,以确定双值 FromOADate 是否确实是一个日期。原因是我的 Excel 文件中也有一个邮政编码。如果是文本,numberingFormat 将为 null。

或者,您可以使用 numberingFormatId 并检查 Excel 用于日期的 Id 列表。

就我而言,我已经明确确定了客户端所有字段的格式。

    /// <summary>
    /// Creates the datatable and parses the file into a datatable
    /// </summary>
    /// <param name="fileName">the file upload's filename</param>
    private void ReadAsDataTable(string fileName)
    {
        try
        {
            DataTable dt = new DataTable();
            using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false))
            {
                WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
                IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                string relationshipId = sheets.First().Id.Value;
                WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
                Worksheet workSheet = worksheetPart.Worksheet;
                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
                var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;

                // columns omitted for brevity

                // skip first row as this row is column header names
                foreach (Row row in rows.Skip(1))
                {
                    DataRow dataRow = dt.NewRow();

                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        bool isDate = false;
                        var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;
                        var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);

                        if (cellFormat.NumberFormatId != null)
                        {
                            var numberFormatId = cellFormat.NumberFormatId.Value;
                            var numberingFormat = numberingFormats.Cast<NumberingFormat>()
                                .SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);

                            // Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00)
                            if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy"))
                            {
                                string formatString = numberingFormat.FormatCode.Value;
                                isDate = true;
                            }
                        }

                        // replace '-' with empty string
                        string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate);
                        dataRow[i] = value.Equals("-") ? string.Empty : value;
                    }

                    dt.Rows.Add(dataRow);
                }
            }

            this.InsertMembers(dt);
            dt.Clear();
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }
    }

    /// <summary>
    /// Reads the cell's value
    /// </summary>
    /// <param name="document">current document</param>
    /// <param name="cell">the cell to read</param>
    /// <returns>cell's value</returns>
    private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate)
    {
        string value = string.Empty;

        try
        {
            SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
            value = cell.CellValue.InnerXml;

            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
            {
                return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
            }
            else
            {
                // check if this is a date or zip.
                // integers will be passed into this else statement as well. 
                if (isDate)
                {
                    value = DateTime.FromOADate(double.Parse(value)).ToString();
                }

                return value;
            }
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }

        return value;
    }

Thought I'd add my solution that I've put together to determine if the double value FromOADate is really a date or not. Reason being is I have a zip code in my excel file as well. The numberingFormat will be null if it's text.

Alternatively you could use the numberingFormatId and check against a list of Ids that Excel uses for dates.

In my case I've explicitly determined the formatting of all fields for the client.

    /// <summary>
    /// Creates the datatable and parses the file into a datatable
    /// </summary>
    /// <param name="fileName">the file upload's filename</param>
    private void ReadAsDataTable(string fileName)
    {
        try
        {
            DataTable dt = new DataTable();
            using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false))
            {
                WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
                IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                string relationshipId = sheets.First().Id.Value;
                WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
                Worksheet workSheet = worksheetPart.Worksheet;
                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
                var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;

                // columns omitted for brevity

                // skip first row as this row is column header names
                foreach (Row row in rows.Skip(1))
                {
                    DataRow dataRow = dt.NewRow();

                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        bool isDate = false;
                        var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;
                        var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);

                        if (cellFormat.NumberFormatId != null)
                        {
                            var numberFormatId = cellFormat.NumberFormatId.Value;
                            var numberingFormat = numberingFormats.Cast<NumberingFormat>()
                                .SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);

                            // Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00)
                            if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy"))
                            {
                                string formatString = numberingFormat.FormatCode.Value;
                                isDate = true;
                            }
                        }

                        // replace '-' with empty string
                        string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate);
                        dataRow[i] = value.Equals("-") ? string.Empty : value;
                    }

                    dt.Rows.Add(dataRow);
                }
            }

            this.InsertMembers(dt);
            dt.Clear();
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }
    }

    /// <summary>
    /// Reads the cell's value
    /// </summary>
    /// <param name="document">current document</param>
    /// <param name="cell">the cell to read</param>
    /// <returns>cell's value</returns>
    private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate)
    {
        string value = string.Empty;

        try
        {
            SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
            value = cell.CellValue.InnerXml;

            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
            {
                return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
            }
            else
            {
                // check if this is a date or zip.
                // integers will be passed into this else statement as well. 
                if (isDate)
                {
                    value = DateTime.FromOADate(double.Parse(value)).ToString();
                }

                return value;
            }
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }

        return value;
    }
眼波传意2024-10-20 00:46:41

所选择的答案是正确的,但请注意,Excel 定义了一些与 OpenXML 规范不同的数字格式 (numFmt) 代码。根据 Open XML SDK 2.5 生产力工具文档(位于NumberingFormat 类的“实施者注释”选项卡):

标准定义了内置格式ID 14:“mm-dd-yy”; 22:“月/日/年时:毫米”; 37: "#,##0;(#,##0)"; 38: "#,##0;[红色]"; 39: "#,##0.00;(#,##0.00)"; 40: "#,##0.00;[红色]"; 47:“mmss.0”;韩国 fmt 55:“yyyy-mm-dd”。

Excel定义内置格式ID
14:“年/月/日”
22:“年月日时:分”
37: "#,##0_);(#,##0)"
38: "#,##0_);[红色]"
39: "#,##0.00_);(#,##0.00)"
40: "#,##0.00_);[红色]"
47:“毫米:ss.0”
55:“年/月/日”

大多数都是细微的变化,但#14 是一个很棒的变化。我浪费了几个小时来解决为什么不将前导零添加到个位数的月份和日期(例如 01/05/14 与 1/5/14)。

The chosen answer is spot-on, but note that Excel defines some number format (numFmt) codes differently from the OpenXML spec. Per the Open XML SDK 2.5 Productivity Tool's documentation (on the "Implementer Notes" tab for the NumberingFormat class):

The standard defines built-in format ID 14: "mm-dd-yy"; 22: "m/d/yy h:mm"; 37: "#,##0 ;(#,##0)"; 38: "#,##0 ;[Red]"; 39: "#,##0.00;(#,##0.00)"; 40: "#,##0.00;[Red]"; 47: "mmss.0"; KOR fmt 55: "yyyy-mm-dd".

Excel defines built-in format ID
14: "m/d/yyyy"
22: "m/d/yyyy h:mm"
37: "#,##0_);(#,##0)"
38: "#,##0_);[Red]"
39: "#,##0.00_);(#,##0.00)"
40: "#,##0.00_);[Red]"
47: "mm:ss.0"
55: "yyyy/mm/dd"

Most are minor variations, but #14 is a doozy. I wasted a couple of hours troubleshooting why leading zeros weren't being added to single-digits months and days (e.g. 01/05/14 vs. 1/5/14).

天涯沦落人2024-10-20 00:46:41

在 styles.xml 中查看是否有 numFmt 节点。我认为这将保存 numFmtId 为“9”,这将与所使用的日期格式相关。

我不知道它在 ECMA 中的位置,但如果您搜索 numFmt,您可能会找到它。

In styles.xml see if there is a numFmt node. I think that will hold a numFmtId of "9" which will relate to the date format that's used.

I don't know where that is in the ECMA, but if you search for numFmt, you might find it.

橘亓2024-10-20 00:46:41

我不清楚如何可靠地确定单元格是否具有日期/时间值。经过一段时间的实验后,我想出了代码 (参见帖子) 将查找内置和自定义日期/时间格式。

It was unclear to me how to reliably determine whether a cell has date/time value. After spending some time experimenting I had come up with the code (see post) that would look for both built-in and custom date/time formats.

三人与歌2024-10-20 00:46:41

如果其他人对此遇到困难,这就是我所做的:

1)创建一个新的 Excel 文件并在单元格 A1 中放入日期时间字符串

2)将单元格上的格式更改为您想要的任何格式,然后保存文件。

3) 运行以下 powershell 脚本从 .xlxs 中提取样式表

[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml")

$xlsx = (ls C:\PATH\TO\FILE.xlsx).FullName
$package = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($xlsx, $true)

[xml]$style = $package.WorkbookPart.WorkbookStylesPart.Stylesheet.OuterXml
Out-File -InputObject $style.OuterXml -FilePath "style.xml"

style.xml 现在包含可以注入到 DocumentFormat.OpenXml.Spreadsheet.Stylesheet(string outerXml) 的信息>,导致

4)使用提取的文件构建excel对象模型

var style = File.ReadAllText(@"c:\PATH\TO\EXTRACTED\Style.xml");
var stylesheetPart = WorkbookPart_REFERENCE.AddNewPart<WorkbookStylesPart>();
stylesheetPart.Stylesheet = new Stylesheet(style);
stylesheetPart.Stylesheet.Save();

In case anyone else is having a hard time with this, here is what I've done:

1) Create a new excel file and put in a date time string in cell A1

2) Change formatting on the cell to whatever you want, then save file.

3) Run following powershell script to extract out the stylesheet from .xlxs

[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml")

$xlsx = (ls C:\PATH\TO\FILE.xlsx).FullName
$package = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($xlsx, $true)

[xml]$style = $package.WorkbookPart.WorkbookStylesPart.Stylesheet.OuterXml
Out-File -InputObject $style.OuterXml -FilePath "style.xml"

style.xml now contains the information that you can inject to DocumentFormat.OpenXml.Spreadsheet.Stylesheet(string outerXml), leading to

4) Use the extracted file to construct excel object model

var style = File.ReadAllText(@"c:\PATH\TO\EXTRACTED\Style.xml");
var stylesheetPart = WorkbookPart_REFERENCE.AddNewPart<WorkbookStylesPart>();
stylesheetPart.Stylesheet = new Stylesheet(style);
stylesheetPart.Stylesheet.Save();
落花随流水2024-10-20 00:46:41

@RobScott 对您的代码片段的引用
我发现特定单元格的样式索引始终为空,

              var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;

我需要阅读下面提到的 Excel 并将行和列数据转换为 json。

excel 参考

StockInvoiceNoStockInvoiceOn名称描述
DC332001298923-01-2021 00:00:00:00项目 1描述
DC332001298924-01-2021 00:00:00:00项目 2描述
DC332001298925-01-2021 00:00:00:00项目3描述

@RobScott reference to your code snippet
I have found always null in style index of a particular Cell

              var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;

my requirement to read below mentioned excel and transfrom the row and column data to the json.

excel reference

StockInvoiceNoStockInvoiceOnNameDescription
DC332001298923-01-2021 00:00:00:00item1description
DC332001298924-01-2021 00:00:00:00item2description
DC332001298925-01-2021 00:00:00:00item3description
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文