在 .Net 中查找 Excel 文档中最右边的单元格
我正在通过 DocumentFormat.OpenXml
库读取 Excel 文档。有没有好的方法可以找出它有多少列?
我刚刚在调查错误时遇到的当前代码是这样做的:
public string getMaxColumnName(SheetData aSheetData)
{
string lLastCellReference = aSheetData.Descendants<Cell>().Last().CellReference.InnerText;
char[] lRowNumberIndex = lLastCellReference.IndexOfAny(new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' });
return lLastCellReference.Substring(0, lRowNumberIndex);
}
用英语:找到工作表中的最后一个单元格,获取其单元格引用(如“CB99”),并检索第一个数字之前的所有内容。问题是工作表中的最后一个单元格不一定位于最右列。
我有一张测试表,是一张整齐的矩形桌子。它有 1000 行填充 A 到 M 列,因此该函数应该返回字符串“M”。但由于单元格 C1522 中有一个无关的空格字符,因此该单元格被计为最后一个单元格,因此该函数将最大列报告为“C”。
我最初的冲动是用某种 Max(columnNumber)
替换 Last()
调用。但是,Cell
显然没有公开实际的列号,仅公开这个复合 CellReference
字符串。我不认为我想在谓词内进行字符串分割。
有没有办法找到工作表最右边的列,而不必解析每个单元格的 CellReference
?
I'm reading an Excel document via the DocumentFormat.OpenXml
library. Is there a good way to find out how many columns it has?
The current code, which I've just come across while investigating a bug, does this:
public string getMaxColumnName(SheetData aSheetData)
{
string lLastCellReference = aSheetData.Descendants<Cell>().Last().CellReference.InnerText;
char[] lRowNumberIndex = lLastCellReference.IndexOfAny(new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' });
return lLastCellReference.Substring(0, lRowNumberIndex);
}
In English: find the last cell in the sheet, get its cell reference (like "CB99"), and retrieve everything before the first digit. The problem is that the last cell in the sheet is not necessarily in the rightmost column.
I have a test sheet that is a neat, rectangular table. It has 1000 rows filling columns A through M, so the function is supposed to return the string "M". But because there is an extraneous space character in cell C1522, that's counted as the last cell, so the function reports the max column as "C".
My initial impulse was to just replace that Last()
call with some kind of Max(columnNumber)
. However, Cell
apparently does not expose an actual column number, only this composite CellReference
string. I don't think I want to be doing string-splitting inside a predicate there.
Is there a way to find the sheet's rightmost column, without having to parse the CellReference
of every single cell?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
据我了解格式,有多种情况:
如果文件不是由 Excel 生成的,并且工作表包含的数据不存在空白行且行内不存在空白列,但不一定每个都包含空白行。行具有相同的列数(可能是这种情况):
你真是完蛋了。在这种情况下,该格式允许忽略行和单元格引用。您必须计算每行中的所有单元格引用才能获得最大值。
如果文件不是由 Excel 生成的,但单元格填充稀疏(显然情况并非如此):
每行的最后一个单元格保存它必须位于“r”属性中的列的引用。不过,您必须转换引用。
如果文件是由 Excel 生成的:
通常,我还没有找到一个 Excel 生成的文件不这样做,工作表部分有一个名为维度的子项,它有一个“ref”属性,其中包含工作表使用的单元格引用,即“A1:M1001” ”。这只是使用它来了解列的情况。当然,只有当无关字符不在表格后面的列中时,它才有效。
或者,通常每一行以及我见过的每个 Excel 生成的文件都有一个名为“spans”的属性,该属性包含行使用的列。 “spans”属性格式是数字,因此在您的示例中,表中的每一行都有一个值“1:13”。也许您只需以这种方式检查第一行。
As I understand the format, there are various cases:
If the file is not generated by Excel and the worksheet contains data in a way that there are no blank rows and there are no blank column within a row, but not necesarily every row has the same number of columns (which may be the case):
You are pretty much screwed. The format allow for rows and cells references to be ignored in this case. You have to count all cell references in each row to get the maximum.
If the file is not generated by Excel, but cells are populated sparse (which apparently is not the case):
The last cell of each row holds the reference of the column it must be in the "r" attribute. You will have to convert the reference, though.
If the file is generated by Excel:
Usually, and I haven't found an Excel-generated file that doesn't, the worksheet part has a child named dimension, which has a "ref" attribute with the cell reference used by the worksheet i.e. "A1:M1001". It is only a case of using this to know the columns. Of course, it works only if the extraneous character does not come in a column after the table.
Alternatively, every row usually, and every Excel-generated file I have seen has it, has an attribute called "spans" that has the columns that row uses. The "spans" attribute format is numeric, so in your example it would have a value "1:13" for every row in the table. Maybe you only have to check the first row this way.
我的结论是,这从一开始就是错误的做法。使用代码实际上从未在整个工作表中寻找最右边的单元格。一般来说,它需要的是特定行中的单元格数量——第 1 行或已知的表标题位置。
事实上,除了渲染或打印可能的例外之外,我无法想出任何需要获取整个工作表的最大单元格的情况。
因此,我需要稍微重构一下。我正在更改该函数,以便它需要一个工作表和一个行索引,并返回该行中最右侧单元格的列。也就是说,它现在看起来像:
为了实现这一点,我可以检查
Row.Spans
属性是否存在,或者解析Row.ChildElements.Last( )
。I have concluded that this is the wrong thing to do in the first place. The consuming code is never actually looking for the rightmost cell in the whole sheet. Generally, what it wants is the number of cells in a particular row-- either row 1, or a known table header location.
In fact, with the possible exception of rendering or printing, I can't come up with any situation where getting the whole sheet's max cell is necessary.
Therefore, I need to refactor slightly. I'm changing the function so it takes a sheet and a row index, and returns the column of the rightmost cell in that row. That is, it will now look like:
For the implementation of that, I can check the
Row.Spans
property when it exists, or else parse the cell reference ofRow.ChildElements.Last()
.