Excel 列数据类型(不包括标题行)
使用spreadsheetgear,有没有办法获得一列的“可能”数据类型,不包括标题行(如果存在)并且合理地容忍稀疏人口,而不必自己做样本......是否已经有一种方法要这样做吗?
因此,例如,如果我有一个像
| Customers | Sales Item | Sale Date | Contact | Quantity |
| IBM | Keyboard | 28-10-2011 | | 2 |
| MS | Mouse | 27-09-2011 | joe | 5 |
我希望看到
字符串、字符串、日期时间、字符串、数字
编辑
这样的Excel行,那么我最终必须像@Tim Anderson建议的那样进行采样,但我需要处理稀疏数据的情况,当列中类型冲突时默认为字符串。 (这是在遍历列的循环中调用的,我无法发布它,因为它包含一些 IP)DataValueType 只是一个本地枚举,rowcount 是要采样的行数,因为我已经在采样,所以我只是忽略第 0 行如果它是标题行。
private DataType GetDataTypeFromColRange(IRange range, int rowcount, int col)
{
var dtlist = GetValueTypes(range, rowcount, col).Distinct();
// If conflicting types for the col default to string.
if (dtlist.Count() != 1)
{
return new DataType(DataTypeValue.String);
}
else
{
return new DataType(dtlist.First());
}
}
private IEnumerable<DataTypeValue> GetValueTypes(IRange range, int rowcount, int col)
{
for (int i = 1; i < rowcount; i++)
{
switch (range[i, col].ValueType)
{
case SpreadsheetGear.ValueType.Text:
yield return DataTypeValue.String;
break;
case SpreadsheetGear.ValueType.Number:
if (range[i, col].NumberFormatType == NumberFormatType.Date || range[i, col].NumberFormatType == NumberFormatType.DateTime)
{
yield return DataTypeValue.Date;
}
else
{
yield return DataTypeValue.Numeric;
}
break;
case SpreadsheetGear.ValueType.Logical:
yield return DataTypeValue.Bool;
break;
default: // ignore empty or errored cells.
continue;
}
}
}
我确信这可以进一步改进,因此请随时发布改进,但这正是我现在所需要的。
Using spreadsheetgear, is there any way to get the "probable" data type for a column, exclusive of the header row (if one exists) and reasonably tolerant of sparse population without having to do a sample myself...is there already a way to do this?
so for example if I had an excel row like
| Customers | Sales Item | Sale Date | Contact | Quantity |
| IBM | Keyboard | 28-10-2011 | | 2 |
| MS | Mouse | 27-09-2011 | joe | 5 |
I would expect to see
String, String, DateTime, String, Numeric
EDIT
So I ended up having to sample like @Tim Anderson suggested, but I needed to handle the case of sparse data, and default to string when conflicting types in the col. (this is called in a loop that walks the cols, I can't post that as it contains some IP) DataValueType is just a local enum and rowcount is the number of rows to sample and because I am already sampling I simply ignore row 0 in case it's a header row.
private DataType GetDataTypeFromColRange(IRange range, int rowcount, int col)
{
var dtlist = GetValueTypes(range, rowcount, col).Distinct();
// If conflicting types for the col default to string.
if (dtlist.Count() != 1)
{
return new DataType(DataTypeValue.String);
}
else
{
return new DataType(dtlist.First());
}
}
private IEnumerable<DataTypeValue> GetValueTypes(IRange range, int rowcount, int col)
{
for (int i = 1; i < rowcount; i++)
{
switch (range[i, col].ValueType)
{
case SpreadsheetGear.ValueType.Text:
yield return DataTypeValue.String;
break;
case SpreadsheetGear.ValueType.Number:
if (range[i, col].NumberFormatType == NumberFormatType.Date || range[i, col].NumberFormatType == NumberFormatType.DateTime)
{
yield return DataTypeValue.Date;
}
else
{
yield return DataTypeValue.Numeric;
}
break;
case SpreadsheetGear.ValueType.Logical:
yield return DataTypeValue.Bool;
break;
default: // ignore empty or errored cells.
continue;
}
}
}
I am sure this can be further improved so please feel free to post improvements, but this does what I need for now.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
SpreadsheetGear 中不存在帮助程序方法或其他 API 来自动返回一列值的“可能数据类型”。实现这样的东西来满足您自己的特定要求并不是很困难,尽管如果不“采样”数据就无法做到这一点。下面是一个非常简单的方法,它接受要检查的范围和指示该范围是否包含标题行的布尔值。它所做的只是检查第一行数据以确定类型;您可能想要构建一些更健壮的东西:
但是,您应该注意的一件事是,SpreadsheetGear 使用与 Excel 相同的基本内部数据类型,并且在检查 IRange.ValueType 时将返回这些类型(包括 Empty、Error、逻辑、数字、文本)。请注意,没有日期时间。在您的示例中,这将影响“销售日期”列上返回的值类型,因为日期/时间实际上作为表示日期/时间序列号的双精度数存储在 Excel 和 SpreadsheetGear 中。因此,这种类型的值将返回 Number,而不是 DateTime 之类的值。它们在单元格中显示为“日期”的事实只是单元格 NumberFormat 的函数。
No helper method or other API exists in SpreadsheetGear to automatically return the “probable data type” for a column of values. It would not be very difficult to implement something like this to meet your own particular requirements, although there’s no way to do this without “sampling” the data. Below is a very simple method that accepts the range to check and a Boolean that indicates whether or not the range includes a header row. All it does is check the first row of data to determine the type; you might want to build something a little more robust:
One thing you should be aware of, however, is that SpreadsheetGear uses the same basic internal data types as Excel and will return these types when checking IRange.ValueType (these include Empty, Error, Logical, Number, Text). Note there is no DateTime. In your example this would impact the value type returned on the “Sale Date” column because dates/times are actually stored in Excel and SpreadsheetGear as a doubles representing a date/time serial number. So this type of value would return Number, not something like DateTime. The fact that they show up as a “date” in the cell is simply a function of the NumberFormat of the cell.
从不使用电子表格齿轮,但在 Excel 中我使用这个 UDF
我相信可以适应
[] 的
In never use spreadsheetgear, but in Excel i use this UDF
I believe it is possible to adapt
[]'s
这是基于错误处理程序和 VBA 类型转换的另一种尝试:
可以使用此子测试:
您可以使用 Excel VBA 的所有转换函数(在 ozgrid 上查看此链接)
Here is another try based on error handler and VBA type conversion:
Can be tested with this sub:
You could generalize this with all the conversion functions of Excel VBA (see this link on ozgrid)