如何强制 ADO.Net 在阅读器 TableSchema 中仅使用 System.String 数据类型

发布于 2024-08-27 08:01:58 字数 1633 浏览 12 评论 0原文

我正在使用 OleDbConnection 查询 Excel 2007 电子表格。我想强制 OleDbDataReader 仅使用字符串作为列数据类型。

系统正在查看前 8 行数据并推断数据类型为 Double。问题是,在第 9 行,该列中有一个字符串,并且 OleDbDataReader 返回 Null 值,因为它无法转换为 Double。

我使用了这些连接字符串:

Provider=Microsoft.ACE.OLEDB.12.0;数据源=“ExcelFile.xlsx”;持久安全信息=False;扩展属性=“Excel 12.0;IMEX=1;HDR=否”

Provider=Microsoft.Jet.OLEDB.4.0;数据源=“ExcelFile.xlsx”;持久安全信息=False;扩展属性=“Excel 8.0;HDR=No;IMEX=1”

查看 reader.GetSchemaTable().Rows[7].ItemArray[5],它是数据类型为双精度。

此架构中的第 7 行与我遇到问题的 Excel 中的特定列相关。 ItemArray[5] 是其 DataType 列

是否可以为阅读器创建自定义 TableSchema,以便在访问 ExcelFiles 时,我可以将所有单元格视为文本,而不是让系统尝试推断数据类型?


我在此页面找到了一些很好的信息:使用 ADO.NET 阅读 Excel 电子表格的提示

ADO.NET 接口的主要特点是数据类型的处理方式。 (您会注意到,在阅读电子表格时,我一直在小心地避免返回哪些数据类型的问题。)您准备好了吗? ADO.NET 扫描前 8 行数据,并据此猜测每列的数据类型。然后它尝试将该列中的所有数据强制转换为该数据类型,只要强制失败就返回 NULL!

谢谢你,
Keith


这是我的代码的简化版本:

using (OleDbConnection connection = new OleDbConnection(BuildConnectionString(dataMapper).ToString()))
{
    connection.Open();
    using (OleDbCommand cmd = new OleDbCommand())
    {
        cmd.Connection = connection;
        cmd.CommandText = SELECT * from [Sheet1$];
        using (OleDbDataReader reader = cmd.ExecuteReader())
        {
            using (DataTable dataTable = new DataTable("TestTable"))
            {
                dataTable.Load(reader);
                base.SourceDataSet.Tables.Add(dataTable);
            }
        }
    }
}

I am using an OleDbConnection to query an Excel 2007 Spreadsheet. I want force the OleDbDataReader to use only string as the column datatype.

The system is looking at the first 8 rows of data and inferring the data type to be Double. The problem is that on row 9 I have a string in that column and the OleDbDataReader is returning a Null value since it could not be cast to a Double.

I have used these connection strings:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 12.0;IMEX=1;HDR=No"

Provider=Microsoft.Jet.OLEDB.4.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 8.0;HDR=No;IMEX=1"

Looking at the reader.GetSchemaTable().Rows[7].ItemArray[5], it's dataType is Double.

Row 7 in this schema correlates with the specific column in Excel I am having issues with. ItemArray[5] is its DataType column

Is it possible to create a custom TableSchema for the reader so when accessing the ExcelFiles, I can treat all cells as text instead of letting the system attempt to infer the datatype?


I found some good info at this page: Tips for reading Excel spreadsheets using ADO.NET

The main quirk about the ADO.NET interface is how datatypes are handled. (You'll notice I've been carefully avoiding the question of which datatypes are returned when reading the spreadsheet.) Are you ready for this? ADO.NET scans the first 8 rows of data, and based on that guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!

Thank you,
Keith


Here is a reduced version of my code:

using (OleDbConnection connection = new OleDbConnection(BuildConnectionString(dataMapper).ToString()))
{
    connection.Open();
    using (OleDbCommand cmd = new OleDbCommand())
    {
        cmd.Connection = connection;
        cmd.CommandText = SELECT * from [Sheet1$];
        using (OleDbDataReader reader = cmd.ExecuteReader())
        {
            using (DataTable dataTable = new DataTable("TestTable"))
            {
                dataTable.Load(reader);
                base.SourceDataSet.Tables.Add(dataTable);
            }
        }
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

说谎友 2024-09-03 08:01:58

正如您所发现的,OLEDB 使用 Jet,但它的调整方式受到限制。如果您设置使用 OleDbConnection 读取 Excel 文件,则需要将 HKLM\...\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows 值设置为零,以便系统将扫描整个结果集。

也就是说,如果您愿意使用替代引擎读取 Excel 文件,您可以考虑尝试 ExcelDataReader。它将所有列读取为字符串,但允许您使用 dataReader.Getxxx 方法来获取键入的值。下面是填充 DataSet 的示例:

DataSet result;
const string path = @"....\Test.xlsx";
using ( var fileStream = new FileStream( path, FileMode.Open, FileAccess.Read ) )
{
    using ( var excelReader = ExcelReaderFactory.CreateOpenXmlReader( fileStream ) )
    {
        excelReader.IsFirstRowAsColumnNames = true;
        result = excelReader.AsDataSet();
    }
}

As you have discovered, OLEDB uses Jet which is limited in the manner in which it can be tweaked. If you are set on using an OleDbConnection to read from an Excel file, then you need to set the HKLM\...\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows value to zero so that the system will scan the entire resultset.

That said, if you are open to using an alternative engine to read from an Excel file, you might consider trying the ExcelDataReader. It reads all columns as strings but will let you use dataReader.Getxxx methods to get typed values. Here's a sample that fills a DataSet:

DataSet result;
const string path = @"....\Test.xlsx";
using ( var fileStream = new FileStream( path, FileMode.Open, FileAccess.Read ) )
{
    using ( var excelReader = ExcelReaderFactory.CreateOpenXmlReader( fileStream ) )
    {
        excelReader.IsFirstRowAsColumnNames = true;
        result = excelReader.AsDataSet();
    }
}
伏妖词 2024-09-03 08:01:58

对于 64 位操作系统,请注意此处:

My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel

Note for 64bit OS it is here:

My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
夏の忆 2024-09-03 08:01:58

此页面上查看最终答案。


刚刚注意到您引用的页面说了同样的事情...


更新

问题似乎出在 JET 引擎本身而不是 ADO 上。一旦 JET 决定了类型,它就会坚持下去。此后所做的任何事情都没有效果;就像在 SQL 中将值转换为字符串(例如 Cstr([Column]))一样,只会导致返回空字符串。

此时(如果没有其他答案)我会选择其他方法:修改电子表格;修改注册表(这并不理想,因为您会弄乱使用 JET 的每个其他应用程序的设置); Excel 自动化或不使用 JET 的第三方组件。

如果自动化选项速度很慢,那么也许只需使用它以更容易处理的不同格式保存电子表格。

Check out the final answer on this page.


Just noticed the page you refer to says the same thing ...


Update:

The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it. Anything done after that has no effect; like casting the values to string in the SQL (e.g. Cstr([Column])) just results in an empty string being returned.

At this point (if there are no other answers) I'd opt for other methods: modifying the spreadsheet; modifying registry (not ideal since you will be messing with the settings for every other app the uses JET); Excel automation or a third party component that does not use JET.

If Automation option is to slow then maybe just use it to save the spreadsheet in a different format which is easier to handle.

你与清晨阳光 2024-09-03 08:01:58

我也遇到过同样的问题,并且确定这是很多人都经历过的事情。这里有一些建议的解决方案,其中许多是我尝试实施的:


  1. 将以下内容添加到您的连接字符串中(Source< /a>):

TypeGuessRows=0;ImportMixedTypes=文本

  1. 将以下内容添加到连接字符串中(来源更多讨论更多):

IMEX=1;HDR=否;

  1. 编辑以下注册表设置,禁用“TypeGuessRows”,并将“ImportMixedTypes”设置为“Text”(来源不推荐更多文档):

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

  1. 考虑使用替代库来读取 Excel 文件:

  2. 将源文件中的所有数据格式化为文本(至少前 8 行),尽管我知道这通常不切实际(来源,虽然这与SSIS有关,但概念相同)

  3. 在导入文件之前使用 Schema.ini 文件定义数据类型,我发现这与直接使用“Jet.OleDb”有关,可能需要您修改连接字符串。这可能仅适用于 CSV 我还没有尝试过这种方法。(来源, 相关文章)


这些都对我不起作用(尽管我相信它们对其他人有用)。我同意@Asher表达的观点,这个问题确实没有好的解决方案。在我的软件中,我只是向用户显示一条错误消息(如果任何必需的列包含空值),指示他们 将所有列格式设置为“文本”

老实说,我认为这本书更适用情况。已经多次指出的问题是:

  • “目标的数据类型是 varchar,但假定的数据
    “double”类型会使任何不适合的数据无效。”()

  • "但问题实际上出在 OLEDBDataReader 上。问题
    是如果它看到一列中大部分是数字,它就会假设一切
    是一个数字 - 如果正在读取的行项目不是数字,它只是
    将其设置为空!哎呀!”(来源)

  • “问题似乎出在 JET 引擎本身,而不是 ADO。一次
    JET 决定类型,并坚持下去。”(@Asher)

虽然我还没有找到任何以官方身份记录的内容,但我认为很明显这是一个有意的设计决策,并且Jet 数据库库 有效 我犹豫是否称这个库完全无用,因为我认为对于很多人来说,其中一些解决方案。确实有效,但到目前为止,对于我的项目,我得出的结论是,该库无法读取单列中的多种数据类型,并且不适合一般数据检索。

I have faced the same issue and determined that this is something that many people commonly experience. Here are a number of solutions that have been suggested, many of which I have attempted to implement:


  1. Add the following to your connection string(Source):

TypeGuessRows=0;ImportMixedTypes=Text

  1. Add the following to your connection string(Source, More Discussion, Even More):

IMEX=1;HDR=NO;

  1. Edit the following registry settings, disable "TypeGuessRows", and "ImportMixedTypes" set to "Text"(Source, Not Recommended, More Documentation):

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

  1. Consider using an alternative library for reading the excel file:

  2. Format all data in the source file as Text(at least the first 8 rows), though I understand that's typically impractical(Source, though this is relation to SSIS, but it's the same concepts)

  3. Use a Schema.ini file to define the data type before importing the file, I found this in relation to using "Jet.OleDb" directly, maybe requiring you to modifying your connection string. This may only be applicable to CSV's I have not tried this approach.(Source, Related Post)


None of these have worked for me(though I believe they have worked for others). I am of the opinion expressed by @Asher that there is really no good solution to this problem. In my software I simply display an error message to the user(if any required column contain empty values) instructing them to format all columns as "Text".

Honestly, I think this book is more applicable to situation. The issue, already stated multiple times is:

  • "The data type at the destination is varchar but the assumed data
    type of "double" nullifies any data that doesn't fit."(Source)

  • "But the problem is actually with the OLEDBDataReader. The problem
    is that if it sees mostly numbers in a column, it assumes everything
    is a number - if a row item being read is not a number, it simply
    sets it to null! Ouch!"(Source)

  • "The problem seems to be with the JET engine itself and not ADO. Once
    JET decides on the type, it sticks to it."(@Asher)

While I haven't found any of this documented in an official capacity I think that it's very clear that this is an intentional design decision and simply how the Jet Database Library works. I hesitate to call this library entirely useless because I think for many people some of these solutions do work, but so far for my project, I have come to the conclusion that this library cannot read multiple data types in a single column and is ill suited for general data retrieval.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文