使用 DataReader 和 OLEDB Jet 数据提供程序读取 CSV 文件时,如何控制列数据类型?

发布于 2024-07-05 10:39:10 字数 746 浏览 6 评论 0原文

在我的 C# 应用程序中,我使用 Microsoft Jet OLEDB 数据提供程序来读取 CSV 文件。 连接字符串如下所示:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Data;Extended Properties="text;HDR=Yes;FMT=Delimited

我使用该连接字符串打开 ADO.NET OleDbConnection,并使用以下命令从 CSV 文件中选择所有行:

select * from Data.csv

当我打开 OleDbDataReader 并检查它返回的列的数据类型时,我发现堆栈中的某些内容尝试根据文件中的第一行数据猜测数据类型。 例如,假设 CSV 文件包含:

House,Street,Town
123,Fake Street,Springfield
12a,Evergreen Terrace,Springfield

调用 House 列的 OleDbDataReader.GetDataTypeName 方法将显示该列已被赋予数据类型“DBTYPE_I4”,因此从该列读取的所有值都被解释为整数。 我的问题是 House 应该是一个字符串 - 当我尝试从第二行读取 House 值时,OleDbDataReader 返回 null。

我如何告诉 Jet 数据库提供程序或 OleDbDataReader 将列解释为字符串而不是数字?

In my C# application I am using the Microsoft Jet OLEDB data provider to read a CSV file. The connection string looks like this:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\Data;Extended Properties="text;HDR=Yes;FMT=Delimited

I open an ADO.NET OleDbConnection using that connection string and select all the rows from the CSV file with the command:

select * from Data.csv

When I open an OleDbDataReader and examine the data types of the columns it returns, I find that something in the stack has tried to guess at the data types based on the first row of data in the file. For example, suppose the CSV file contains:

House,Street,Town
123,Fake Street,Springfield
12a,Evergreen Terrace,Springfield

Calling the OleDbDataReader.GetDataTypeName method for the House column will reveal that the column has been given the data type "DBTYPE_I4", so all values read from it are interpreted as integers. My problem is that House should be a string - when I try to read the House value from the second row, the OleDbDataReader returns null.

How can I tell either the Jet database provider or the OleDbDataReader to interpret a column as strings instead of numbers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

撞了怀 2024-07-12 10:39:10

您可以创建一个架构文件,它会告诉 ADO.NET 如何解释 CSV - 实际上给它一个结构。

试试这个: http://www .aspdotnetcodes.com/Importing_CSV_Database_Schema.ini.aspx

或最新的 MS 文档

There's a schema file you can create that would tell ADO.NET how to interpret the CSV - in effect giving it a structure.

Try this: http://www.aspdotnetcodes.com/Importing_CSV_Database_Schema.ini.aspx

Or the most recent MS Documentation

瞄了个咪的 2024-07-12 10:39:10

请检查

http://kbcsv.codeplex.com/

using (var reader = new CsvReader("data.csv"))
{
    reader.ReadHeaderRecord();
    foreach (var record in reader.DataRecords)
    {
        var name = record["Name"];
        var age = record["Age"];
    }
}

Please check

http://kbcsv.codeplex.com/

using (var reader = new CsvReader("data.csv"))
{
    reader.ReadHeaderRecord();
    foreach (var record in reader.DataRecords)
    {
        var name = record["Name"];
        var age = record["Age"];
    }
}
笑着哭最痛 2024-07-12 10:39:10

为了扩展 Marc 的答案,我需要创建一个名为 Schema.ini 的文本文件,并将其放在与 CSV 文件相同的目录中。 除了列类型之外,此文件还可以指定文件格式、日期时间格式、区域设置和列名称(如果文件中未包含这些内容)。

为了使我在问题中给出的示例起作用,架构文件应该如下所示:

[Data.csv]
ColNameHeader=True
Col1=House Text
Col2=Street Text
Col3=Town Text

我还可以尝试让数据提供者在尝试猜测数据类型之前检查文件中的所有行:

[Data.csv]
ColNameHeader=true
MaxScanRows=0

在现实生活中,我的应用程序从具有动态名称的文件导入数据,因此我必须动态创建一个 Schema.ini 文件,并将其写入与 CSV 文件相同的目录,然后再打开连接。

更多详细信息可以在这里找到 - http://msdn.microsoft .com/en-us/library/ms709353(VS.85).aspx - 或通过在 MSDN 库中搜索“Schema.ini 文件”。

To expand on Marc's answer, I need to create a text file called Schema.ini and put it in the same directory as the CSV file. As well as column types, this file can specify the file format, date time format, regional settings, and the column names if they're not included in the file.

To make the example I gave in the question work, the Schema file should look like this:

[Data.csv]
ColNameHeader=True
Col1=House Text
Col2=Street Text
Col3=Town Text

I could also try this to make the data provider examine all the rows in the file before it tries to guess the data types:

[Data.csv]
ColNameHeader=true
MaxScanRows=0

In real life, my application imports data from files with dynamic names, so I have to create a Schema.ini file on the fly and write it to the same directory as the CSV file before I open my connection.

Further details can be found here - http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx - or by searching the MSDN Library for "Schema.ini file".

素手挽清风 2024-07-12 10:39:10

您需要告诉驱动程序扫描所有行以确定架构。 否则,如果前几行是数字,其余行是字母数字,则字母数字单元格将为空白。

就像Rory一样,我发现我需要动态创建一个schema.ini文件,因为没有办法以编程方式告诉驱动程序扫描所有行。 (Excel 文件不是这种情况)

您的 schema.ini 中必须有 MaxScanRows=0

这是一个代码示例:

    public static DataTable GetDataFromCsvFile(string filePath, bool isFirstRowHeader = true)
    {
        if (!File.Exists(filePath))
        {
            throw new FileNotFoundException("The path: " + filePath + " doesn't exist!");
        }

        if (!(Path.GetExtension(filePath) ?? string.Empty).ToUpper().Equals(".CSV"))
        {
            throw new ArgumentException("Only CSV files are supported");
        }
        var pathOnly = Path.GetDirectoryName(filePath);
        var filename = Path.GetFileName(filePath);
        var schemaIni =
            $"[{filename}]{Environment.NewLine}" +
            $"Format=CSVDelimited{Environment.NewLine}" +
            $"ColNameHeader={(isFirstRowHeader ? "True" : "False")}{Environment.NewLine}" +
            $"MaxScanRows=0{Environment.NewLine}" +
            $" ; scan all rows for data type{Environment.NewLine}" +
            $" ; This file was automatically generated";
        var schemaFile = pathOnly != null ? Path.Combine(pathOnly, "schema.ini") : "schema.ini";
        File.WriteAllText(schemaFile, schemaIni);

        try
        {
            var sqlCommand = $@"SELECT * FROM [{filename}]";

            var oleDbConnString =
                $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={pathOnly};Extended Properties=\"Text;HDR={(isFirstRowHeader ? "Yes" : "No")}\"";

            using (var oleDbConnection = new OleDbConnection(oleDbConnString))
            using (var adapter = new OleDbDataAdapter(sqlCommand, oleDbConnection))
            using (var dataTable = new DataTable())
            {
                adapter.FillSchema(dataTable, SchemaType.Source);
                adapter.Fill(dataTable);
                return dataTable;
            }
        }
        finally
        {
            if (File.Exists(schemaFile))
            {
                File.Delete(schemaFile);
            }
        }
    }

如果您在同一目录上运行它,则需要进行一些修改同时在多个线程中。

You need to tell the driver to scan all rows to determine the schema. Otherwise if the first few rows are numeric and the rest are alphanumeric, the alphanumeric cells will be blank.

Like Rory, I found that I needed to create a schema.ini file dynamically because there is no way to programatically tell the driver to scan all rows. (this is not the case for excel files)

You must have MaxScanRows=0 in your schema.ini

Here's a code example:

    public static DataTable GetDataFromCsvFile(string filePath, bool isFirstRowHeader = true)
    {
        if (!File.Exists(filePath))
        {
            throw new FileNotFoundException("The path: " + filePath + " doesn't exist!");
        }

        if (!(Path.GetExtension(filePath) ?? string.Empty).ToUpper().Equals(".CSV"))
        {
            throw new ArgumentException("Only CSV files are supported");
        }
        var pathOnly = Path.GetDirectoryName(filePath);
        var filename = Path.GetFileName(filePath);
        var schemaIni =
            $"[{filename}]{Environment.NewLine}" +
            $"Format=CSVDelimited{Environment.NewLine}" +
            $"ColNameHeader={(isFirstRowHeader ? "True" : "False")}{Environment.NewLine}" +
            $"MaxScanRows=0{Environment.NewLine}" +
            $" ; scan all rows for data type{Environment.NewLine}" +
            $" ; This file was automatically generated";
        var schemaFile = pathOnly != null ? Path.Combine(pathOnly, "schema.ini") : "schema.ini";
        File.WriteAllText(schemaFile, schemaIni);

        try
        {
            var sqlCommand = $@"SELECT * FROM [{filename}]";

            var oleDbConnString =
                $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={pathOnly};Extended Properties=\"Text;HDR={(isFirstRowHeader ? "Yes" : "No")}\"";

            using (var oleDbConnection = new OleDbConnection(oleDbConnString))
            using (var adapter = new OleDbDataAdapter(sqlCommand, oleDbConnection))
            using (var dataTable = new DataTable())
            {
                adapter.FillSchema(dataTable, SchemaType.Source);
                adapter.Fill(dataTable);
                return dataTable;
            }
        }
        finally
        {
            if (File.Exists(schemaFile))
            {
                File.Delete(schemaFile);
            }
        }
    }

You'll need to do some modification if you are running this on the same directory in multiple threads at the same time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文