将逗号分隔文本文件读取到 C# DataTable，列被截断为 255 个字符

发布于 2024-07-25 09:37:02 字数 1557 浏览 10 评论 0原文

我们正在从 CSV 导入到 SQL。为此，我们读取 CSV 文件并使用 schema.ini 写入临时 .txt 文件。（我还不确定为什么要写入这个临时文件，但这就是代码当前的工作方式）。从那里，我们使用以下连接字符串（对于 ASCII 文件）通过 OleDB 加载 DataTable。

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sPath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\"";

我们遇到的问题是超过 255 个字符的字段会被截断。我在网上阅读过有关此问题的信息，似乎默认情况下，文本字段会因此被截断。

我在 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel 中设置了注册表设置 ImportMixedTypes=Majority Type 和 TypeGuessRows=0 ，希望 mycolumns 不再被解释为文本。执行此操作后，可以从 CSV 文件正确写入临时 txt 文件，但是当我调用 dataAdapter.Fill 时，生成的 DataTable 仍然具有被截断的值。

这是有问题的列定义。 CommaDelimited#txt 注释 2 假 234 真 130 0 0

任何帮助，将不胜感激。目前，我对使用任何 3d 方代码来解决这个问题不感兴趣，必须有一种使用内置工具的方法。

这是表定义：

<Columns> 
    <TABLE_NAME>CommaDelimited#txt</TABLE_NAME> 
    <COLUMN_NAME>Notes</COLUMN_NAME> 
    <ORDINAL_POSITION>2</ORDINAL_POSITION> 
    <COLUMN_HASDEFAULT>false</COLUMN_HASDEFAULT> 
    <COLUMN_FLAGS>234</COLUMN_FLAGS> 
    <IS_NULLABLE>true</IS_NULLABLE> 
    <DATA_TYPE>130</DATA_TYPE> 
    <CHARACTER_MAXIMUM_LENGTH>0</CHARACTER_MAXIMUM_LENGTH> 
    <CHARACTER_OCTET_LENGTH>0</CHARACTER_OCTET_LENGTH> 
</Columns>

谢谢，

Greg

我尝试编辑 schema.ini 指定宽度的文本，但这没有帮助（之前已设置为备忘录）

[CommaDelimited.txt] 格式=CSVDelimited 小数符号=. Col1=注释文本宽度 5000

原文

We are importing from CSV to SQL. To do so, we are reading the CSV file and writing to a temporary .txt file using a schema.ini. (I'm not sure yet exactly why are are writing to this temporary file, but that's how the code currently works). From there, we are loading a DataTable via OleDB using the following connection string (for ASCII files).

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sPath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\"";

The problem we are having is that fields with more than 255 characters get truncated. I've read online about this problem and it seems that by default, text fields get truncated thusly.

I set my registry settings ImportMixedTypes=Majority Type and TypeGuessRows=0 in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
, hoping that mycolumns will no longer be interpreted as text. After doing that, the temporary txt file is being written correctly from the CSV file, but when I call dataAdapter.Fill, the resulting DataTable still has a truncated value.

Here is the column definition in question.

CommaDelimited#txt Notes 2 false 234 true 130 0 0

Any help would be appreciated. At this time, I'm not interested in using any 3d party code to solve this problem, there must be a way using built in tools.

Here is the table definition:

<Columns> 
    <TABLE_NAME>CommaDelimited#txt</TABLE_NAME> 
    <COLUMN_NAME>Notes</COLUMN_NAME> 
    <ORDINAL_POSITION>2</ORDINAL_POSITION> 
    <COLUMN_HASDEFAULT>false</COLUMN_HASDEFAULT> 
    <COLUMN_FLAGS>234</COLUMN_FLAGS> 
    <IS_NULLABLE>true</IS_NULLABLE> 
    <DATA_TYPE>130</DATA_TYPE> 
    <CHARACTER_MAXIMUM_LENGTH>0</CHARACTER_MAXIMUM_LENGTH> 
    <CHARACTER_OCTET_LENGTH>0</CHARACTER_OCTET_LENGTH> 
</Columns>

Thanks,

Greg

I tried editing the schema.ini specifying text with a width, and that did not help (it was set to memo before)

[CommaDelimited.txt]
Format=CSVDelimited
DecimalSymbol=.
Col1=Notes Text Width 5000

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吃不饱 2024-08-01 09:37:10

我的 .Net 代码也遇到了类似的问题，只需将注册表设置更改为多数类型（如原始帖子中所述）即可使其正常工作。我在我的案例中所做的不同之处是：
因为我们尝试从 CSV 而不是 Excel 导入。所以我必须更改注册表中文本的设置（不是 Excel）。

尝试这样做，我认为它应该有效

回复收藏 0 原文

站稳脚跟 2024-08-01 09:37:09

我认为最好的方法是使用以下博客中的 CSVReader：
http://ronaldlemmen.blogspot.com/2008/ 03/stopping-and-continuing-save-event.html

回复收藏 0 原文

橙幽之幻 2024-08-01 09:37:08

我倾向于在读取 CSV 文件时直接创建 DataTable，而不是执行将数据写入不同文本文件的额外步骤，然后再次将其读回内存。

就这个问题而言，最终如何将数据从 DataTable 获取到 SQL 数据库中？如果您只是循环访问 DataTable 并执行一堆 INSERT 语句，为什么不跳过两个中间人并在最初读取 CSV 文件时调用相同的 INSERT 语句呢？

回复收藏 0 原文

ゞ花落谁相伴 2024-08-01 09:37:07

您可以通过正确指定 schema.ini 文件。我相信这两个选项是将列设置为备忘录类型，或者将宽度设置为“备注”。 255.

回复收藏 0 原文

ゝ偶尔ゞ 2024-08-01 09:37:06

这是一个简单的类，用于读取分隔文件并返回不截断字符串的 DataTable（所有字符串）。它有一个重载方法来指定列名（如果它们不在文件中）。也许你可以用它？

导入的命名空间

using System;
using System.Text;
using System.Data;
using System.IO;

代码

/// <summary>
/// Simple class for reading delimited text files
/// </summary>
public class DelimitedTextReader
{
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter)
    {
        return ReadFile(filename, delimiter, null);
    }
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <param name="columnNames">Array of column names</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter, string[] columnNames)
    {
        //  Create the new table
        DataTable data = new DataTable();
        data.Locale = System.Globalization.CultureInfo.CurrentCulture;

        //  Check file
        if (!File.Exists(filename))
            throw new FileNotFoundException("File not found", filename);

        //  Process the file line by line
        string line;
        using (TextReader tr = new StreamReader(filename, Encoding.Default))
        {
            //  If column names were not passed, we'll read them from the file
            if (columnNames == null)
            {
                //  Get the first line
                line = tr.ReadLine();
                if (string.IsNullOrEmpty(line))
                    throw new IOException("Could not read column names from file.");
                columnNames = line.Split(new string[] { delimiter }, StringSplitOptions.RemoveEmptyEntries);
            }

            //  Add the columns to the data table
            foreach (string colName in columnNames)
                data.Columns.Add(colName);

            //  Read the file
            string[] columns;
            while ((line = tr.ReadLine()) != null)
            {
                columns = line.Split(new string[] { delimiter }, StringSplitOptions.None);
                //  Ensure we have the same number of columns
                if (columns.Length != columnNames.Length)
                {
                    string message = "Data row has {0} columns and {1} are defined by column names.";
                    throw new DataException(string.Format(message, columns.Length, columnNames.Length));
                }
                data.Rows.Add(columns);
            }
        }
        return data;

    }
}

所需的命名空间

using System;
using System.Data;
using System.Windows.Forms;
using System.Data.SqlClient;
using System.Diagnostics;

以下是调用它并上传到 SQL 数据库的示例：

        Stopwatch sw = new Stopwatch();
        TimeSpan tsRead;
        TimeSpan tsTrunc;
        TimeSpan tsBcp;
        int rows;
        sw.Start();
        using (DataTable dt = DelimitedTextReader.ReadFile(textBox1.Text, "\t"))
        {
            tsRead = sw.Elapsed;
            sw.Reset();
            rows = dt.Rows.Count;
            string connect = @"Data Source=.;Initial Catalog=MyDB;Integrated Security=SSPI";
            using (SqlConnection cn = new SqlConnection(connect))
            using (SqlCommand cmd = new SqlCommand("TRUNCATE TABLE dbo.UploadTable", cn))
            using (SqlBulkCopy bcp = new SqlBulkCopy(cn))
            {
                cn.Open();
                sw.Start();
                cmd.ExecuteNonQuery();
                tsTrunc = sw.Elapsed;
                sw.Reset();

                sw.Start();
                bcp.DestinationTableName = "dbo.UploadTable";
                bcp.ColumnMappings.Add("Column A", "ColumnA");
                bcp.ColumnMappings.Add("Column D", "ColumnD");
                bcp.WriteToServer(dt);
                tsBcp = sw.Elapsed;
                sw.Reset();
            }
        }

        string message = "File read:\t{0}\r\nTruncate:\t{1}\r\nBcp:\t{2}\r\n\r\nTotal time:\t{3}\r\nTotal rows:\t{4}";
        MessageBox.Show(string.Format(message, tsRead, tsTrunc, tsBcp, tsRead + tsTrunc + tsBcp, rows));

Here's a simple class for reading a delimited file and returning a DataTable (all strings) that doesn't truncate strings. It has an overloaded method to specify column names if they're not in the file. Maybe you can use it?

Imported Namespaces

using System;
using System.Text;
using System.Data;
using System.IO;

Code

/// <summary>
/// Simple class for reading delimited text files
/// </summary>
public class DelimitedTextReader
{
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter)
    {
        return ReadFile(filename, delimiter, null);
    }
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <param name="columnNames">Array of column names</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter, string[] columnNames)
    {
        //  Create the new table
        DataTable data = new DataTable();
        data.Locale = System.Globalization.CultureInfo.CurrentCulture;

        //  Check file
        if (!File.Exists(filename))
            throw new FileNotFoundException("File not found", filename);

        //  Process the file line by line
        string line;
        using (TextReader tr = new StreamReader(filename, Encoding.Default))
        {
            //  If column names were not passed, we'll read them from the file
            if (columnNames == null)
            {
                //  Get the first line
                line = tr.ReadLine();
                if (string.IsNullOrEmpty(line))
                    throw new IOException("Could not read column names from file.");
                columnNames = line.Split(new string[] { delimiter }, StringSplitOptions.RemoveEmptyEntries);
            }

            //  Add the columns to the data table
            foreach (string colName in columnNames)
                data.Columns.Add(colName);

            //  Read the file
            string[] columns;
            while ((line = tr.ReadLine()) != null)
            {
                columns = line.Split(new string[] { delimiter }, StringSplitOptions.None);
                //  Ensure we have the same number of columns
                if (columns.Length != columnNames.Length)
                {
                    string message = "Data row has {0} columns and {1} are defined by column names.";
                    throw new DataException(string.Format(message, columns.Length, columnNames.Length));
                }
                data.Rows.Add(columns);
            }
        }
        return data;

    }
}

Required Namespaces

using System;
using System.Data;
using System.Windows.Forms;
using System.Data.SqlClient;
using System.Diagnostics;

Here's an example of calling it and uploading to a SQL Database:

        Stopwatch sw = new Stopwatch();
        TimeSpan tsRead;
        TimeSpan tsTrunc;
        TimeSpan tsBcp;
        int rows;
        sw.Start();
        using (DataTable dt = DelimitedTextReader.ReadFile(textBox1.Text, "\t"))
        {
            tsRead = sw.Elapsed;
            sw.Reset();
            rows = dt.Rows.Count;
            string connect = @"Data Source=.;Initial Catalog=MyDB;Integrated Security=SSPI";
            using (SqlConnection cn = new SqlConnection(connect))
            using (SqlCommand cmd = new SqlCommand("TRUNCATE TABLE dbo.UploadTable", cn))
            using (SqlBulkCopy bcp = new SqlBulkCopy(cn))
            {
                cn.Open();
                sw.Start();
                cmd.ExecuteNonQuery();
                tsTrunc = sw.Elapsed;
                sw.Reset();

                sw.Start();
                bcp.DestinationTableName = "dbo.UploadTable";
                bcp.ColumnMappings.Add("Column A", "ColumnA");
                bcp.ColumnMappings.Add("Column D", "ColumnD");
                bcp.WriteToServer(dt);
                tsBcp = sw.Elapsed;
                sw.Reset();
            }
        }

        string message = "File read:\t{0}\r\nTruncate:\t{1}\r\nBcp:\t{2}\r\n\r\nTotal time:\t{3}\r\nTotal rows:\t{4}";
        MessageBox.Show(string.Format(message, tsRead, tsTrunc, tsBcp, tsRead + tsTrunc + tsBcp, rows));

回复收藏 0 原文