SqlBulkCopy 像糖蜜一样慢

发布于 2024-08-30 02:14:29 字数 2064 浏览 5 评论 0原文

我正在寻找通过 C# 加载批量数据的最快方法。我有这个脚本可以完成这项工作,但速度很慢。我读到的证词表明 SqlBulkCopy 是最快的。
1000条记录2.5秒。文件包含接近 5000 到 250k 的记录 有哪些事情可以减慢它的速度?

表定义:

CREATE TABLE [dbo].[tempDispositions](
    [QuotaGroup] [varchar](100) NULL,
    [Country] [varchar](50) NULL,
    [ServiceGroup] [varchar](50) NULL,
    [Language] [varchar](50) NULL,
    [ContactChannel] [varchar](10) NULL,
    [TrackingID] [varchar](20) NULL,
    [CaseClosedDate] [varchar](25) NULL,
    [MSFTRep] [varchar](50) NULL,
    [CustEmail] [varchar](100) NULL,
    [CustPhone] [varchar](100) NULL,
    [CustomerName] [nvarchar](100) NULL,
    [ProductFamily] [varchar](35) NULL,
    [ProductSubType] [varchar](255) NULL,
    [CandidateReceivedDate] [varchar](25) NULL,
    [SurveyMode] [varchar](1) NULL,
    [SurveyWaveStartDate] [varchar](25) NULL,
    [SurveyInvitationDate] [varchar](25) NULL,
    [SurveyReminderDate] [varchar](25) NULL,
    [SurveyCompleteDate] [varchar](25) NULL,
    [OptOutDate] [varchar](25) NULL,
    [SurveyWaveEndDate] [varchar](25) NULL,
    [DispositionCode] [varchar](5) NULL,
    [SurveyName] [varchar](20) NULL,
    [SurveyVendor] [varchar](20) NULL,
    [BusinessUnitName] [varchar](25) NULL,
    [UploadId] [int] NULL,
    [LineNumber] [int] NULL,
    [BusinessUnitSubgroup] [varchar](25) NULL,
    [FileDate] [datetime] NULL
) ON [PRIMARY]

这是代码

    private void BulkLoadContent(DataTable dt)
    {
        OnMessage("Bulk loading records to temp table");
        OnSubMessage("Bulk Load Started");
        using (SqlBulkCopy bcp = new SqlBulkCopy(conn))
        {
            bcp.DestinationTableName = "dbo.tempDispositions";
            bcp.BulkCopyTimeout = 0;
            foreach (DataColumn dc in dt.Columns)
            {
                bcp.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
            }
            bcp.NotifyAfter = 2000;
            bcp.SqlRowsCopied += new SqlRowsCopiedEventHandler(bcp_SqlRowsCopied);
            bcp.WriteToServer(dt);
            bcp.Close();
        }
    }

I'm looking for the fastest way to load bulk data via c#. I have this script that does the job but slow. I read testimonies that SqlBulkCopy is the fastest.
1000 records 2.5 seconds. files contain anywhere near 5000 records to 250k
What are some of the things that can slow it down?

Table Def:

CREATE TABLE [dbo].[tempDispositions](
    [QuotaGroup] [varchar](100) NULL,
    [Country] [varchar](50) NULL,
    [ServiceGroup] [varchar](50) NULL,
    [Language] [varchar](50) NULL,
    [ContactChannel] [varchar](10) NULL,
    [TrackingID] [varchar](20) NULL,
    [CaseClosedDate] [varchar](25) NULL,
    [MSFTRep] [varchar](50) NULL,
    [CustEmail] [varchar](100) NULL,
    [CustPhone] [varchar](100) NULL,
    [CustomerName] [nvarchar](100) NULL,
    [ProductFamily] [varchar](35) NULL,
    [ProductSubType] [varchar](255) NULL,
    [CandidateReceivedDate] [varchar](25) NULL,
    [SurveyMode] [varchar](1) NULL,
    [SurveyWaveStartDate] [varchar](25) NULL,
    [SurveyInvitationDate] [varchar](25) NULL,
    [SurveyReminderDate] [varchar](25) NULL,
    [SurveyCompleteDate] [varchar](25) NULL,
    [OptOutDate] [varchar](25) NULL,
    [SurveyWaveEndDate] [varchar](25) NULL,
    [DispositionCode] [varchar](5) NULL,
    [SurveyName] [varchar](20) NULL,
    [SurveyVendor] [varchar](20) NULL,
    [BusinessUnitName] [varchar](25) NULL,
    [UploadId] [int] NULL,
    [LineNumber] [int] NULL,
    [BusinessUnitSubgroup] [varchar](25) NULL,
    [FileDate] [datetime] NULL
) ON [PRIMARY]

and here's the code

    private void BulkLoadContent(DataTable dt)
    {
        OnMessage("Bulk loading records to temp table");
        OnSubMessage("Bulk Load Started");
        using (SqlBulkCopy bcp = new SqlBulkCopy(conn))
        {
            bcp.DestinationTableName = "dbo.tempDispositions";
            bcp.BulkCopyTimeout = 0;
            foreach (DataColumn dc in dt.Columns)
            {
                bcp.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
            }
            bcp.NotifyAfter = 2000;
            bcp.SqlRowsCopied += new SqlRowsCopiedEventHandler(bcp_SqlRowsCopied);
            bcp.WriteToServer(dt);
            bcp.Close();
        }
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

戏舞 2024-09-06 02:14:29

该表上有任何索引、触发器或约束吗?

这将导致插入速度减慢——尤其是聚集索引会受到损害。
当您正在处理大量数据时,最好先删除索引,然后再重新应用它们。

关于它的一个好帖子在这里:
在SQL Server(C#客户端)中批量插入大量数据最快的方法是什么

Do you have any indexes, triggers or constraints on that table?

That will cause slowdowns on insert - especially a clustered index would hurt.
When blasting the amounts of data you're doing, it's best to drop indexes first, and re-apply them afterwards.

A good post about it is here:
What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

一抹微笑 2024-09-06 02:14:29

如果您有大量数据,将批量大小设置为相当大的数字可能会有所帮助:

bcp.BatchSize = 10000;

If you have lots of data, setting the batchsize to a reasonably large number might help:

bcp.BatchSize = 10000;
愿与i 2024-09-06 02:14:29

可能会减慢批量复制速度的因素:
-表格全文索引
-插入时触发
-外键约束

Things that can slow down the bulk copy :
-Full text indexes on the table
-Triggers on Insert
-Foreign-Key constraints

小姐丶请自重 2024-09-06 02:14:29

我注意到,尝试刷新大型数据集最初要快得多,但随着时间的推移,速度会大大减慢。我发现使用缓冲方法可以适度提高性能,在同一连接下一次只提供批量复制几千条记录。随着时间的推移,它似乎会降低每批事务的时间,这(随着时间的推移)会提高性能。在我的解决方案中,我注意到,相同的未缓冲方法将在该方法将大约 7,500,000 条相同类型的记录保存到同一数据库的时间内保存大约 5,000,000 条记录。
希望这对某人有帮助。

public void flush_DataTable(DataTable dt, string tableName)//my incoming DTs have a million or so each and slow down over time to nothing. This helps.
    {  int bufferSize = 10000;
        int bufferHigh = bufferSize;
        int lowBuffer = 0;
        if (dt.Rows.Count >= bufferSize)
        {  using (SqlConnection conn = getConn())
            {   conn.Open();
                while (bufferHigh < dt.Rows.Count)
                {
                    using (SqlBulkCopy s = new SqlBulkCopy(conn))
                    {   s.BulkCopyTimeout = 900;
                        s.DestinationTableName = tableName;
                        s.BatchSize = bufferSize;

                        s.EnableStreaming = true;
                        foreach (var column in dt.Columns)
                            s.ColumnMappings.Add(column.ToString(), column.ToString());
                        DataTable bufferedTable = dt.Clone();
                        for (int bu = lowBuffer; bu < bufferHigh; bu++)
                        {
                            bufferedTable.ImportRow(dt.Rows[bu]);
                        }
                        s.WriteToServer(bufferedTable);
                        if (bufferHigh == dt.Rows.Count)
                        {
                            break;
                        }
                        lowBuffer = bufferHigh;
                        bufferHigh += bufferSize;

                        if (bufferHigh > dt.Rows.Count)
                        {
                            bufferHigh = dt.Rows.Count;
                        }
                    }
                }
                conn.Close();
            }
        }
        else
        {
            flushDataTable(dt, tableName);//perofrm a non-buffered flush (could just as easily flush the buffer here bu I already had the other method 
        }
    }

I've noticed that trying to flush large datasets is initially much faster, but slows down substantially over time. I've found a modest increase in performance using a buffered approach, feeding bulkcopy just a few thousand records at a time under the same connection. It seems to keep the per-batch transaction time down over time, which (over time), improves performance. On my solution, I've noted that the same method un-buffered will save about 5,000,000 records in the time it takes this method to save about 7,500,000 records of the same type to the same DB.
Hope this helps someone.

public void flush_DataTable(DataTable dt, string tableName)//my incoming DTs have a million or so each and slow down over time to nothing. This helps.
    {  int bufferSize = 10000;
        int bufferHigh = bufferSize;
        int lowBuffer = 0;
        if (dt.Rows.Count >= bufferSize)
        {  using (SqlConnection conn = getConn())
            {   conn.Open();
                while (bufferHigh < dt.Rows.Count)
                {
                    using (SqlBulkCopy s = new SqlBulkCopy(conn))
                    {   s.BulkCopyTimeout = 900;
                        s.DestinationTableName = tableName;
                        s.BatchSize = bufferSize;

                        s.EnableStreaming = true;
                        foreach (var column in dt.Columns)
                            s.ColumnMappings.Add(column.ToString(), column.ToString());
                        DataTable bufferedTable = dt.Clone();
                        for (int bu = lowBuffer; bu < bufferHigh; bu++)
                        {
                            bufferedTable.ImportRow(dt.Rows[bu]);
                        }
                        s.WriteToServer(bufferedTable);
                        if (bufferHigh == dt.Rows.Count)
                        {
                            break;
                        }
                        lowBuffer = bufferHigh;
                        bufferHigh += bufferSize;

                        if (bufferHigh > dt.Rows.Count)
                        {
                            bufferHigh = dt.Rows.Count;
                        }
                    }
                }
                conn.Close();
            }
        }
        else
        {
            flushDataTable(dt, tableName);//perofrm a non-buffered flush (could just as easily flush the buffer here bu I already had the other method 
        }
    }
本王不退位尔等都是臣 2024-09-06 02:14:29

设置 BatchSize 或 NotifyAfter 都会导致速度变慢。

如果您有数据读取器,则可以将其与行计数数据读取器包装在一起以获取进度。

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
    bulkCopy.EnableStreaming = true;
    bulkCopy.BulkCopyTimeout = 0;
    using (var reader = new MyRowCountDataReader(dataReader))
    {
        reader.NotifyAfter = 1000;
        reader.RowRead += (sender, args) => { ... };
        bulkCopy.WriteToServer(reader);
    }
}

Setting BatchSize or NotifyAfter will both cause a slow down.

If you have a data reader, you can wrap it with a row-counting data reader to get the progress.

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
    bulkCopy.EnableStreaming = true;
    bulkCopy.BulkCopyTimeout = 0;
    using (var reader = new MyRowCountDataReader(dataReader))
    {
        reader.NotifyAfter = 1000;
        reader.RowRead += (sender, args) => { ... };
        bulkCopy.WriteToServer(reader);
    }
}
剧终人散尽 2024-09-06 02:14:29

我刚刚有过类似的经历。 SqlBulkCopy 对于一个表来说没问题,但对于另一个表来说,复制相同数量的记录至少要慢 10 倍。

慢速表确实有非聚集索引和触发器,但禁用它们并没有产生任何明显的区别。

事实证明,慢速表有六个 NVARCHAR(MAX) 列。我实际插入的数据都相当短,因此我将列更改为 NVARCHAR(500)

瞧瞧!慢速表的性能得到改善,可以与其他表相匹配 - 速度至少提高 10 倍。

I just had a similar experience. SqlBulkCopy was fine with one table, but with another it was at least 10x slower to copy the same number of records.

The slow table did have non-clustered indexes and a trigger, but disabling them didn't make any appreciable difference.

It turns out the slow table had six NVARCHAR(MAX) columns. The data I was actually inserting was all fairly short, so I changed the columns to NVARCHAR(500).

Et voilà! The performance of the slow table improved to match the other table - at least a 10x speed improvement.

丑丑阿 2024-09-06 02:14:29

我在这里建议的 IDataReader 实现 如何实现 IDataReader? 也许对您有帮助。我将它与 SqlBulkCopy 一起使用,如下所示:

using (MyFileDataReader reader = new MyFileDataReader(@"C:\myfile.txt"))
 {
      SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
      bulkCopy.DestinationTableName = "[my_table]";
      bulkCopy.BatchSize = 10000;

      bulkCopy.WriteToServer(reader);

      bulkCopy.Close();

 } 

The IDataReader implementation I sugested here How to implement IDataReader? maybe helps you. I used it with SqlBulkCopy as follows:

using (MyFileDataReader reader = new MyFileDataReader(@"C:\myfile.txt"))
 {
      SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
      bulkCopy.DestinationTableName = "[my_table]";
      bulkCopy.BatchSize = 10000;

      bulkCopy.WriteToServer(reader);

      bulkCopy.Close();

 } 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文