将数千行数据插入 Azure SQL DB 的最高效方法？

发布于 2025-01-14 14:19:23 字数 972 浏览 1 评论 0原文

我正在开发一个导入器，它生成需要进入 Azure SQL DB 的数据（配置为具有 4vCore 的无服务器）。

我插入的对象只有几列。

Id : INT
Type : NVARCHAR
Uid : NVARCHAR
Json : NVARCHAR

Json 数据的平均大小约为 1.5kb。我每次运行导入大约 300 万行。

我当前的方法是每插入 2000 行使用一个事务（代码摘录如下）：

using var transaction = sqlConnection.BeginTransaction (System.Data.IsolationLevel.ReadUncommitted);

cmd = new SqlCommand (insertNodeQuery, sqlConnection) {
    CommandTimeout = 600
};

cmd.Transaction = transaction;
cmd.Parameters.AddWithValue ("@Type", node.Type);
[...]

var insertTask = cmd.ExecuteNonQueryAsync ();
tasks.Add (insertTask);

然后我使用 Task.WhenAll(tasks) 等待事务完成。

在我的 PC 上，这可以让我每秒对 (localdb) 进行 20 次插入。然而，对于 Azure DB，我每秒只看到大约 5 次插入。据我了解，这会涉及网络延迟，但即使在同一区域的 Azure 数据中心 (AKS) 中运行代码，我也无法接近本地数据库的速度。

这让我想知道是否有更好的方式来运行导入？如果我将生成的内存数据发送到数据库，SqlBulkCopy 是否是一个选项？

我还能如何优化性能？

原文

I'm working on an importer that generates data which needs to go into an Azure SQL DB (configured as serverless with 4vCores).

The objects I insert have only a handful of columns

Id : INT
Type : NVARCHAR
Uid : NVARCHAR
Json : NVARCHAR

The Json data is about 1.5kb in size on average. I'm importing around 3 million rows per run.

My current approach is to use one transaction per 2000 rows to be inserted (code excerpt below):

using var transaction = sqlConnection.BeginTransaction (System.Data.IsolationLevel.ReadUncommitted);

cmd = new SqlCommand (insertNodeQuery, sqlConnection) {
    CommandTimeout = 600
};

cmd.Transaction = transaction;
cmd.Parameters.AddWithValue ("@Type", node.Type);
[...]

var insertTask = cmd.ExecuteNonQueryAsync ();
tasks.Add (insertTask);

Then I use Task.WhenAll(tasks) to wait for the transaction to be completed.

On my PC, this gives me 20 inserts per second against (localdb). However, against Azure DB I'm only seeing around 5 inserts per second. I understand, there's network latency involved but even when running the code in an Azure Data Center (AKS) in the same region I'm not getting near the speed of the local DB.

This makes me wonder if there's a better way of running the import? Would SqlBulkCopy be an option, given I'm sending generated in-memory data to the DB?

How else can I optimize the performance?

分享到QQ

分享到微博