批量大小如何影响批量插入性能?
我正在 syabse 数据库中进行批量插入,方法是对插入查询进行分组并将其批量发送到数据库,其中批量大小是可配置的,代码看起来有点像这样,
public static void InsertModelValueInBulk(DataSet modelValueData, int clsaId)
{
int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
IList<string> queryBuffer = new List<string>();
using (var connection = GetAseConnection())
{
connection.Open();
var tran = connection.BeginTransaction();
try
{
for (int i = 0; i < modelValueData.Tables[0].Rows.Count; i++)
{
var insertItem = string.Format(@"select '{0}',{1},{2},{3},'{4}','{5}','{6}',{7}", row["ModelValueID"], Convert.ToInt32(row["StockModelID"]), Convert.ToInt32(row["ModelItemID"]),
fyeStr, row["Period"], value, row["UpdatedUser"], clsaId);
queryBuffer.Add(insertItem);
if (queryBuffer.Count % (batchSize) == 0 && queryBuffer.Count > 0)
{
var finalQuery = @"INSERT INTO InsertTable (ModelValueID, StockModelID, ModelItemID, FYE, Period, Value, UpdatedUser,id)
" + String.Join(" union ", queryBuffer.ToArray<string>());
using (var cmd = new AseCommand(finalQuery, connection, tran))
{
cmd.ExecuteNonQuery();
}
queryBuffer.Clear();
}
}
tran.Commit();
}
catch
{
tran.Rollback();
throw;
}
finally
{
tran.Dispose();
}
}
}
使用批量大小观察到的性能与插入 20000 条所需时间的关系形成 J 曲线,样本数据有点像
批量大小 10 =>当批量大小为 50 时,操作在 30 秒内完成=> 20秒、100=>10秒、200=>20秒、500、30秒、1000=>1分钟。
想了解 J 曲线背后的原因是什么。这是否与应用程序服务器内存或某些数据库服务器设置或其他有关?是什么使 100 成为最佳值?可以进一步调整吗?
I am doing bulk insert in syabse database by grouping insert query and sending it to database in batch where size of batch is configurable, the code looks somewhat like this
public static void InsertModelValueInBulk(DataSet modelValueData, int clsaId)
{
int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
IList<string> queryBuffer = new List<string>();
using (var connection = GetAseConnection())
{
connection.Open();
var tran = connection.BeginTransaction();
try
{
for (int i = 0; i < modelValueData.Tables[0].Rows.Count; i++)
{
var insertItem = string.Format(@"select '{0}',{1},{2},{3},'{4}','{5}','{6}',{7}", row["ModelValueID"], Convert.ToInt32(row["StockModelID"]), Convert.ToInt32(row["ModelItemID"]),
fyeStr, row["Period"], value, row["UpdatedUser"], clsaId);
queryBuffer.Add(insertItem);
if (queryBuffer.Count % (batchSize) == 0 && queryBuffer.Count > 0)
{
var finalQuery = @"INSERT INTO InsertTable (ModelValueID, StockModelID, ModelItemID, FYE, Period, Value, UpdatedUser,id)
" + String.Join(" union ", queryBuffer.ToArray<string>());
using (var cmd = new AseCommand(finalQuery, connection, tran))
{
cmd.ExecuteNonQuery();
}
queryBuffer.Clear();
}
}
tran.Commit();
}
catch
{
tran.Rollback();
throw;
}
finally
{
tran.Dispose();
}
}
}
using this the performance observed for batch size vs time taken to insert 20000 forms a J curve, sample data is somewhat like
batch size 10 => Operation completes in 30 sec, when batch size is 50 => 20 sec, 100=>10 sec, 200=>20 sec, 500 30 sec, 1000=>1 min.
Would like to understand what is reason behind this J curve. Is it something to do with app server memory or some database server setting or its something else? What makes 100 optimum and can this be tweaked further?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
BULK insert 在批量大小的持续时间内锁定表。锁有基本的开销,因此小批量不会有那么多好处,但确实会让其他操作在批次之间对表进行。
因此,从某种程度上来说,大批量是好的。因为它是一个事务,所以在当前批次完成之前不会提交数据。这意味着写入日志文件。真正的大批量会导致日志增长,这是 IO 密集型的,它还会增加争用,因为更多的日志将被使用。
沿着这些思路。
编辑:另外两件事
1) 使用参数化输入
2)如果你不做#1,“union”会导致不同的结果。使用“全部联合”
BULK insert locks the table for the duration of the batch size. Locks have a basic overhead, so small batches won't benefit nearly as much, but do let other operations happen against the table in-between batches.
So larger batches are good, to a point. Because it's a transaction, the data is not committed until the current batch is complete. This means writing to the log file. Really large batches will cause the log to grow, which is IO intensive, it also increases contention as more of your log will be in use.
Something along those lines.
edit: Two other things
1) Use parameterized inputs
2) If you don't do #1, "union" causes a distinct. Use "union all"
我看到你现有的代码有很多问题..例如..在你的提交上我不会认为提交总是成功的..
我会包装所有可能失败或爆炸的代码try catch Commits, Rollbacks cmd.Execute
我会查看我的 Select 语句,我个人会创建一个存储过程,如果你不能这样做,我会将选择字符串设置为常量。
我会亲自命名我的交易..但这取决于你
这条线在每次方法调用期间是否有可能改变..
int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
如果不是,我会将其设为静态调用,而不是每次进入方法时都调用它
I see quite a feww Issues with you existing code.. for example.. on your Commit I would not assume that Commits would always be successful..
I would wrap all code that could have the potential to fail or explode around a try catch Commits, Rollbacks cmd.Execute
I would look at my Select statement and personally I would create a stored procedure and if you can't do that I would make the select string a const.
I would name my transactions personally.. but that's up to you
does this line have the potential of changing during every method call..
int batchSize = Convert.ToInt32(ConfigurationManager.AppSettings["BatchSize"].ToString());
if not I would make it a static call and not call it everytime you go into the method