哪个更快:多个单 INSERT 或一个多行 INSERT?

发布于 2024-08-12 06:58:54 字数 83 浏览 10 评论 0原文

我正在尝试优化将数据插入 MySQL 的代码的一部分。我应该链接 INSERT 来制作一个巨大的多行 INSERT 还是多个单独的 INSERT 更快?

I am trying to optimize one part of my code that inserts data into MySQL. Should I chain INSERTs to make one huge multiple-row INSERT or are multiple separate INSERTs faster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

月下凄凉 2024-08-19 06:58:54

https://dev.mysql.com/doc/refman/8.0 /en/insert-optimization.html

插入一行所需的时间由以下因素决定,其中数字表示大致比例:

  • 连接:(3)
  • 向服务器发送查询:(2)
  • 解析查询:(2)
  • 插入行:(1 × 行大小)
  • 插入索引:(1 × 索引数量)
  • 结束:(1)

由此可见发送一条大语句将为每个插入语句节省 7 的开销。这给我们带来了文档,其中指出:

如果要同时从同一客户端插入多行,请使用具有多个 VALUES 列表的 INSERT 语句一次插入多行。这比使用单独的单行 INSERT 语句要快得多(在某些情况下快很多倍)。

https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

  • Connecting: (3)
  • Sending query to server: (2)
  • Parsing query: (2)
  • Inserting row: (1 × size of row)
  • Inserting indexes: (1 × number of indexes)
  • Closing: (1)

From this it is clear that sending one large statement will save you an overhead of 7 per insert statement. This brings us to the documentation that states:

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

浮云落日 2024-08-19 06:58:54

我知道我在这个问题被问到两年半后才回答这个问题,但我只是想提供一些来自我现在正在从事的项目的硬数据,这些数据表明每次插入确实执行多个 VALUE 块<比连续的单个 VALUE 块 INSERT 语句快得多。

我在 C# 中为此基准测试编写的代码使用 ODBC 将数据从 MSSQL 数据源读入内存(约 19,000 行,在开始任何写入之前读取所有行),并使用 MySql .NET 连接器 (Mysql.Data.*) 将数据读取到内存中。通过准备好的语句将内存中的数据插入 MySQL 服务器上的表中。它的编写方式允许我动态调整每个准备好的 INSERT 的 VALUE 块的数量(即一次插入 n 行,我可以在运行之前调整 n 的值。)我还运行了测试每个 n 多次。

执行单个 VALUE 块(例如,一次 1 行)运行需要 5.7 - 5.9 秒。其他值如下:

一次 2 行:3.5 - 3.5 秒
一次 5 行:2.2 - 2.2 秒
一次 10 行:1.7 - 1.7 秒
一次 50 行:1.17 - 1.18 秒
一次 100 行:1.1 - 1.4 秒
一次 500 行:1.1 - 1.2 秒
一次 1000 行:1.17 - 1.17 秒

所以是的,即使只是将 2 或 3 次写入捆绑在一起也能显着提高速度(运行时间缩短 n 倍),直到达到 n = 5 和 n = 10 之间的某个位置,此时改进显着下降,并且在 n = 10 到 n = 50 范围内的某个位置,改进变得可以忽略不计。

希望这可以帮助人们决定 (a) 是否使用 multiprepare 想法,以及 (b) 每个语句创建多少个 VALUE 块(假设您想要处理可能足够大以将查询推过最大查询大小的数据)对于 MySQL,我相信在很多地方默认情况下是 16MB,可能更大或更小,具体取决于服务器上设置的 max_allowed_pa​​cket 的值。)

I know I'm answering this question almost two and a half years after it was asked, but I just wanted to provide some hard data from a project I'm working on right now that shows that indeed doing multiple VALUE blocks per insert is MUCH faster than sequential single VALUE block INSERT statements.

The code I wrote for this benchmark in C# uses ODBC to read data into memory from an MSSQL data source (~19,000 rows, all are read before any writing commences), and the MySql .NET connector (Mysql.Data.*) stuff to INSERT the data from memory into a table on a MySQL server via prepared statements. It was written in such a way as to allow me to dynamically adjust the number of VALUE blocks per prepared INSERT (ie, insert n rows at a time, where I could adjust the value of n before a run.) I also ran the test multiple times for each n.

Doing single VALUE blocks (eg, 1 row at a time) took 5.7 - 5.9 seconds to run. The other values are as follows:

2 rows at a time: 3.5 - 3.5 seconds
5 rows at a time: 2.2 - 2.2 seconds
10 rows at a time: 1.7 - 1.7 seconds
50 rows at a time: 1.17 - 1.18 seconds
100 rows at a time: 1.1 - 1.4 seconds
500 rows at a time: 1.1 - 1.2 seconds
1000 rows at a time: 1.17 - 1.17 seconds

So yes, even just bundling 2 or 3 writes together provides a dramatic improvement in speed (runtime cut by a factor of n), until you get to somewhere between n = 5 and n = 10, at which point the improvement drops off markedly, and somewhere in the n = 10 to n = 50 range the improvement becomes negligible.

Hope that helps people decide on (a) whether to use the multiprepare idea, and (b) how many VALUE blocks to create per statement (assuming you want to work with data that may be large enough to push the query past the max query size for MySQL, which I believe is 16MB by default in a lot of places, possibly larger or smaller depending on the value of max_allowed_packet set on the server.)

柠檬色的秋千 2024-08-19 06:58:54

一个主要因素是您是否使用事务引擎以及是否启用了自动提交。

自动提交默认处于打开状态,您可能希望将其保留为打开状态;因此,您所做的每个插入都会执行自己的事务。这意味着,如果您对每行执行一次插入,那么您将为每一行提交一个事务。

假设是单线程,这意味着服务器需要将每一行的一些数据同步到光盘。它需要等待数据到达持久存储位置(希望是 RAID 控制器中的电池支持的 RAM)。这本质上是相当慢的,并且可能成为这些情况下的限制因素。

我当然假设您正在使用事务引擎(通常是 innodb)并且您没有调整设置以降低耐用性。

我还假设您使用单个线程来执行这些插入。使用多个线程会使事情变得有点混乱,因为某些版本的 MySQL 在 innodb 中具有工作组提交 - 这意味着多个线程执行自己的提交可以共享对事务日志的单个写入,这很好,因为这意味着与持久存储的同步更少。

另一方面,结果是,您真的想使用多行插入。

存在一个限制,超过该限制就会产生反作用,但在大多数情况下,它至少为 10,000 行。因此,如果您将它们批量处理最多 1,000 行,那么您可能是安全的。

如果您使用 MyISAM,还有很多其他的事情,但我不会让您厌烦这些。和平。

A major factor will be whether you're using a transactional engine and whether you have autocommit on.

Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.

Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.

I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.

I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.

On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.

There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.

If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.

轮廓§ 2024-08-19 06:58:54

以下是我做的一个小型 PHP 工作台的结果:

我尝试使用 PHP 8.0、MySQL 8.1 (mysqli) 以 3 种不同的方式插入 3000 条记录

多个插入查询,具有多个事务:

$start = microtime(true);
for($i = 0; $i < 3000; $i++)
{
    mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
$end = microtime(true);
echo "Took " . ($end - $start) . " s\n";

做了 5 次,平均:< strong>11.132s (+/- 0.6s)

多个插入查询,单个事务:

$start = microtime(true);
mysqli_begin_transaction($res, MYSQLI_TRANS_START_READ_WRITE);
for($i = 0; $i < 3000; $i++)
{
    mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
mysqli_commit($res);
$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";

5 次测试的结果:0.48s (+/- 0.04s)

单个聚合插入查询

$start = microtime(true);

$values = "";

for($i = 0; $i < 3000; $i++)
{
    $values .= "(null,now(), 'msg : $i','callstack','user','debug_speed','vars')";
    if($i !== 2999)
        $values .= ",";
}
mysqli_query($res, "insert into app__debuglog VALUES $values");

$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";

结果5 次测试:0.085s (+/- 0.05s)

因此,对于 3000 行插入,看起来像:

  • 在单个写入事务中使用多个查询比使用多个查询进行多个查询快约 22 倍每个插入的事务。
  • 使用单个聚合插入语句仍然比使用单个写入事务的多个查询快约 6 倍

Here are the results of a little PHP bench I did :

I'm trying to insert 3000 records in 3 different ways, using PHP 8.0, MySQL 8.1 (mysqli)

Multiple insert queries, with multiple transaction :

$start = microtime(true);
for($i = 0; $i < 3000; $i++)
{
    mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
$end = microtime(true);
echo "Took " . ($end - $start) . " s\n";

Did it 5 times, average : 11.132s (+/- 0.6s)

Multiple insert queries, with single transaction :

$start = microtime(true);
mysqli_begin_transaction($res, MYSQLI_TRANS_START_READ_WRITE);
for($i = 0; $i < 3000; $i++)
{
    mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
mysqli_commit($res);
$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";

Result with 5 tests : 0.48s (+/- 0.04s)

Single aggregated insert query

$start = microtime(true);

$values = "";

for($i = 0; $i < 3000; $i++)
{
    $values .= "(null,now(), 'msg : $i','callstack','user','debug_speed','vars')";
    if($i !== 2999)
        $values .= ",";
}
mysqli_query($res, "insert into app__debuglog VALUES $values");

$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";

Result with 5 tests : 0.085s (+/- 0.05s)

So, for a 3000 row insert, looks like :

  • Using multiple queries in a single write transaction is ~22 times faster than making a multiple queries with multiple transactions for each insert.
  • Using a single aggregated insert statement is still ~6 times faster than using multiple queries with a single write transaction
待"谢繁草 2024-08-19 06:58:54

一次通过线路发送尽可能多的插入内容。实际插入速度应该是相同的,但您会看到网络开销减少带来的性能提升。

Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.

合久必婚 2024-08-19 06:58:54

一般来说,对数据库的调用次数越少越好(意味着更快、更高效),因此尝试以最小化数据库访问的方式对插入进行编码。请记住,除非您使用连接池,否则每次数据库访问都必须创建连接,执行sql,然后拆除连接。相当大的开销!

In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!

虐人心 2024-08-19 06:58:54

我刚刚做了一个小基准测试,看来对于很多行来说它并没有更快。这是我插入 280 000 行的结果:

  • by 10 000 : 164.96 秒
  • by 5 000 : 37 秒
  • by 1000 : 12.56 秒
  • by 600 : 12.59 秒
  • by 500 : 13.81 秒
  • by 250 : 17.96 秒
  • by 400 : 14.75 秒
  • by 10 0 : 27 秒

它看来 1000 x 1000 是最好的选择。

I just did a small benchmark and it appears that for a lot of line it's not faster. Here my result to insert 280 000 rows :

  • by 10 000 : 164.96 seconds
  • by 5 000 : 37seconds
  • by 1000 : 12.56 seconds
  • by 600 : 12.59 seconds
  • by 500 : 13.81 seconds
  • by 250 : 17.96 seconds
  • by 400 : 14.75 seconds
  • by 100 : 27seconds

It appears that 1000 by 1000 is the best choice.

别闹i 2024-08-19 06:58:54

您可能想要:

  • 检查自动提交是否已关闭
  • 打开连接
  • 在单个事务中发送多批插入(大小约为 4000-10000 行?你看)
  • 关闭连接

取决于服务器的扩展程度(它绝对可以使用 < code>PostgreSQl、OracleMSSQL),使用多个线程和多个连接执行上述操作。

You might want to :

  • Check that auto-commit is off
  • Open Connection
  • Send multiple batches of inserts in a single transaction (size of about 4000-10000 rows ? you see)
  • Close connection

Depending on how well your server scales (its definitively ok with PostgreSQl, Oracle and MSSQL), do the thing above with multiple threads and multiple connections.

三岁铭 2024-08-19 06:58:54

一般来说,由于连接开销,多次插入会变慢。一次执行多个插入将减少每次插入的开销成本。

根据您使用的语言,您可以在访问数据库之前用编程/脚本语言创建一个批处理并将每个插入添加到该批处理中。然后,您将能够使用一个连接操作执行一大批操作。 这里是 Java 中的一个示例。

In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.

Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here's an example in Java.

紫南 2024-08-19 06:58:54

MySQL 5.5
一条 SQL 插入语句花费了大约 300 到大约 450 毫秒。
而以下统计数据适用于内联多个插入语句。

(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time  : 00:00:00:000
Total Time     : 00:00:03:343

我想说内联是一种可行的方法:)

MYSQL 5.5
One sql insert statement took ~300 to ~450ms.
while the below stats is for inline multiple insert statments.

(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time  : 00:00:00:000
Total Time     : 00:00:03:343

I would say inline is way to go :)

漆黑的白昼 2024-08-19 06:58:54

Mysql 和 MariaDB 在插入方面的优化有多糟糕,真是可笑。
我测试了 mysql 5.7 和 mariadb 10.3,没有真正的区别。

我已经在具有 NVME 磁盘、70,000 IOPS、1.1 GB/秒 seq 吞吐量的服务器上对此进行了测试,并且可以实现全双工(读取和写入)。
该服务器也是高性能服务器。
给它 20 GB 的内存。
数据库完全是空的。

在进行多行插入时,我收到的速度是每秒 5000 次插入(尝试使用 1MB 到 10MB 的数据块)

现在是线索:
如果我添加另一个线程并插入到相同的表中,我突然有 2x5000 /秒。
多一个线程,我总共有 15000 个/秒

考虑一下:当执行一个线程插入时,这意味着您可以按顺序写入磁盘(索引除外)。
使用线程时,实际上会降低可能的性能,因为它现在需要进行更多的随机访问。
但现实检查表明 mysql 的优化非常糟糕,以至于线程有很大帮​​助。

这种服务器的实际性能可能是每秒数百万,CPU 空闲,磁盘空闲。
原因很明显,mariadb 就像 mysql 一样有内部延迟。

It's ridiculous how bad Mysql and MariaDB are optimized when it comes to inserts.
I tested mysql 5.7 and mariadb 10.3, no real difference on those.

I've tested this on a server with NVME disks, 70,000 IOPS, 1.1 GB/sec seq throughput and that's possible full duplex (read and write).
The server is a high performance server as well.
Gave it 20 GB of ram.
The database completely empty.

The speed I receive was 5000 inserts per second when doing multi row inserts (tried it with 1MB up to 10MB chunks of data)

Now the clue:
If I add another thread and insert into the SAME tables I suddenly have 2x5000 /sec.
One more thread and I have 15000 total /sec

Consider this: When doing ONE thread inserts it means you can sequentially write to the disk (with exceptions to indexes).
When using threads you actually degrade the possible performance because it now needs to do a lot more random accesses.
But reality check shows mysql is so badly optimized that threads help a lot.

The real performance possible with such a server is probably millions per second, the CPU is idle the disk is idle.
The reason is quite clearly that mariadb just as mysql has internal delays.

厌味 2024-08-19 06:58:54

我会添加以下信息:一次过多的行(取决于其内容)可能会导致 得到的数据包大于“max_allowed_pa​​cket”

也许考虑使用像 PHP 的 array_chunk 这样的函数来为您的大文件执行多次插入数据集。

I would add the information that too many rows at a time depending on their contents could lead to Got a packet bigger than 'max_allowed_packet'.

Maybe consider using functions like PHP's array_chunk to do multiple inserts for your big datasets.

我不是你的备胎 2024-08-19 06:58:54

多次插入速度更快,但它应该有线程。另一个要点是临时禁用约束检查,使插入速度更快。你的桌子上有没有它并不重要。例如,测试禁用外键并享受速度:

SET FOREIGN_KEY_CHECKS=0;

当然,您应该在插入后将其重新打开:

SET FOREIGN_KEY_CHECKS=1;

这是插入大量数据的常见方法。
数据完整性可能会被破坏,因此您应该在禁用外键检查之前注意这一点。

multiple inserts are faster but it has thredshould. another thrik is disabling constrains checks temprorary make inserts much much faster. It dosn't matter your table has it or not. For example test disabling foreign keys and enjoy the speed:

SET FOREIGN_KEY_CHECKS=0;

offcourse you should turn it back on after inserts by:

SET FOREIGN_KEY_CHECKS=1;

this is common way to inserting huge data.
the data integridity may break so you shoud care of that before disabling foreign key checks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文