分叉 MySQL INSERT INTO (InnoDB)
我正在尝试将大约 5 亿行垃圾数据插入数据库进行测试。现在,我有一个 PHP 脚本循环遍历 TRANSACTION
内的几个 SELECT/INSERT
语句——显然这不是最好的解决方案。这些表是 InnoDB(行级锁定)。
我想知道我是否(正确)分叉了该进程,这会加快 INSERT
进程吗?按照目前的进度,需要 140 个小时才能完成。我担心两件事:
如果 INSERT 语句必须获取写锁,那么它是否会使分叉变得无用,因为多个进程无法同时写入同一个表?
我正在使用
SELECT...LAST_INSERT_ID()
(在TRANSACTION
内)。当多个进程插入数据库时,这个逻辑会中断吗?我可以为每个分叉创建一个新的数据库连接,所以我希望这可以避免这个问题。我应该使用多少个进程?查询本身很简单,我有一个带有 2GB RAM 的常规双核开发盒。我将 InnoDB 设置为使用 8 个线程 (
innodb_thread_concurrency=8
),但我不确定是否应该使用 8 个进程,或者这是否是考虑匹配的正确方法。
感谢您的帮助!
I'm trying to insert about 500 million rows of garbage data into a database for testing. Right now I have a PHP script looping through a few SELECT/INSERT
statements each inside a TRANSACTION
-- clearly this isn't the best solution. The tables are InnoDB (row-level locking).
I'm wondering if I (properly) fork the process, will this speed up the INSERT
process? At the rate it's going, it will take 140 hours to complete. I'm concerned about two things:
If
INSERT
statements must acquire a write lock, then will it render forking useless, since multiple processes can't write to the same table at the same time?I'm using
SELECT...LAST_INSERT_ID()
(inside aTRANSACTION
). Will this logic break when multiple processes areINSERT
ing into the database? I could create a new database connection for each fork, so I hope this would avoid the problem.How many processes should I be using? The queries themselves are simple, and I have a regular dual-core dev box with 2GB RAM. I set up my InnoDB to use 8 threads (
innodb_thread_concurrency=8
), but I'm not sure if I should be using 8 processes or if this is even a correct way to think about matching.
Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
MySQL 文档有关于高效插入的讨论大量记录。看来明显的赢家是使用 LOAD DATA INFILE 命令,后跟插入多个值列表的插入。
The MySQL documentation has a discussion on efficient insertion of a large number of records. It seems that the clear winner is usage of the LOAD DATA INFILE command, followed by inserts that insert multiple values lists.
1)是的,会有锁争用,但是innodb被设计用来处理多个线程尝试插入。当然,它们不会同时插入,但它会为您处理序列化插入。只需确保您明确关闭交易并尽快完成即可。这将确保您获得最佳的刀片性能。
2) 不,只要每个线程有 1 个连接,这个逻辑就不会中断,因为 last_insert_id() 是特定于连接的。
3)这是您只需要进行基准测试就能弄清楚的事情之一。实际上,我会让程序自我调整。使用 8 个线程运行 100 次插入并记录执行时间。然后用一半和两倍的数量再试一次。无论哪个更快,然后围绕该数字对更多线程计数值进行基准测试。
一般来说,您应该始终继续对此类内容进行基准测试,看看哪个更快。如果你花时间思考并写下来,你可能已经有了初步的数字。
1) yes, there will be lock contention, but innodb is designed to handle multiple threads trying to insert. sure, they won't simultaneously insert, but it will handle serializing the inserts for you. just make sure you specifically close your transactions and you do it ASAP. this will ensure you get the best possible insert performance.
2) no, this logic will not break provided you have 1 connection per thread, since last_insert_id() is connection specific.
3) this is one of those things that you just need to benchmark to figure out. actually, i would make the program self-adjust. run 100 inserts with 8 threads and record the execution times. then try again with half as many and twice as many. whichever one is faster, then benchmark more thread count values around that number.
in general, you should always just go ahead and benchmark this kind of stuff to see which is faster. in the amount of time it takes you to think about it and write it up, you could probably already have preliminary numbers.