当索引不适合 key_buffer 时快速 MySQL 批量加载
这里有一个问题是如何正确配置 mysql (myisam) 以便快速执行批量插入 (load data infile)。
有 6 GB 文本文件需要导入,1500 万行,16 列(一些 int、一些 varchar(255)、1 个 varchar(40)、1 个 char(1)、一些日期时间、1 个mediumtext)。
相对 my.conf 设置:
key_buffer = 800M
max_allowed_packet = 160M
thread_cache_size = 80
myisam_sort_buffer_size = 400M
bulk_insert_buffer_size = 400M
delay_key_write = ON
delayed_insert_limit = 10000
共有三个索引 - 一个主索引 (autincrement int)、一个唯一 int 和一个唯一 varchar(40)。
问题是,执行 load data infile 命令后,前 3 GB 数据快速导入(基于 table.myd 大小的增加 - 5-8 mb/s),但超过 3020 Mb 时限制了导入速度大大减少 - table.myd 的大小增长了 0.5mb/s。我注意到,当 Key_blocks_unused
耗尽到零时,导入过程会变慢。这些是 mysql> 的输出在导入开始时显示类似 '%key%';
的状态:
mysql> show status like '%key%';
+------------------------+---------+
| Variable_name | Value |
+------------------------+---------+
| Com_preload_keys | 0 |
| Com_show_keys | 0 |
| Handler_read_key | 0 |
| Key_blocks_not_flushed | 57664 |
| Key_blocks_unused | 669364 |
| Key_blocks_used | 57672 |
| Key_read_requests | 7865321 |
| Key_reads | 57672 |
| Key_write_requests | 2170158 |
| Key_writes | 4 |
+------------------------+---------+
10 rows in set (0.00 sec)
这就是 3020Mb 限制之后的样子,即当 key_blocks_unused
降至零时,此时批量插入过程变得非常慢:
mysql> show status like '%key%';
+------------------------+-----------+
| Variable_name | Value |
+------------------------+-----------+
| Com_preload_keys | 0 |
| Com_show_keys | 0 |
| Handler_read_key | 0 |
| Key_blocks_not_flushed | 727031 |
| Key_blocks_unused | 0 |
| Key_blocks_used | 727036 |
| Key_read_requests | 171275179 |
| Key_reads | 1163091 |
| Key_write_requests | 41181024 |
| Key_writes | 436095 |
+------------------------+-----------+
10 rows in set (0.00 sec)
据我了解,问题非常清楚 - 索引存储在缓存中,但是一旦缓存填满,索引就会一一写入磁盘,这很慢,因此所有过程都会变慢向下。如果我禁用基于 varchar(40) 列的唯一索引,因此所有索引都适合 Key_blocks_used
(我猜这是直接依赖于 key_buffer
的变量,不是不是吗?),全部批量导入成功。所以,我很好奇,如何让mysql一次性将所有Key_blocks_used
数据放入磁盘,并释放Key_blocks_used
?我知道它可能会进行一些即时排序,但我仍然认为它应该可以进行一些缓存的 RAM 磁盘同步,以便成功管理索引,即使它们并不全部适合内存缓存。所以我的问题是“如何配置 mysql,以便批量插入可以避免在(几乎)每个索引上写入磁盘,即使所有索引都不适合缓存?”最后也是最重要的-delay_key_write对于给定表设置为 1,但与禁用时相比,它没有增加任何加速。
提前感谢您的任何想法、想法、解释和 RTM! (:
还有一个小问题 - 在 Key_blocks_unused
达到 0 之前,我如何计算有多少 varchar(40) 索引适合缓存?
PS 使用 $myisamchk --keys-used= 禁用索引0 -rq /path/to/db/tbl_name
然后使用 $myisamchk -rq /path/to/db/tbl_name
重新启用它们,如 Mysql docs 是一个已知的解决方案,它可以工作,但仅当批量时 -插入空表时,当表中已经有一些数据时,索引唯一性检查是必要的,因此禁用索引不是解决方案。
have an issue here of how to configure mysql (myisam) properly for the bulk insert (load data infile) to be performed fast.
There is 6 Gb text file to be imported, 15 mln rows, 16 columns (some int, some varchar(255), one varchar(40), one char(1) some datetime, one mediumtext).
relative my.conf settings:
key_buffer = 800M
max_allowed_packet = 160M
thread_cache_size = 80
myisam_sort_buffer_size = 400M
bulk_insert_buffer_size = 400M
delay_key_write = ON
delayed_insert_limit = 10000
There are three indexes - one primary (autincrement int), one unique int and one unique varchar(40).
The problem is that after executing the load data infile command, the first 3 gigs of data are imported quickly (based on the increasing size of table.myd - 5-8 mb/s), but uppon crossing the 3020 Mb limit the import speed decreases greatly - the size of table.myd is growing 0,5mb/s. I've noticed, that the import process slows down upon the Key_blocks_unused
gets drained to zero. These are the output of mysql> show status like '%key%';
in the beginning of import:
mysql> show status like '%key%';
+------------------------+---------+
| Variable_name | Value |
+------------------------+---------+
| Com_preload_keys | 0 |
| Com_show_keys | 0 |
| Handler_read_key | 0 |
| Key_blocks_not_flushed | 57664 |
| Key_blocks_unused | 669364 |
| Key_blocks_used | 57672 |
| Key_read_requests | 7865321 |
| Key_reads | 57672 |
| Key_write_requests | 2170158 |
| Key_writes | 4 |
+------------------------+---------+
10 rows in set (0.00 sec)
and this is what how it looks after the 3020Mb limit, i.e. when key_blocks_unused
gets down to zero, and that's when the bulk insert process get really slow:
mysql> show status like '%key%';
+------------------------+-----------+
| Variable_name | Value |
+------------------------+-----------+
| Com_preload_keys | 0 |
| Com_show_keys | 0 |
| Handler_read_key | 0 |
| Key_blocks_not_flushed | 727031 |
| Key_blocks_unused | 0 |
| Key_blocks_used | 727036 |
| Key_read_requests | 171275179 |
| Key_reads | 1163091 |
| Key_write_requests | 41181024 |
| Key_writes | 436095 |
+------------------------+-----------+
10 rows in set (0.00 sec)
The problem is pretty clear, to my understanding - indexes are being stored in cache, but once the cache fills in, the indexes get written to disk one by one, which is slow, therefore all the process slows down. If i disable the unique index based on varchar(40) column and, therefore, all the indexes fit into Key_blocks_used
(i guess this is the variable directly dependant on key_buffer
, isn't it?), all the bulk import is successfull. So, i'm curious, how to make mysql put all the Key_blocks_used
data to disk at once, and free up the Key_blocks_used
?. I understand that it might be doing some sorting on-the-fly, but still, i guess it should be available to do some cached RAM-disk synchronization in order to successfully manage indexes even when they don't all fit into the memory cache. So my question is "how to configure mysql so that bulk inserting would avoid writing to disk on (almost)each index, even when all indexes don't fit into a cache?" last not least - delay_key_write is set to 1 for a given table, though it didn't add any speed-up, in comparison to when it was disabled.
Thanks for any thoughts, ideas, explanations and RTMs in advance ! (:
One more little question - how would i calculate how many varchar(40) indexes would fit into cache before Key_blocks_unused
gets to 0?
P.S. disabling indexes with $myisamchk --keys-used=0 -rq /path/to/db/tbl_name
and then re-enabling them with $myisamchk -rq /path/to/db/tbl_name
, as described in Mysql docs is a known solution, which works, but only when bulk-inserting into an empty table. When there are some data in a table already, the index uniqueness checking is necessary, therefore disabling indexes is not a solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当您使用“load data infile”导入数据时,我认为 mysql 会逐一执行插入,并且每次插入时,它也会尝试更新索引文件 .MYI,这可能会减慢您的导入速度,因为它会消耗机器人 I/O以及每个单独插入的 CPU 资源。
您可以做的是将 4 个文件添加到导入文件中以禁用表的键并在插入语句末尾启用它,您应该会看到差异。
如果您不想编辑数据文件,请尝试使用 mysqldump 获取正确的转储文件,并且导入数据时不应遇到这种缓慢的情况。
希望这有帮助!
When you import data with "load data infile", I think mysql perform the insert one by one and with each insert, it tries to update the index file .MYI as well and this could slow down your import as it consume bot I/O and CPU resources for each individual insert.
What you could do is add 4 files to your import file to disable the keys of your table and enable it at the end of the insert statement and you should see the difference.
If you don't want to edit your data file, try to use mysqldump to get a proper dump file and you shouldn't run into this slowness with import data.
Hope this helps!
我不确定您提到的
key_buffer
是否与key_buffer_size
相同。我也遇到过类似的问题。通过将
key_buffer_size
值提高到 1GB 之类的值,我的问题得到了解决。 此处查看我的问题。I am not sure the
key_buffer
you mention is same askey_buffer_size
.I had faced similar problem. My problem was resolved by bumping up the
key_buffer_size
value to something like 1GB. Check my question here.