提高 mysql 加载数据 infile 的性能
我正在尝试使用 LOAD DATA INFILE (来自 CSV)将大约 12m 记录批量加载到(本地)mysql 中的 InnoDB 表中,并发现它需要很长时间才能完成。
主键类型是 UUID,并且键在数据文件中未排序。
我已将数据文件拆分为包含 100000 条记录的文件,并将其导入为:
mysql -e 'ALTER TABLE customer DISABLE KEYS;'
for file in *.csv
mysql -e "SET sql_log_bin=0;SET FOREIGN_KEY_CHECKS=0; SET UNIQUE_CHECKS=0;
SET AUTOCOMMIT=0;LOAD DATA INFILE '${file}' INTO TABLE table
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'; COMMIT"
这对于前几十万条记录来说效果很好,但随后每个后续加载的插入时间似乎不断增长(从大约 7 秒到大约 2 分钟/ 我在
一台 8GB RAM 的机器上运行,并将 InnoDB 参数设置为:
innodb_buffer_pool_size =1024M
innodb_additional_mem_pool_size =512M
innodb_log_file_size = 256M
innodb_log_buffer_size = 256M
我还尝试加载包含所有行的单个 CSV,但没有成功 - 这在我之前运行了超过 2 小时杀了它。
是否还有其他方法可以加快速度,因为仅加载 1200 万条记录似乎时间过长?
I'm trying to bulk load around 12m records into a InnoDB table in a (local) mysql using LOAD DATA INFILE (from CSV) and finding it's taking a very long time to complete.
The primary key type is UUID and the keys are unsorted in the data files.
I've split the data file into files containing 100000 records and import it as:
mysql -e 'ALTER TABLE customer DISABLE KEYS;'
for file in *.csv
mysql -e "SET sql_log_bin=0;SET FOREIGN_KEY_CHECKS=0; SET UNIQUE_CHECKS=0;
SET AUTOCOMMIT=0;LOAD DATA INFILE '${file}' INTO TABLE table
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'; COMMIT"
This works fine for the first few hundred thousand records but then the insert time for each subsequent load seems to keep growing (from around 7 seconds to around 2 minutes per load before I killed it.)
I'm running on a machine with 8GB RAM and have set the InnoDB parameters to:
innodb_buffer_pool_size =1024M
innodb_additional_mem_pool_size =512M
innodb_log_file_size = 256M
innodb_log_buffer_size = 256M
I've also tried loading a single CSV containing all rows with no luck - this ran in excess of 2 hours before I killed it.
Is there anything else that could speed this up as this seems like an excessive time to only load 12m records?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您知道数据是“干净的”,那么您可以在导入之前删除受影响表上的索引,然后在导入完成后重新添加它们。
否则,每条记录都会导致索引重新计算,如果您有一堆索引,这可能会真正减慢速度。
If you know the data is "clean", then you can drop indexes on the affected tables prior to the import and then re-add them after it is complete.
Otherwise, each record causes an index-recalc, and if you have a bunch of indexes, this can REALLY slow things down.
总是很难说出性能问题的原因是什么,但这些是我的 2 美分:
您的密钥是一个 uuid ,它是随机分布的,这使得维护索引变得困难。原因是密钥按范围存储在文件系统块中,因此随机 uuid 彼此跟随会使操作系统在不利用缓存的情况下读取和写入文件系统块。我不知道您是否可以更改密钥,但您可以对输入文件中的 uuid 进行排序,看看是否有帮助。
仅供参考,为了更好地理解这个问题,我会看看这篇博客文章< /a> 也许可以阅读这本书 mysql高性能 它有一个关于innodb聚集索引的很好的章节。
祝你好运!
Its always hard to tell what is the cause of performance issues but these are my 2 cents:
Your key being a uuid is randomly distributed which makes it hard to maintain an index. The reason being that keys are stored by range in a file system block, so having random uuids follow each other makes the OS read and write blocks to the file system without leveraging the cache. I don't know if you can change the key, but you could maybe sort the uuids in the input file and see if that helps.
FYI, to understand this issue better I would take a look at this blog post and maybe read this book mysql high performance it has a nice chapter about innodb clustered index.
Good Luck!