增加 mySQL 中较慢的数据插入速度

发布于 2024-08-20 20:58:59 字数 3911 浏览 7 评论 0原文

背景

我们有大约 60GB 的大型平面文件,正在插入数据库。我们在插入过程中遇到了性能逐渐下降的情况。

  • 我们有 174(百万)条记录,预计将插入另外 50(百万)条记录
  • 我们已根据实体名称的前两个字符将主表拆分为 1000 多个表 例如entity_aa,entity_ab ...entity_zz
  • 在每次插入期间,有三个查询运行(a)基于范围的搜索到另一个表,(b)检查记录是否已插入(c)插入详细信息(entity_briefs)表
  • 我们添加entity_briefs 来处理频繁的搜索查询,但意识到,在插入数据库后,无论我们是否 ALTER TABLEEntity(或entity_briefs)DISABLE(或ENABLE)KEY,它都会逐渐变慢。
  • 该机器有 4 个 CPU、G 磁盘空间、2 GB RAM。操作系统是Linux CentOS (5.4) 32位
  • 我们发现并非所有4个CPU都得到利用
  • 我们一次运行了4个导入脚本,但整体性能并不令人满意

有问题的表

CREATE TABLE `entity_briefs` (
`entity_brief_id` bigint(11) NOT NULL auto_increment,
`entity_id` bigint(11) default NULL,
`entity_table_prefix` char(2) default NULL,
`string_1` varchar(255) default NULL,
`string_2` varchar(255) default NULL,
`zip` varchar(25) default NULL,
`phone` bigint(11) default NULL,
PRIMARY KEY  (`entity_brief_id`),
KEY `idx_entity_id` (`entity_id`),
KEY `idx_entity_table_prefix` (`entity_table_prefix`),
KEY `idx_zip` (`zip`),
KEY `idx_string_1` (`string_1`),
KEY `idx_string_2` (`string_2`),
KEY `idx_phone` (`phone`)
);

mysqltuner。 pl输出:

 >>  MySQLTuner 1.1.1 - Major Hayden <[email protected]>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: xxxxx
Please enter your MySQL administrative password:xxxxx

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.85-community
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 101M (Tables: 1344)
[!!] InnoDB is enabled but isn't being used
[!!] Total fragmented tables: 1

-------- Security Recommendations  -------------------------------------------
ERROR 1142 (42000) at line 1: SELECT command denied to user 'xxxx'@'localhost' for table 'user'
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 5d 15h 53m 55s (2M q [4.395 qps], 9K conn, TX: 1B, RX: 425M)
[--] Reads / Writes: 51% / 49%
[--] Total buffers: 34.0M global + 2.7M per thread (500 max threads)
[OK] Maximum possible memory usage: 1.3G (67% of installed RAM)
[OK] Slow queries: 0% (9/2M)
[OK] Highest usage of available connections: 1% (5/500)
[!!] Key buffer size / total MyISAM indexes: 8.0M/105.3M
[!!] Key buffer hit rate: 94.1% (72M cached / 4M reads)
[!!] Query cache is disabled
[OK] Temporary tables created on disk: 7% (101 on disk / 1K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (64 open / 277K opened)
[OK] Open file limit used: 0% (127/18K)
[OK] Table locks acquired immediately: 99% (2M immediate / 2M locks)
[!!] Connections aborted: 38%

-------- Recommendations -----------------------------------------------------
General recommendations:
    Add skip-innodb to MySQL configuration to disable InnoDB
    Run OPTIMIZE TABLE to defragment tables for better performance
    Enable the slow query log to troubleshoot bad queries
    Set thread_cache_size to 4 as a starting value
    Increase table_cache gradually to avoid file descriptor limits
    Your applications are not closing MySQL connections properly
Variables to adjust:
    key_buffer_size (> 105.3M)
    query_cache_size (>= 8M)
    thread_cache_size (start at 4)
    table_cache (> 64)

要求: 为了加速插入可以使用什么优化策略?

Background:

We have large flat files span around 60GB and are inserting into database. We are experiencing incremental performance downgrade during insertion.

  • We have 174 (million) records and expecting another 50 (million) to be inserted
  • We have splitted main table into 1000+ tables on the basis of first-two-characters of entity-name
    e.g. entity_aa, entity_ab ... entity_zz
  • During each insertion, there are three queries ran (a) range based search to another table,(b) checking if record is already inserted or not (c) insert into detail (entity_briefs) table
  • We added entity_briefs to handle frequent search queries, but realized that, upon insertion into database, it slow down gradually no matter if we ALTER TABLE entity (or entity_briefs) DISABLE (or ENABLE) KEY.
  • The machine has 4 CPUs, Gigs of disk-space, 2GB RAM. Operating system is Linux CentOS (5.4) 32bit
  • We found that not all 4 CPUs are get utilized
  • We've had ran 4 importing scripts at once, tho overall performance is not satisfactory

Problematic Table

CREATE TABLE `entity_briefs` (
`entity_brief_id` bigint(11) NOT NULL auto_increment,
`entity_id` bigint(11) default NULL,
`entity_table_prefix` char(2) default NULL,
`string_1` varchar(255) default NULL,
`string_2` varchar(255) default NULL,
`zip` varchar(25) default NULL,
`phone` bigint(11) default NULL,
PRIMARY KEY  (`entity_brief_id`),
KEY `idx_entity_id` (`entity_id`),
KEY `idx_entity_table_prefix` (`entity_table_prefix`),
KEY `idx_zip` (`zip`),
KEY `idx_string_1` (`string_1`),
KEY `idx_string_2` (`string_2`),
KEY `idx_phone` (`phone`)
);

mysqltuner.pl output:

 >>  MySQLTuner 1.1.1 - Major Hayden <[email protected]>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: xxxxx
Please enter your MySQL administrative password:xxxxx

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.85-community
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 101M (Tables: 1344)
[!!] InnoDB is enabled but isn't being used
[!!] Total fragmented tables: 1

-------- Security Recommendations  -------------------------------------------
ERROR 1142 (42000) at line 1: SELECT command denied to user 'xxxx'@'localhost' for table 'user'
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 5d 15h 53m 55s (2M q [4.395 qps], 9K conn, TX: 1B, RX: 425M)
[--] Reads / Writes: 51% / 49%
[--] Total buffers: 34.0M global + 2.7M per thread (500 max threads)
[OK] Maximum possible memory usage: 1.3G (67% of installed RAM)
[OK] Slow queries: 0% (9/2M)
[OK] Highest usage of available connections: 1% (5/500)
[!!] Key buffer size / total MyISAM indexes: 8.0M/105.3M
[!!] Key buffer hit rate: 94.1% (72M cached / 4M reads)
[!!] Query cache is disabled
[OK] Temporary tables created on disk: 7% (101 on disk / 1K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (64 open / 277K opened)
[OK] Open file limit used: 0% (127/18K)
[OK] Table locks acquired immediately: 99% (2M immediate / 2M locks)
[!!] Connections aborted: 38%

-------- Recommendations -----------------------------------------------------
General recommendations:
    Add skip-innodb to MySQL configuration to disable InnoDB
    Run OPTIMIZE TABLE to defragment tables for better performance
    Enable the slow query log to troubleshoot bad queries
    Set thread_cache_size to 4 as a starting value
    Increase table_cache gradually to avoid file descriptor limits
    Your applications are not closing MySQL connections properly
Variables to adjust:
    key_buffer_size (> 105.3M)
    query_cache_size (>= 8M)
    thread_cache_size (start at 4)
    table_cache (> 64)

Requirement:
In order to speed up the insertion what optimization strategy can be used?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

说好的呢 2024-08-27 20:58:59

一些一般性建议,因为我没有给您灵丹妙药:

我认为您不能指望随着表大小的增长,插入时事情不会完全减慢。数据库插入时间通常会随着数据库大小而变化,技巧是在考虑到这种期望的情况下尝试使整体性能可接受。

如果速度变慢并且 CPU 没有固定,那么数据库访问可能会受到 I/O 限制。如果您发现这种情况,您可能想尝试更快的驱动器、Raid 0、更快的驱动器控制器等。您甚至可能想考虑在固态驱动器上构建数据库,然后在创建后将其复制到传统硬盘驾驶。对于文件系统上的 mysql 的随机访问行为来说,这些应该要快得多,尽管我知道随着时间的推移你会“磨损它们”。尽管如此,您还是可以以不到 1 万美元的价格购买 TB 级的固态存储。

还要仔细考虑优化插入过程。像您提到的那样,在插入期间禁用索引虽然不会阻止逐渐减慢,但应该会显着加快整个过程。我从您的描述中得知,您有某种执行选择和插入的插入脚本逻辑,而不是简单的平面文件加载。每次插入您都会执行三个不同的查询,可能会在客户端和数据库之间多次往返数据。尤其要注意范围选择,并确保该查询本身在表大小方面没有不良的性能特征。

另一种可能性可能是使用更多的 RAM 来解决问题并将其用作磁盘缓存。如果您运行这些范围选择的“其他表”在插入期间没有被修改,也许您可​​以将其放入内存中以减少驱动器查找,如果您确定查找时间确实是这里的性能限制。

A few general suggestions, as I don't have a silver bullet for you:

I don't think you can expect things to not slow down at all on insert as the table sizes grow. Database insert times generally will scale with database size, the trick is to try to make the overall performance acceptable given this expectation.

If things are slowing down and CPU isn't pegged, then you are probably I/O bound on database access. If you find that this is the case, you may want to try faster drives, Raid 0, faster drive controllers, etc. You may even want to consider building the database on a Solid State drive and then copying it after creation to a traditional hard drive. These should be much faster for the random access behaviour you can expect from mysql on a filesystem, though I understand you will 'wear them out' over time. Still, you can get a Terabyte of Solid State storage under $10k.

Also take a good look at optimizing your insert procedure. Disabling indexes during the inserts like you mention, while it won't stop the gradual slow down, should speed up the overall procedure significantly. I take from your description that you have some sort of insert script logic that does selects and inserts, not not a simple LOAD of a flat file. You are doing three different queries per insert, possibly round tripping the data multiple times between your client and the database. Especially look at that ranged select and make sure that this query alone doesn't have bad performance characteristics on table size.

Another possibility may be throwing a lot more RAM at the problem and using it as a disk cache. If that "other table" that you are running those range selects on isn't being modified during your insertfest, perhaps you can get that in memory to cut down on drive seeking, if you determine that seek time is indeed the performance bound here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文