即使对于非并发插入/更新,大型 MyISAM 表也会很慢
我有一个包含约 50'000'000 条记录的 MyISAM 表(网络爬虫的任务):
CREATE TABLE `tasks2` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(760) character set latin1 NOT NULL,
`state` varchar(10) collate utf8_bin default NULL,
`links_depth` int(11) NOT NULL,
`sites_depth` int(11) NOT NULL,
`error_text` text character set latin1,
`parent` int(11) default NULL,
`seed` int(11) NOT NULL,
`random` int(11) NOT NULL default '0',
PRIMARY KEY (`id`),
UNIQUE KEY `URL_UNIQUE` (`url`),
KEY `next_random_task` (`state`,`random`)
) ENGINE=MyISAM AUTO_INCREMENT=61211954 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
每隔几秒就会发生一次以下操作之一(但不会同时发生):
- INSERT ... VALUES(500 行)- 插入新任务
- UPDATE ... WHERE id IN(最多 10 个 id) - 更新批量任务的状态
- SELECT ... WHERE(通过 next_random_task 索引) - 加载批量任务进行处理
我的问题是插入和更新非常慢 - 在大约几十秒,有时甚至超过一分钟。不过,选择速度很快。为什么会发生这种情况以及如何提高性能?
I have a MyISAM table with ~50'000'000 records (tasks for web crawler):
CREATE TABLE `tasks2` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(760) character set latin1 NOT NULL,
`state` varchar(10) collate utf8_bin default NULL,
`links_depth` int(11) NOT NULL,
`sites_depth` int(11) NOT NULL,
`error_text` text character set latin1,
`parent` int(11) default NULL,
`seed` int(11) NOT NULL,
`random` int(11) NOT NULL default '0',
PRIMARY KEY (`id`),
UNIQUE KEY `URL_UNIQUE` (`url`),
KEY `next_random_task` (`state`,`random`)
) ENGINE=MyISAM AUTO_INCREMENT=61211954 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Once every few seconds one of the following operations occur (but never simultaneously):
- INSERT ... VALUES (500 rows) - inserts new tasks
- UPDATE ... WHERE id IN (up to 10 ids) - updates state for batch of tasks
- SELECT ... WHERE (by next_random_task index) - loads batch of tasks for processing
My problem is that inserts and updates are very slow - running on the order of tens of seconds, sometimes over a minute. Selects are fast, though. Why could this happen and how to improve performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在普通硬件上~50M 是一个不错的数字。
请仔细阅读关于 sf 的这个问题(尽管它是为 InoDB 编写,MyISAM 有类似的参数)
的周期,
~50M on a regular hardware is a decent number.
Please go through this question on sf (even though it is written for InoDB, there are similar parameters for MyISAM)
After that you should start the cycle of
EXPLAIN
针对完整表的UPDATE
示例,以确保使用主键索引。考虑将
state
更改为TINYINT
或ENUM
以使其索引更小。 (ENUM
实际上可能不会这样做)。您需要
url
上的唯一密钥吗?这会减慢插入速度。EXPLAIN
a sampleUPDATE
against the full table to ensure the primary key index is being used.Consider changing
state
to aTINYINT
orENUM
to make its index smaller. (ENUM
might not actually do this).Do you need the unique key on
url
? This will slow down inserts.