使用随机 PRIMARY KEY 列值缓慢插入 InnoDB 表
对于我的网站,我使用 Flickr 的 PHP API ( http://www.flickr.com/services/api /)。此 API 提供了几种有用的方法来获取特定 GPS 位置周围的照片。
对 API 方法的调用看起来像带有特定参数(如纬度、经度、API 密钥、半径、排序等)的 URL。比如说,它看起来像 http://api.flickr.com/method?lat=0.0 &lon=0.0&radius=10
我的网站对 API 进行了超过 200,000 次调用,以生成多个包含 Flickr 图片的页面。这是 API 上的一个非常艰难的推动,因此我在 mySQL 数据库中创建了一个结果缓存。
带缓存的 InnoDB 表的简化方案是:
char(32) request
datetime expires // 2-3 days
text response // serialized data from API response
其中 request
是一个 PRIMARY KEY,表示请求 URI 的 MD5 哈希值。其他字段非常简单:)
当表变得足够大(例如超过 100,000 行)时,就会出现问题。新的 INSERT
最多需要 2 秒(对于 1,000,000 行最多需要 6 (!) 秒)。
据我了解,问题在于主索引和引擎是 InnoDB。每次插入新请求时,InnoDB 引擎都会重建树索引并移动数据,因为 MD5(请求)是一个真正的随机值。
那么...问题是是否有更好的方法来缓存此类请求?或者也许我应该切换到 MyISAM 引擎?或者我可能应该尝试伪分区并创建多个表来解决问题?或者可能只是不使用 BTREE 而使用 HASH 索引?
欢迎任何想法!
编辑:
好吧,我尝试按照 Furicane 和 Johan 的建议更改表格,但仍然没有成功 - INSERT 最多需要 3 秒。目前,request
字段已成为普通的非唯一索引,并且新的 id
列已添加为具有自动增量的 PRIMARY KEY。我还尝试在此表上添加 4 个分区,结果相同。
我认为 request
字段上的索引仍然是一个瓶颈。我目前看到的唯一方法是确定所有可能的参数,将它们作为列添加到表中,然后在它们上创建索引。
还有其他想法吗? :)
编辑 2:
Salman A 在下面的评论中说他的类似表性能更好(插入时约为 0.03)。该问题可能出在系统的 IO 负载上。虽然我不能承受任何高负荷。
iostat
结果:
avg-cpu: %user %nice %system %iowait %steal %idle
22.94 0.71 8.42 8.50 0.00 59.43
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 38.01 151.04 114.32 1383655437 1047309046
iotop
结果:
Total DISK READ: 152.91 K/s | Total DISK WRITE: 197.67 K/s
mySQL 位于写入和读取列表的顶部。也许我的磁盘快坏了?如何检查磁盘性能?
For my website I use the PHP API for Flickr ( http://www.flickr.com/services/api/ ). This API provides several useful methods to get photos around particular GPS positions.
The call to API methods looks like URL with specific parameters like latitude, longitude, API key, radius, sorting, etc. Say, it'll look like http://api.flickr.com/method?lat=0.0&lon=0.0&radius=10
My website makes over 200,000 calls to API to generate several pages with pictures from Flickr. It is a pritty hard push on API thus I created a results cache in mySQL database.
Simplified scheme of the InnoDB table with cache is:
char(32) request
datetime expires // 2-3 days
text response // serialized data from API response
where request
is a PRIMARY KEY and represents an MD5 hash of a request URI. Other fields are quite simple :)
The problem arises when the table becomes large enough, say over 100,000 rows. New INSERTs
take up to 2 seconds (and up to 6 (!) second with 1,000,000 rows).
As far as I understand the problem is with PRIMARY INDEX and engine being InnoDB. Every time a new request is being inserted, InnoDB engine rebuilds the tree index and moves data around, because MD5(request) is a really random value.
So... The question is whether there is a better way to cache such requests? Or maybe I should switch to MyISAM engine? Or may be I should try pseudo-partitioning and create several tables to solve the problem? Or may be just use not a BTREE but HASH index?
Any ideas are welcome!
Edit:
Ok, I tried to alter table as Furicane and Johan suggested, but still no luck - INSERTs takes up to 3 seconds. Currently request
field became a normal non-unique index and new id
column has been added as PRIMARY KEY with auto increment. Also I tried to add 4 partitions on this table with same results.
I think that index on request
field is still a bottleneck. The only way I currently see is to determine all possible parameters, add them as columns to a table and then create index on them.
Any other ideas? :)
Edit 2:
Salman A in comments below said that his similar table performs much better (~0.03 for insert). This the problem may be in IO load on system. Though I cannot any high load on it.
iostat
results:
avg-cpu: %user %nice %system %iowait %steal %idle
22.94 0.71 8.42 8.50 0.00 59.43
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 38.01 151.04 114.32 1383655437 1047309046
iotop
results:
Total DISK READ: 152.91 K/s | Total DISK WRITE: 197.67 K/s
With mySQL on top of the list both for writing and reading. Maybe my disks are almost dead? How can I check disks performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
InnoDB 不支持
hash
键,仅支持 Btree。MyISAM 因不可靠而臭名昭著。
我认为你的问题是你使用MD5值作为主键。
主键包含在每个辅助键中。
并且PK被迫是唯一的密钥。
设置一个整数自增主键,并将你的MD5值设置为普通索引。
它甚至不需要是唯一的,因为这是拖慢你速度的很大一部分原因。
之后你的插入应该运行得更快。
InnoDB does not support
hash
keys, only Btree.MyISAM is infamous for being unreliable.
I think your problem is that you use the MD5 value for the primary key.
The primary key is included in every secondary key.
And the PK is forced to be a unique key.
Set an integer autoincrement primary key and set your MD5 value as a normal index.
It does not even need to be unique, because that's a huge part of what's slowing you down.
After this your inserts should run much faster.