MyISAM 密钥长度限制,试图通过字段的 md5 来加快速度
我在小型 MyISAM 表上工作 - 大约 30k 条目,大小 - 10mb。其中一个字段是 varchar(500+),因为我使用 utf8_unicode_ci 我无法索引该字段(我达到 1000 字节限制),同时我需要基于该字段执行大量“get_or_create”查询。我正在尝试优化数据库,但速度仍然很慢。
创建附加字段(该字段将保存 varchar 值的 md5 并对其进行索引/用于查找)是一个很好的解决方案吗?有人尝试过这种方法吗?
I work on small MyISAM table - around 30k entries, size - 10mb. One of the fields is varchar(500+), because I use utf8_unicode_ci I can't index this field(I hit 1000 bytes limit) and at the same time I need to perform lots of "get_or_create" queries based on this field. I am trying to optimize database but things are still to slow.
Is it a good solution to create additional field, which will hold md5 of varchar's value and index it/use for lookup? Has anyone tried this approach?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对我来说,使用如此宽的列作为键似乎是一个坏主意,但除此之外,你绝对可以做你建议的事情。你甚至不需要使用MD5,你所需要的只是一个产生很少冲突的哈希函数,但唯一性不是必需的。
CRC32
产生的值很小,而且速度非常快。假设您的表如下所示:
将其更改为:
当您插入行时,您
会执行以下操作: 然后您可以像这样检索行:
查询将使用
text_hash
上的索引,但由于CRC32
不会生成唯一值,您仍然需要检查lots_of_text
字段是否相等 - 但查询仍然很快,因为最多有几行具有相同的哈希值。其变体是使用前 50 个字符左右作为散列,具有相同前 50 个字符的行数很可能很少。
To me it seems like a bad idea to use such a wide column as a key, but that aside you can definitely do something like what you suggest. You don't even need to use MD5, all you need is a hash function that produces few collisions, but uniqueness is not necessary.
CRC32
produces a small value, and is very fast.Say your table looks like this:
change it to this:
and when you insert rows you do:
and then you can retrieve rows like this:
the query will use the index on
text_hash
, but sinceCRC32
will not produce unique values you still need to check thelots_of_text
field for equality -- but the query will still be quick since at most a few rows will have the same hash.A variant of this is to use the first 50 characters or so as a hash, the number of rows having the same first 50 characters is very likely to be low.