MyISAM 密钥长度限制,试图通过字段的 md5 来加快速度

发布于 2024-10-08 04:18:56 字数 239 浏览 2 评论 0原文

我在小型 MyISAM 表上工作 - 大约 30k 条目,大小 - 10mb。其中一个字段是 varchar(500+),因为我使用 utf8_unicode_ci 我无法索引该字段(我达到 1000 字节限制),同时我需要基于该字段执行大量“get_or_create”查询。我正在尝试优化数据库,但速度仍然很慢。

创建附加字段(该字段将保存 varchar 值的 md5 并对其进行索引/用于查找)是一个很好的解决方案吗?有人尝试过这种方法吗?

I work on small MyISAM table - around 30k entries, size - 10mb. One of the fields is varchar(500+), because I use utf8_unicode_ci I can't index this field(I hit 1000 bytes limit) and at the same time I need to perform lots of "get_or_create" queries based on this field. I am trying to optimize database but things are still to slow.

Is it a good solution to create additional field, which will hold md5 of varchar's value and index it/use for lookup? Has anyone tried this approach?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你的心境我的脸 2024-10-15 04:18:56

对我来说,使用如此宽的列作为键似乎是一个坏主意,但除此之外,你绝对可以做你建议的事情。你甚至不需要使用MD5,你所需要的只是一个产生很少冲突的哈希函数,但唯一性不是必需的。 CRC32 产生的值很小,而且速度非常快。

假设您的表如下所示:

CREATE TABLE data (lots_of_text VARCHAR(500));

将其更改为:

CREATE TABLE data (text_hash INT, lots_of_text VARCHAR(500), INDEX (text_hash));

当您插入行时,您

INSERT INTO data (lots_of_text, text_hash) 
VALUES ("lots and lots of text", CRC32("lots and lots of text"));

会执行以下操作: 然后您可以像这样检索行:

SELECT lots_of_text FROM data
WHERE text_hash = CRC32("lots and lots of text")
AND lots_of_text = "lots and lots of text";

查询将使用 text_hash 上的索引,但由于 CRC32 不会生成唯一值,您仍然需要检查 lots_of_text 字段是否相等 - 但查询仍然很快,因为最多有几行具有相同的哈希值。

其变体是使用前 50 个字符左右作为散列,具有相同前 50 个字符的行数很可能很少。

To me it seems like a bad idea to use such a wide column as a key, but that aside you can definitely do something like what you suggest. You don't even need to use MD5, all you need is a hash function that produces few collisions, but uniqueness is not necessary. CRC32 produces a small value, and is very fast.

Say your table looks like this:

CREATE TABLE data (lots_of_text VARCHAR(500));

change it to this:

CREATE TABLE data (text_hash INT, lots_of_text VARCHAR(500), INDEX (text_hash));

and when you insert rows you do:

INSERT INTO data (lots_of_text, text_hash) 
VALUES ("lots and lots of text", CRC32("lots and lots of text"));

and then you can retrieve rows like this:

SELECT lots_of_text FROM data
WHERE text_hash = CRC32("lots and lots of text")
AND lots_of_text = "lots and lots of text";

the query will use the index on text_hash, but since CRC32 will not produce unique values you still need to check the lots_of_text field for equality -- but the query will still be quick since at most a few rows will have the same hash.

A variant of this is to use the first 50 characters or so as a hash, the number of rows having the same first 50 characters is very likely to be low.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文