MySQL Char() 或其他字段的顺序 UID 集生成

发布于 2024-10-02 13:37:05 字数 557 浏览 6 评论 0原文

尝试过谷歌搜索,但是:

问题: 从外部为 MySQL 字段生成顺序 UID 值的最佳方法,该字段必须可表示为字符串。

原因:
从 char[0] 向前搜索字段索引时,用于磁盘顺序/页面附加插入的通用顺序 UUID-ish 值可提高写入性能,并添加日期前缀以提高读取速度。该列将被索引,但会寻找最佳数据来提高索引读取和表写入性能,而不是普通的旧 UUID。

我最初的想法是在固定宽度的字符字段中附加或替换 UUIDv4 生成的字符串的某些部分(即 [Unix epoch][remaining UUID4])的某个粒度(可能是填充纪元)的日期,但我我不确定这是否具有所需的页内/磁盘排序结果和索引搜索结果。例如:

12904645950049bceba1cc24e80806dd

这些值必须独立于 MySQL 本身,因此使用 UUID 和时间戳而不是自动递增的某种变体。

任何了解 MySQL 索引内部结构的人有什么建议(对于 InnoDB 表)吗?

艾登

Tried Googling but:

Question:
Best way to externally generate Sequential UID values for a MySQL field which must be representable as a string.

Reason:
Generic sequential UUID-ish values for on-disk-order/page-appending inserts for performance of writes and date prefixing for read speed when searching an index of the field from char[0] forward. The column will be indexed, but looking for the best data to increase index read and table write performance rather than a plain-old-UUID.

My initial thought is date to some granularity (possibly padded epoch) appended to or replacing some portion of a UUIDv4 generated string ie [Unix epoch][remaining UUID4] in a fixed-width char field, but I am unsure if this would have the desired in-page/disk ordering result and index-searching result. An example would be:

12904645950049bceba1cc24e80806dd

The values must be independent of MySQL itself, hence using UUIDs and timestamps rather than some variation of auto-incrementing.

Anyone who knows the internals of MySQL indexes have any suggestions (for InnoDB Tables) ?

Aiden

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

柳若烟 2024-10-09 13:37:05

可能有点离题,但请看一下 Twitter 的雪花。他们说它是:(

  • 大致)时间排序(有助于避免昂贵的随机主键 BTREE 更新)
  • 直接可排序
  • 紧凑

更不用说其他功能(HA 等)。您可以修改他们的算法,也可以直接使用它。

整个 UID 最多只使用 64 位空间,所以我猜索引会非常有效 - 请参阅 http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/ (反例)。

Might be a bit offtopic, but have a look at Twitter's snowflake. They say it's:

  • (Roughly) Time Ordered (helps a lot to avoid expensive random primary key BTREE updates)
  • Directly Sortable
  • Compact

Not to mention other features (HA, etc.). You can either nick their algorithm or just use it as it stands.

The whole UID only uses up to 64 bits of space so I would guess it would be quite effective to index - see http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/ (a counter example).

南汐寒笙箫 2024-10-09 13:37:05

我认为您可能需要更具体地说明您要解决的问题(实际问题是什么 - 为什么不使用 auto_increment?,您建议的模式是什么?等)。
回答你的内部问题:

  • InnoDB 将数据存储在索引(聚集索引)中,以 16K 页为单位。

不按顺序插入的风险至少有两方面:

  1. 如果内存不合适,您可能需要执行随机 IO 从磁盘加载页面,以将值插入到该页面。

  2. 页面中可能没有剩余空间(InnoDB 填充了 93%,并留下了一个小间隙用于更新),这可能会导致页面需要拆分。更多拆分页面 = 碎片/对内存等事物的优化使用较少。

因此,我认为只要您近似连续,至少 (1) 不是主键索引的问题(对于任何唯一索引仍然可能如此)。您只需要担心(2)。


为什么我说理解这个问题很重要,因为除了长 GUID 之外,还有很多方法可以做到这一点。首先,MySQL 中的 BIGINT 比您可能使用的任何数据类型都要小,但范围为 18 quintillion。您可以一次将 N 千个密钥空间的“块”分配给工作节点,并保证没有重复项。如果工作节点崩溃并且没有使用它分配的所有块,那又怎样。没关系。

I think you may need to be more specific with what you are trying to solve (what's the actual problem - why not auto_increment?, what is your proposed schema?, etc.).
To answer your internals question:

  • InnoDB stores data in an index (the clustered index), in 16K pages.

The risks of not inserting sequentially are at least two fold:

  1. If you do not have memory fit, you may need to do random IO to load a page from disk to insert the value to that page.

  2. There might not be space remaining in the page (InnoDB fills 93% and leaves a small gap for updates), which could result in the page needing to be split. More split pages = fragmentation / less optimal use of things such as memory.

So, I think as long as you are approximately sequential at least (1) isn't a concern for the primary key index (could still be true for any unique indexes). You just need to be worried about (2).


Why I said that understanding the problem is important, is that there is so many ways to do this besides long GUIDs. For one, a BIGINT in MySQL is smaller than any data type you will probably be using, but has a range of 18 quintillion. You could allocate "chunks" of key space N thousand at a time to worker nodes and guarantee no duplicates. If a worker node crashes and doesn't use all the chunk it was allocated, so what. It doesn't matter.

灰色世界里的红玫瑰 2024-10-09 13:37:05

查看这个问题。它可能没有详细说明 MySQL 索引的具体用途,但它确实为您提供了一些性能数据以及生成 Seq 的代码。 UID。

看来 MySQL 索引极大地受益于顺序 ID,并且根据 MySQL 索引依赖于磁盘排序(请参阅部分:B 树索引特征)来查找相关结果。

根据记忆,MySQL 索引(至少对于字符串索引)首先依赖于字段的字母数字排序,即“哦,它以 A 开头?我有以 A 开头的数据,我会为你获取它。 .. ETC。”而不是对每个字段进行全文扫描。

按顺序输入 UID 意味着索引不会首先按“字母顺序”对结果重新排序,或者至少显着缩短该时间,因此具有上述性能优势。

(不是真正的解决方案,但至少是一个答案。)

Check out this question. It perhaps doesn't detail the specific uses of MySQL indices, but it does give you some performance data, and the code to generate the Seq. UIDs.

It seems MySQL indexing benefits greatly from sequential IDs, and according to MySQL the indexing relies on disk-ordering (see Section: B-Tree Index Characteristics) to find the relevant results.

From memory, MySQL indexing (for String indices at least) relies first on the alphanumeric-ordering of the field, i.e. "Oh, it begins with an A? I have data that begins with an A, I'll fetch it for you... etc." Rather than doing a full-text scan on each field.

And entering the UIDs in sequentially means the index does not reorder the results 'alphabetically' first, or at least reduces this time dramatically, hence the above performance benefits mentioned above.

(Not really a solution, but an answer at least.)

初心 2024-10-09 13:37:05

我所做的是使用固定宽度字符字段并将随机 UUID 字符串附加到当前时间(以毫秒为单位)。这很好,因为即使您的服务器在同一毫秒内被访问两次,它仍然(可能)是唯一的。我假设如果您有大量服务器负载,这可能会提供多个 id,但如果担心这一点,您可以检查是否已经创建了具有此 uuid 的行。

PHP:

$date = new DateTime();
$UUID = uniqid( $date->format('Uu'), FALSE);  // For less length
$UUID = uniqid( $date->format('Uu'), TRUE);   // For more length

这是我在我的(很少使用的)服务器上使用的。但它应该能够承受更大的负载。正如我所说,要克服创建两个相同密钥的微小可能性,请检查它是否已被使用并分配一个新密钥。 (这种情况不应该经常发生)

What I do is I use a fixed width character field and perpend a random UUID string to the current time (in milliseconds). This is nice because even if your server is accessed twice in the same millisecond it will still (likely) be unique. I assume if you have a massive server load this could give multiple id's but if this is worried about you could check to see if a row with this uuid has already been created.

PHP:

$date = new DateTime();
$UUID = uniqid( $date->format('Uu'), FALSE);  // For less length
$UUID = uniqid( $date->format('Uu'), TRUE);   // For more length

This is what I use on my (rarely used) server. But it should hold strong for bigger loads. As I said to overcome the slight chance that two identical keys are created check to see if it has already been used and assign a new one. (this shouldn't happen too often)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文