GUID/UUID 的 URL 压缩表示?

发布于 2024-09-02 18:08:04 字数 243 浏览 2 评论 0原文

我需要生成一个 GUID 并通过字符串表示形式保存它。字符串表示形式应尽可能短,因为它将用作已经很长的 URL 字符串的一部分。

现在,我不使用正常的 abcd-efgh-... 表示,而是使用生成的原始字节并对它们进行 Base64 编码,这会导致字符串稍微短一些。

但是有可能让它变得更短吗?

我可以接受失去一定程度的唯一性并保留一个计数器,但是扫描所有现有的密钥不是一个选项。建议?

I need to generate a GUID and save it via a string representation. The string representation should be as short as possible as it will be used as part of an already-long URL string.

Right now, instead of using the normal abcd-efgh-... representation, I use the raw bytes generated and base64-encode them instead, which results in a somewhat shorter string.

But is it possible to make it even shorter?

I'm OK with losing some degree of uniqueness and keeping a counter, but scanning all existing keys is not an option. Suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

素罗衫 2024-09-09 18:08:05

(很久了,但今天才遇到同样的需求)

UUID 长 128 位,由 32 个十六进制加 4 个连字符表示。
如果我们使用 64 (2^6) 个可打印 ascii 的字典,只需从 32 组 4 位(十六进制长度)转换为 22 组 6 位即可。

这是一个 UUID 缩写。相反,36 个字符您会得到 22 个字符,而不会丢失原始位。

https://gist.github.com/tomlobato/e932818fa7eb989e645f2e64645cf7a5

class UUIDShortner
    IGNORE = '-'
    BASE6_SLAB = ' ' * 22

    # 64 (6 bits) items dictionary
    DICT = 'a'.upto('z').to_a +
        'A'.upto('Z').to_a +
        '0'.upto('9').to_a +
        ['_', '-'] 

    def self.uuid_to_base6 uuid
        uuid_bits = 0

        uuid.each_char do |c|
            next if c == IGNORE
            uuid_bits = (uuid_bits << 4) | c.hex
        end

        base6 = BASE6_SLAB.dup

        base6.size.times { |i|
            base6[i] = DICT[uuid_bits & 0b111111]
            uuid_bits >>= 6
        }

        base6
    end
end

# Examples:

require 'securerandom'
uuid = ARGV[0] || SecureRandom.uuid
short = UUIDShortner.uuid_to_base6 uuid
puts "#{uuid}\n#{short}"

# ruby uuid_to_base6.rb
# c7e6a9e5-1fc6-4d5a-b889-4734e42b9ecc
# m75kKtZrjIRwnz8hLNQ5hd

(long time, but just came into the same need today)

UUIDs are 128bits long, represented by 32 hex plus 4 hyphens.
If we use a dictionary of 64 (2^6) printable ascii`s, it is just a matter of converting from 32 groups of 4 bits (length of a hex) to 22 groups of 6 bits.

Here is a UUID shortner. Instead 36 chars you get 22, without losing the original bits.

https://gist.github.com/tomlobato/e932818fa7eb989e645f2e64645cf7a5

class UUIDShortner
    IGNORE = '-'
    BASE6_SLAB = ' ' * 22

    # 64 (6 bits) items dictionary
    DICT = 'a'.upto('z').to_a +
        'A'.upto('Z').to_a +
        '0'.upto('9').to_a +
        ['_', '-'] 

    def self.uuid_to_base6 uuid
        uuid_bits = 0

        uuid.each_char do |c|
            next if c == IGNORE
            uuid_bits = (uuid_bits << 4) | c.hex
        end

        base6 = BASE6_SLAB.dup

        base6.size.times { |i|
            base6[i] = DICT[uuid_bits & 0b111111]
            uuid_bits >>= 6
        }

        base6
    end
end

# Examples:

require 'securerandom'
uuid = ARGV[0] || SecureRandom.uuid
short = UUIDShortner.uuid_to_base6 uuid
puts "#{uuid}\n#{short}"

# ruby uuid_to_base6.rb
# c7e6a9e5-1fc6-4d5a-b889-4734e42b9ecc
# m75kKtZrjIRwnz8hLNQ5hd
无法言说的痛 2024-09-09 18:08:05

你可以从另一个方向来解决这个问题。生成尽可能短的字符串表示形式并将其映射到 Guid 中。

使用定义的字母表生成密钥,如下所示:

在伪代码中:

string RandomString(char[] alphabet, int length)
{
  StringBuilder result = new StringBuilder();
  for (int i = 0; i < length; i++)
    result.Append(alphabet[RandomInt(0, alphabet.Length)]);

  return result;
}

如果保持字符串长度 < 16、您可以简单地对结果进行十六进制编码并将其传递给 Guid 构造函数进行解析。

You could approach this from the other direction. Produce the shortest possible string representation and map it into a Guid.

Generate the key using a defined alphabet as below:

In psuedocode:

string RandomString(char[] alphabet, int length)
{
  StringBuilder result = new StringBuilder();
  for (int i = 0; i < length; i++)
    result.Append(alphabet[RandomInt(0, alphabet.Length)]);

  return result;
}

If you keep the string length < 16, you can simply hex encode the result and pass it to the Guid constructor to parse.

泡沫很甜 2024-09-09 18:08:05

不是完全相同的问题,但非常非常接近 - 我使用了 CRC64、Base64,你得到 11 个字节,CRC64 已经过测试(未经证明),不会在各种字符串上产生重复项。

由于根据定义它是 64 位长,因此您得到的密钥大小只有一半。

要直接回答原始问题 - 您可以对 GUID 的任何表示形式进行 CRC64 编码。

或者只需在业务密钥上运行 CRC64,您将拥有一个 64 位唯一的东西,然后您可以对其进行 base64。

not for exact same problem, but very very close - I have used CRC64, Base64 that and you get 11 bytes, CRC64 has been tested (not proven) to NOT produce duplicates on a wide range of strings.

And since it is 64 bit long by definition - you get the key that is half the size.

To directly answer the original question - you can CRC64 encode any representation of your GUIDs.

Or just run CRC64 on the business key and you will have a 64 bit unique thing that you can then base64.

内心旳酸楚 2024-09-09 18:08:04

我使用 Ascii85 编码以 20 个 ASCII 字符将 Guid 写入数据库列。我已经发布了 C# 代码,以防它有用。 URL 编码的特定字符集可能有所不同,但您可以选择适合您的应用程序的字符。它可以在这里找到:将任意 GUID 编码为可读 ASCII (33-127) 的最有效方法是什么?

I used an Ascii85 encoding for writing a Guid to a database column in 20 ASCII characters. I've posted the C# code in case it is useful. The specific character set may be different for a URL encoding, but you can pick whichever characters suit your application. It's available here: What is the most efficient way to encode an arbitrary GUID into readable ASCII (33-127)?

终难遇 2024-09-09 18:08:04

当然,只需使用大于 64 的基数即可。您必须使用自定义字母表对它们进行编码,但您应该能够找到更多“url 安全”的可打印 ASCII 字符。

Base64 使用 8 位对 6 位进行编码,因此 16 字节 GUID 值变为 22 字节编码。您也许可以将其减少一两个字符,但也不能多了。

Sure, just use a base larger than 64. You'll have to encode them using a custom alphabet, but you should be able to find a few more "url-safe" printable ASCII characters.

Base64 encodes 6 bits using 8, so a 16 byte GUID value becomes 22 bytes encoded. You may be able to reduce that by a character or two, but not much more.

顾挽 2024-09-09 18:08:04

我发现这个讨论很有趣:https://www. percona.com/blog/2014/12/19/store-uuid-optimized-way/

基本上,您获取 36 个字符并将它们转换为 16 字节的二进制文件,但首先使用存储过程对三个时间片段进行排序:

set @uuid:= uuid();
select @uuid;
+--------------------------------------+
| @uuid                                |
+--------------------------------------+
| 59f3ac1e-06fe-11e6-ac3c-9b18a7fcf9ed |
+--------------------------------------+

CREATE DEFINER=`root`@`localhost`
    FUNCTION `ordered_uuid`(uuid BINARY(36))
    RETURNS binary(16) DETERMINISTIC
    RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25)));

select hex(ordered_uuid(@uuid));
+----------------------------------+
| hex(ordered_uuid(@uuid))         |
+----------------------------------+
| 11e606fe59f3ac1eac3c9b18a7fcf9ed |
+----------------------------------+

I found this discussion interesting: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

Basically you take the 36 characters and turn them into 16 bytes of binary but first sort the three temporal pieces using a stored procedure:

set @uuid:= uuid();
select @uuid;
+--------------------------------------+
| @uuid                                |
+--------------------------------------+
| 59f3ac1e-06fe-11e6-ac3c-9b18a7fcf9ed |
+--------------------------------------+

CREATE DEFINER=`root`@`localhost`
    FUNCTION `ordered_uuid`(uuid BINARY(36))
    RETURNS binary(16) DETERMINISTIC
    RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25)));

select hex(ordered_uuid(@uuid));
+----------------------------------+
| hex(ordered_uuid(@uuid))         |
+----------------------------------+
| 11e606fe59f3ac1eac3c9b18a7fcf9ed |
+----------------------------------+
贱贱哒 2024-09-09 18:08:04

我不确定这是否可行,但您可以将所有生成的 GUID 放入表中,并在 URL 中仅使用表中 GUID 的索引。

您还可以减少 guid 的长度 - 例如,使用 2 个字节表示自 2010 年以来的天数,使用 4 个字节表示自当天开始以来的毫秒数。仅同一毫秒内生成的 2 个 GUID 才会发生冲突。您还可以添加 2 个随机字节,这将使效果更好。

I'm not sure if this is feasible, but you could put all the generated GUIDs in a table and use in the URL only the index of the GUID in the table.

You could also reduce the length of the guid - for example use 2 bytes to indicate the number of days since 2010 for example and 4 bytes for the number of miliseconds since the start of the current day. You will have collisions only for 2 GUIDs generated in the same milisecond. You could also add 2 more random bytes which will make this even better.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文