生成/压缩唯一密钥

发布于 2024-12-28 21:00:51 字数 456 浏览 1 评论 0原文

在我的工作中,我有很多用户,每个用户的主目录中都有一组文件。由于一些预定义的规则,我根据用户文件内容及其创建时间为每个文件指定了一个 UID(唯一标识)。但现在我知道用户帐户中的文件数量不能超过 100 万个。当前 UID 长度约为 32 个字符。有什么方法可以将我的 UID 减少到大约 6 个(理想条件)字符到大约 10-12 个字符,因为当前的 uidl 在我的 NoSQL 数据库中使用了大量空间。

当前的uidl看起来像 timestamp.prrocess_whichcreated_it.size

编辑 让我重新表述一下这个问题。我真正需要的是一个压缩算法: 例如,

我有 1,000,000 个字符串的列表(每个字符串都是唯一的),每个字符串长度为 32 个字符。我需要一个压缩函数 f,使得 F(string) = s2 ,其中 S2 的长度为 10 个字符,并且所有 S2 字符串都是唯一映射的

In my work I have many users and each users have set of files in there home directories. Due to some pre defined rules I have given each file a UID (unique identification), based on the user file content and its creation time. But now I came to know that the number of files in user account cannot exceed say 1 million. The current UID is about 32 characters long. Is there any way through which I can bring down my UID to about 6 (ideal condition) character to about 10-12 character long as the current uidl is using lots of space in my NoSQL database.

Current uidl looks like
timestamp.prrocess_whichcreated_it.size

EDIT
Let me rephrase the problem. What I actually need is a compressing algo:
For e.g.

I have list of 1,000,000 strings( each unique )and each 32 character long. I need a compress function f, such that F(string) = s2 , where S2 is of length 10 characters and all the S2 strings are uniquely mapped

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

错々过的事 2025-01-04 21:00:51

对您的 UID 进行排序并用新的 UID 替换旧的 UID,指示旧 UID 的排序数组中的索引,

简化的伪代码应如下所示:

sorted <- sort(UID's)
for each file:
  file.UID <- sorted.indexOf(file.UID)

Sort your UID's and replace the old UID's with a new UID indicating the index in the sorted array of the old UID's

a simplified pseudo code should look like that:

sorted <- sort(UID's)
for each file:
  file.UID <- sorted.indexOf(file.UID)
星光不落少年眉 2025-01-04 21:00:51

很难采用唯一的 id 来压缩它并保持它的唯一性。你往往会遇到碰撞。

@amit 的建议确实是最好的。也许他的实现有点圆滑。

创建一个带有自动递增整数“ID”列和字符串/varchar“OldGUID”的表怎么样?将所有旧的/当前的 GUID 插入表中,现在 GUID 和更短/压缩的“ID”之间有一对一的匹配。当您创建新的 GUID 时,只需将它们插入表中,您将继续进行一对一匹配,以便您可以在长版本和短版本之间来回切换。

It very difficult to take a UNIQUE id compress it and keep it UNIQUE. You tend to run into collisions.

@amit's suggestion really is the best one. Perhaps his implementation was a bit glib though.

How about you create a table with an AUTO INCREMENTING INTEGER "ID" column and a string/varchar "OldGUID". INSERT all your old/current GUIDs into the table and now you have a 1-to-1 match between the GUID and a shorter/compressed "ID". As you create new GUIDs just INSERT them into the table and you'll continue having the 1-to-1 match so you can switch back and forth between long and short version.

怎樣才叫好 2025-01-04 21:00:51

如果您只需要一个唯一标识符,那么我的第一个想法是UUID

然而,通用UUID将消耗16个字节,并且是二进制格式。它不满足您对 6 个字符的要求。与当前使用 32 个字符的方法相比,它“仅”节省了 50% 的空间。

因此,更温和的方案是使用 64 位 UID(8 字节)和通用哈希函数。有了良好的哈希值,只要生成的 UID 总数低于 <0,冲突的概率就保持相当合理。一亿。如果这看起来可以接受,那么 8 字节似乎非常接近您的空间需求。

If you only need a Unique Identifier, then my first thought goes to UUID.

However, generic UUID will consume 16 bytes, and is binary format. It does not meat your requirement of 6 characters. Compared to your current method using 32 characters, it "only" saves 50% space.

Therefore, a milder scheme would be to use 64-bit UID (8 bytes) with a general Hash Function. With a good hash, the probability of collision remains fairly reasonable as long as the total number of UID generated is below < 100 millions. If that seems acceptable, then 8-bytes seems pretty close to your space requirement.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文