Python - 为什么使用 uuid4() 以外的任何东西来表示唯一字符串?
我看到一些针对上传图像名称、会话 ID 等的唯一字符串生成的实现已被放弃,其中许多都使用 SHA1 或其他哈希值。
我并不是质疑使用这样的自定义方法的合法性,而只是质疑原因。如果我想要一个唯一的字符串,我只需这样说:
>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')
我就完成了。在阅读 uuid 之前我并不是很信任,所以我这样做了:
>>> import uuid
>>> s = set()
>>> for i in range(5000000): # That's 5 million!
>>> s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000
没有一个中继器(考虑到赔率是 1.108e+50,我不希望有一个中继器,但看到它的实际应用令人欣慰)。通过组合 2 个 uuid4() 来创建字符串,您甚至可以将成功率降低一半。
那么,话虽如此,为什么人们花时间在 random() 和其他独特字符串等上?关于 uuid 是否存在重要的安全问题或其他问题?
I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.
I'm not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:
>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')
And I'm done with it. I wasn't very trusting before I read up on uuid, so I did this:
>>> import uuid
>>> s = set()
>>> for i in range(5000000): # That's 5 million!
>>> s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000
Not one repeater (I wouldn't expect one now considering the odds are like 1.108e+50, but it's comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()
s.
So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用哈希来唯一标识资源允许您从对象生成“唯一”引用。例如,Git 使用 SHA 散列来生成唯一的散列,表示单个提交的确切变更集。由于散列是确定性的,因此您每次都会为同一个文件获得相同的散列。
世界各地的两个人可以独立地对同一个存储库进行相同的更改,Git 会知道他们做了相同的更改。 UUID v1、v2 和 v4 无法支持这一点,因为它们与文件或文件内容无关。
Using a hash to uniquely identify a resource allows you to generate a 'unique' reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you'll get the same hash for the same file every time.
Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can't support that since they have no relation to the file or the file's contents.
好吧,有时你想要碰撞。如果有人两次上传相同的图像,也许您宁愿告诉他们这是重复的,而不是用新名称制作另一个副本。
Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you'd rather tell them it's a duplicate rather than just make another copy with a new name.
一个可能的原因是您希望唯一的字符串是人类可读的。 UUID 并不容易读取。
One possible reason is that you want the unique string to be human-readable. UUIDs just aren't easy to read.
uuid 很长且毫无意义(例如,如果您按 uuid 排序,则会得到无意义的结果)。
而且,由于它太长,我不想将其放入 URL 中或以任何形式向用户公开。
uuids are long, and meaningless (for instance, if you order by uuid, you get a meaningless result).
And, because it's too long, I wouldn't want to put it in a URL or expose it to the user in any shape or form.
除了其他答案之外,哈希对于那些应该不可变的东西来说确实很有用。该名称是唯一的,可用于随时检查其所附内容的完整性。
In addition to the other answers, hashes are really good for things that should be immutable. The name is unique and can be used to check the integrity of whatever it is attached to at any time.
另请注意,其他类型的 UUID 也可能是合适的。例如,如果您希望标识符可订购,则 UUID1 部分基于时间戳。这完全取决于您的应用程序要求......
Also note other kinds of UUID could even be appropriate. For example, if you want your identifier to be orderable, UUID1 is based in part on a timestamp. It's all really about your application requirements...