哈希函数-sql studio Express
我需要在我的表上创建一个哈希键以确保唯一性,有人向我提到了 md5。但我读过有关校验和和二进制和的内容;这不是达到同样的目的吗?确保特定字段中不重复。 现在我设法实现了这一点,并且我在表中看到了 hask 键。 自从我使用这些哈希键创建新索引键以来,我是否需要更改最初创建的索引键?另外我需要换钥匙吗? 如何更改示例 SELECT 语句的查询? 我想我仍然不确定哈希键除了唯一性之外对查询有何真正帮助?
I need to create a hash key on my tables for uniqueness and someone mentioned to me about md5. But I have read about checksum and binary sum; would this not serve the same purpose? To ensure no duplicates in a specific field.
Now I managed to implement this and I see the hask keys in my tables.
Do I need to alter index keys originally created since I created a new index key with these hash keys? Also do I need to change the keys?
How do I change my queries for example SELECT statements?
I guess I am still unsure how hash keys really help in queries other than uniqueness?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您的目标是确保特定字段中没有重复项,为什么不直接对该字段应用唯一索引并让数据库引擎执行其预期的操作呢?
If your goal is to ensure no duplicates in a specific field, why not just apply a unique index to that field and let the database engine do what it was meant to do?
编写唯一函数来替换 SQL Server 唯一约束/索引是没有意义的。
您将如何确保哈希值是唯一的?有约束吗?
如果您对其进行索引(由于确定性,这可能是不允许的),那么优化器会将其视为非唯一的。还有杀戮表演。
而你只有区区 100,000 行。花生。
如果有时间,我可以提出更多论点,但我会总结一下:不要这样做
It makes no sense to write a unique function to replace SQL Server unique constraints/indexes.
How are you going to ensure the hash is unique? With a constraint?
If you index it (which may not be allowed because of determinism), then the optimiser will treat it as non-unique. As well as killing performance.
And you only have a few 100,000 rows. Peanuts.
Given time I could come up with more arguments, but I'll summarise: Don't do it
总有 HashBytes() 函数。它支持 md5,但如果您不喜欢它,可以选择 sha1。
至于这如何帮助查询:一个简单的例子是,如果您有一个很大的 varchar 列(可能是 varchar max),并且在查询中您想知道该列的内容是否与特定字符串匹配。如果您必须将搜索与每条记录进行比较,那么速度可能会很慢。但是,如果您对搜索字符串进行散列并使用它,事情会变得更快,因为现在它只是一个非常短的二进制比较。
There's always the HashBytes() function. It supports md5, but if you don't like it there's an option for sha1.
As for how this can help queries: one simple example is if you have a large varchar column — maybe varchar max — and in your query you want to know if the contents of this column match a particular string. If you have to compare your search with every single record it could be slow. But if you hash your search string and use that, things can go much faster since now it's just a very short binary compare.
以加密方式保存的哈希函数是一种单向函数,与非加密方式安全的函数相比,它们会消耗更多的资源(CPU 周期)。如果您只需要函数作为哈希键,则不需要这样的属性。您所需要的只是与均匀性相关的低碰撞概率。尝试使用 CRC 或者如果您有字符串或数字的模数。
http://en.wikipedia.org/wiki/Hash_function
Cryptographically save Hash functions are one way functions and they consume more resources (CPU cycles) that functions that are not cryptographically secure. If you just need function as hash key you do not need such property. All you need is low probability for collisions what is related whit uniformity. Try whit CRC or if you have strings or modulo for numbers.
http://en.wikipedia.org/wiki/Hash_function
为什么不使用默认为 NEWSEQUENTIALID() 的 GUID ..不要使用 NEWID() 因为它对于集群来说很糟糕,请参见此处: 最佳实践:使用 NewId 时不要在 UniqueIdentifier 上集群
将此列设为主键,你就差不多完成了
why don't you use a GUID with a default of NEWSEQUENTIALID() ..don't use NEWID() since it is horrible for clustering, see here: Best Practice: Do not cluster on UniqueIdentifier when you use NewId
make this column the primary key and you are pretty much done