散列方法允许增加桶的数量,而不会弄乱之前的数据映射
是否有一种算法/方法可以让我在不重建数据/重新散列的情况下增加存储桶的数量。
实践中的问题: 假设您有一群由字符串“用户名”标识的用户。 然后,您将这些“用户名”散列到存储桶列表中。
This is done by something like:
String username = "user";
int index = username.hash();
int bucketIndex = index % bucketlist.size();
因此,在这个方案中,如果我想增加“桶”的数量,那么还需要移动桶中的数据。这样它就可以与用不同的数字取模得到的新桶索引相匹配。
这实际上只是一个映射。在哪里可以找到属于给定用户的存储桶。
可能的愚蠢解决方案: 具有旧的存储桶尺寸和新的存储桶尺寸。然后尝试在两个桶中查找。 然后使用 new bucketlist.size() 慢慢移动所有用户,使其匹配。这不需要在散列和移动时完全停止。
需要什么: 确实是所有用户的举动都不好。在许多桶中寻找正确的桶也并不理想。
重点是能够仅通过使用算法来确定要使用列表中的哪个存储桶。
并且不可能将存储桶列表的大小作为用户名的一部分。
如果它的作用大致相同,则不需要像这里所做的那样进行散列。
我不知道这个问题是否有任何明智的答案......
Is there an algorithm/method that lets me increase the number of buckets without rebuilding the data/ re-hashing.
The problem in practice:
Say you have a bunch of users that are identified by a string, "username".
Then you hash these "usernames" to a list of buckets.
This is done by something like:
String username = "user";
int index = username.hash();
int bucketIndex = index % bucketlist.size();
So in this scheme if I one would want to increase the number of "buckets", one would also need to move the data in the buckets. So that it matches the new bucket index that one gets with doing modulo with a different number.
This is really just a mapping. Where to find the bucket that belongs to a given user.
Possible dumb solutions:
Have both the old bucket size and the new bucket size. And then try to look in two buckets.
Then slowly move all the users so that it matches by using new bucketlist.size(). This would not require a total stop, while hashing and moving.
What's needed:
It is really the moving of all users that is bad. And looking in many buckets to find the correct one is also not ideal.
And the whole point is to be able to pinpoint which bucket in the list to use just by using an algorithm.
And it is not possible to have the size of the bucket list as part of the username.
It does not need to be hashing like it is done here if it roughly does the same.
I don't know if there is any sensible answer to this...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有什么方法可以将哈希集的大小预先调整为适合您的数据的大小 - 从而消除或几乎消除重新哈希的需要?另外,即使出现一些重叠,只要冲突不会太深,每个节点的链表或类似的东西进行散列也不会造成太大的伤害。
Any way to pre-size your hash set to something that will fit your data - thus eliminating or nearly eliminating the need to rehash? Also, even if you get some overlap, hashing with linked lists per node or something like it will not hurt too bad as long as the collisions don't get too deep.
我认为您正在寻找线性哈希。
您还可以考虑多种平衡二叉树中的任何一种。它们有一个很好的特性,你可以继续种植它们,而无需在任何时候重新安排世界。
I think that you're looking for linear hashing.
You can also consider any of the many kinds of balanced binary trees. They have the nice property that you can continue to grow them without rearranging the world at any point.