如何通过算法对键空间进行分区?
这与一致性哈希有关,虽然我在概念上理解我需要做什么,但我很难将其转换为代码。
我试图将给定的键空间(例如 128 位)划分为大小相等的分区。我想要每个分区的上限(最高键)。
基本上,我将如何完成这个?
#define KEYSPACE_BYTE_SIZE 16
#define KEYSPACE_BIT_SIZE (KEYSPACE_BYTE_SIZE * 8)
typedef struct _key
{
char byte[KEYSPACE_BYTE_SIZE];
} key;
key * partition_keyspace( int num_partitions )
{
key * partitions = malloc( sizeof(key) * num_partitions );
// ...
}
编辑:
我想另一种说法是:
for (i = 0; i < num_partitions; i++)
{
partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}
当然问题是 2 ^ 128 是一个非常大的数字,不能包含在任何单个整数变量中C 用来进行数学计算(因此是 char[16] 结构)。
我真的不想为此使用大量库(或任何库)。
编辑:
尽管如此,实际上我正在寻找的数字是:
for (i = 0; i < num_partitions; i++)
{
partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}
This is related to consistent hashing and while I conceptually understand what I need to do, I'm having a hard time translating this into code.
I'm trying to divide a given keyspace (say, 128 bits) into equal sized partitions. I want the upper bound (highest key) of each partition.
Basically, how would I complete this?
#define KEYSPACE_BYTE_SIZE 16
#define KEYSPACE_BIT_SIZE (KEYSPACE_BYTE_SIZE * 8)
typedef struct _key
{
char byte[KEYSPACE_BYTE_SIZE];
} key;
key * partition_keyspace( int num_partitions )
{
key * partitions = malloc( sizeof(key) * num_partitions );
// ...
}
Edit:
I suppose another way of saying this is:
for (i = 0; i < num_partitions; i++)
{
partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}
Of course the problem is 2 ^ 128 is a very large number and can't be contained in any single integer variable in C with which to do the math (hence the char[16] struct).
I really don't want to use a large number library (or any library) for this.
Edit:
Although, in actuality the numbers I'm looking for is:
for (i = 0; i < num_partitions; i++)
{
partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
任何特定分区中的最高密钥显然将由所有
1
位组成。如果您的密钥有较低的n
位,而您的分区 ID 有较高的m
位,那么您所需要做的就是运行m< /code> 位计数器,并将其与
,低 6 位用于密钥。每个分区中的最高密钥将是这四个:n
位连接起来。为了说明这一点,假设一个 8 位密钥空间,其中高 2 位用于分区(因此 num_partitions = 2^2 = 4
为了生成它们,您需要做的就是:
当然,这假设 num_partitions 是 2 的幂,
当然,对于像您这样大的键空间来说,它不会。就像上面一样简单,因为您无法将所有内容放入一个变量中,但是只要您的 num_partitions 足够小,您就可以将计数器放入一个变量中。普通的 int 变量,将其复制到高位,然后用 1 填充其余部分就很简单了。
The highest key in any particular partition will obviously be comprised of all
1
-bits. If you have the lowern
bits for your keys, and the upperm
bits for your partition-ids, then all you need to do is run anm
-bit counter, and concatenate it withn
ones.To illustrate, assume an 8-bit keyspace with the upper 2 bits for the partitions (so
num_partitions = 2^2 = 4
, and the lower 6 for the keys. The highest key in each partition will be these four:In order to generate them, all you need to do is:
Of course, this assumes
num_partitions
is a power of two.Naturally, for a key-space as large as yours it won't be as simple as the above, since you can't fit everything into a single variable. Still, the principle remains the same. As long as your
num_partitions
is small enough, you can fit the counter into an ordinaryint
variable, copy it into the upper bits, and then filling the rest with ones is trivial.我不确定我是否理解你问题的上下文 - 我没有研究过一致性哈希。
这个问题几乎等于“如何在不排序的情况下进行排序”。
另一种方法可能是这样做:
这是在线性时间内。然而,它不需要密钥空间的先验知识,除了 nextIter 遵循一些顺序之外。
如果要分区 [0, 2^128] -> {values},例如,您正在做一些分布式计算或其他什么,您的运气要好得多,因为整数结构良好。
我建议使用一个有点愚蠢的想法,即在一个结构中包含 4 个 32 位整数,并编写自己的 bigint 例程来解决您需要解决的问题。
如果您可以自由地不使用 C++,Common Lisp 内置了 bigint。我发现这很方便。
如果你有可表示的键...
但是,当在具有 n 个元素的某个空间 a 中寻找一些大小相等的 k 分区时,我会像这样处理问题:
I am not sure I understand the context of your question - I've not studied consistent hashing.
The question almost amounts to, "how can I sort without sorting".
Another approach might be to do this:
This is in linear time. However, it requires no a priori knowledge of the key space except that there is some order which nextIter obeys.
If you are partitioning [0, 2^128] -> {values}, e.g., you're doing some distributed computing or whathave you, you're in much better luck, since integers are well-structured.
I would suggest the slightly silly idea of having 4 32-bit ints in a struct and writing your own bigint routine that solves what you need to solve.
If you have the freedom to not use C++, Common Lisp has bigints built in. I've found that handy.
If you have representable keys...
However, when seeking some equally sized k partitions in some space a with n elements, I would approach the problem like this:
根据 tzaman 的回答,这是我的解决方案。它允许最多 255 个分区(尽管这可以更改)。它不需要 2 num_partitions 的幂...它只会让最后一个分区占用剩余的空间。
如果您发现任何错误,请告诉我...:)
Based on tzaman's answer, here is my solution. It allows up to 255 partitions (although this could be altered). It does NOT require a power of 2 num_partitions... it'll just make the last partition take up whatever's left.
Let me know if you see any bugs... :)