如何通过算法对键空间进行分区?

发布于 2024-09-03 12:41:36 字数 923 浏览 5 评论 0原文

这与一致性哈希有关,虽然我在概念上理解我需要做什么,但我很难将其转换为代码。

我试图将给定的键空间(例如 128 位)划分为大小相等的分区。我想要每个分区的上限(最高键)。

基本上,我将如何完成这个?

#define KEYSPACE_BYTE_SIZE  16
#define KEYSPACE_BIT_SIZE   (KEYSPACE_BYTE_SIZE * 8)

typedef struct _key
{ 
    char byte[KEYSPACE_BYTE_SIZE];
} key;

key * partition_keyspace( int num_partitions )
{
    key * partitions = malloc( sizeof(key) * num_partitions );

    // ...

}

编辑:

我想另一种说法是:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}

当然问题是 2 ^ 128 是一个非常大的数字,不能包含在任何单个整数变量中C 用来进行数学计算(因此是 char[16] 结构)。

我真的不想为此使用大量库(或任何库)。

编辑:

尽管如此,实际上我正在寻找的数字是:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}

This is related to consistent hashing and while I conceptually understand what I need to do, I'm having a hard time translating this into code.

I'm trying to divide a given keyspace (say, 128 bits) into equal sized partitions. I want the upper bound (highest key) of each partition.

Basically, how would I complete this?

#define KEYSPACE_BYTE_SIZE  16
#define KEYSPACE_BIT_SIZE   (KEYSPACE_BYTE_SIZE * 8)

typedef struct _key
{ 
    char byte[KEYSPACE_BYTE_SIZE];
} key;

key * partition_keyspace( int num_partitions )
{
    key * partitions = malloc( sizeof(key) * num_partitions );

    // ...

}

Edit:

I suppose another way of saying this is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}

Of course the problem is 2 ^ 128 is a very large number and can't be contained in any single integer variable in C with which to do the math (hence the char[16] struct).

I really don't want to use a large number library (or any library) for this.

Edit:

Although, in actuality the numbers I'm looking for is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

好倦 2024-09-10 12:41:36

任何特定分区中的最高密钥显然将由所有 1 位组成。如果您的密钥有较低的 n 位,而您的分区 ID 有较高的 m 位,那么您所需要做的就是运行 m< /code> 位计数器,并将其与 n 位连接起来。
为了说明这一点,假设一个 8 位密钥空间,其中高 2 位用于分区(因此 num_partitions = 2^2 = 4
,低 6 位用于密钥。每个分区中的最高密钥将是这四个:

00 111111
01 111111
10 111111
11 111111

为了生成它们,您需要做的就是:

for (int i = 0; i < num_partitions; i++)
    highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.

当然,这假设 num_partitions 是 2 的幂,

当然,对于像您这样大的键空间来说,它不会。就像上面一样简单,因为您无法将所有内容放入一个变量中,但是只要您的 num_partitions 足够小,您就可以将计数器放入一个变量中。普通的 int 变量,将其复制到高位,然后用 1 填充其余部分就很简单了。

The highest key in any particular partition will obviously be comprised of all 1-bits. If you have the lower n bits for your keys, and the upper m bits for your partition-ids, then all you need to do is run an m-bit counter, and concatenate it with n ones.
To illustrate, assume an 8-bit keyspace with the upper 2 bits for the partitions (so num_partitions = 2^2 = 4, and the lower 6 for the keys. The highest key in each partition will be these four:

00 111111
01 111111
10 111111
11 111111

In order to generate them, all you need to do is:

for (int i = 0; i < num_partitions; i++)
    highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.

Of course, this assumes num_partitions is a power of two.

Naturally, for a key-space as large as yours it won't be as simple as the above, since you can't fit everything into a single variable. Still, the principle remains the same. As long as your num_partitions is small enough, you can fit the counter into an ordinary int variable, copy it into the upper bits, and then filling the rest with ones is trivial.

以往的大感动 2024-09-10 12:41:36

我不确定我是否理解你问题的上下文 - 我没有研究过一致性哈希。


这个问题几乎等于“如何在不排序的情况下进行排序”。

另一种方法可能是这样做:

iter = seed() #initialize to the bottom of the hash keys
for(i = 0 to partitionbound)
{
   iter = nextIter(iter);
}

这是在线性时间内。然而,它不需要密钥空间的先验知识,除了 nextIter 遵循一些顺序之外。

如果要分区 [0, 2^128] -> {values},例如,您正在做一些分布式计算或其他什么,您的运气要好得多,因为整数结构良好。

我建议使用一个有点愚蠢的想法,即在一个结构中包含 4 个 32 位整数,并编写自己的 bigint 例程来解决您需要解决的问题。

如果您可以自由地使用 C++,Common Lisp 内置了 bigint。我发现这很方便。


如果你有可表示的键...

但是,当在具有 n 个元素的某个空间 a 中寻找一些大小相等的 k 分区时,我会像这样处理问题:

if( n % k)
{
   return "not equal-sized partition!"
}
//could be forking/threading, whatever.
for(int i = 0; i < n; i+=k)
{
   process(i, i+k-1);
}


process(bottom, top)
{
   sort(a[bottom], a[top]);
   return a[top]; //you'll have to figure out where to dump the results.
}

I am not sure I understand the context of your question - I've not studied consistent hashing.


The question almost amounts to, "how can I sort without sorting".

Another approach might be to do this:

iter = seed() #initialize to the bottom of the hash keys
for(i = 0 to partitionbound)
{
   iter = nextIter(iter);
}

This is in linear time. However, it requires no a priori knowledge of the key space except that there is some order which nextIter obeys.

If you are partitioning [0, 2^128] -> {values}, e.g., you're doing some distributed computing or whathave you, you're in much better luck, since integers are well-structured.

I would suggest the slightly silly idea of having 4 32-bit ints in a struct and writing your own bigint routine that solves what you need to solve.

If you have the freedom to not use C++, Common Lisp has bigints built in. I've found that handy.


If you have representable keys...

However, when seeking some equally sized k partitions in some space a with n elements, I would approach the problem like this:

if( n % k)
{
   return "not equal-sized partition!"
}
//could be forking/threading, whatever.
for(int i = 0; i < n; i+=k)
{
   process(i, i+k-1);
}


process(bottom, top)
{
   sort(a[bottom], a[top]);
   return a[top]; //you'll have to figure out where to dump the results.
}
凌乱心跳 2024-09-10 12:41:36

根据 tzaman 的回答,这是我的解决方案。它允许最多 255 个分区(尽管这可以更改)。它不需要 2 num_partitions 的幂...它只会让最后一个分区占用剩余的空间。

如果您发现任何错误,请告诉我...:)

key * partition_keyspace( unsigned int num_partitions )
{
    assert( num_partitions > 0 );
    assert( num_partitions < 0xFF );

    key * partitions = (key *) malloc( sizeof(key) * num_partitions );

    // fill every bit
    memset( partitions, 0xFF, sizeof(key) * num_partitions );

    // calculate how many bits of the top byte needs to be filled by 1's
    unsigned char fill_bits = 0;
    while (num_partitions > (1 << fill_bits)) fill_bits++;
    fill_bits = 8 - fill_bits;

    // fill the top byte with the base number of 1's
    unsigned char fill_part = 0;
    for (unsigned int i = 0; i < fill_bits; i++) fill_part |= 1 << i;

    // last partition takes up whatever remains, so don't process it (hence the -1)
    for (unsigned char i = 0; i < num_partitions - 1; i++)
    {
        partitions[i].byte[0] = fill_part | (i << fill_bits);
    }

    return partitions;
}

Based on tzaman's answer, here is my solution. It allows up to 255 partitions (although this could be altered). It does NOT require a power of 2 num_partitions... it'll just make the last partition take up whatever's left.

Let me know if you see any bugs... :)

key * partition_keyspace( unsigned int num_partitions )
{
    assert( num_partitions > 0 );
    assert( num_partitions < 0xFF );

    key * partitions = (key *) malloc( sizeof(key) * num_partitions );

    // fill every bit
    memset( partitions, 0xFF, sizeof(key) * num_partitions );

    // calculate how many bits of the top byte needs to be filled by 1's
    unsigned char fill_bits = 0;
    while (num_partitions > (1 << fill_bits)) fill_bits++;
    fill_bits = 8 - fill_bits;

    // fill the top byte with the base number of 1's
    unsigned char fill_part = 0;
    for (unsigned int i = 0; i < fill_bits; i++) fill_part |= 1 << i;

    // last partition takes up whatever remains, so don't process it (hence the -1)
    for (unsigned char i = 0; i < num_partitions - 1; i++)
    {
        partitions[i].byte[0] = fill_part | (i << fill_bits);
    }

    return partitions;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文