如何通过算法对键空间进行分区？

发布于 2024-09-03 12:41:36 字数 923 浏览 5 评论 0原文

这与一致性哈希有关，虽然我在概念上理解我需要做什么，但我很难将其转换为代码。

我试图将给定的键空间（例如 128 位）划分为大小相等的分区。我想要每个分区的上限（最高键）。

基本上，我将如何完成这个？

#define KEYSPACE_BYTE_SIZE  16
#define KEYSPACE_BIT_SIZE   (KEYSPACE_BYTE_SIZE * 8)

typedef struct _key
{ 
    char byte[KEYSPACE_BYTE_SIZE];
} key;

key * partition_keyspace( int num_partitions )
{
    key * partitions = malloc( sizeof(key) * num_partitions );

    // ...

}

编辑：

我想另一种说法是：

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}

当然问题是 2 ^ 128 是一个非常大的数字，不能包含在任何单个整数变量中C 用来进行数学计算（因此是 char[16] 结构）。

我真的不想为此使用大量库（或任何库）。

编辑：

尽管如此，实际上我正在寻找的数字是：

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}

原文

This is related to consistent hashing and while I conceptually understand what I need to do, I'm having a hard time translating this into code.

I'm trying to divide a given keyspace (say, 128 bits) into equal sized partitions. I want the upper bound (highest key) of each partition.

Basically, how would I complete this?

#define KEYSPACE_BYTE_SIZE  16
#define KEYSPACE_BIT_SIZE   (KEYSPACE_BYTE_SIZE * 8)

typedef struct _key
{ 
    char byte[KEYSPACE_BYTE_SIZE];
} key;

key * partition_keyspace( int num_partitions )
{
    key * partitions = malloc( sizeof(key) * num_partitions );

    // ...

}

Edit:

I suppose another way of saying this is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}

Of course the problem is 2 ^ 128 is a very large number and can't be contained in any single integer variable in C with which to do the math (hence the char[16] struct).

I really don't want to use a large number library (or any library) for this.

Edit:

Although, in actuality the numbers I'm looking for is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

好倦 2024-09-10 12:41:36

任何特定分区中的最高密钥显然将由所有 1 位组成。如果您的密钥有较低的 n 位，而您的分区 ID 有较高的 m 位，那么您所需要做的就是运行 m< /code> 位计数器，并将其与 n 位连接起来。为了说明这一点，假设一个 8 位密钥空间，其中高 2 位用于分区（因此 num_partitions = 2^2 = 4，低 6 位用于密钥。每个分区中的最高密钥将是这四个：

为了生成它们，您需要做的就是：

for (int i = 0; i < num_partitions; i++)
    highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.

当然，这假设 num_partitions 是 2 的幂，

当然，对于像您这样大的键空间来说，它不会。就像上面一样简单，因为您无法将所有内容放入一个变量中，但是只要您的 num_partitions 足够小，您就可以将计数器放入一个变量中。普通的 int 变量，将其复制到高位，然后用 1 填充其余部分就很简单了。

The highest key in any particular partition will obviously be comprised of all 1-bits. If you have the lower n bits for your keys, and the upper m bits for your partition-ids, then all you need to do is run an m-bit counter, and concatenate it with n ones.
To illustrate, assume an 8-bit keyspace with the upper 2 bits for the partitions (so num_partitions = 2^2 = 4, and the lower 6 for the keys. The highest key in each partition will be these four:

In order to generate them, all you need to do is:

for (int i = 0; i < num_partitions; i++)
    highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.

Of course, this assumes num_partitions is a power of two.

Naturally, for a key-space as large as yours it won't be as simple as the above, since you can't fit everything into a single variable. Still, the principle remains the same. As long as your num_partitions is small enough, you can fit the counter into an ordinary int variable, copy it into the upper bits, and then filling the rest with ones is trivial.

回复收藏 0 原文

以往的大感动 2024-09-10 12:41:36

我不确定我是否理解你问题的上下文 - 我没有研究过一致性哈希。

这个问题几乎等于“如何在不排序的情况下进行排序”。

另一种方法可能是这样做：

iter = seed() #initialize to the bottom of the hash keys
for(i = 0 to partitionbound)
{
   iter = nextIter(iter);
}

这是在线性时间内。然而，它不需要密钥空间的先验知识，除了 nextIter 遵循一些顺序之外。

如果要分区 [0, 2^128] -> {values}，例如，您正在做一些分布式计算或其他什么，您的运气要好得多，因为整数结构良好。

我建议使用一个有点愚蠢的想法，即在一个结构中包含 4 个 32 位整数，并编写自己的 bigint 例程来解决您需要解决的问题。

如果您可以自由地不使用 C++，Common Lisp 内置了 bigint。我发现这很方便。

如果你有可表示的键...

但是，当在具有 n 个元素的某个空间 a 中寻找一些大小相等的 k 分区时，我会像这样处理问题：

if( n % k)
{
   return "not equal-sized partition!"
}
//could be forking/threading, whatever.
for(int i = 0; i < n; i+=k)
{
   process(i, i+k-1);
}


process(bottom, top)
{
   sort(a[bottom], a[top]);
   return a[top]; //you'll have to figure out where to dump the results.
}

I am not sure I understand the context of your question - I've not studied consistent hashing.

The question almost amounts to, "how can I sort without sorting".

Another approach might be to do this:

iter = seed() #initialize to the bottom of the hash keys
for(i = 0 to partitionbound)
{
   iter = nextIter(iter);
}

This is in linear time. However, it requires no a priori knowledge of the key space except that there is some order which nextIter obeys.

If you are partitioning [0, 2^128] -> {values}, e.g., you're doing some distributed computing or whathave you, you're in much better luck, since integers are well-structured.

I would suggest the slightly silly idea of having 4 32-bit ints in a struct and writing your own bigint routine that solves what you need to solve.

If you have the freedom to not use C++, Common Lisp has bigints built in. I've found that handy.

If you have representable keys...

However, when seeking some equally sized k partitions in some space a with n elements, I would approach the problem like this:

if( n % k)
{
   return "not equal-sized partition!"
}
//could be forking/threading, whatever.
for(int i = 0; i < n; i+=k)
{
   process(i, i+k-1);
}


process(bottom, top)
{
   sort(a[bottom], a[top]);
   return a[top]; //you'll have to figure out where to dump the results.
}

回复收藏 0 原文

凌乱心跳 2024-09-10 12:41:36

根据 tzaman 的回答，这是我的解决方案。它允许最多 255 个分区（尽管这可以更改）。它不需要 2 num_partitions 的幂...它只会让最后一个分区占用剩余的空间。

如果您发现任何错误，请告诉我...:)

key * partition_keyspace( unsigned int num_partitions )
{
    assert( num_partitions > 0 );
    assert( num_partitions < 0xFF );

    key * partitions = (key *) malloc( sizeof(key) * num_partitions );

    // fill every bit
    memset( partitions, 0xFF, sizeof(key) * num_partitions );

    // calculate how many bits of the top byte needs to be filled by 1's
    unsigned char fill_bits = 0;
    while (num_partitions > (1 << fill_bits)) fill_bits++;
    fill_bits = 8 - fill_bits;

    // fill the top byte with the base number of 1's
    unsigned char fill_part = 0;
    for (unsigned int i = 0; i < fill_bits; i++) fill_part |= 1 << i;

    // last partition takes up whatever remains, so don't process it (hence the -1)
    for (unsigned char i = 0; i < num_partitions - 1; i++)
    {
        partitions[i].byte[0] = fill_part | (i << fill_bits);
    }

    return partitions;
}

Based on tzaman's answer, here is my solution. It allows up to 255 partitions (although this could be altered). It does NOT require a power of 2 num_partitions... it'll just make the last partition take up whatever's left.

Let me know if you see any bugs... :)

key * partition_keyspace( unsigned int num_partitions )
{
    assert( num_partitions > 0 );
    assert( num_partitions < 0xFF );

    key * partitions = (key *) malloc( sizeof(key) * num_partitions );

    // fill every bit
    memset( partitions, 0xFF, sizeof(key) * num_partitions );

    // calculate how many bits of the top byte needs to be filled by 1's
    unsigned char fill_bits = 0;
    while (num_partitions > (1 << fill_bits)) fill_bits++;
    fill_bits = 8 - fill_bits;

    // fill the top byte with the base number of 1's
    unsigned char fill_part = 0;
    for (unsigned int i = 0; i < fill_bits; i++) fill_part |= 1 << i;

    // last partition takes up whatever remains, so don't process it (hence the -1)
    for (unsigned char i = 0; i < num_partitions - 1; i++)
    {
        partitions[i].byte[0] = fill_part | (i << fill_bits);
    }

    return partitions;
}

回复收藏 0 原文

~没有更多了~