带着面具搜索

发布于 2024-11-18 22:12:21 字数 449 浏览 8 评论 0原文

有大量具有以下类型的条目:

typedef struct {
    int value;
    int mask;
    int otherData;
} Entry;

我想根据提供的 int key; 尽可能快地在该数组中找到一个条目。该条目需要确保(key & mask) == value

这种数组组织的最佳方式是什么?相应的算法是什么?

编辑:数组组织没有限制;它是静态的,可以在查找之前准备好。 valuemask 可以具有任意值。

Edit2:valuemask可以具有任意值,但数组中的条目数约为10000。因此,可以提前计算某些“模式”。

查找次数很大。

There is big array of entries having the following type:

typedef struct {
    int value;
    int mask;
    int otherData;
} Entry;

I'd like to find an entry in this array according to provided int key; as fast as posible. The entry is required to ensure that (key & mask) == value.

What will be the best way for such array organization and what is the corresponding algorithm processing it?

Edit: There is no restrictions on the array organization; it is static and can be prepared before lookup. The value and mask may have arbitrary values.

Edit2: value and mask may have arbitrary values, but number of entries in the array is about 10000. So, certain "paterns" can be calculated in advance.

The number of lookups is big.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

分开我的手 2024-11-25 22:12:21

每个位都是独立的,因此在预处理阶段[*],您可以对每个条目分类 32 次(或者无论您的 int 有多大)次。每个分类存储 2 个集合:当 key 为 0 时在该位匹配的集合和当 key 为 1 时匹配的集合。

也就是说,如果 value == 1 且 mask = = 0 在该位,那么该分类根本不存储该条目,因为它与 key 的任何值都不匹配(事实上,无论您使用什么方案,此类条目都应该是在任何预处理阶段都被删除,所以分类应该存储一个条目,即使只有一位是这样的)。如果均为 0,则存储到两个集合中。否则存储到两个集合之一中。

然后,根据您的密钥,您希望找到 32 个集合的快速交集。

根据原始数组的大小,存储每个集合的最佳方式可能是一个巨大的位数组,指示数组中的每个条目是否在该集合中。然后查找交集可以一次一个字完成 - & 总共 32 个字,每个位数组中一个字。如果结果为0,则继续。如果结果非 0,则表示有匹配项,并且结果中设置的位会告诉您哪个条目是匹配项。当然,这在数组的大小上仍然是线性的,事实上,您正在执行 31 个 & 操作来检查 32 个条目是否匹配,这与简单的线性搜索大致相同通过原始数组。但比较和分支较少,并且您正在查看的数据更加压缩,因此您可能会获得更好的性能。

或者可能有更好的方法来进行交叉。

如果键倾向于重复使用,那么您应该将查找结果缓存在从键到条目的映射中。如果可能的键的数量相当小(也就是说,如果可能的输入明显少于 2^32 个键,和/或您有大量可用内存),那么您的预处理阶段可能就是:

  1. 依次获取每个条目
  2. 找出它匹配的可能的键
  3. 将其添加到这些键的映射中

[*] 在没有任何预处理的情况下,显然您所能做的就是检查每个数组成员,直到找到匹配项或者检查所有内容。

Each bit is independent, so in a preprocessing phase[*] you could classify each entry 32 (or however big your int is) times. Each classification stores 2 sets: those which match at that bit when key is 0 and those which match when key is 1.

That is, if value == 1 and mask == 0 at that bit, then that classification doesn't store that entry at all, since it doesn't match any value of key (in fact, no matter what scheme you use, such entries should be removed during any preprocessing stage, so no classification should store an entry if even one bit is like this). If both 0, store into both sets. Otherwise store into one of the two sets.

Then, given your key, you want to find a fast intersection of 32 sets.

Depending on the size of the original array, it may be that the best way to store each set is a giant bit array indicating whether each entry in the array is in the set or not. Then finding the intersection can be done a word at a time - & together 32 words, one from each bit array. If the result is 0, keep going. If the result is non-0, you have a match, and the bit that's set in the result tells you which entry is the match. This is still linear in the size of the array, of course, and in fact you're doing 31 & operations to check 32 entries for a match, which is about the same as the simple linear search through the original array. But there's less comparison and branching, and the data you're looking at is more compressed, so you might get better performance.

Or there might be a better way to do the intersection.

If keys tend to be re-used then you should cache the results of the lookup in a map from keys to entries. If the number of possible keys is reasonably small (that is, if significantly less than 2^32 keys are possible inputs, and/or you have a lot of memory available), then your preprocessing phase could just be:

  1. take each entry in turn
  2. work out which possible keys it matches
  3. add it to the map for those keys

[*] Without any preprocessing, obviously all you can do is check every array member until either you find a match or else you've checked everything.

郁金香雨 2024-11-25 22:12:21

由于您没有额外的信息(例如,数组已排序),因此您需要线性搜索 - 遍历数组并检查条件 - 伪代码:

for( size_t index = 0; index < arraySize; index++ ) {
   if( ( array[index].mask & key ) == array[index].value ) ) {
      return index;
   }
}
return -1;

Since you don't have extra information (for example, that the array is sorted) you need a linear search - traverse the array and check the condition - pseudocode:

for( size_t index = 0; index < arraySize; index++ ) {
   if( ( array[index].mask & key ) == array[index].value ) ) {
      return index;
   }
}
return -1;
陪你到最终 2024-11-25 22:12:21
  • 如果您有一个到条目的键映射,那么这将非常容易。

  • 如果你的数组是按键排序的,那么你可以通过一些小的努力来进行字典二分搜索。 [实际上,也许不是!]

  • 事实上,你'我们只需要遍历数组,直到找到您要查找的内容。也就是说,从头到尾迭代并在找到它时停止。

顺便说一句,这是一个很好的例子,说明了数据结构的选择如何影响算法的可用性。如果你一开始就选择了错误的数据结构,你就不能只是用算法来解决问题!

  • If you instead had a map of keys to Entries, then this would be really easy.

  • If your array were sorted by key, then you could do a lexicographic binary search with some small effort. [actually, maybe not!]

  • As it is, you're just going to have to traverse the array until you find what you're looking for. That is, iterate from start to end and stop when you find it.

As an aside, this is a great example of how a choice of data structure affects the availability of algorithms down the line. You can't just throw algorithms at a problem if you picked the wrong data structures in the first place!

却一份温柔 2024-11-25 22:12:21

线性搜索当然可以,但如果您需要使用相同键进行多次查找,您可以尝试首先根据 (key & mask) 对范围进行排序。如果您只有几个固定键,您可以尝试使用 boost.multi_index,每个键值有一个索引。

A linear search would of course work, but if you need many lookups with the same key, you could try sorting the range first according to (key & mask). If you only have a few, fixed keys, you could try using a boost.multi_index, with one index for each key value.

牵你手 2024-11-25 22:12:21

如果每个条目的 mask 任意变化,我看不到太多
线性搜索的替代方案。如果有重大限制
mask,这样只有少数几个值是可能的,最好
mask的每个值使用某种map,进行线性搜索
找到第一个包含您要查找的值的map
或者,如果掩码仅涉及几个位,则可能值得
使用 multimap,按 value 排序,并用所有的 and 进行掩码
mask ,并用 key 进行索引处理相同,然后是一个线性
使用完整的key 进行搜索以查找完全匹配的内容。

If the mask varies arbitrarily for each entry, I don't see much
alternative to a linear search. If there are significant constraints on
mask, such that only a few values are possible, it might be better to
use some sort of map for each value of mask, doing a linear search
to find the first map which contained the value you are looking for.
Alternatively, if the masks only concern a few bits, it may be worth
using a multimap, ordered by value masked with an and of all of
the masks, and indexed with key handled the same, then a linear
search using the full key to find the exact match.

猥︴琐丶欲为 2024-11-25 22:12:21

如果掩码中的零位数量很少,您可以为掩码中的每个“无关”位复制该条目。例如,如果 value=0mask=0xfffe,那么您需要在表中为 key=0key 添加一个条目=1。对于 value=0mask=0xfeef,在表中放入 4 个条目:key=0x0000key=0x0010 >、key=0x0100key=0x0110。现在,您可以对条目进行排序并使用二分搜索,或使用二分搜索结构,例如 std::map

If the number of zero bits in your mask is small, you could duplicate the entry for each "don't-care" bit in the mask. For example if value=0 and mask=0xfffe then you'd put an entry in the table for key=0 and key=1. For value=0 and mask=0xfeef, put 4 entries in the table: key=0x0000, key=0x0010, key=0x0100, and key=0x0110. Now you can sort the entries and use a binary search, or use a binary search structure such as std::map.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文