查找与布尔查询匹配的大型 int 数组子集的算法
假设我有一个 M 32 位整数的大数组,其中每个值的设置不超过 N 位。现在我想返回与查询 Target AND Value == Target 匹配的子集,即目标位出现在数组值中设置的值。
暴力破解很简单,只需迭代数组并提取其中 target&value == target 即可。如果 M 变得很大,这会变得太慢。有人知道如何将数组转换为更适合搜索的数据结构吗?
一种方法是存储每个位的数组或值(因此对于 32 位数组,您需要 32 个),然后仅搜索与目标值中的每个位匹配的值。除非 N 接近 32 或者目标设置接近 N 位,否则这会有一点帮助。由于我正在寻找的本质上是部分匹配,因此散列或排序似乎没有帮助。
精确正确的结果是一个要求。这必须在不访问并行硬件(例如 GPU 或使用 SIMD)的情况下工作。
我将使用 C++,但只要一些算法或想法的指针就可以了。最可能的情况是 M=100000 且 N=8 并且被频繁调用。
只是重申一下:我需要部分匹配(例如 item = 011000 匹配 target = 001000)而不是完全匹配。尽管M项是提前已知的,但目标的可能值可以是任何值。
我最终决定坚持使用暴力。对于 80,000 件物品,不值得做任何其他事情。我想如果数据集的大小更像 800,000,000 可能是值得的。
Say I have a large array of M 32 bit ints in which each value has no more than N bits set. Now I want to return the subset which matches the query Target AND Value == Target, i.e. values in which the targets bits appear set in the array's values.
Brute force is easy, simply iterator the array and extract where target&value == target. This becomes way too slow if M gets very large. Anyone have an idea of how to convert the array into a data structure that is more optimal to search?
One way is to store arrays or value for each bit (thus for 32 bit array you need 32 of these) and then only search the values that match each bit in the target value. This helps a little unless N gets close to 32 or the target has close to N bits set. Since what I am looking for is essentially a partial match, hashing or sorting doesn't appear to help.
Exact correct results are a requirement. This will have to work without access to parallel hardware (like a GPU or using SIMD).
I will be using C++ but just some pointers to algorithms or ideas is fine. The most likely case would be M=100000 and N=8 and be called frequently.
Just to reiterate: I need a partial match (like item = 011000 matching target = 001000) not an exact match. Although the M items is known ahead of time, the possible values of targets can be anything.
I finally decided to stick with brute force. For 80,000 items it's not worth doing anything else. I imagine if the size of the dataset were more like 800,000,000 it might be worth it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以构建一个按位特里树。
遍历 trie 时,对于目标中的每个 0,您需要遍历两个分支。
编辑 在测试了快速实现之后,我不会推荐这种方法。蛮力方法比这个方法快约 100 倍。
You could build a bitwise trie.
When traversing the trie, for each 0 in the target, you would need to traverse both branches.
Edit After testing a quick implementation I would NOT recommend this approach. The brute force approach was ~100 faster than this one.
从另一个角度看这个问题怎么样?...将整数集视为一维图片的集合。组织它们的方法之一是将每张图片分成两部分
A
和B
并按类别对所有图片进行排序:A
仅包含零并且B
包含一些已设置的位(至少一个)A
包含一些已设置的位,并且B
仅包含零A
> 和B
包含一些位集(1 和2)A
和B
仅包含零现在,您可以将目标/掩码以相同的方式拆分为相同的部分,并以相同的方式进行分类。之后,您可以推断出下一步(按目标/蒙版类别):
在下一个级别部分
A
和B
再次拆分(因此您有 4 个部分),依此类推。当然,我并不指望它会带来一些加速。但对于某些没有设置太多位的数据集(与基于位的树的变体相反),它可能会工作得更好。
更新:我在 Haskell 变体中加速了 34%:
源代码:
更新:C++11 代码类型。暴力破解需要 10.9444 毫秒,该算法需要 8.69286 毫秒。我通过使打开位的分布输出更加稀疏来作弊。
How about looking at this problem from another view point?.. Consider your set of integers as a collection of one-dimension pictures. One of the way to organize them is to split each picture into two parts
A
andB
and sort all pictures by categories:A
contains only zeroes andB
contains some bits is set (at least one)A
contains some bits set andB
contains only zeroesA
andB
contains some bits set (superset of 1 and 2)A
andB
contains only zeroesNow you do the same split of your target/mask into the same parts and categorize in the same way. After that you can deduce next (by target/mask category):
On the next level parts
A
andB
is splitted again (so you have 4 parts) and so on.Of course I don't expect it to give some speed-up. But for some sets of data where there is not so much bits is set (as opposite to variants with bit-based-tree) it might work better.
Update: I've got speedup for 34% in Haskell variant:
Source code:
Update: Kind of C++11 code. It gives 10.9444 ms for brute-force and 8.69286 ms for this algorithm. I've cheated by making output of distribution of turned on bits more sparse.
该解决方案将占用与 M 中“1”位的数量成比例的内存,
但应该运行得相当快。我假设
集合 M 是静态的,有许多 Target 匹配请求。
预处理:
给定集合M,将其按升序排序。接下来构建一个包含一个的数组
每比特插槽。您使用的是 32 位数字,因此需要一个 32 个槽的数组。调用此数组:MBit[0..31]。
每个插槽包含
指向链表的指针(称之为:MPtr)。链表包含来自 M 的数字,其中
相应的位被设置。为了
例如,M 中设置了位 3 的所有数字都可以在链接列表中找到:MBit[3].MPtr。
基本算法是处理每个 MBit 列表
其中相应的目标编号设置了“1”位。仅所有已处理列表共有的数字
被选中。由于每个 MPtr 列表都包含排序后的数字,我们可以向前扫描,直到找到我们要查找的数字
找到(匹配),找到更大的数字(没有匹配)或列表已用完(不可能再有匹配)。
这种方法的主要缺点是 M 中的相同数字将出现在尽可能多的
链表,因为它有“1”位。
这有点
记忆受到打击,但你必须在某个地方给予一些东西!
概要:
如上所述构建 MBit 数组。
为目标数字构建另一个数组数据结构。该数组包含 1
目标中的每一位槽(称为:TBit[0..31])。每个插槽
包含一个链表指针(称之为:MPtr)和一个布尔值(称之为:BitSet)。 BitSet表示是否对应
目标位已设置。
给定一个新目标:
我可以看到对上述大纲的许多优化,但认为这将是一个好的开始。
This solution will take memory proportional to the number of '1' bits in M,
but should run reasonably quickly. I am assuming
that the set M is static with many Target matching requests.
Preprocessing:
Given the set M, sort it into ascending order. Next construct an array containing one
slot per bit. You are using 32 bit numbers so you need an array of 32 slots. Call this array: MBit[0..31].
Each slot contains
a pointer to a linked list (call it: MPtr). The linked list contains numbers from M where
the corresponding bit is set. For
example all numbers from M having bit 3 set would be found in the linked list: MBit[3].MPtr.
The basic algorithm is to process each of the MBit lists
where the corresponding Target number has a '1' bit set. Only numbers common to all of the processed lists
are selected. Since each MPtr list contains sorted numbers we can scan forward until the number we are looking for
is found (a match), a larger number is found (no match) or the list is exhausted (no more matches possible).
The major drawback to this approach is that the same number from M will appear in as many
linked lists as it has '1' bits.
This is a bit
of a memory hit but you have to give something somewhere!
Outline:
Build the MBit array as outlined above.
Build another array data structure for the Target number. The array contains 1
slot per bit in the Target (call this: TBit[0..31]). Each slot
contains a linked list pointer (call it: MPtr) and a boolean (call it: BitSet). BitSet indicates whether the corresponding
bit of Target is set.
Given a new Target:
I can see a number of optimizations to the above outline but figured this would be a good start.
这似乎是 SQL 数据库擅长的事情。如果您在(MSB、BitsSet、Value)上放置复合索引,您的结果应该非常快。
如果您必须直接使用 C++ 进行操作,我建议尝试模拟 SQL 方法。
This seems like something a SQL database would be good at. If you put a composite index on (MSB, BitsSet, Value) your results should be very fast.
If you must do it in straight C++ I suggest trying to emulate the SQL approach.
一般方法。
按位构建树。一级节点是第一个位,二级节点是第二位,...
当你得到掩码时,你只需否定它,你就知道必须排除树的哪些部分。
可以仅快速遍历相关的槽节点。
N_bits空间解*
只需对这些整数进行适当的排序并使用二分搜索来遍历这棵树即可。
复杂度 O(N_results*N_bits))
看起来它的运行速度比暴力破解 O(N) 快了 3 倍。但这是我用 C++ 编写的第一个代码,所以我可能会错过一些东西。任何关于代码的评论也很酷。
代码如何工作?
它使用的唯一数据结构是输入的排序数组。
在每一步中,它都会使用二分搜索将数组基于绑定分成两部分
std::lower_bound();
如果 mask[深度] 为 1,则不需要继续该树的左侧部分
无论如何,它都必须走对路。
如果你设置像 0xFFFFFFFF 这样的掩码,它总是会正确运行并且会在 log(n) 时间内执行
如果你输入掩码 0x00000000 ,它将返回所有解决方案,因此它将在左右每一步执行,并且执行效果比朴素循环更差。一旦数组的大小小于 10(可以更改),它就会使用简单的方法返回输出向量中的所有解决方案。
在长度为 100k 且掩码 0x11111111(8 位)的随机输入向量上,它的运行速度比朴素循环快两倍。
General approach.
Build tree by bits. Level one node is fisrt bit, than level 2 node is 2nd bit, ...
When you get mask you just negate it and you know which parts of tree you have to exclude.
Than can quickly traverse only trough nodes that are relevant.
N_bits space solution*
Just sort this integers in place and use binary search to traverse this tree.
Complexity O(N_results*N_bits))
it looks like that it run faster by factor 3 compared to bruteforce O(N). But this is my first code in c++, so I might missed something. Any comment about code would be also cool.
How code works?
Only data structure it uses is sorted array of inputs.
At each step it splits array on two parts based on bound using binary search whit
std::lower_bound();
in case that mask[depth] is 1 it does not need to go on left part of this tree
It has to go right in any case.
If you put mask like 0xFFFFFFFF it will always go only right and will perform in log(n) time
if you put mask 0x00000000 it will return all solutions, so it will go at each step both left and right and will perform worse than naive loop. Once size of array is less than 10 (can be changed) it uses naive approach to return all solutions in output vector.
On random input vector of lenght 100k and mask 0x11111111 (8 bits) it runs twice faster than naive loop.