tbb:并行查找第一个元素
我遇到了这个问题:
- 查找列表中满足给定条件的第一个元素。
不幸的是,该列表相当长(100.000 个元素),并且使用单个线程评估每个元素的条件总共需要大约 30 秒。
有没有办法干净地并行化这个问题?我浏览了所有tbb模式,但找不到任何合适的。
更新:出于性能原因,我想在找到项目时尽早停止并停止处理列表的其余部分。这就是为什么我相信我不能使用 parallel_while
或 parallel_do
。
I have got this problem:
- Find the first element in a list, for which a given condition holds.
Unfortunately, the list is quite long (100.000 elements), and evaluation the condition for each element takes in total about 30 seconds using one single Thread.
Is there a way to cleanly parallelize this problem? I have looked through all the tbb patterns, but could not find any fitting.
UPDATE: for performance reason, I want to stop as early as possible when an item is found and stop processing the rest of the list. That's why I believe I cannot use parallel_while
or parallel_do
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我对此不太熟悉,但只是想一想,你不能让一组线程从不同的起始点以相同的步幅进行不同的迭代吗?
假设您决定拥有
n
个线程(= 核心数或其他),每个线程都应指定一个最多n
的特定起点,因此第一个线程从 < code>begin(),它比较的下一个项目是begin() + n
,等等。第二个线程从begin()+1
开始,并且那么下一个比较也在n
中这样你就可以让一组线程并行地遍历列表,迭代本身可能并不昂贵 - 只是比较。没有节点会被比较多次,并且您可以在任何线程进行匹配时设置一些条件,并且所有线程都应该在迭代/比较之前检查此条件。
我认为实现起来非常简单(?)
I'm not too familiar with libraries for this, but just thinking aloud, could you not have a group of threads iterating at different at the same stride from different staring points?
Say you decide to have
n
threads (= number of cores or whatever), each thread should be given a specific starting point up ton
, so the first thread starts onbegin()
, the next item it compares isbegin() + n
, etc. etc. second thread starts onbegin()+1
and then it's next comparison is inn
too etc.This way you can have a group of threads iterating in parallel through the list, the iteration itself is presumably not expensive - just the comparison. No node will be compared more than once and you can have some condition which is set when a match is made by any of the threads and all should check this condition before iterating/comparing..
I think it's pretty straightforward to implement(?)
我认为用TBB解决这个问题的最好方法是parallel_pipeline。
管道中应该(至少)有两个阶段。第一阶段是串行的;它只是从列表中读取下一个元素并将其传递到第二阶段。该第二阶段是并行的;它评估给定元素的感兴趣条件。一旦满足条件,第二阶段就会设置一个标志(该标志应该是原子的或用锁保护)来指示找到解决方案。第一阶段必须检查此标志,并在找到解决方案后停止读取列表。
由于条件评估是对几个元素并行执行的,因此可能会出现找到的元素不是列表中第一个合适的元素。如果这很重要,您还需要保留元素的索引,当找到合适的解决方案时,您可以检测其索引是否小于先前已知的解决方案(如果有)的索引。
HTH。
I think the best way to solve this problem with TBB is
parallel_pipeline
.There should be (at least) two stages in the pipeline. The 1st stage is serial; it just reads the next element from the list and passes it to the 2nd stage. This 2nd stage is parallel; it evaluates the condition of interest for a given element. As soon as the condition is met, the second stage sets a flag (which should be either atomic or protected with a lock) to indicate that a solution is found. The first stage must check this flag and stop reading the list once the solution is found.
Since condition evaluation is performed in parallel for a few elements, it can happen that a found element is not the first suitable one in the list. If this is important, you also need to keep an index of the element, and when a suitable solution is found you detect whether its index is less than that of a previously known solution (if any).
HTH.
好的,我是这样做的:
tbb::concurrent_bounded_queue;元素。
tbb::concurrent_vector;结果
。并行运行的逻辑:
因此,当找到第一个元素时,所有其他线程在下次检查时将停止处理
>results.empty()
。有可能两个或多个线程正在处理
slow_and_painfull_check
返回 true 的元素,因此我只是将结果放入向量中并在并行循环之外处理它。线程组中的所有线程完成后,我检查结果中的所有元素并使用第一个元素。
ok, I have done it this way:
tbb::concurrent_bounded_queue<Element> elements
.tbb::concurrent_vector<Element> results
.boost::thread_group
, and create several threads that run this logic:logic to run in parallel:
So when the first element is found, all other threads will stop processing the next time they check
results.empty()
.It is possible that two or more threads are working on an element for which
slow_and_painfull_check
returns true, so I just put the result into a vector and deal with this outside of the parallel loop.After all threads in the thread group have finished, I check all elements in the
results
and use the one that comes first.你可以看看 http://gcc.gnu.org/onlinedocs/libstdc++ /manual/parallel_mode.html 用于并行算法实现。
特别是你需要 find_if 算法 http://www.cplusplus.com/reference/algorithm/find_if /
you can take a look at http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html for parallel algorithms implementations.
And in particular you need find_if algorithm http://www.cplusplus.com/reference/algorithm/find_if/
我在这里看到了两种并行机会:在多个线程上评估一个元素,或者在不同线程上一次评估多个元素。
没有足够的信息来确定在多个线程上评估一个元素的难度和有效性。如果这很容易,每个元素的 30 秒时间就可以减少。
我认为 TBB 没有完全适合这个问题。列表没有随机访问迭代器、确定何时停止并保证找到第一个元素存在问题。不过,您可能可以玩一些游戏来使其发挥作用。
您也可以使用一些较低级别的线程构造来自己实现这一点,但有很多地方会返回错误的结果。为了防止此类错误,我建议使用现有算法。您可以将列表转换为数组(或具有随机访问迭代器的其他结构),并使用 user383522 引用的实验性 libstdc++ 并行模式 find_if 算法。
I see two opportunities for parallelism here: evaluating one element on multiple threads, or evaluating multiple elements at once on different threads.
There isn't enough information to determine the difficulty nor the effectiveness of evaluating one element on multiple threads. If this is easy, the 30 second per element time could be reduced.
I do not see a clean fit into TBB for this problem. There are issues with lists not having random access iterators, determining when to stop, and guaranteeing the first element is found. There may be some games you can play with the ranges to get it to work though.
You could use some lower level thread constructs to implement this yourself as well, but there are a number of places for incorrect results to be returned. To prevent such errors, I would recommend using an existing algorithm. You could convert the list to an array (or some other structure with random access iterators) and use the experimental libstdc++ Parellel Mode find_if algorithm user383522 referenced.
如果它是一个链接列表,并行搜索不会增加太多速度。然而,链表在缓存中往往表现不佳。如果您有两个线程,您可能会获得微小的性能提升:一个线程执行 find_first_element,另一个线程只是迭代列表,确保在第一个线程之前获得的数据不会超过 X(100?)。第二个线程不进行任何比较,但将确保第一个线程尽可能好地缓存项目。这可能会帮助你节省时间,或者可能没什么作用,或者可能会造成阻碍。测试一切。
If it's a linked list, A parallel search isn't going to add much speed. However, linked lists tend to perform poorly with caches. You may get a tiny performance increase if you have two threads: one does the find_first_element, and one simply iterates through the list, making sure not to get more than X (100?) ahead of the first thread. The second thread doesn't do any comparisons, but will assure that the items are cached as well as possible for the first thread. This may help your time, or it might make little difference, or it might hinder. Test everything.
你不能将列表转换为平衡树或类似的树吗?这样的数据结构更容易并行处理 - 通常你会收回你在第一次平衡时可能付出的开销......例如,如果你编写函数式代码,请检查这篇论文:函数式并行编程中的平衡树
Can't you transform the list to a balanced tree or similar? Such data structures are easier to process in parallel - usually you get back the overhead you may have paid in making it balanced in the first time... For example, if you write functional-style code, check this paper: Balanced trees inhabiting functional parallel programming
如果您使用 GCC,GNU OpenMP 提供并行 std 函数
链接
If you are using GCC, GNU OpenMP provides parallel std functions
link
我从未听说过英特尔 tbb 库,但快速打开并浏览教程后,我找到了parallel_for,这似乎可以解决问题。
I've never heard of the Intel tbb library but a quick open and scan of the Tutorial led me to
parallel_for
which seems like it will do the trick.