无论配置如何,TBB都会影响性能
我注意到,无论我编写的异形程序提供了哪种配置,TBB都不会以正面或负面的方式影响性能。 ,我的意思是:
- 。
- 通过配置
tbb :: parallel_for
- 重复相同过程的次数
就像性能与序列程序完全相同。
我有:
- 使用
g ++
链接,使用-LTBB
- 使用的
tbb :: Task_scheduler_init
位于主要功能。 - 使用的
tbb :: parallel_for
带有各种旋转的
我不使用锁,tbb使用的唯一组件是tbb :: task_scheduler_init
和tbb :: parallel_for
。
tbb :: parallel_for
的代码看起来像这样:
for (size_t i = 0 ; i < 10 ; ++i) {
//parallel_for invocation.
//The lambda passed at parallel_for
//does some mathematical computations
//(potentially costly).
if it isn't the last iteration of the loop then
//Calls costly function.
}
问题是tbb :: task_scheduler_init
是剥夺的吗?由于我确实会收到有关它的警告消息,但是我也尝试了tbb :: Task_arena
,但没有任何改变。
我尝试尝试几乎可以配置的任何内容,只是为了了解对性能的影响。如何进行一定的输入,无论配置如何,无论我运行多少次,运行时间始终固定为大约10秒钟?这只是给人一种感觉,就像tbb没有并行化
I've noticed that no matter what configurations are provided to a paralllel program i've written, TBB does not affect performance neither in a positive or a negative way.
By configurations, I mean:
- Number of threads
- Input size (of an array)
- grainsize argument at
tbb::parallel_for
- The number of times the same process is repeated.
It is like the performance is exactly the same as a sequencial program.
I have:
- Linked with
g++
, using-ltbb
- Used
tbb::task_scheduler_init
, and it doesn't get out of scope since the initialization takes place at the main function. - Used
tbb::parallel_for
with various grainsizes
I am not using locks, the only components used from TBB are tbb::task_scheduler_init
and tbb::parallel_for
.
The code where tbb::parallel_for
is used looks something like this:
for (size_t i = 0 ; i < 10 ; ++i) {
//parallel_for invocation.
//The lambda passed at parallel_for
//does some mathematical computations
//(potentially costly).
if it isn't the last iteration of the loop then
//Calls costly function.
}
Could the problem be that tbb::task_scheduler_init
is depracated? Since I do get a warning message about it, but I've tried tbb::task_arena
as well, but nothing changes.
I've tried experimenting with almost anything that can be configured, just to see an effect on performance. How is it possible, that given a certain input, no matter what the configurations, the program no matter how many times I run it, the running time is always fixed to approximately say 10 seconds? This just gives a feeling that it is like TBB does no parallelization
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论