Intel Parallel Studio 时序不一致

发布于 2024-10-31 13:36:48 字数 593 浏览 5 评论 0原文

我有一些使用 Intel TBB 的代码，并且在 32 核机器上运行。在代码中，我使用

parallel_for(blocked_range (2,left_image_width-2, left_image_width /32) ...

为执行并发工作的线程生成 32 个线程，不存在竞争条件，并且希望每个线程都获得相同的工作量。我使用clock_t 来测量我的程序需要多长时间才能完成，

然后我通过Intel Parallel Studio 运行我的代码，它在2 秒内运行了代码。期待，但我不明白为什么两者之间有如此大的差异。 time_t 是否对所有内核上的时钟周期进行求和？下面是有问题的代码片段

clock_t begin=clock();

create_threads_and_do_work();

clock_t end=clock();
double diffticks=end-begin;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
cout<<"And the time is "<<diffms<<" ms"<<endl;

。。

原文

I have some code that uses Intel TBB and I'm running on a 32 core machine. In the code, I use

parallel_for(blocked_range (2,left_image_width-2, left_image_width /32) ...

to spawn 32 to threads that do concurrent work, there are no race conditions and each thread is hopefully given the same amount of work. I'm using clock_t to measure how long my program takes. For a certain image, it takes roughly 19 seconds to complete.

Then I ran my code through Intel Parallel Studio and it ran the code in 2 seconds. This is the result I was expecting but I can't figure out why there's such a large difference between the two. Does time_t sum the clock cycles on all the cores? Even then it doesn't make sense. Below is the snippet in question.

clock_t begin=clock();

create_threads_and_do_work();

clock_t end=clock();
double diffticks=end-begin;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
cout<<"And the time is "<<diffms<<" ms"<<endl;

Any advice would be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷乱花海 2024-11-07 13:36:48

目前尚不清楚运行时间的差异是否是两个不同输入（图像）的结果，或者仅仅是两种不同的运行时间测量方法（clock_t 差异与英特尔软件测量）的结果。此外，您没有向我们展示 create_threads_and_do_work() 中发生的情况，也没有提及您正在使用 Intel Parallel Studio 中的哪个工具，是 Vtune 吗？

您的clock_t差异方法将对调用它的线程（示例中的主线程）的处理时间进行求和，但它可能不会计算在create_threads_and_do_work()内生成的线程的处理时间。是否执行取决于在该函数中您是否等待所有线程完成然后才退出该函数，或者您是否只是生成线程并立即退出（在它们完成处理之前）。如果您在函数中所做的只是parallel_for()，那么clock_t 差异应该产生正确的结果，并且应该与其他运行时测量没有什么不同。

Intel Parallel Studio 中有一个名为 Vtune 的分析工具。是一个功能强大的工具，当您通过它运行程序时，您可以查看（以图形方式）代码中每个函数的处理时间（以及调用时间）。我很确定这样做之后你可能就会明白了。

最后一个想法 - 使用英特尔软件时程序是否完成了其进程？我这么问是因为有时 Vtune 会收集数据一段时间，然后停止而不让程序完成。

回复收藏 0 原文

~没有更多了~