有使用英特尔线程构建模块的经验吗?

发布于 2024-07-05 07:12:34 字数 1450 浏览 6 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

晌融 2024-07-12 07:12:35

我曾短暂使用过TBB,将来可能会更多地使用它。 我喜欢使用它,最重要的是因为您不必处理 C++ 的宏/扩展,而可以保留在语言中。 而且它非常便携。 我在windows和linux下都用过。 但有一件事:使用 TBB 处理线程很困难,您必须考虑任务(这实际上是一件好事)。 英特尔 TBB 不支持您使用裸锁(这会让这变得乏味)。 但总的来说,这是我的初步经验。

我还建议您也看看 openMP 3。

I have used TBB briefly, and will probably use it more in the future. I liked using it, most importantly because you dont have to deal with macros/extensions of C++, but remain within the language. Also its pretty portable. I have used it on both windows and linux. One thing though: it is difficult to work with threads using TBB, you would have to think in terms of tasks (which is actually a good thing). Intel TBB would not support your use of bare locks (it will make this tedious). But overall, this is my preliminary experience.

I'd also recommend having a look at openMP 3 too.

往日 2024-07-12 07:12:35

ZThread 是 LGPL,如果不在开源项目中工作,您只能在动态链接中使用该库。

开源版本中的线程构建块(TBB)(有一个新的商业版本,299 美元,还不知道有什么区别)是 GNU 通用公共许可证版本 2,带有所谓的“运行时异常” (仅适用于创建自由软件。)
我见过其他运行时异常尝试接近 LGPL,但启用商业用途和静态链接,这不是 现在是这种情况。

我写这篇文章只是因为我有机会检查库许可证,这些许可证也应该是根据人们打算给予它们的用途进行选择的考虑因素。


Txs,Jihn 指出了此更新......

ZThread is LGPL, you are limited to use the library in dynamic linkage if not working in a open source project.

The Threading Building Blocks (TBB) in the open source version, (there is a new commercial version, $299 , don't know the differences yet) is GNU General Public License version 2 with a so-called “Runtime Exception” (that is specific to the use only on creating free software.)
I've seen other Runtime Exceptions that attempt to approach LGPL but enabling commercial use and static linking this is not is now the case.

I'm only writing this because I took the chance to examine the libraries licenses and those should be also a consideration for selection based on the use one intends to give them.


Txs, Jihn for pointing out this update...

旧人九事 2024-07-12 07:12:35

我在一个项目中使用TBB。 看起来比线程更容易使用。
有些任务可以并行运行。 任务只是对并行子例程的调用。 负载平衡是自动完成的。 这就是为什么我接受它作为更高级别的并行化库。 我在 4 核英特尔处理器上无需做太多工作就实现了 2.5 倍的速度提升。
有一些例子,他们在论坛上回答问题,并且它是免费维护的。

I use TBB in one project. It seemed to be easier to use it than threads.
There are tasks which can be run in parallel. A task is just a call to your parallelized subroutine. Load balancing is done automatically. That is why I accept it as a higher level parallelization library. I achieved 2.5x speed up without much work on a 4 core intel processor.
There are examples, they answer questions on forums and it is maintained and it is free.

分开我的手 2024-07-12 07:12:35

与其他替代方案(例如 C++ 11x 并发功能)进行对比时,有必要明确 TBB(线程构建块)的用途。 TBB 是一个可移植且可扩展的库(不是编译器扩展),允许您以轻量级任务的形式编写代码,TBB 将安排这些任务在可用的 CPU 资源上尽可能快地运行。 它不支持用于其他目的的线程(例如抢占)。

我使用 TBB 来加速现有图像处理的 for 循环,将图像扫描线转换为 parallel_for 循环(至少 2-4 条扫描线作为“粒度”大小)。 这非常成功。 它确实需要(重新)编写循环体来处理任意索引,而不是假设每个循环体都是按顺序处理的(例如,在每次循环迭代之间递增的指针)。

这是一个相当简单的案例,因为没有任何共享存储需要更新。 使用更强大的功能(例如管道)将需要对现有代码进行大量重新想象和/或重写,因此可能更适合新代码。

这是一个强大的优势,基于 TBB 的代码保持可移植性,似乎不会干扰同一进程中其他地方同时使用其他线程策略的其他代码,并且稍后可以与更高或更低级别的多处理策略相结合(例如 TBB parallel_for可以从 TBB 多处理管道中的过滤器调用代码)。

It's worth being clear what TBB (Threading Building Blocks) is for to contrast with other alternatives (e.g. C++ 11x concurrency features). TBB is a portable and scalable library (not a compiler extension) allowing you to write your code in the form of lightweight tasks that TBB will schedule to run as fast as possible on the CPU resources available. It's not designed support threading for other purposes (e.g. pre-emption).

I've used TBB to speed up existing image processing of for loops over image scan lines into parallel_for loops (a minimum of 2-4 scan lines as a 'grain' size). This has been very successful. It does require your loop body is (re)written to process an arbitrary index rather than assuming each loop body is processed sequentially (e.g. pointers that are incremented between each loop iteration).

This was a fairly trivial case as there wasn't any shared storage to update. Using the more powerful features (e.g. pipeline) will require significant reimagining and/or rewriting of existing code so is perhaps better suited to new code.

It's a powerful advantage that this TBB based code remains portable, doesn't seem to interfere with other code elsewhere in the same process concurrently using other threading strategies and can later be combined with multiprocessing strategies at a higher or lower level (e.g. the TBB parallel_for code could be called from a filter in a TBB multiprocessing pipeline).

红ご颜醉 2024-07-12 07:12:35

我研究过 TBB 但从未在项目中使用过它。 我认为相对于 ZThread 没有任何优势(就我的目的而言)。 可以在此处找到简短且有些过时的概述

它相当完整,有几个线程分派选项、所有常用的同步类和一个非常方便的基于异常的线程“中断”机制。 它易于扩展、编写良好且记录良好。 我已经在 20 多个项目中使用过它。
它还可以与任何支持 POSIX 线程的 *NIX 以及 Windows 很好地配合。

值得一看。

I've looked into TBB but never used it in a project. I saw no advantages (for my purposes) over ZThread. A brief and somewhat dated overview can be found here.

It's fairly complete with several thread dispatch options, all the usual synchronization classes and a very handy exception based thread "interrupt" mechanism . It's easily extendable, well written and documented. I've used it on 20+ projects.
It also plays nice with any *NIX that supports POSIX threads as well as Windows.

Worth a look.

掩耳倾听 2024-07-12 07:12:35

中的线程构建块 (TBB)
开源版本,(有一个
新商业版,299 美元,不要
还知道差异)是 GNU
通用公共许可证版本 2
所谓的“运行时异常”(即
仅特定于使用
创建自由软件。)我见过
尝试的其他运行时异常
接近 LGPL 但能够
商业用途和静态链接
事实并非如此。

根据这个 问题 线程构建块可以在没有版权限制的情况下用于商业用途。

The Threading Building Blocks (TBB) in
the open source version, (there is a
new commercial version, $299, don't
know the differences yet) is GNU
General Public License version 2 with
a so-called “Runtime Exception” (that
is specific to the use only on
creating free software.) I've seen
other Runtime Exceptions that attempt
to approach LGPL but enabling
comercial use and static linking this
is not the case.

According to this question threading building blocks is usable without copy-left restrictions with commercial use.

神经大条 2024-07-12 07:12:35

您是否看过 boost 库及其 线程 API

Have you looked at boost library with its thread API?

笑叹一世浮沉 2024-07-12 07:12:34

我将它引入到我们的代码库中,因为当我们迁移到 16 核机器时,我们需要一个 bettor malloc 来使用。 对于 8 岁及以下的孩子来说,这不是一个大问题。 它对我们来说效果很好。 我们计划接下来使用细粒度并发容器。 理想情况下,我们可以利用产品的真正核心,但这需要重新思考如何构建代码。 我真的很喜欢 TBB 中的想法,但改造到代码库并不容易。

您不能将 TBB 视为另一个线程库。 他们有一个全新的模型,真正位于线程之上并将线程抽象出来。 您将学习在任务、parallel_for 类型操作和管道中进行思考。 如果我要构建一个新项目,我可能会尝试以这种方式对其进行建模。

我们在 Visual Studio 中工作,它运行得很好。 它最初是为 linux/pthreads 编写的,因此它在那里也运行得很好。

I've introduced it into our code base because we needed a bettor malloc to use when we moved to a 16 core machine. With 8 and under it wasn't a significant issue. It has worked well for us. We plan on using the fine grained concurrent containers next. Ideally we can make use of the real meat of the product, but that requires rethinking how we build our code. I really like the ideas in TBB, but it's not easy to retrofit onto a code base.

You can't think of TBB as another threading library. They have a whole new model that really sits on top of threads and abstracts the threads away. You learn to think in task, parallel_for type operations and pipelines. If I were to build a new project I would probably try to model it in this fashion.

We work in Visual Studio and it works just fine. It was originally written for linux/pthreads so it runs just fine over there also.

冷︶言冷语的世界 2024-07-12 07:12:34

便携性

TBB 是便携的。 它支持 Intel 和 AMD(即 x86)处理器、IBM PowerPC 和 POWER 处理器、ARM 处理器以及可能的其他处理器。 如果您查看构建目录,您可以看到构建系统的所有配置支持,包括广泛的操作系统(Linux、Windows、Android、MacOS、iOS、FreeBSD、AIX 等)和编译器(GCC、Intel、Clang/LLVM、IBM XL 等)。 我尚未在 PGI C++ 编译器上尝试过 TBB,并且知道它不适用于 Cray C++ 编译器(截至 2017 年)。

几年前,我参与了将 TBB 移植到 IBM Blue Gene 系统的工作。 静态链接是一个挑战,但现在已由 big_iron.inc 解决 构建系统助手。 其他问题是支持相对较旧的 GCC 版本(4.1 和 4.4)并确保 PowerPC 原子功能正常工作。 我希望在提供 GCC 和 POSIX 或与 GCC 和 POSIX 兼容的平台上移植到任何当前不受支持的体系结构将相对简单。

社区代码中的用法

我知道至少有两个使用 TBB 的 HPC 应用程序框架:

我不知道 MOOSE 如何使用 TBB,但 MADNESS 使用 TBB 作为其任务队列和内存分配器。

性能与其他线程模型的比较

我个人在 并行研究内核 项目中使用了 TBB,在该项目中我对 TBB 进行了比较到 OpenMP、OpenCL、Kokkos、RAJA、C++17 Parallel STL 和其他模型。 有关详细信息,请参阅 C++ 子目录

下图显示了上述型号在 Intel Xeon Phi 7250 处理器上的相对性能(细节并不重要 - 所有型号都使用相同的设置)。 正如您所看到的,除了较小的问题规模(自适应调度的开销更相关)之外,TBB 表现得相当好。 TBB 具有会影响这些结果的调谐旋钮。

PRK 模板

全面披露:我在英特尔从事研究/探路工作。

Portability

TBB is portable. It supports Intel and AMD (i.e. x86) processors, IBM PowerPC and POWER processors, ARM processors, and possibly others. If you look in the build directory, you can see all the configurations the build system support, which include a wide range of operating systems (Linux, Windows, Android, MacOS, iOS, FreeBSD, AIX, etc.) and compilers (GCC, Intel, Clang/LLVM, IBM XL, etc.). I have not tried TBB with the PGI C++ compiler and know that it does not work with the Cray C++ compiler (as of 2017).

A few years ago, I was part of the effort to port TBB to IBM Blue Gene systems. Static linking was a challenge, but is now addressed by the big_iron.inc build system helper. The other issues were supporting relatively ancient versions of GCC (4.1 and 4.4) and ensuring the PowerPC atomics were working. I expect that porting to any currently unsupported architecture would be relatively straightforward on platforms that provide or are compatible with GCC and POSIX.

Usage in community codes

I am aware of at least two HPC application frameworks that uses TBB:

I do not know how MOOSE uses TBB, but MADNESS uses TBB for its task queue and memory allocator.

Performance versus other threading models

I have personally used TBB in the Parallel Research Kernels project, within which I have compared TBB to OpenMP, OpenCL, Kokkos, RAJA, C++17 Parallel STL, and other models. See the C++ subdirectory for details.

The following figure shows the relative performance of the aforementioned models on an Intel Xeon Phi 7250 processor (the details aren't important - all models used the same settings). As you can see, TBB does quite well except for smaller problem sizes, where the overhead of adaptive scheduling is more relevant. TBB has tuning knobs that will affect these results.

PRK stencil

Full disclosure: I work for Intel in a research/pathfinding capacity.

能怎样 2024-07-12 07:12:34

我不做数值计算,但我从事数据挖掘(想想聚类和分类),我们的工作负载可能很相似:所有数据都是静态的,并且在程序开始时就有它。 我简要研究了英特尔的 TBB,发现它们对于我的需求来说太过分了。 在开始使用基于 pthread 的原始代码后,我切换到 OPENMP 并在可读性和性能之间获得了正确的组合。

I'm not doing numerical computing but I work with data mining (think clustering and classification), and our workloads are probably similar: all the data is static and you have it at the beginning of the program. I have briefly investigated Intel's TBB and found them overkill for my needs. After starting with raw pthread-based code, I switched to OPENMP and got the right mix between readability and performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文