推荐的 C++ 有哪些?用于大数据处理的并行化库

发布于 2024-09-26 08:44:44 字数 119 浏览 7 评论 0原文

当要处理的数据很大时,有人可以推荐在 C++ 中并行化的方法吗?我一直在阅读有关 openMP 和 Intel 的 TBB 在 C++ 中进行并行化的信息,但尚未尝试过它们。其中哪一个更适合并行数据处理?还有其他库/方法吗?

Can some one recommend approaches to parallelize in C++, when the data to be acted up on is huge. I have been reading about openMP and Intel's TBB for parallelization in C++, but have not experimented with them yet. Which of these is better for parallel data processing ? Any other libraries/ approaches ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

本王不退位尔等都是臣 2024-10-03 08:44:44

“大型”和“数据处理”在这里涵盖了很多内容,如果没有更多信息,很难给出合理的答案。

如果数据处理是“令人尴尬的并行”——​​如果它涉及进行大量彼此完全独立的计算——那么就有一百万种东西可以工作,这只是找到与你的代码相匹配的东西的问题背景。

如果它不是令人尴尬的并行,但几乎如此 - 计算需要大量数据,但只是将其提炼成少量数字 - 就会有更少的选择,但仍然有很多选择。

如果计算比这更紧密地耦合 - 您需要处理器串联处理大数据块,那么您可能会陷入备用状态 - 如果编译器的 OpenMP 功能可以在单台机器上工作(有TBB 也一样,但通常对于数字处理 OpenMP 更快更容易)或 MPI(如果需要同时使用多台机器)。你提到了C++; Boost 有一个非常好的 MPI 层。

但考虑使用哪个库进行并行化可能首先想到的是问题的错误结局。在许多情况下,您不一定需要直接处理这些层。如果数字运算涉及大量线性代数(例如),则 PLASMA(对于多核机器 - http:// /icl.cs.utk.edu/plasma/ )或 PetSC,它支持分布式内存机器,例如多台计算机( http://www.mcs.anl.gov/petsc/petsc-as/ )是不错的选择,它可以完全隐藏并行实现的实际细节来自你。其他类型的技术也有其他库。最好考虑一下您需要执行哪种分析,并查看现有工具包是否具有您需要的并行化量。只有当你确定答案是否定的时候,你才应该开始担心如何推出自己的产品。

"large" and "data processing" cover a lot of ground here, and it's hard to give a sensible answer without more information.

If the data processing is "embarrassingly parallel" -- if it involves doing lots and lots of calculations that are completely independant of each other -- then there's a million things that will work and it's just a matter of finding something that matches your code and background.

If it isn't embarrasingly parallel, but nearly so - the computations take a big chunk of data but just distill it into a handfull of numbers - there's fewer, but still lots of options.

If the calculation is more tightly coupled than this - where you need the processors to work on tandem on big chunks of data then you're probably stuck with the standbys - the OpenMP features of your compiler if it will work on a single machine (there's TBB, too, but usually for number crunching OpenMP is faster and easier) or MPI if it needs several machines simultaneously. You mentioned C++; Boost has a very nice MPI layer.

But thinking about which library to use for parallelization is probably thinking about the wrong end of the problem first. In many cases, you don't necessarily need to deal with these layers directly. If the number crunching involves lots of linear algebra (for instance), then PLASMA (for multicore machines - http://icl.cs.utk.edu/plasma/ ) or PetSC, which has support for distributed memory machines, eg, multiple computers ( http://www.mcs.anl.gov/petsc/petsc-as/ ) are good choices, which can completely hide the actual details of the parallel implementation from you. Other sorts of techniques have other libraries, too. It's probably best to think about what sort of analysis you need to do, and look to see if existing toolkits have the amount of parallization you need. Only once you've determined the answer is no should you start to worry about how to roll your own.

忆依然 2024-10-03 08:44:44

OpenMP 和 Intel TBB 都适合本地使用,因为它们有助于编写多线程应用程序。

如果您拥有真正庞大的数据集,则可能需要将负载分配到多台计算机上,然后使用 Open MPI 对于 MPI 的并行编程开始发挥作用。 Open MPI 具有 C++ 接口,但您现在还面临网络组件和一些单台计算机所没有的管理问题。

Both OpenMP and Intel TBB are for local use as they help in writing multithreaded applications.

If you have truly huge datasets, you may need to split load over several machines -- and then libraries like Open MPI for parallel programming with MPI come into play. Open MPI has a C++ interface, but you now also face a networking component and some administrative issues you do not have with a single computer.

悟红尘 2024-10-03 08:44:44

MPI 在单个本地计算机上也很有用。它将跨多个核心/CPU 运行作业,虽然与线程相比,这可能有点过大,但它确实意味着您可以将作业移至集群而不进行任何更改。大多数 MPI 实现还优化本地作业以使用共享内存而不是 TCP 进行数据连接。

MPI is also useful on a single local machine. It will run a job across multiple cores/CPUs, while this is probably overkill compared to threading it does mean you can move the job to a cluster with no changes. Most MPI implementations also optimize a local job to use shared memory instead of TCP for data connections.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文