并行计算的使用越来越多,新的框架功能和快捷方式使其更易于使用(例如.NET 4 中直接提供的并行扩展)。
现在跨网络的并行性怎么样?我的意思是,与通信、在远程计算机上创建进程等相关的所有内容的抽象。类似于 C# 中的内容:
NetworkParallel.ForEach(myEnumerable, () =>
{
// Computing and/or access to web ressource or local network database here
});
我知道它与多核并行性有很大不同。两个最明显的区别可能是:
- 事实上,这种并行任务将仅限于计算,而无法使用本地存储的文件(但为什么不能使用数据库?),甚至无法使用局部变量,因为它会而是两个不同的应用程序而不是同一应用程序的两个线程,
- 非常具体的实现,不仅需要一个单独的线程(这很容易),而且需要跨越不同机器上的进程,然后通过本地网络与它们进行通信。
尽管存在这些差异,即使不谈论分布式架构,这种并行性也是完全可能的。
您认为几年后会实施吗?您是否同意它使开发人员能够轻松地开发极其强大的东西,并且减少痛苦?
示例:
考虑一个从数据库中提取数据、转换数据并显示统计信息的业务应用程序。假设该应用程序在公司的一台机器上使用了所有 CPU,需要 10 秒来加载数据,20 秒来转换数据,10 秒来构建图表,而其他 10 台机器大多数时候只使用 5% 的 CPU 。在这种情况下,每个动作都可以并行完成,整个过程可能需要六到十秒而不是四十秒。
Parallel computing is used more and more, and new framework features and shortcuts make it easier to use (for example Parallel extensions which are directly available in .NET 4).
Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc. Something like, in C#:
NetworkParallel.ForEach(myEnumerable, () =>
{
// Computing and/or access to web ressource or local network database here
});
I understand that it is very different from the multi-core parallelism. The two most obvious differences would probably be:
- The fact that such parallel task will be limited to computing, without being able for example to use files stored locally (but why not a database?), or even to use local variables, because it would be rather two distinct applications than two threads of the same application,
- The very specific implementation, requiring not just a separate thread (which is quite easy), but spanning a process on different machines, then communicating with them over local network.
Despite those differences, such parallelism is quite possible, even without speaking about distributed architecture.
Do you think it will be implemented in a few years? Do you agree that it enables developers to easily develop extremely powerfull stuff with much less pain?
Example:
Think about a business application which extracts data from the database, transforms it, and displays statistics. Let's say this application takes ten seconds to load data, twenty seconds to transform data and ten seconds to build charts on a single machine in a company, using all the CPU, whereas ten other machines are used at 5% of CPU most of the time. In a such case, every action may be done in parallel, resulting in probably six to ten seconds for overall process instead of forty.
发布评论
评论(2)
这通常以与进程内并发不同的方式处理。由于架构而出现的问题要严重得多,并且缺乏共享内存会导致出现其他问题。
话虽这么说,“跨网络并行性”已经使用了很长时间。最常见的选项是使用消息传递接口 (MPI)。甚至还有一个用于此目的的 C# 库,MPI.NET。
现在,“完全抽象化”跨网络的分区和调用工作的目标尚未完成(尽管 MPI 确实以相对直接的方式处理了许多此类任务)。我也怀疑这会很快发生,因为当你失去共享记忆时,会出现许多新的问题。但是,我怀疑某些项目,例如 Axum 最终会导致这是实现这一目标的一种非常高度抽象的方法,但我也怀疑这将需要几年的时间,因为进程内共享内存并发现在正变得更加普遍和主流。
This is typically handled in a different manner than in-process concurrency. The issues which arise due to architecture are much greater, and the lack of shared memory causes other concerns to arise.
That being said, "parallelism across network" has been in use for a very long time. The most common option is to use Message Passing Interface (MPI). There is even a C# library for this, MPI.NET.
Now, the goal of "completely abstracting away" the work of partitioning and calling out across the network is not done (though MPI does handle many of these tasks in a relatively-straightforward manner). I doubt this will happen soon, either, since there are many new concerns that arise when you lose shared memory. However, I suspect that some projects such as Axum will eventually lead to a very highly abstracted means of accomplishing this, but I also suspect that this will be quite a few years out, since in-process, shared memory concurrency is just now becoming more common and mainstream.
之前已经尝试过很多次,并且此类抽象通常会失败,因为它们体现了 分布式计算的谬误。计算中有时出现网络故障的可能性远远高于正常的硬件故障,因此您需要使用容错和延迟的通信模式,而不是依赖于过程习惯用法。
It has been tried many times before, and such abstractions usually fail as they embody the fallacies of distributed computing. The chances of a network failure sometime in a calculation are far higher than normal hardware failure, so you need to use fault and latency tolerant patterns of communication, rather than relying on procedural idioms.