Reactive Framework、PLINQ、TPL 和并行扩展如何相互关联?

发布于 2024-08-19 08:33:22 字数 349 浏览 13 评论 0原文

至少自.NET 4.0发布以来,微软似乎在支持并行和异步编程方面投入了大量精力,并且似乎出现了很多围绕此的API和库。尤其是最近到处不断提到下面这些花哨的名字:

  • Reactive Framework、
  • PLINQ(Parallel LINQ)、
  • TPL(Task Parallel Library)和
  • Parallel Extensions。

现在它们似乎都是 Microsoft 产品,并且它们似乎都针对 .NET 的异步或并行编程场景。但目前尚不清楚它们各自的实际含义以及它们之间的关系。有些实际上可能是同一件事。

简而言之,谁能澄清什么是什么?

At least since the release of .NET 4.0, Microsoft seems to have put a lot of effort in support for parallel and asynchronous programming and it seems a lot of APIs and libraries around this have emerged. Especially the following fancy names are constantly mentioned everywhere lately:

  • Reactive Framework,
  • PLINQ (Parallel LINQ),
  • TPL (Task Parallel Library) and
  • Parallel Extensions.

Now they all seem to be Microsoft products and they all seem to target asynchronous or parallel programming scenarios for .NET. But it is not quite clear what each of them actually is and how they are related to each other. Some might actually be the same thing.

In a few words, can anyone set the record straight on what is what?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小红帽 2024-08-26 08:33:22

PLINQ(并行 Linq)只是一种编写常规 Linq 查询的新方法,以便它们并行运行 - 换句话说,框架将自动负责跨多个线程运行查询,以便它们完成更快(即使用多个 CPU 核心)。

例如,假设您有一堆字符串,并且想要获取所有以字母“A”开头的字符串。您可以这样编写查询:

var words = new[] { "Apple", "Banana", "Coconut", "Anvil" };
var myWords = words.Select(s => s.StartsWith("A"));

这工作得很好。但是,如果您有 50,000 个单词要搜索,您可能希望利用每个测试都是独立的这一事实,并将其拆分到多个核心上:

var myWords = words.AsParallel().Select(s => s.StartsWith("A"));

这就是将常规查询转变为运行在多个核心。相当整洁。


TPL(任务并行库)是 PLINQ 的补充,它们共同构成了并行扩展。尽管 PLINQ 主要基于函数式编程风格,副作用,但副作用正是 TPL 的用途。如果您想实际上并行工作,而不是仅仅并行搜索/选择内容,则可以使用 TPL。

TPL 本质上是 Parallel 类,它公开了 ForForeachInvoke 的重载。 Invoke 有点像在 ThreadPool 中排队任务,但使用起来更简单。 IMO,更有趣的部分是 ForForeach。举例来说,假设您有一大堆要压缩的文件。您可以编写常规的顺序版本:

string[] fileNames = (...);
foreach (string fileName in fileNames)
{
    byte[] data = File.ReadAllBytes(fileName);
    byte[] compressedData = Compress(data);
    string outputFileName = Path.ChangeExtension(fileName, ".zip");
    File.WriteAllBytes(outputFileName, compressedData);
}

同样,此压缩的每次迭代都完全独立于任何其他迭代。我们可以通过同时执行其中几个来加快速度:

Parallel.ForEach(fileNames, fileName =>
{
    byte[] data = File.ReadAllBytes(fileName);
    byte[] compressedData = Compress(data);
    string outputFileName = Path.ChangeExtension(fileName, ".zip");
    File.WriteAllBytes(outputFileName, compressedData);
});

同样,这就是并行化此操作所需的全部内容。现在,当我们运行 CompressFiles 方法(或者我们决定调用它的任何名称)时,它将使用多个 CPU 核心,并且可能会在一半或 1/4 的时间内完成。

与将其全部放入线程池相比,这样做的优点是它实际上是同步运行的。如果您使用 ThreadPool 来代替(或者只是简单的 Thread 实例),您必须想出一种方法来找出所有任务何时完成,并且虽然这并不是非常复杂,但很多人往往会搞砸或者至少遇到麻烦。当您使用Parallel类时,您实际上不必考虑它;多线程方面对您来说是隐藏的,它全部在幕后处理。


反应式扩展 (Rx) 确实是一种完全不同的野兽。这是考虑事件处理的不同方式。关于这一点确实有很多内容需要介绍,但长话短说,Rx 可以让您将事件序列视为...以及序列 (IEnumerable),而不是将事件处理程序连接到事件。 )。您可以以迭代方式处理事件,而不是让它们在随机时间异步触发,您必须始终保存状态才能检测以特定顺序发生的一系列事件。

我发现的 Rx 最酷的例子之一是 这里。跳到“Linq to IObservable”部分,他仅用 4 行代码就实现了一个拖放处理程序,这在 WPF 中通常是一个难题。 Rx 为您提供了事件的组合,这是常规事件处理程序所不具备的,并且像这样的代码片段也可以直接重构为可以嵌入到任何地方的行为类。


就是这样。这些是 .NET 4.0 中提供的一些更酷的功能。当然还有更多,但这些是您询问的!

PLINQ (Parallel Linq) is simply a new way to write regular Linq queries so that they run in parallel - in other words, the Framework will automatically take care of running your query across multiple threads so that they finish faster (i.e. using multiple CPU cores).

For example, let's say that you have a bunch of strings and you want to get all the ones that start with the letter "A". You could write your query like this:

var words = new[] { "Apple", "Banana", "Coconut", "Anvil" };
var myWords = words.Select(s => s.StartsWith("A"));

And this works fine. If you had 50,000 words to search, though, you might want to take advantage of the fact that each test is independent, and split this across multiple cores:

var myWords = words.AsParallel().Select(s => s.StartsWith("A"));

That's all you have to do to turn a regular query into a parallel one that runs on multiple cores. Pretty neat.


The TPL (Task Parallel Library) is sort of the complement to PLINQ, and together they make up Parallel Extensions. Whereas PLINQ is largely based on a functional style of programming with no side-effects, side-effects are precisely what the TPL is for. If you want to actually do work in parallel as opposed to just searching/selecting things in parallel, you use the TPL.

The TPL is essentially the Parallel class which exposes overloads of For, Foreach, and Invoke. Invoke is a bit like queuing up tasks in the ThreadPool, but a bit simpler to use. IMO, the more interesting bits are the For and Foreach. So for example let's say you have a whole bunch of files you want to compress. You could write the regular sequential version:

string[] fileNames = (...);
foreach (string fileName in fileNames)
{
    byte[] data = File.ReadAllBytes(fileName);
    byte[] compressedData = Compress(data);
    string outputFileName = Path.ChangeExtension(fileName, ".zip");
    File.WriteAllBytes(outputFileName, compressedData);
}

Again, each iteration of this compression is completely independent of any other. We can speed this up by doing several of them at once:

Parallel.ForEach(fileNames, fileName =>
{
    byte[] data = File.ReadAllBytes(fileName);
    byte[] compressedData = Compress(data);
    string outputFileName = Path.ChangeExtension(fileName, ".zip");
    File.WriteAllBytes(outputFileName, compressedData);
});

And again, that's all it takes to parallelize this operation. Now when we run our CompressFiles method (or whatever we decide to call it), it will use multiple CPU cores and probably finish in half or 1/4th the time.

The advantage of this over just chucking it all in the ThreadPool is that this actually runs synchronously. If you used the ThreadPool instead (or just plain Thread instances), you'd have to come up with a way of finding out when all of the tasks are finished, and while this isn't terribly complicated, it's something that a lot of people tend to screw up or at least have trouble with. When you use the Parallel class, you don't really have to think about it; the multi-threading aspect is hidden from you, it's all handled behind the scenes.


Reactive Extensions (Rx) are really a different beast altogether. It's a different way of thinking about event handling. There's really a lot of material to cover on this, but to make a long story short, instead of wiring up event handlers to events, Rx lets you treat sequences of events as... well, sequences (IEnumerable<T>). You get to process events in an iterative fashion instead of having them fired asynchronously at random times, where you have to keep saving state all the time in order to detect a series of events happening in a particular order.

One of the coolest examples I've found of Rx is here. Skip down to the "Linq to IObservable" section where he implements a drag-and-drop handler, which is normally a pain in WPF, in just 4 lines of code. Rx gives you composition of events, something you don't really have with regular event handlers, and code snippets like these are also straightforward to refactor into behaviour classes that you can sleeve in anywhere.


And that's it. These are some of the cooler features that are available in .NET 4.0. There are several more, of course, but these were the ones you asked about!

看海 2024-08-26 08:33:22

我喜欢 Aaronaught 的回答,但我想说 Rx 和 TPL 解决不同的问题。 TPL 团队添加的部分内容是线程原语以及对 ThreadPool 等运行时构建块的重大增强。您列出的所有内容都构建在这些原语和运行时功能之上。

但 TPL 和 Rx 解决两个不同的问题。当程序或算法“拉动和拉动”时,TPL 效果最佳。排队'。当程序或算法需要对流中的数据做出“反应”时(例如鼠标输入或从 WCF 等端点接收相关消息流时),Rx 表现出色。

您需要 TPL 中的“工作单元”概念来完成文件系统等工作、迭代集合或遍历组织结构图等层次结构。在每种情况下,程序员都可以推断出工作总量,可以将工作分解为一定大小的块(任务),并且在通过层次结构进行计算的情况下,可以将任务“链接”在一起。因此,某些类型的工作适合 TPL“任务层次结构”模型,并受益于取消等管道的增强功能(请参阅 CancellationTokenSource 上的第 9 频道视频)。 TPL 还拥有许多针对专业领域的旋钮,例如近实时数据处理。

Rx 将是大多数开发人员最终应该使用的。这就是 WPF 应用程序如何对外部消息(如外部数据(发送至 IM 客户端的 IM 消息流)或外部输入(如 Aaronaught 链接的鼠标拖动示例))进行“反应”。在幕后,Rx 使用来自 TPL/BCL 的线程原语、来自 TPL/BCL 的线程安全集合以及 ThreadPool 等运行时对象。在我看来,Rx 是表达意图的“最高级别”编程。

普通开发人员是否能够理解可以用 Rx 表达的一系列意图还有待观察。 :)

但我认为未来几年 TPL 与 Rx 将成为下一场争论,就像 LINQ-to-SQL 与实体框架一样。同一领域中有两种 API,专门针对不同的场景,但在很多方面都有重叠。但在 TPL 和 TPL 的情况下Rx 它们实际上彼此了解,并且有内置适配器来组合应用程序并一起使用这两个框架(例如将 PLINQ 循环的结果馈送到 IObservable Rx 流中)。对于那些没有做过任何并行编程的人来说,需要大量的学习才能加快速度。

更新:过去 6 个月(自我最初回答以来的 18 个月),我一直在日常工作中使用 TPL 和 RxNet。我在中间层 WCF 服务(企业 LOB 服务)中选择 TPL 和/或 RxNet 的想法: http://yzorgsoft.blogspot.com/2011/09/middle-tier-tpl-andor-rxnet.html

I like Aaronaught's answer, but I would say Rx and TPL solve different problems. Part of what the TPL team added are the threading primitives and significant enhancements to the building blocks of the runtime like the ThreadPool. And everything you list is built on top of these primitives and runtime features.

But the TPL and Rx solve two different problems. TPL works best when the program or algorithm is 'pulling & queuing'. Rx excels when the program or algorithm needs to 'react' to data from a stream (like mouse input or when receiving a stream of related messages from an endpoint like WCF).

You'd need the 'unit of work' concept from TPL to do work like the filesystem, iterating over a collection, or walking a hierarchy like a org chart. In each of those cases the programmer can reason about the overall amount of work, the work can be broken down into chunks of a certain size (Tasks), and in the case of doing computations over a hierarchy the tasks can be 'chained' together. So certain types of work lend themselves to the TPL 'Task Hierarchy' model, and benefit from the enhancements to plumbing like cancellation (see Channel 9 video on CancellationTokenSource). TPL also has lots of knobs for specialized domains like near real-time data processing.

Rx will be what most developers should end up using. It is how WPF applications can 'react' to external messages like external data (stream of IM messages to an IM client) or external input (like the mouse drag example linked from Aaronaught). Under the covers Rx uses threading primitives from TPL/BCL, threadsafe collections from TPL/BCL, and runtime objects like the ThreadPool. In my mind Rx is the 'highest-level' of programming to express your intentions.

Whether the average developer can get their head wrapped around the set of intentions you can express with Rx is yet to be seen. :)

But I think the next couple of years the TPL vs. Rx is going to be the next debate like LINQ-to-SQL vs. Entity Framework. There are two flavors of API in the same domain and are specialized for different scenarios but overlap in a lot of ways. But in the case of TPL & Rx they are actually aware of each other and there are built-in adapters to compose applications and use both frameworks together (like feeding results from a PLINQ loop into an IObservable Rx stream). For the folks who haven't done any parallel programming there is a ton of learning to get up to speed.

Update: I've been using both TPL and RxNet in my regular work for the past 6 months (of the 18 months since my original answer). My thoughts of choice of TPL and/or RxNet in a middle-tier WCF service (enterprise LOB service): http://yzorgsoft.blogspot.com/2011/09/middle-tier-tpl-andor-rxnet.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文