当前位置：文江博客话题详情

linq 和 plinq 的区别

发布于 2024-12-21 10:15:06 字数 81 浏览 1 评论 0原文

这两者有什么区别？

最好的比较方法是什么？

plinq 总是更好吗？

当我们使用 plinq 时？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风蛊 2024-12-28 10:15:06

Linq 是一组技术，它们协同工作来解决一系列类似的问题 - 在所有这些技术中，您都有一个数据源（一个或多个 xml 文件、数据库内容、内存中的对象集合），并且您想要检索部分或全部并以某种方式对其采取行动。 Linq 致力于解决这组问题的共性，例如：

var brithdays = from user in users where
  user.dob.Date == DateTime.Today && user.ReceiveMails
  select new{user.Firstname, user.Lastname, user.Email};
foreach(bdUser in birthdays)
  SendBirthdayMail(bdUser.Firstname, bdUser.Lastname, bdUser.Email);

以及等效项（使用传统 C# 语法显式使用与 Linq 相关的类和方法）：

var birthdays = users
  .Where(user => user.dob.Date == DateTime.Today)
  .Select(user => new{user.Firstname, user.Lastname, user.Email});
foreach(bdUser in birthdays)
  SendBirthdayMail(bdUser.Firstname, bdUser.Lastname, bdUser.Email);

这两个代码示例都可以工作，无论它是否要转换为数据库调用、解析 xml 文档或搜索对象数组。

唯一的区别是users 是什么类型的对象。如果它是一个列表、数组或其他可枚举集合，那么它就是 linq-to-objects，如果它是一个 System.Data.Linq.Table，那么它就是 linq to sql。前者会导致内存中操作，后者会导致 SQL 查询，然后尽可能晚地反序列化为内存中对象。

如果它是一个 ParallelQuery - 通过在内存可枚举集合上调用 .AsParallel 生成 - 那么查询将在内存中并行执行（大多数情况下）以便由多个线程执行 - 理想情况下让每个核心都忙于推动工作。

显然这里的想法是更快。当它运作良好时，它就可以。

但也有一些缺点。

首先，即使在最终无法并行化的情况下，进行并行化总是会产生一些开销。如果没有对数据进行足够的工作，这种开销将超过任何潜在的收益。

其次，并行处理的好处取决于可用的内核。对于最终不会阻塞 4 核机器上的资源的查询，理论上您可以获得 4 倍的速度提升（4 个超线程可能会给您带来更多甚至更少的速度，但可能不会达到 8 倍，因为超线程线程对 CPU 某些部分的加倍并没有带来明显的两倍增加）。对于单核上的相同查询，或者处理器亲和力意味着只有一个核心可用（例如“网络花园”模式下的网络服务器），则不会有加速。如果存在资源阻塞，仍然可能会有所收益，但收益取决于机器。

第三，如果有任何共享资源（可能正在输出集合结果）以非线程安全的方式使用，则可能会出现错误结果、崩溃等严重错误。

第四，如果正在使用共享资源如果采用线程安全的方式，并且线程安全来自锁定，则可能存在足够多的争用，从而成为瓶颈，从而抵消并行化带来的所有好处。

第五，如果您有一台四核机器，在四个不同线程上运行或多或少相同的算法（可能在客户端-服务器情况下，由于有四个客户端，或者在桌面情况下，在更高级别的一组类似任务中）过程），那么他们就已经在充分利用这些核心了。将算法中的工作分开以便在所有四个核心上处理意味着您已经从每个使用一个核心的四个线程转变为在四个核心上运行的 16 个线程。最好的情况是一样的，而且可能的管理费用会让情况变得更糟。

在很多情况下，它仍然可能是一个重大胜利，但上面的内容应该表明它并不总是如此。

Linq is a collection of technologies that work together to solve a similar family of problems - in all of them you have a source of data (xml file or files, database contents, collection of objects in memory) and you want to retrieve some or all of this data and act on it in some way. Linq works on the commonality of that set of problems such that:

var brithdays = from user in users where
  user.dob.Date == DateTime.Today && user.ReceiveMails
  select new{user.Firstname, user.Lastname, user.Email};
foreach(bdUser in birthdays)
  SendBirthdayMail(bdUser.Firstname, bdUser.Lastname, bdUser.Email);

And the equivalent (explicit use of Linq-related classes and methods with a traditional C# syntax):

var birthdays = users
  .Where(user => user.dob.Date == DateTime.Today)
  .Select(user => new{user.Firstname, user.Lastname, user.Email});
foreach(bdUser in birthdays)
  SendBirthdayMail(bdUser.Firstname, bdUser.Lastname, bdUser.Email);

Are both examples of code that could work regardless of whether it's going to be turned into database calls, parsing of xml documents, or a search through an array of objects.

The only difference is what sort of object users is. If it was a list, array, or other enumerable collection, it would be linq-to-objects, if it was a System.Data.Linq.Table it would be linq to sql. The former would result in in-memory operations, the latter in a SQL query that would then be deserialised to in-memory objects as late as possible.

If it was a ParallelQuery - produced by calling .AsParallel on an in-memory enumerable collection - then the query will be performed in-memroy, parallelised (most of the time) so as to performed by multiple threads - ideally keeping each core busy moving the work forward.

Obviously the idea here is to be faster. When it works well, it does.

There are some downsides though.

First, there's always some overhead to getting the parallelisation going, even in cases where it ends up not being possible to parallelise. If there isn't enough work being done on data, this overhead will out-weigh any potential gains.

Second, the benefits of parallel processing depends on the cores available. With a query that doesn't end up blocking on resources on a 4-core machine, you theoretically get a 4-times speed up (4 hyper-threaded might give you more or even less, but probably not 8-times since hyper-threading's doubling of some parts of the CPU doesn't give a clear two-times increase). With the same query on a single-core, or with processor affinity meaning only one core is available (e.g. a webserver in "web-garden" mode), then there's no speed-up. There could still be a gain if there's blocking on resources, but the benefit depends on the machine then.

Third, if there's any shared resource (maybe an collection results are being output to) is used in a non-threadsafe way, it can go pretty badly wrong with incorrect results, crashes, etc.

Fourth, if there's a shared resource being used in a threadsafe way, and that threadsafety comes from locking, there could be enough contention to become a bottleneck that undoes all the benefits from the parallelisation.

Fifth, if you've a four-core machine working on more or less the same algorithm on four different threads (perhaps in a client-server situation due to four clients, or on a desktop situation from a set of similar tasks higher in the process), then they're alreay making the best use of those cores. Splitting the work in the algorithm up so as to be handled across all four cores means you've moved from four threads using one core each to 16 threads fighting over four cores. At best it'll be the same, and likely overheads will make it slightly worse.

It can still be a major win in a lot of cases, but the above should make it clear that it won't always.

回复收藏 0 原文

季末如歌 2024-12-28 10:15:06

我还想知道何时使用 PLINQ 而不是 LINQ，因此我运行了一些测试。

摘要：
在决定是使用 LINQ 还是 PLINQ 运行查询时，需要回答两个问题。

运行查询涉及多少次迭代（集合中有多少个对象）？
一次迭代涉及多少工作？

除非 PLINQ 性能更高，否则请使用 LINQ。如果查询集合涉及太多迭代并且/或每次迭代涉及太多工作，则 PLINQ 的性能可能比 LINQ 更高。

但随后出现了两个难题：

多少次迭代算太多迭代？
多少工作算太多工作？

我的建议是测试您的查询。使用 LINQ 测试一次，使用 PLINQ 测试一次，然后比较两个结果。

测试 1：通过增加集合中的对象数量来增加查询的迭代次数。

初始化 PLINQ 的开销大约需要 20 毫秒。如果没有利用 PLINQ 的优势，这就是浪费时间，因为 LINQ 的开销为 0 毫秒。

对于每个测试，每次迭代涉及的工作始终是相同的。工作量保持在最低限度。

工作的定义：将 int（集合中的对象）乘以 10。

当迭代 100 万个对象且每次迭代涉及的工作量最少时，PLINQ 比 LINQ 更快。尽管在专业环境中，我从未查询过甚至初始化过内存中 1000 万个对象的集合，因此 PLINQ 碰巧优于 LINQ 的情况不太可能发生。

╔═══════════╦═══════════╦════════════╗
║ # Objects ║ LINQ (ms) ║ PLINQ (ms) ║
╠═══════════╬═══════════╬════════════╣
║ 1         ║         1 ║         20 ║
║ 10        ║         0 ║         18 ║
║ 100       ║         0 ║         20 ║
║ 1k        ║         0 ║         23 ║
║ 10k       ║         1 ║         17 ║
║ 100k      ║         4 ║         37 ║
║ 1m        ║        36 ║         76 ║
║ 10m       ║       392 ║        285 ║
║ 100m      ║      3834 ║       2596 ║
╚═══════════╩═══════════╩════════════╝

测试 2：增加迭代中涉及的工作

我将集合中的对象数量设置为始终为 10，因此查询涉及的迭代次数较少。对于每次测试，我都增加了处理每次迭代所涉及的工作。

work的定义：将int（集合中的对象）乘以10。

增加work的定义：增加迭代次数，将int乘以10。PLINQ

是查询集合的速度更快，因为当工作迭代内的迭代次数增加到 1000 万次时，工作量显着增加，并且我得出的结论是，当单次迭代涉及此工作量时，PLINQ 优于 LINQ。

该表中的“#Iterations”表示工作迭代内的迭代次数。请参阅下面的测试 2 代码。

╔══════════════╦═══════════╦════════════╗
║ # Iterations ║ LINQ (ms) ║ PLINQ (ms) ║
╠══════════════╬═══════════╬════════════╣
║ 1            ║         1 ║         22 ║
║ 10           ║         1 ║         32 ║
║ 100          ║         0 ║         25 ║
║ 1k           ║         1 ║         18 ║
║ 10k          ║         0 ║         21 ║
║ 100k         ║         3 ║         30 ║
║ 1m           ║        27 ║         52 ║
║ 10m          ║       263 ║        107 ║
║ 100m         ║      2624 ║        728 ║
║ 1b           ║     26300 ║       6774 ║
╚══════════════╩═══════════╩════════════╝

测试1代码：

class Program
{
    private static IEnumerable<int> _numbers;

    static void Main(string[] args)
    {
        const int numberOfObjectsInCollection = 1000000000;

        _numbers = Enumerable.Range(0, numberOfObjectsInCollection);

        var watch = new Stopwatch();

        watch.Start();

        var parallelTask = Task.Run(() => ParallelTask());

        parallelTask.Wait();

        watch.Stop();

        Console.WriteLine($"Parallel: {watch.ElapsedMilliseconds}ms");

        watch.Reset();

        watch.Start();

        var sequentialTask = Task.Run(() => SequentialTask());

        sequentialTask.Wait();

        watch.Stop();

        Console.WriteLine($"Sequential: {watch.ElapsedMilliseconds}ms");

        Console.ReadKey();
    }

    private static void ParallelTask()
    {
        _numbers
            .AsParallel()
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static void SequentialTask()
    {
        _numbers
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static int DoWork(int @int)
    {
        return @int * 10;
    }
}

测试2代码：

class Program
{
    private static IEnumerable<int> _numbers;

    static void Main(string[] args)
    {
        _numbers = Enumerable.Range(0, 10);

        var watch = new Stopwatch();

        watch.Start();

        var parallelTask = Task.Run(() => ParallelTask());

        parallelTask.Wait();

        watch.Stop();

        Console.WriteLine($"Parallel: {watch.ElapsedMilliseconds}ms");

        watch.Reset();

        watch.Start();

        var sequentialTask = Task.Run(() => SequentialTask());

        sequentialTask.Wait();

        watch.Stop();

        Console.WriteLine($"Sequential: {watch.ElapsedMilliseconds}ms");

        Console.ReadKey();
    }

    private static void ParallelTask()
    {
        _numbers
            .AsParallel()
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static void SequentialTask()
    {
        _numbers
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static int DoWork(int @int)
    {
        const int numberOfIterations = 1000000000;

        for (int i = 0; i < numberOfIterations; i++)
        {
            @int = @int * 10;
        }

        return @int;
    }
}

I also wanted to know when to use PLINQ instead of LINQ so I ran some tests.

Summary:
There are two questions to answer when deciding whether to use LINQ or PLINQ to run a query.

How many iterations are involved in running the query (how many objects are in the collection)?
How much work is involved in an iteration?

Use LINQ unless PLINQ is more performant. PLINQ can be more performant than LINQ if querying the collection involves too many iterations AND/OR each iteration involves too much work.

But then two difficult questions arise:

How many iterations are too many iterations?
How much work is too much work?

My advice is to test your query. Test once using LINQ and once using PLINQ and then compare the two results.

Test 1: Increasing the number of iterations in the query by increasing the number of objects in the collection.

The overhead of initialising PLINQ takes around 20ms. If PLINQ's strengths aren't utilised, this is wasted time because LINQ has 0ms overhead.

The work involved in each iteration is always the same for each test. The work is kept minimal.

Definition of work: Multiplying the int (object in the collection) by 10.

When iterating 1 million objects where each iteration involves minimal work, PLINQ is faster than LINQ. Although in a professional environment, I've never queried or even initialised a collection of 10 million objects in memory so this might be an unlikely scenario where PLINQ happens to be superior to LINQ.

╔═══════════╦═══════════╦════════════╗
║ # Objects ║ LINQ (ms) ║ PLINQ (ms) ║
╠═══════════╬═══════════╬════════════╣
║ 1         ║         1 ║         20 ║
║ 10        ║         0 ║         18 ║
║ 100       ║         0 ║         20 ║
║ 1k        ║         0 ║         23 ║
║ 10k       ║         1 ║         17 ║
║ 100k      ║         4 ║         37 ║
║ 1m        ║        36 ║         76 ║
║ 10m       ║       392 ║        285 ║
║ 100m      ║      3834 ║       2596 ║
╚═══════════╩═══════════╩════════════╝

Test 2: Increasing the work involved in a iteration

I set the number of objects in the collection to always be 10 so the query involves a low number of iterations. For each test, I increased the work involved to process each iteration.

Definition of work: Multiplying the int (object in the collection) by 10.

Definition of increasing the work: Increasing the number of iterations to multiply the int by 10.

PLINQ was faster at querying the collection as the work was significantly increased when the number of iterations inside a work iteration was increased to 10 million and I concluded that PLINQ is superior to LINQ when a single iteration involves this amount of work.

"# Iterations" in this table means the number of iterations inside a work iteration. See Test 2 code below.

╔══════════════╦═══════════╦════════════╗
║ # Iterations ║ LINQ (ms) ║ PLINQ (ms) ║
╠══════════════╬═══════════╬════════════╣
║ 1            ║         1 ║         22 ║
║ 10           ║         1 ║         32 ║
║ 100          ║         0 ║         25 ║
║ 1k           ║         1 ║         18 ║
║ 10k          ║         0 ║         21 ║
║ 100k         ║         3 ║         30 ║
║ 1m           ║        27 ║         52 ║
║ 10m          ║       263 ║        107 ║
║ 100m         ║      2624 ║        728 ║
║ 1b           ║     26300 ║       6774 ║
╚══════════════╩═══════════╩════════════╝

Test 1 code:

class Program
{
    private static IEnumerable<int> _numbers;

    static void Main(string[] args)
    {
        const int numberOfObjectsInCollection = 1000000000;

        _numbers = Enumerable.Range(0, numberOfObjectsInCollection);

        var watch = new Stopwatch();

        watch.Start();

        var parallelTask = Task.Run(() => ParallelTask());

        parallelTask.Wait();

        watch.Stop();

        Console.WriteLine($"Parallel: {watch.ElapsedMilliseconds}ms");

        watch.Reset();

        watch.Start();

        var sequentialTask = Task.Run(() => SequentialTask());

        sequentialTask.Wait();

        watch.Stop();

        Console.WriteLine($"Sequential: {watch.ElapsedMilliseconds}ms");

        Console.ReadKey();
    }

    private static void ParallelTask()
    {
        _numbers
            .AsParallel()
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static void SequentialTask()
    {
        _numbers
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static int DoWork(int @int)
    {
        return @int * 10;
    }
}

Test 2 code:

class Program
{
    private static IEnumerable<int> _numbers;

    static void Main(string[] args)
    {
        _numbers = Enumerable.Range(0, 10);

        var watch = new Stopwatch();

        watch.Start();

        var parallelTask = Task.Run(() => ParallelTask());

        parallelTask.Wait();

        watch.Stop();

        Console.WriteLine($"Parallel: {watch.ElapsedMilliseconds}ms");

        watch.Reset();

        watch.Start();

        var sequentialTask = Task.Run(() => SequentialTask());

        sequentialTask.Wait();

        watch.Stop();

        Console.WriteLine($"Sequential: {watch.ElapsedMilliseconds}ms");

        Console.ReadKey();
    }

    private static void ParallelTask()
    {
        _numbers
            .AsParallel()
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static void SequentialTask()
    {
        _numbers
            .Select(x => DoWork(x))
            .ToArray();
    }

    private static int DoWork(int @int)
    {
        const int numberOfIterations = 1000000000;

        for (int i = 0; i < numberOfIterations; i++)
        {
            @int = @int * 10;
        }

        return @int;
    }
}

回复收藏 0 原文