比较 2 个 linq 应用程序：意外结果

发布于 2024-09-01 21:38:49 字数 517 浏览 6 评论 0原文

我使用 LINQ 起草了 2 个 ASP.NET 应用程序。一个连接到MS SQL Server，另一个连接到专有内存结构。这两个应用程序都使用3个int字段的表，具有500 000条记录（内存结构与SQL Server表相同）。使用的控件是常规的：GridView 和ObjectDataSource。在应用程序中，我计算每次分页点击处理所需的平均时间。

LINQ + MS SQL 应用程序要求每次页面更改0.1 秒。
LINQ + 内存结构要求每页更改需要 0.8 秒。

这是令人震惊的结果。为什么在内存中处理数据的应用程序比使用硬盘驱动器的应用程序慢8倍？谁能告诉我为什么会发生这种情况？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

薄凉少年不暖心 2024-09-08 21:38:49

主要因素可能是算法效率。 LINQ-to-Objects 使用 IEnumerable 输入和输出，这些输入和输出通常按顺序处理，而数据库可能具有可大幅提高速度的索引。

回复收藏 0 原文

最佳男配角 2024-09-08 21:38:49

我可以想到至少三个原因：

索引
缓存
特殊优化（例如 TOP N SORT）

索引

有许多类型的查询，如果在正确索引但非常慢的数据库上运行，它们将运行得非常快如果您迭代内存中的列表。例如，在数据库中按 ID（主键）查找几乎是即时的，因为结果存储在高度非常小的 B 树中。要在内存中的列表中查找相同的元素需要扫描整个列表。

缓存

您的假设是数据库总是访问磁盘。这并不总是正确的。数据库将尝试在内存中保存尽可能多的数据，因此当您向它询问数据时，它已经为您准备好了答案。特别是，它将在内存中保存常用索引，并且仅在必要时才访问磁盘。数据在磁盘和内存中的存储方式也经过仔细优化，以减少磁盘寻道和页面丢失。

优化

即使没有索引，数据库仍然知道许多可以加快速度的技巧。例如，如果您在 SQL Server 中执行以下操作：

list.OrderBy(x => x.Value).Take(1)

如果列表上有索引，则几乎是即时的，但即使没有索引，它也会使用名为 TOP N SORT 以线性时间运行。检查查询的执行计划以查看是否正在使用此优化。请注意，此优化未针对 LINQ to Objects 实现。我们可以通过运行此代码看到这一点：

Random random = new Random();
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10000000; ++i)
{
    list.Add(new Foo { Id = random.Next() });
}

DateTime now = DateTime.UtcNow;
Foo smallest = list.OrderBy(foo => foo.Id).First();
Console.WriteLine(DateTime.UtcNow - now);

此代码执行大约需要 30 秒，并且随着添加更多项目，执行时间的增长速度会比线性增长慢。用此替换查询会导致它花费不到一秒的时间：

int smallestId = list.Min(foo => foo.Id);

这是因为在 LINQ to object 中 OrderBy 是使用 O(n log(n)) 算法实现的，但是Min 使用 O(n) 算法。然而，当针对 SQL Server 执行时，这两个查询将生成相同的 SQL，并且都是线性时间 - O(n)。

因此，在数据库中运行像 OrderBy(x => x.Something).Skip(50).Take(10) 这样的分页查询会更快，因为我们付出了更多的努力来确保它更快。毕竟，这种查询的速度是数据库的一个主要卖点。

I can think of at least three reasons:

indexes
caching
special optimizations (e.g. TOP N SORT)

Indexes

There are many types of queries that will run very fast if run on a database which is correctly indexed but very slow if you iterate through a list in memory. For example a lookup by ID (primary key) is almost instant in a database because the results are stored in a B-tree with very small height. To find the same element in a list in memory would require scanning the entire list.

Caching

Your assumption is that the database always hits the disk. This is not always true. The database will try to hold as much data in memory as it can, so when you ask it for data it already has the answer ready for you. In particular it will hold commonly used indexes in memory and only hit the disk when necessary. The way the data is stored on disk and in memory is also carefully optimized to reduce disk seeks and page misses.

Optimizations

Even without indexes the database still knows many tricks that could speed things up. For example if you do the following in SQL Server:

list.OrderBy(x => x.Value).Take(1)

it will be almost instant if there is an index on list, but even without the index it will use a special optimization called TOP N SORT that runs in linear time. Check the execution plan for your query to see if this optimization is being used. Note that this optimization is not implemented for LINQ to Objects. We can see this by running this code:

Random random = new Random();
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10000000; ++i)
{
    list.Add(new Foo { Id = random.Next() });
}

DateTime now = DateTime.UtcNow;
Foo smallest = list.OrderBy(foo => foo.Id).First();
Console.WriteLine(DateTime.UtcNow - now);

This code takes about 30 seconds to execute, and the execution time grows slower than linearly as more items are added. Replacing the query with this results in it taking less than one second:

int smallestId = list.Min(foo => foo.Id);

This is because in LINQ to objects OrderBy is implemented using an O(n log(n)) algorithm but Min uses a O(n) algorithm. However when executed against SQL Server, both these queries will produce the same SQL and both are linear time - O(n).

So running a paging query like OrderBy(x => x.Something).Skip(50).Take(10) is faster in a database because a lot more effort has gone into making sure that it is faster. After all, the speed of this sort of query is a major selling point for databases.

回复收藏 0 原文

~没有更多了~