在 SQL 语句中使用 LIMIT 可以提高多少性能?

发布于 2024-11-02 08:08:06 字数 1259 浏览 0 评论 0原文

假设我的数据库中有一个包含 1.000.000 记录的表。

如果我执行:

SELECT * FROM [Table] LIMIT 1000

该查询是否会花费与我拥有包含 1000 条记录的表并执行以下操作相同的时间

SELECT * FROM [Table]

我不是在寻找是否需要完全相同的时间。我只是想知道第一个执行是否会比第二个花费更多的时间。

我说的是 1.000.000 条记录,但也可能是 20.000.000。这只是一个例子。

编辑:
当然,当使用 LIMIT 并且不在同一个表中使用它时,使用 LIMIT 构建的查询应该执行得更快,但我并不是要求......

使其通用:

表1X条记录
Table2Y 记录

(X << Y)

我要比较的是:

SELECT * FROM Table1

SELECT * FROM Table2 LIMIT X

编辑 2:
这就是我问这个问题的原因:

我有一个数据库,其中有 5 个表以及其中一些表之间的关系。其中一张表将(我 100% 确定)包含大约 5.000.000 记录。我使用 SQL Server CE 3.5、实体框架作为 ORM 和 LINQ to SQL 来进行查询。

我基本上需要执行三种非简单查询,并且我正在考虑向用户显示记录的限制(就像许多网站所做的那样)。如果用户想要查看更多记录,他/她可以选择限制更多搜索。

所以,出现这个问题是因为我正在考虑这样做(限制每个查询的 X 记录),或者如果在数据库中仅存储 X 结果(最近的结果),这需要在数据库中进行一些删除,但我只是在想......

所以,该表可能包含 5.000.000 记录或更多,我不想显示的是用户 1000 左右,即使这样,查询仍然像返回 5.000.000 行一样慢。

Let's suppose I have a table in my database with 1.000.000 records.

If I execute:

SELECT * FROM [Table] LIMIT 1000

Will this query take the same time as if I have that table with 1000 records and just do:

SELECT * FROM [Table]

?

I'm not looking for if it will take exactly the same time. I just want to know if the first one will take much more time to execute than the second one.

I said 1.000.000 records, but it could be 20.000.000. That was just an example.

Edit:
Of course that when using LIMIT and without using it in the same table, the query built using LIMIT should be executed faster, but I'm not asking that...

To make it generic:

Table1: X records
Table2: Y records

(X << Y)

What I want to compare is:

SELECT * FROM Table1

and

SELECT * FROM Table2 LIMIT X

Edit 2:
Here is why I'm asking this:

I have a database, with 5 tables and relationships between some of them. One of those tables will (I'm 100% sure) contain about 5.000.000 records. I'm using SQL Server CE 3.5, Entity Framework as the ORM and LINQ to SQL to make the queries.

I need to perform basically three kind of non-simple queries, and I was thinking about showing to the user a limit of records (just like lot of websites do). If the user wants to see more records, the option he/she has is to restrict more the search.

So, the question came up because I was thinking about doing this (limiting to X records per query) or if storing in the database only X results (the recent ones), which will require to do some deletions in the database, but I was just thinking...

So, that table could contain 5.000.000 records or more, and what I don't want is to show the user 1000 or so, and even like this, the query still be as slow as if it would be returning the 5.000.000 rows.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

旧竹 2024-11-09 08:08:06

从包含 1000000 条记录的表中 TAKE 1000 - 速度会快 1000000/1000 (= 1000) 倍,因为它只需要查看(并返回)1000/1000000 条记录。既然做的少了,自然就快了。

结果将是相当(伪)随机的,因为您没有指定任何采取的顺序。但是,如果您确实引入了顺序,则以下两个条件之一变为 true:

  1. ORDER BY 子句遵循索引 - 上述语句仍然为 true。
  2. ORDER BY 子句不能使用任何索引 - 它只会比没有 TAKE 时稍微快一些,因为
    • 它必须检查所有记录,并按 ORDER BY 排序
    • 仅交付一个子集(TAKE 计数)
    • 所以第一步并不快,但第二步涉及的 IO/网络比所有记录更少

如果您从包含 1000 条记录的表中取出 1000 条记录,那么只要遵循 (1) 的情况,就相当于从 10 亿条记录中取出 1000 条记录(几乎没有显着差异)无排序依据,或 (2) 针对索引排序

TAKE 1000 from a table of 1000000 records - will be 1000000/1000 (= 1000) times faster because it only needs to look at (and return) 1000/1000000 records. Since it does less, it is naturally faster.

The result will be pretty (pseudo-)random, since you haven't specified any order in which to TAKE. However, if you do introduce an order, then one of two below becomes true:

  1. The ORDER BY clause follows an index - the above statement is still true.
  2. The ORDER BY clause cannot use any index - it will be only marginally faster than without the TAKE, because
    • it has to inspect ALL records, and sort by ORDER BY
    • deliver only a subset (TAKE count)
    • so it is not faster in the first step, but the 2nd step involves less IO/network than ALL records

If you TAKE 1000 records from a table of 1000 records, it will be equivalent (with little significant differences) to TAKE 1000 records from 1 billion, as long as you are following the case of (1) no order by, or (2) order by against an index

坦然微笑 2024-11-09 08:08:06

假设两个表在索引、行大小和其他结构方面是等效的。还假设您正在运行该简单的 SELECT 语句。如果您的 SQL 语句中有 ORDER BY 子句,那么显然较大的表会变慢。我想你不是在问这个。

如果 X = Y,那么显然它们应该以相似的速度运行,因为对于这个简单的 SELECT 语句,查询引擎将以完全相同的顺序遍历记录(基本上是表扫描)。查询计划不会有任何差异。

如果Y> X只增加了一点点,然后速度也差不多。

然而,如果Y>> X(意味着 Y 的行数比 X 多很多),那么 LIMIT 版本可能会更慢。不是因为查询计划——同样应该是相同的——而只是因为数据布局的内部结构可能有更多的层次。例如,如果数据存储为树上的叶子,则可能有更多的树级别,因此访问相同数量的页面可能需要稍多的时间。

换句话说,1000 行可以存储在 10 个页面的 1 个树级别中。 1000000 行可以存储在 10000 个页面的 3-4 个树级别中。即使从这 10000 个页面中只取出 10 个页面,存储引擎仍然需要经过 3-4 个树级别,这可能需要稍长的时间。

现在,如果存储引擎按顺序或以链表的形式存储数据页,那么执行速度不会有任何差异。

Assuming both tables are equivalent in terms of index, row-sizing and other structures. Also assuming that you are running that simple SELECT statement. If you have an ORDER BY clause in your SQL statements, then obviously the larger table will be slower. I suppose you're not asking that.

If X = Y, then obviously they should run in similar speed, since the query engine will be going through the records in exactly the same order -- basically a table scan -- for this simple SELECT statement. There will be no difference in query plan.

If Y > X only by a little bit, then also similar speed.

However, if Y >> X (meaning Y has many many more rows than X), then the LIMIT version MAY be slower. Not because of query plan -- again should be the same -- but simply because that the internal structure of data layout may have several more levels. For example, if data is stored as leafs on a tree, there may be more tree levels, so it may take slightly more time to access the same number of pages.

In other words, 1000 rows may be stored in 1 tree level in 10 pages, say. 1000000 rows may be stored in 3-4 tree levels in 10000 pages. Even when taking only 10 pages from those 10000 pages, the storage engine still has to go through 3-4 tree levels, which may take slightly longer.

Now, if the storage engine stores data pages sequentially or as a linked list, say, then there will be no difference in execution speed.

只等公子 2024-11-09 08:08:06

只要您不指定任何字段、不排序以及所有记录,它就会近似线性。但这并没有给你带来太多好处。一旦你的查询想要做一些有用的事情,它就会崩溃。

如果您打算得出一些有用的结论并告诉我们在某些情况下如何使用它来做出设计选择,这会更有趣。

感谢您的澄清。

根据我的经验,具有真实用户的真实应用程序很少有有趣或有用的查询来返回整个百万行表。用户想了解他们自己的活动,或者特定的论坛主题等。因此,除非您的情况很不寻常,否则当您真正掌握了他们的选择标准时,您将讨论合理的结果大小。

无论如何,用户将无法对数百行以上的行执行任何有用的操作,传输它们将花费很长时间,并且他们无法以任何合理的方式滚动浏览它们。

MySQL 具有 LIMIT 和 OFFSET(起始记录 #)修饰符,主要用于创建分页列表块的确切目的,如您所描述的。

在你用完这个和一堆其他策略之前,开始考虑模式设计和记录清除会适得其反。在这种情况下,不要解决您还没有遇到的问题。实际上,几百万行的表并不大,只要索引正确即可。

It would be approximately linear, as long as you specify no fields, no ordering, and all the records. But that doesn't buy you much. It falls apart as soon as your query wants to do something useful.

This would be quite a bit more interesting if you intended to draw some useful conclusion and tell us about the way it would be used to make a design choice in some context.

Thanks for the clarification.

In my experience, real applications with real users seldom have interesting or useful queries that return entire million-row tables. Users want to know about their own activity, or a specific forum thread, etc. So unless yours is an unusual case, by the time you've really got their selection criteria in hand, you'll be talking about reasonable result sizes.

In any case, users wouldn't be able to do anything useful with many rows over several hundred, transporting them would take a long time, and they couldn't scroll through it in any reasonable way.

MySQL has the LIMIT and OFFSET (starting record #) modifiers primarlly for the exact purpose of creating chunks of a list for paging as you describe.

It's way counterproductive to start thinking about schema design and record purging until you've used up this and a bunch of other strategies. In this case don't solve problems you don't have yet. Several-million-row tables are not big, practically speaking, as long as they are correctly indexed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文