在 SQL 语句中使用 LIMIT 可以提高多少性能?
假设我的数据库中有一个包含 1.000.000
记录的表。
如果我执行:
SELECT * FROM [Table] LIMIT 1000
该查询是否会花费与我拥有包含 1000 条记录的表并执行以下操作相同的时间
SELECT * FROM [Table]
?
我不是在寻找是否需要完全相同的时间。我只是想知道第一个执行是否会比第二个花费更多的时间。
我说的是 1.000.000
条记录,但也可能是 20.000.000
。这只是一个例子。
编辑:
当然,当使用 LIMIT 并且不在同一个表中使用它时,使用 LIMIT 构建的查询应该执行得更快,但我并不是要求......
使其通用:
表1
:X
条记录Table2
:Y
记录
(X << Y)
我要比较的是:
SELECT * FROM Table1
和
SELECT * FROM Table2 LIMIT X
编辑 2:
这就是我问这个问题的原因:
我有一个数据库,其中有 5 个表以及其中一些表之间的关系。其中一张表将(我 100% 确定)包含大约 5.000.000
记录。我使用 SQL Server CE 3.5、实体框架作为 ORM 和 LINQ to SQL 来进行查询。
我基本上需要执行三种非简单查询,并且我正在考虑向用户显示记录的限制(就像许多网站所做的那样)。如果用户想要查看更多记录,他/她可以选择限制更多搜索。
所以,出现这个问题是因为我正在考虑这样做(限制每个查询的 X
记录),或者如果在数据库中仅存储 X
结果(最近的结果),这需要在数据库中进行一些删除,但我只是在想......
所以,该表可能包含 5.000.000
记录或更多,我不想显示的是用户 1000
左右,即使这样,查询仍然像返回 5.000.000
行一样慢。
Let's suppose I have a table in my database with 1.000.000
records.
If I execute:
SELECT * FROM [Table] LIMIT 1000
Will this query take the same time as if I have that table with 1000
records and just do:
SELECT * FROM [Table]
?
I'm not looking for if it will take exactly the same time. I just want to know if the first one will take much more time to execute than the second one.
I said 1.000.000
records, but it could be 20.000.000
. That was just an example.
Edit:
Of course that when using LIMIT and without using it in the same table, the query built using LIMIT should be executed faster, but I'm not asking that...
To make it generic:
Table1
: X
recordsTable2
: Y
records
(X << Y)
What I want to compare is:
SELECT * FROM Table1
and
SELECT * FROM Table2 LIMIT X
Edit 2:
Here is why I'm asking this:
I have a database, with 5 tables and relationships between some of them. One of those tables will (I'm 100% sure) contain about 5.000.000
records. I'm using SQL Server CE 3.5, Entity Framework as the ORM and LINQ to SQL to make the queries.
I need to perform basically three kind of non-simple queries, and I was thinking about showing to the user a limit of records (just like lot of websites do). If the user wants to see more records, the option he/she has is to restrict more the search.
So, the question came up because I was thinking about doing this (limiting to X
records per query) or if storing in the database only X
results (the recent ones), which will require to do some deletions in the database, but I was just thinking...
So, that table could contain 5.000.000
records or more, and what I don't want is to show the user 1000
or so, and even like this, the query still be as slow as if it would be returning the 5.000.000
rows.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从包含 1000000 条记录的表中
TAKE 1000
- 速度会快 1000000/1000 (= 1000
) 倍,因为它只需要查看(并返回)1000/1000000 条记录。既然做的少了,自然就快了。结果将是相当(伪)随机的,因为您没有指定任何采取的顺序。但是,如果您确实引入了顺序,则以下两个条件之一变为 true:
ORDER BY
排序如果您从包含 1000 条记录的表中取出 1000 条记录,那么只要遵循 (1) 的情况,就相当于从 10 亿条记录中取出 1000 条记录(几乎没有显着差异)无排序依据,或 (2) 针对索引排序
TAKE 1000
from a table of 1000000 records - will be 1000000/1000 (= 1000
) times faster because it only needs to look at (and return) 1000/1000000 records. Since it does less, it is naturally faster.The result will be pretty (pseudo-)random, since you haven't specified any order in which to TAKE. However, if you do introduce an order, then one of two below becomes true:
ORDER BY
If you TAKE 1000 records from a table of 1000 records, it will be equivalent (with little significant differences) to TAKE 1000 records from 1 billion, as long as you are following the case of (1) no order by, or (2) order by against an index
假设两个表在索引、行大小和其他结构方面是等效的。还假设您正在运行该简单的 SELECT 语句。如果您的 SQL 语句中有 ORDER BY 子句,那么显然较大的表会变慢。我想你不是在问这个。
如果 X = Y,那么显然它们应该以相似的速度运行,因为对于这个简单的 SELECT 语句,查询引擎将以完全相同的顺序遍历记录(基本上是表扫描)。查询计划不会有任何差异。
如果Y> X只增加了一点点,然后速度也差不多。
然而,如果Y>> X(意味着 Y 的行数比 X 多很多),那么 LIMIT 版本可能会更慢。不是因为查询计划——同样应该是相同的——而只是因为数据布局的内部结构可能有更多的层次。例如,如果数据存储为树上的叶子,则可能有更多的树级别,因此访问相同数量的页面可能需要稍多的时间。
换句话说,1000 行可以存储在 10 个页面的 1 个树级别中。 1000000 行可以存储在 10000 个页面的 3-4 个树级别中。即使从这 10000 个页面中只取出 10 个页面,存储引擎仍然需要经过 3-4 个树级别,这可能需要稍长的时间。
现在,如果存储引擎按顺序或以链表的形式存储数据页,那么执行速度不会有任何差异。
Assuming both tables are equivalent in terms of index, row-sizing and other structures. Also assuming that you are running that simple SELECT statement. If you have an
ORDER BY
clause in your SQL statements, then obviously the larger table will be slower. I suppose you're not asking that.If X = Y, then obviously they should run in similar speed, since the query engine will be going through the records in exactly the same order -- basically a table scan -- for this simple SELECT statement. There will be no difference in query plan.
If Y > X only by a little bit, then also similar speed.
However, if Y >> X (meaning Y has many many more rows than X), then the LIMIT version MAY be slower. Not because of query plan -- again should be the same -- but simply because that the internal structure of data layout may have several more levels. For example, if data is stored as leafs on a tree, there may be more tree levels, so it may take slightly more time to access the same number of pages.
In other words, 1000 rows may be stored in 1 tree level in 10 pages, say. 1000000 rows may be stored in 3-4 tree levels in 10000 pages. Even when taking only 10 pages from those 10000 pages, the storage engine still has to go through 3-4 tree levels, which may take slightly longer.
Now, if the storage engine stores data pages sequentially or as a linked list, say, then there will be no difference in execution speed.
只要您不指定任何字段、不排序以及所有记录,它就会近似线性。但这并没有给你带来太多好处。一旦你的查询想要做一些有用的事情,它就会崩溃。
如果您打算得出一些有用的结论并告诉我们在某些情况下如何使用它来做出设计选择,这会更有趣。
感谢您的澄清。
根据我的经验,具有真实用户的真实应用程序很少有有趣或有用的查询来返回整个百万行表。用户想了解他们自己的活动,或者特定的论坛主题等。因此,除非您的情况很不寻常,否则当您真正掌握了他们的选择标准时,您将讨论合理的结果大小。
无论如何,用户将无法对数百行以上的行执行任何有用的操作,传输它们将花费很长时间,并且他们无法以任何合理的方式滚动浏览它们。
MySQL 具有 LIMIT 和 OFFSET(起始记录 #)修饰符,主要用于创建分页列表块的确切目的,如您所描述的。
在你用完这个和一堆其他策略之前,开始考虑模式设计和记录清除会适得其反。在这种情况下,不要解决您还没有遇到的问题。实际上,几百万行的表并不大,只要索引正确即可。
It would be approximately linear, as long as you specify no fields, no ordering, and all the records. But that doesn't buy you much. It falls apart as soon as your query wants to do something useful.
This would be quite a bit more interesting if you intended to draw some useful conclusion and tell us about the way it would be used to make a design choice in some context.
Thanks for the clarification.
In my experience, real applications with real users seldom have interesting or useful queries that return entire million-row tables. Users want to know about their own activity, or a specific forum thread, etc. So unless yours is an unusual case, by the time you've really got their selection criteria in hand, you'll be talking about reasonable result sizes.
In any case, users wouldn't be able to do anything useful with many rows over several hundred, transporting them would take a long time, and they couldn't scroll through it in any reasonable way.
MySQL has the LIMIT and OFFSET (starting record #) modifiers primarlly for the exact purpose of creating chunks of a list for paging as you describe.
It's way counterproductive to start thinking about schema design and record purging until you've used up this and a bunch of other strategies. In this case don't solve problems you don't have yet. Several-million-row tables are not big, practically speaking, as long as they are correctly indexed.