如何连接到第一行
我将使用一个具体但假设的例子。
每个订单通常只有一个行项目:
订单:
OrderGUID OrderNumber
========= ============
{FFB2...} STL-7442-1
{3EC6...} MPT-9931-8A
行项目:
LineItemGUID Order ID Quantity Description
============ ======== ======== =================================
{098FBE3...} 1 7 prefabulated amulite
{1609B09...} 2 32 spurving bearing
但偶尔会有一个订单有两个行项目:
LineItemID Order ID Quantity Description
========== ======== ======== =================================
{A58A1...} 6,784,329 5 pentametric fan
{0E9BC...} 6,784,329 5 differential girdlespring
通常在向用户显示订单时:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN LineItems
ON Orders.OrderID = LineItems.OrderID
我想显示订单上的单个项目。但是,如果订单偶尔包含两个(或多个)项目,订单将出现重复:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 spurving bearing
KSG-0619-81 5 panametric fan
KSG-0619-81 5 differential girdlespring
我真正想要的是让 SQL Server只选择一个< /em>,因为它会足够好:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan
如果我喜欢冒险,我可能会向用户展示一个省略号来表明有多个:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan, ...
所以问题是如何
- 消除“重复”行
- 仅连接到其中一行,以避免重复
第一次尝试
我的第一次天真的尝试是仅连接到“TOP 1”行项目:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN (
SELECT TOP 1 LineItems.Quantity, LineItems.Description
FROM LineItems
WHERE LineItems.OrderID = Orders.OrderID) LineItems2
ON 1=1
但这给出了错误:
列或前缀“订单”不存在
与表名或别名匹配
在查询中使用。
大概是因为内部选择看不到外部表。
I'll use a concrete, but hypothetical, example.
Each Order normally has only one line item:
Orders:
OrderGUID OrderNumber
========= ============
{FFB2...} STL-7442-1
{3EC6...} MPT-9931-8A
LineItems:
LineItemGUID Order ID Quantity Description
============ ======== ======== =================================
{098FBE3...} 1 7 prefabulated amulite
{1609B09...} 2 32 spurving bearing
But occasionally there will be an order with two line items:
LineItemID Order ID Quantity Description
========== ======== ======== =================================
{A58A1...} 6,784,329 5 pentametric fan
{0E9BC...} 6,784,329 5 differential girdlespring
Normally when showing the orders to the user:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN LineItems
ON Orders.OrderID = LineItems.OrderID
I want to show the single item on the order. But with this occasional order containing two (or more) items, the orders would appear be duplicated:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 spurving bearing
KSG-0619-81 5 panametric fan
KSG-0619-81 5 differential girdlespring
What I really want is to have SQL Server just pick one, as it will be good enough:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan
If I get adventurous, I might show the user, an ellipsis to indicate that there's more than one:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan, ...
So the question is how to either
- eliminate "duplicate" rows
- only join to one of the rows, to avoid duplication
First attempt
My first naive attempt was to only join to the "TOP 1" line items:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN (
SELECT TOP 1 LineItems.Quantity, LineItems.Description
FROM LineItems
WHERE LineItems.OrderID = Orders.OrderID) LineItems2
ON 1=1
But that gives the error:
The column or prefix 'Orders' does not
match with a table name or alias name
used in the query.
Presumably because the inner select doesn't see the outer table.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
对于 2005 年之前的 SQL Server 版本,您需要使用
INNER JOIN
而不是CROSS APPLY
:请注意
TOP 1
不带>ORDER BY
不是确定性的:此查询将为每个订单提供一个订单项,但未定义它是哪一个。多次调用查询可以为同一订单提供不同的行项目,即使底层证券没有更改。
如果您想要确定性顺序,则应将
ORDER BY
子句添加到最里面的查询。示例 sqlfiddle
For SQL Server versions prior to 2005, you need to use an
INNER JOIN
instead of aCROSS APPLY
:Please note that
TOP 1
withoutORDER BY
is not deterministic: this query you will get you one line item per order, but it is not defined which one will it be.Multiple invocations of the query can give you different line items for the same order, even if the underlying did not change.
If you want deterministic order, you should add an
ORDER BY
clause to the innermost query.Example sqlfiddle
我知道这个问题不久前已经得到回答,但是在处理大型数据集时,嵌套查询的成本可能很高。这是一个不同的解决方案,其中嵌套查询仅运行一次,而不是针对返回的每一行运行。
I know this question was answered a while ago, but when dealing with large data sets, nested queries can be costly. Here is a different solution where the nested query will only be ran once, instead of for each row returned.
@Quassnoi 答案很好,在某些情况下(特别是如果外表很大),更有效的查询可能是使用窗口函数,如下所示:
有时您只需 需要测试哪个查询提供更好的性能。
@Quassnoi answer is good, in some cases (especially if the outer table is big), a more efficient query might be with using windowed functions, like this:
Sometimes you just need to test which query gives better performance.
您可以这样做:
这需要在
LineItems.LineItemID
上有一个索引(或主键),在LineItems.OrderID
上有一个索引,否则速度会很慢。You could do:
This requires an index (or primary key) on
LineItems.LineItemID
and an index onLineItems.OrderID
or it will be slow.从 SQL Server 2012 开始,我认为这可以解决问题:
From SQL Server 2012 and onwards I think this will do the trick:
,另一种使用公共表表达式的方法:
或者,最后您可能想显示所有连接的行?
此处以逗号分隔的版本:
,Another aproach using common table expression:
or, in the end maybe you would like to show all rows joined?
comma separated version here:
相关子查询是依赖于外部查询的子查询。这就像 SQL 中的 for 循环。子查询将为外部查询中的每一行运行一次:
Correlated sub queries are sub queries that depend on the outer query. It’s like a for loop in SQL. The sub-query will run once for each row in the outer query:
我最喜欢的运行此查询的方法是使用 not contains 子句。我相信这是运行此类查询的最有效方法:
但我尚未针对此处建议的其他方法测试此方法。
My favorite way to run this query is with a not exists clause. I believe this is the most efficient way to run this sort of query:
But I have not tested this method against other methods suggested here.
编辑:没关系,Quassnoi 有更好的答案。
对于 SQL2K,类似这样:
EDIT: nevermind, Quassnoi has a better answer.
For SQL2K, something like this:
尝试过交叉,效果很好,但需要更长的时间。调整行列以具有最大和添加的组,从而保持速度并删除额外的记录。
这是调整后的查询:
Tried the cross, works nicely, but takes slightly longer. Adjusted line columns to have max and added group which kept speed and dropped the extra record.
Here's the adjusted query: