提高这个string_agg的速度?
我有以下形状的数据:
BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols
然而,当我尝试执行 string_agg
时,我的代码将花费 10 多分钟才能在 500 行上执行。我该如何改进这个查询(数据不可用)。
select
BOM.*,
childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
FROM (
SELECT DISTINCT PROJECT3.NAME
FROM [dbo].[Project] PROJECT3
LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3
I have data of the following shape:
BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols
Yet when I try to do string_agg
, my code will take me well over 10 minutes to execute on 500 rows. How can I improve this query (the data is not available).
select
BOM.*,
childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
FROM (
SELECT DISTINCT PROJECT3.NAME
FROM [dbo].[Project] PROJECT3
LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您拥有的子查询(子查询内)有一种代码“气味”,它是有意编写的,但不正确。
首先,子查询中有 2 个
LEFT JOIN
,但是,别名为P3
和PP3
的表都需要有一个非<代码>NULL值;如果没有找到相关行,这是不可能的。这意味着JOIN
是隐式的INNER JOIN
。接下来,当从多个表中进行
SELECT
时,您会对单个列进行DISTINCT
;这似乎是错误的。DISTINCT
的成本非常,而且您使用它的事实意味着NAME
不是唯一的,或者是由于您的隐式INNER JOIN 造成的
你会得到重复的行。我认为是后者。因此,很可能您实际上应该使用EXISTS
,而不是LEFT JOIN
INNER JOIN
。以下内容很大程度上是猜测,但我怀疑它的性能会更高。
The subquery (within a subquery) you have has a code "smell" to it that it's been written with intention, but not correctly.
Firstly you have 2
LEFT JOIN
s in the subquery, however, both the tables aliased asP3
andPP3
are required to have a non-NULL
value; that is impossible if no related row is found. This means theJOIN
s are implicitINNER JOIN
s.Next you have a
DISTINCT
against a single column whenSELECT
ing from multiple tables; this seems wrong.DISTINCT
is very expensive and the fact you are using it implies that eitherNAME
is not unique or that due to your implicitINNER JOIN
s you are getting duplicate rows. I assume it's the latter. As a results, very likely you should actually be using anEXISTS
, notLEFT JOINs
INNER JOIN
s.The following is very much a guess, but I suspect it will be more performant.