提高这个string_agg的速度?

发布于 2025-01-11 09:04:18 字数 901 浏览 0 评论 0原文

我有以下形状的数据:

BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols

然而,当我尝试执行 string_agg 时,我的代码将花费 10 多分钟才能在 500 行上执行。我该如何改进这个查询(数据不可用)。

select
    BOM.*,
    childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
    SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
    FROM (
        SELECT DISTINCT PROJECT3.NAME
    FROM [dbo].[Project] PROJECT3
    LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
    LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
    WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3

I have data of the following shape:

BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols

Yet when I try to do string_agg, my code will take me well over 10 minutes to execute on 500 rows. How can I improve this query (the data is not available).

select
    BOM.*,
    childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
    SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
    FROM (
        SELECT DISTINCT PROJECT3.NAME
    FROM [dbo].[Project] PROJECT3
    LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
    LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
    WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

紙鸢 2025-01-18 09:04:18

您拥有的子查询(子查询内)有一种代码“气味”,它是有意编写的,但不正确。

首先,子查询中有 2 个 LEFT JOIN,但是,别名为 P3PP3 的表都需要有一个非<代码>NULL值;如果没有找到相关行,这是不可能的。这意味着JOIN是隐式的INNER JOIN

接下来,当从多个表中进行SELECT时,您会对单个列进行DISTINCT;这似乎是错误的。 DISTINCT 的成本非常,而且您使用它的事实意味着 NAME 不是唯一的,或者是由于您的隐式 INNER JOIN 造成的你会得到重复的行。我认为是后者。因此,很可能您实际上应该使用 EXISTS,而不是 LEFT JOIN INNER JOIN

以下内容很大程度上是猜测,但我怀疑它的性能会更高。

SELECT BOM.*, --Replace this with an explicit list of the columns you need
       SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
     CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
                  FROM dbo.Project Prj --Don't use an alias that is longer than the object name
                  WHERE EXISTS (SELECT 1
                                FROM dbo.Part P
                                     JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
                                WHERE PP.Related_ID = Prg.ID
                                  AND P.ITEM_NUMBER = BOM.childParentPart
                                  AND P.Current = 1)) SA;

The subquery (within a subquery) you have has a code "smell" to it that it's been written with intention, but not correctly.

Firstly you have 2 LEFT JOINs in the subquery, however, both the tables aliased as P3 and PP3 are required to have a non-NULL value; that is impossible if no related row is found. This means the JOINs are implicit INNER JOINs.

Next you have a DISTINCT against a single column when SELECTing from multiple tables; this seems wrong. DISTINCT is very expensive and the fact you are using it implies that either NAME is not unique or that due to your implicit INNER JOINs you are getting duplicate rows. I assume it's the latter. As a results, very likely you should actually be using an EXISTS, not LEFT JOINs INNER JOINs.

The following is very much a guess, but I suspect it will be more performant.

SELECT BOM.*, --Replace this with an explicit list of the columns you need
       SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
     CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
                  FROM dbo.Project Prj --Don't use an alias that is longer than the object name
                  WHERE EXISTS (SELECT 1
                                FROM dbo.Part P
                                     JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
                                WHERE PP.Related_ID = Prg.ID
                                  AND P.ITEM_NUMBER = BOM.childParentPart
                                  AND P.Current = 1)) SA;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文