雪花:在层次数据中的旗帜无限循环,而无需在雪花中使用光标

发布于 2025-01-28 20:51:34 字数 3613 浏览 3 评论 0原文

我正在努力解决SQL代码性能问题,同时试图提前识别具有大约1.5 mo行候选的数据集的循环引用。

在进入自定义解决方案之前,我通过文档检查了Snowflake Connect,但是他们当前的实现具有一些不支持的功能。

因此,我不得不使用B计划移动,该计划涉及使用光标进行雪橇,但是即使在调整了我的输入数据集后,性能也确实很糟糕,在该数据集将记录数减少一半的情况下(默认情况下,它们不是循环引用,因为它们共享,同一父id,或者它们是没有任何父级的根记录)。

我四处寻找任何不涉及光标的解决方案,但是SQL Server Code将其转移到雪花上不符合我的期望。

我仍然留下了C计划:预先对我们要管理的深度水平进行了硬编码,但这绝对不是我首选的选择。

这里的处理速度是必须拥有的,因为将来的流程每天都会在生产中运行,并且数据量超过2000万行。

-- DUMMY table
CREATE OR REPLACE TRANSIENT TABLE DUMMY_HIERARCHY COPY GRANTS
(
    TERR_CD VARCHAR(6),
    CUST_NO VARCHAR(100),
    PAR_CUST_NO VARCHAR(100),
    IS_CYCLE BOOLEAN DEFAULT FALSE
);

-- DUMMY records ingestion
INSERT OVERWRITE INTO DUMMY_HIERARCHY  (TERR_CD,CUST_NO,PAR_CUST_NO,IS_CYCLE) VALUES
('1100A','ABDD12346','ABCD12345',TRUE),
('1100A','ABCD12345','ABDD12346',TRUE),
('1200A','ABDD12346',NULL,FALSE),
('1200A','ABCD12345',NULL,FALSE),
('1300A','ABDD12346',NULL,FALSE),
('1300A','888888888',NULL,FALSE),
('3100A','ABDD12346',NULL,FALSE),
('3200B','FFF789635',NULL,FALSE),
('3200B','ABCD5698',NULL,FALSE),
('3200B','ABCD5698',NULL,FALSE),
('3200B','888888888',NULL,FALSE),
('3200B','9999998',NULL,FALSE),
('3300C','ABDD12346',NULL,FALSE),
('5500C','888888888',NULL,FALSE),
('5510C','ABDD12346',NULL,FALSE),
('5700C','12345','PPF0001',FALSE),
('5300B','88DEF8770','89DIF9990',FALSE),
('5300B','88DEF8778','89DIF9990',FALSE),
('5300B','89DIF9990',NULL,FALSE),
('5300B','88DIF9770','89DIF9990',FALSE),
('7701D','ZZZ014445','ZZZ012345',TRUE),
('7701D','ZZZ012345','ZZZ014445',TRUE),
('7701D','DFGERT345',NULL,FALSE),
('7801A','ERT47890','AP126400',TRUE),
('7801A','AP123400','AP126400',TRUE),
('7801A','AP126400','AP123400',TRUE),
('7902A','ABCDEFG','XYZ234',TRUE),
('7902A','XYZ234','ABCDEFG',TRUE),
('7851A','FFF789635',NULL,FALSE),
('7851A','100001A',NULL,FALSE),
('7908B','12345',NULL,FALSE),
('7909A','9999998',NULL,FALSE),
('8801C','12345',NULL,FALSE),
('9901D','12345',NULL,FALSE),
('9901D','FFF789635','100001A',FALSE),
('9901D','100001A','12345',FALSE),
('9901D','ABCD5698','888888888',FALSE),
('9901D','888888888','9999998',FALSE),
('9901D','9999998','FFF789635',FALSE);

-- Anonymous block to perform row by row processing without using a SPROC
DECLARE
    -- Get CH records to be processed
    cItems CURSOR FOR SELECT TERR_CD, ANY_VALUE(CUST_NO) AS CUST_NO, PAR_CUST_NO FROM DUMMY_HIERARCHY WHERE NOT IS_CYCLE AND PAR_CUST_NO IS NOT NULL GROUP BY TERR_CD, PAR_CUST_NO;
    vTCD VARCHAR(8);
    vCN VARCHAR(100);
    vPCN VARCHAR(100);
    vNextTCD VARCHAR(8);
    vNextCN VARCHAR(100);
    vNextPCN VARCHAR(100);
    vOutput VARCHAR;
BEGIN
        
        -- Row by row processing
    FOR item IN cItems DO
        vTCD := item.TERR_CD;
        vCN := item.CUST_NO;
        vPCN := item.PAR_CUST_NO;
        vOutput := 'OK';
        -- recreate ARRAY
        LOOP
            SELECT TERR_CD, ANY_VALUE(CUST_NO) AS CUST_NO, PAR_CUST_NO INTO :vNextTCD, :vNextCN, :vNextPCN FROM DUMMY_HIERARCHY WHERE NOT IS_CYCLE AND PAR_CUST_NO IS NOT NULL AND CUST_NO =:vCN AND TERR_CD =:vTCD GROUP BY TERR_CD, PAR_CUST_NO;
            IF (vNextPCN IS NOT NULL) THEN
                vCN := vNextPCN;
                vTCD := vNextTCD;
            ELSE
                vOutput := vNextTCD || '+'|| vNextCN || '+'|| COALESCE(vNextPCN,'') || ' KO';
                BREAK;
            END IF;
        END LOOP; 
    END FOR;
    RETURN vOutput;
END;

注意:我确实有一个单独的版本的代码,其中voutput是一个数组,我依靠(不是array_contains(vnextpcn :: variant,votput),vnextpcn not vnextpcn)然后) 要执行Arrey_append,以跟踪我们已经进行的CUST NOS。当发现时,我们需要退出循环并将错误消息或悲伤记录提出到单独的表中。

预先感谢您的帮助和/或建议。

丹尼尔

I'm struggling with a SQL code performance issue, while trying to identify in advance circular references of a dataset with approximately 1.5 MO rows candidates.

Before heading into a custom solution, I checked Snowflake CONNECT BY documentation but their current implementation has some unsupported features.

So I had to move with B Plan which involves using a cursor in Snowscripting, but the performance is really awful even after tweaking my input dataset WHERE condition to reduce the number of records by half (by default they're not circular references since they share the same parent id or they're a root record without any parent).

I've looked around for any solution that would not involve a cursor, but SQL Server code transpose into Snowflake did not meet my expectations.

I'm still left the C plan: Hardcoded in advance the level of depth we'll be managing, but this is definitively not my preferred option.

Processing speed here is a must have since future process will run on a daily basis in production, with a volume of data exceeding 20 million rows.

-- DUMMY table
CREATE OR REPLACE TRANSIENT TABLE DUMMY_HIERARCHY COPY GRANTS
(
    TERR_CD VARCHAR(6),
    CUST_NO VARCHAR(100),
    PAR_CUST_NO VARCHAR(100),
    IS_CYCLE BOOLEAN DEFAULT FALSE
);

-- DUMMY records ingestion
INSERT OVERWRITE INTO DUMMY_HIERARCHY  (TERR_CD,CUST_NO,PAR_CUST_NO,IS_CYCLE) VALUES
('1100A','ABDD12346','ABCD12345',TRUE),
('1100A','ABCD12345','ABDD12346',TRUE),
('1200A','ABDD12346',NULL,FALSE),
('1200A','ABCD12345',NULL,FALSE),
('1300A','ABDD12346',NULL,FALSE),
('1300A','888888888',NULL,FALSE),
('3100A','ABDD12346',NULL,FALSE),
('3200B','FFF789635',NULL,FALSE),
('3200B','ABCD5698',NULL,FALSE),
('3200B','ABCD5698',NULL,FALSE),
('3200B','888888888',NULL,FALSE),
('3200B','9999998',NULL,FALSE),
('3300C','ABDD12346',NULL,FALSE),
('5500C','888888888',NULL,FALSE),
('5510C','ABDD12346',NULL,FALSE),
('5700C','12345','PPF0001',FALSE),
('5300B','88DEF8770','89DIF9990',FALSE),
('5300B','88DEF8778','89DIF9990',FALSE),
('5300B','89DIF9990',NULL,FALSE),
('5300B','88DIF9770','89DIF9990',FALSE),
('7701D','ZZZ014445','ZZZ012345',TRUE),
('7701D','ZZZ012345','ZZZ014445',TRUE),
('7701D','DFGERT345',NULL,FALSE),
('7801A','ERT47890','AP126400',TRUE),
('7801A','AP123400','AP126400',TRUE),
('7801A','AP126400','AP123400',TRUE),
('7902A','ABCDEFG','XYZ234',TRUE),
('7902A','XYZ234','ABCDEFG',TRUE),
('7851A','FFF789635',NULL,FALSE),
('7851A','100001A',NULL,FALSE),
('7908B','12345',NULL,FALSE),
('7909A','9999998',NULL,FALSE),
('8801C','12345',NULL,FALSE),
('9901D','12345',NULL,FALSE),
('9901D','FFF789635','100001A',FALSE),
('9901D','100001A','12345',FALSE),
('9901D','ABCD5698','888888888',FALSE),
('9901D','888888888','9999998',FALSE),
('9901D','9999998','FFF789635',FALSE);

-- Anonymous block to perform row by row processing without using a SPROC
DECLARE
    -- Get CH records to be processed
    cItems CURSOR FOR SELECT TERR_CD, ANY_VALUE(CUST_NO) AS CUST_NO, PAR_CUST_NO FROM DUMMY_HIERARCHY WHERE NOT IS_CYCLE AND PAR_CUST_NO IS NOT NULL GROUP BY TERR_CD, PAR_CUST_NO;
    vTCD VARCHAR(8);
    vCN VARCHAR(100);
    vPCN VARCHAR(100);
    vNextTCD VARCHAR(8);
    vNextCN VARCHAR(100);
    vNextPCN VARCHAR(100);
    vOutput VARCHAR;
BEGIN
        
        -- Row by row processing
    FOR item IN cItems DO
        vTCD := item.TERR_CD;
        vCN := item.CUST_NO;
        vPCN := item.PAR_CUST_NO;
        vOutput := 'OK';
        -- recreate ARRAY
        LOOP
            SELECT TERR_CD, ANY_VALUE(CUST_NO) AS CUST_NO, PAR_CUST_NO INTO :vNextTCD, :vNextCN, :vNextPCN FROM DUMMY_HIERARCHY WHERE NOT IS_CYCLE AND PAR_CUST_NO IS NOT NULL AND CUST_NO =:vCN AND TERR_CD =:vTCD GROUP BY TERR_CD, PAR_CUST_NO;
            IF (vNextPCN IS NOT NULL) THEN
                vCN := vNextPCN;
                vTCD := vNextTCD;
            ELSE
                vOutput := vNextTCD || '+'|| vNextCN || '+'|| COALESCE(vNextPCN,'') || ' KO';
                BREAK;
            END IF;
        END LOOP; 
    END FOR;
    RETURN vOutput;
END;

Note: I do have a separate version of the code where vOutput is an ARRAY and I rely on IF (NOT ARRAY_CONTAINS(vNextPCN::variant,vOutput) AND (vNextPCN IS NOT NULL)) THEN
to perform an ARREY_APPEND to keep track of the Cust Nos we've procesed already. When found we need to exit the loop and raise an error message or grief record into a separate table.

Thanks a lot in advance for your help and/or suggestions.

Daniel

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文