如何使用行组中的值根据另一个表更新一个表?

发布于 2025-01-11 16:56:48 字数 3614 浏览 0 评论 0原文

我有两个表:

CREATE TABLE Employee (
   Site       ????         ????,
   WorkTypeId char(2)  NOT NULL,
   Emp_NO     int      NOT NULL,
   "Date"     ????     NOT NULL  
);

CREATE TABLE PTO (
   Site       ????         ????,
   WorkTypeId char(2)      NULL,
   Emp_NO     int      NOT NULL,
   "Date"     ????     NOT NULL  
);

我想更新 PTOWorkTypeId 列中的值:

  1. Employee 中的 EMP NO (查找表)和 PTO 应匹配。
  2. 应仅从该月的第一次出现中选取单个 WorkTypeId 值。

例如,给定以下示例输入数据:

TABLE Employee:

SiteWorkTypeIdEmp_NODate
5015MB10052022-02-01
5015MI10052022-02-04
5015PO10052022 -02-04
5015ME20032022-01-01
5015TT20032022-01-10

表 PTO

站点WorkTypeIdEmp_NO日期
501510052022-02-03
501510052022-02-14
501420032022-01-09

例如:

  • 给定 EmployeeEmp_NO = 1005...
    • ...Employee 表中该 Emp_NO 有 3 行,具有 3 个不同的 WorkTypeId 值,但 日期值。
    • 因此,选择最早的 Date (2022-02-01) 的 WorkTypeId 值,即 'MB'
    • 因此,Emp_NO 得到 WorkTypeId = 'MB'
    • 并使用该单个值填充 PTO 表中 1005WorkTypeId 单元格。
    • 还可以按月进行匹配

因此,PTO 表中的预期输出为

SiteWorkTypeIdEmp_NODate
5015MB10052022-02-03
5015MB10052022-02-14
5014ME20032022-01-09

I have two tables:

CREATE TABLE Employee (
   Site       ????         ????,
   WorkTypeId char(2)  NOT NULL,
   Emp_NO     int      NOT NULL,
   "Date"     ????     NOT NULL  
);

CREATE TABLE PTO (
   Site       ????         ????,
   WorkTypeId char(2)      NULL,
   Emp_NO     int      NOT NULL,
   "Date"     ????     NOT NULL  
);

I would like to update values in PTO's WorkTypeId column:

  1. EMP NO in Employee (the lookup table) and PTO should match.
  2. A single WorkTypeId value should be picked from only the first occurrence of the month.

For example, given this sample input data:

TABLE Employee:

SiteWorkTypeIdEmp_NODate
5015MB10052022-02-01
5015MI10052022-02-04
5015PO10052022-02-04
5015ME20032022-01-01
5015TT20032022-01-10

TABLE PTO:

SiteWorkTypeIdEmp_NODate
501510052022-02-03
501510052022-02-14
501420032022-01-09

For example:

  • Given Employee with Emp_NO = 1005...
    • ...there are 3 rows for that Emp_NO in the Employee table, with 3 distinct WorkTypeId values, but differing Date values.
    • So pick the WorkTypeId value for the earliest Date (2022-02-01), which is 'MB'
    • So Emp_NO gets WorkTypeId = 'MB'.
    • And use that single value to fill 1005's WorkTypeId cells in the PTO table.
    • But also match by month.

So the expected output in the PTO table is

SiteWorkTypeIdEmp_NODate
5015MB10052022-02-03
5015MB10052022-02-14
5014ME20032022-01-09

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

世俗缘 2025-01-18 16:56:49

更新 2002-03-05

将其留在这里供后代使用,但我建议阅读 Dai 的精彩文章解决这个问题的不同方法


尝试交叉应用来获取第一个具有匹配月份和年份的员工记录。

注意:使用 OUTER APPLY 始终返回所有 PTO 记录,即使没有找到匹配的 WorkTypeId。

SELECT p.Site
       , e.WorkTypeId
       , p.Emp_No
       , p.[Date]
FROM  PTO p CROSS APPLY 
        (
          SELECT TOP 1 WorkTypeId
          FROM   Employee e 
          WHERE  e.Emp_No = p.Emp_No
          AND    MONTH(e.[Date]) = MONTH(p.[Date])
          AND    YEAR(e.[Date]) = YEAR(p.[Date])
          ORDER BY [Date] ASC
        )e

结果:

Site | WorkTypeId | Emp_No | Date      
---: | :--------- | -----: | :---------
5015 | MB         |   1005 | 2022-02-03
5015 | MB         |   1005 | 2022-02-14
5014 | ME         |   2003 | 2022-01-09

db<>fiddle 此处

Update 2002-03-05

Leaving this here for posterity, but I'd recommend reading Dai's excellent write up on different approaches to this problem.


Try a CROSS APPLY to grab the first Employee record with a matching month and year.

Note: Use OUTER APPLY to always return all PTO records, even when no matching WorkTypeId was found.

SELECT p.Site
       , e.WorkTypeId
       , p.Emp_No
       , p.[Date]
FROM  PTO p CROSS APPLY 
        (
          SELECT TOP 1 WorkTypeId
          FROM   Employee e 
          WHERE  e.Emp_No = p.Emp_No
          AND    MONTH(e.[Date]) = MONTH(p.[Date])
          AND    YEAR(e.[Date]) = YEAR(p.[Date])
          ORDER BY [Date] ASC
        )e

Results:

Site | WorkTypeId | Emp_No | Date      
---: | :--------- | -----: | :---------
5015 | MB         |   1005 | 2022-02-03
5015 | MB         |   1005 | 2022-02-14
5014 | ME         |   2003 | 2022-01-09

db<>fiddle here

尘世孤行 2025-01-18 16:56:48

从与 GROUP BY 查询中的 MIN/MAX 表达式中使用的列不同获取值在 SQL 中仍然是一件令人惊讶的困难事情,虽然 SQL 语言(和 SQL Server)的现代版本使它变得更容易,但它们对大多数人来说完全不明显且违反直觉,因为它必然涉及更高级的主题,例如CTE,派生表(又名内部查询)、自连接和窗口函数,尽管查询在概念上很简单。

无论如何,与现代 SQL 一样,通常有 3 或 4 种不同的方法来完成相同的任务,但也有一些陷阱。

前言:

  • 由于SiteDateYearMonth都是T-SQL中的关键字,我用双引号对它们进行了转义,这是符合 ISO/ANSI SQL 标准的转义保留字的方法。

    • SQL Server 默认支持此功能。如果(出于某些不正当的原因)您设置了SET QUOTED IDENTIFIER OFF,则将双引号更改为方括号:[]
  • 我假设 两个表中的 Site 列只是一个普通的“ol”数据列,如下所示:

    • 它不是 PRIMARY KEY 成员列。
    • 不应将其用作GROUP BY
    • 不应在 JOIN 谓词中使用它。
  • 以下所有方法均假定此数据库状态:

CREATE TABLE "Employee" (
    "Site"     int      NOT NULL,
    WorkTypeId char(2)  NOT NULL,
    Emp_NO     int      NOT NULL,
    "Date"     date     NOT NULL  
);

CREATE TABLE "PTO" (
    "Site"     int      NOT NULL,
    WorkTypeId char(2)      NULL,
    Emp_NO     int      NOT NULL,
    "Date"     date     NOT NULL  
);

GO

INSERT INTO "Employee" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, 'MB', 1005, '2022-02-01' ),
( 5015, 'MI', 1005, '2022-02-04' ),
( 5015, 'PO', 1005, '2022-02-04' ),
( 5015, 'ME', 2003, '2022-01-01' ),
( 5015, 'TT', 2003, '2022-01-10' );

INSERT INTO "PTO" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, NULL, 1005, '2022-02-03' ),
( 5015, NULL, 1005, '2022-02-14' ),
( 5014, NULL, 2003, '2022-01-09' );
  • 这两种方法都定义扩展 EmployeePTO 的 CTE ep分别添加计算的 "Year""Month" 列,从而避免重复使用 YEAR( "Date" ) AS "Year"GROUP BY中并且JOIN 表达式。
    • 如果可以的话,我建议您将它们添加为基表中的计算列,因为无论如何它们通常都会有用。不要忘记对它们进行适当的索引。

方法 1:使用基本聚合组合 CTE,然后更新:

WITH
-- Step 1: Extend both the `Employee` and `PTO` tables with YEAR and MONTH columns (this simplifies things later on):
e AS (
    SELECT
        Emp_No,
        "Site",
        WorkTypeId,
        "Date",

        YEAR( "Date" ) AS "Year",
        MONTH( "Date" ) AS "Month"
    FROM
        Employee
),
p AS (
    SELECT
        Emp_No,
        "Site",
        WorkTypeId,
        "Date",

        YEAR( "Date" ) AS "Year",
        MONTH( "Date" ) AS "Month"
    FROM
        PTO
),
-- Step 2: Get the MIN( "Date" ) value for each group:
minDatesForEachEmployeeMonthYearGroup AS (
    SELECT
        e.Emp_No,
        e."Year",
        e."Month",

        MIN( "Date" ) AS "FirstDate"
    FROM
        e
    GROUP BY
        e.Emp_No,
        e."Year",
        e."Month"
),
-- Step 3: INNER JOIN back on `e` to get the first WorkTypeId in each group:
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (
    /* WARNING: This query will fail if multiple rows (for the same Emp_NO, Year and Month) have the same "Date" value. This can be papered-over with GROUP BY and MIN, but I don't think that's a good idea at all). */
    SELECT
        e.Emp_No,
        e."Year",
        e."Month",

        e.WorkTypeId AS FirstWorkTypeId
    FROM
        e
        INNER JOIN minDatesForEachEmployeeMonthYearGroup AS q ON
            e.Emp_NO = q.Emp_NO
            AND
            e."Date" = q.FirstDate
)
-- Step 4: Do the UPDATE.
-- *Yes*, you can UPDATE a CTE (provided the CTE is "simple" and has a 1:1 mapping back to source rows on-disk).
UPDATE
    p
SET
    p.WorkTypeId = f.FirstWorkTypeId
FROM
    p
    INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
        p.Emp_No = f.Emp_No
        AND
        p."Year" = f."Year"
        AND
        p."Month" = f."Month"
WHERE
    p.WorkTypeId IS NULL;

这是 SSMS 的屏幕截图,显示了上述查询运行之前和之后 PTO 表的内容:

< a href="https://i.sstatic.net/5eQxN.png" rel="nofollow noreferrer">在此处输入图像描述

方法 2:使用 FIRST_VALUE 跳过自JOIN

此方法提供了更短、稍微简单的查询,但需要SQL Server 2012 或更高版本(并且您的数据库在兼容级别 110 或更高版本中运行)。

令人惊讶的是,您不能在以下位置使用 FIRST_VALUE GROUP BY 查询,尽管它与 MIN 明显相似,但可以使用 SELECT DISTINCT 构建等效查询

WITH
-- Step 1: Extend the `Employee` table with YEAR and MONTH columns:
e AS (
    SELECT
        Emp_No,
        "Site",
        WorkTypeId,
        "Date",

        YEAR( "Date" ) AS "Year",
        MONTH( "Date" ) AS "Month"
    FROM
        Employee
),
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (

    SELECT
        DISTINCT
        e.Emp_No,
        e."Year",
        e."Month",
        FIRST_VALUE( WorkTypeId ) OVER (
            PARTITION BY
                Emp_No,
                e."Year",
                e."Month"
            ORDER BY
                "Date" ASC
        ) AS FirstWorkTypeId
    FROM
        e
)
-- Step 3: UPDATE PTO:
UPDATE
    p
SET
    p.WorkTypeId = f.FirstWorkTypeId
FROM
    PTO AS p
    INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
        p.Emp_No = f.Emp_No
        AND
        YEAR( p."Date" ) = f."Year"
        AND
        MONTH( p."Date" ) = f."Month"
WHERE
    p.WorkTypeId IS NULL;

:运行后 SELECT * FROM PTO 给出了与方法 2 完全相同的输出。

方法 2b,但更短:

这样 @SOS 就不会觉得自己太自鸣得意了。 SQL 比我的要短得多

Getting a value from a column different to the column used in a MIN/MAX expression in a GROUP BY query still remains a surprisingly difficult thing to do in SQL, and while modern versions of the SQL language (and SQL Server) make it easier, they're completely non-obvious and counter-intuitive to most people as it necessarily involves more advanced topics like CTEs, derived-tables (aka inner-queries), self-joins and windowing-functions despite the conceptually simple nature of the query.

Anyway, as-ever in modern SQL, there's usually 3 or 4 different ways to accomplish the same task, with a few gotchas.

Preface:

  • As Site, Date, Year, and Month are all keywords in T-SQL, I've escaped them with double-quotes, which is the ISO/ANSI SQL Standards compliant way to escape reserved words.

    • SQL Server supports this by default. If (for some ungodly reason) you have SET QUOTED IDENTIFIER OFF then change the double-quotes to square-brackets: []
  • I assume that the Site column in both tables is just a plain' ol' data column, as such:

    • It is not a PRIMARY KEY member column.
    • It should not be used as a GROUP BY.
    • It should not be used in a JOIN predicate.
  • All of the approaches below assume this database state:

CREATE TABLE "Employee" (
    "Site"     int      NOT NULL,
    WorkTypeId char(2)  NOT NULL,
    Emp_NO     int      NOT NULL,
    "Date"     date     NOT NULL  
);

CREATE TABLE "PTO" (
    "Site"     int      NOT NULL,
    WorkTypeId char(2)      NULL,
    Emp_NO     int      NOT NULL,
    "Date"     date     NOT NULL  
);

GO

INSERT INTO "Employee" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, 'MB', 1005, '2022-02-01' ),
( 5015, 'MI', 1005, '2022-02-04' ),
( 5015, 'PO', 1005, '2022-02-04' ),
( 5015, 'ME', 2003, '2022-01-01' ),
( 5015, 'TT', 2003, '2022-01-10' );

INSERT INTO "PTO" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, NULL, 1005, '2022-02-03' ),
( 5015, NULL, 1005, '2022-02-14' ),
( 5014, NULL, 2003, '2022-01-09' );
  • Both approaches define CTEs e and p that extend Employee and PTO respectively to add computed "Year" and "Month" columns, which avoids having to repeatedly use YEAR( "Date" ) AS "Year" in GROUP BY and JOIN expressions.
    • I suggest you add those as computed-columns in your base tables, if you're able, as they'll be useful generally anyway. Don't forget to index them appropriately too.

Approach 1: Composed CTEs with elementary aggregates, then UPDATE:

WITH
-- Step 1: Extend both the `Employee` and `PTO` tables with YEAR and MONTH columns (this simplifies things later on):
e AS (
    SELECT
        Emp_No,
        "Site",
        WorkTypeId,
        "Date",

        YEAR( "Date" ) AS "Year",
        MONTH( "Date" ) AS "Month"
    FROM
        Employee
),
p AS (
    SELECT
        Emp_No,
        "Site",
        WorkTypeId,
        "Date",

        YEAR( "Date" ) AS "Year",
        MONTH( "Date" ) AS "Month"
    FROM
        PTO
),
-- Step 2: Get the MIN( "Date" ) value for each group:
minDatesForEachEmployeeMonthYearGroup AS (
    SELECT
        e.Emp_No,
        e."Year",
        e."Month",

        MIN( "Date" ) AS "FirstDate"
    FROM
        e
    GROUP BY
        e.Emp_No,
        e."Year",
        e."Month"
),
-- Step 3: INNER JOIN back on `e` to get the first WorkTypeId in each group:
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (
    /* WARNING: This query will fail if multiple rows (for the same Emp_NO, Year and Month) have the same "Date" value. This can be papered-over with GROUP BY and MIN, but I don't think that's a good idea at all). */
    SELECT
        e.Emp_No,
        e."Year",
        e."Month",

        e.WorkTypeId AS FirstWorkTypeId
    FROM
        e
        INNER JOIN minDatesForEachEmployeeMonthYearGroup AS q ON
            e.Emp_NO = q.Emp_NO
            AND
            e."Date" = q.FirstDate
)
-- Step 4: Do the UPDATE.
-- *Yes*, you can UPDATE a CTE (provided the CTE is "simple" and has a 1:1 mapping back to source rows on-disk).
UPDATE
    p
SET
    p.WorkTypeId = f.FirstWorkTypeId
FROM
    p
    INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
        p.Emp_No = f.Emp_No
        AND
        p."Year" = f."Year"
        AND
        p."Month" = f."Month"
WHERE
    p.WorkTypeId IS NULL;

Here's a screenshot of SSMS showing the contents of the PTO table from before, and after, the above query runs:

enter image description here

Approach 2: Skip the self-JOIN with FIRST_VALUE:

This approach gives a shorter, slightly simpler query, but requires SQL Server 2012 or later (and that your database is running in compatibility-level 110 or higher).

Surprisingly, you cannot use FIRST_VALUE in a GROUP BY query, despite its obvious similarities with MIN, but an equivalent query can be built with SELECT DISTINCT:

WITH
-- Step 1: Extend the `Employee` table with YEAR and MONTH columns:
e AS (
    SELECT
        Emp_No,
        "Site",
        WorkTypeId,
        "Date",

        YEAR( "Date" ) AS "Year",
        MONTH( "Date" ) AS "Month"
    FROM
        Employee
),
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (

    SELECT
        DISTINCT
        e.Emp_No,
        e."Year",
        e."Month",
        FIRST_VALUE( WorkTypeId ) OVER (
            PARTITION BY
                Emp_No,
                e."Year",
                e."Month"
            ORDER BY
                "Date" ASC
        ) AS FirstWorkTypeId
    FROM
        e
)
-- Step 3: UPDATE PTO:
UPDATE
    p
SET
    p.WorkTypeId = f.FirstWorkTypeId
FROM
    PTO AS p
    INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
        p.Emp_No = f.Emp_No
        AND
        YEAR( p."Date" ) = f."Year"
        AND
        MONTH( p."Date" ) = f."Month"
WHERE
    p.WorkTypeId IS NULL;

Doing a SELECT * FROM PTO after this runs gives me the exact same output as Approach 2.

Approach 2b, but made shorter:

Just so @SOS doesn't feel too smug about their SQL being considerably more shorter than mine ????, the Approach 2 SQL above can be compacted down to this:

WITH empYrMoGroups AS (
    SELECT
        DISTINCT
        e.Emp_No,
        YEAR( e."Date" ) AS "Year",
        MONTH( e."Date" ) AS "Month",
        FIRST_VALUE( e.WorkTypeId ) OVER (
            PARTITION BY
                e.Emp_No,
                YEAR( e."Date" ),
                MONTH( e."Date" )
            ORDER BY
                e."Date" ASC
        ) AS FirstWorkTypeId
    FROM
        Employee AS e
)
UPDATE
    p
SET
    p.WorkTypeId = f.FirstWorkTypeId
FROM
    PTO AS p
    INNER JOIN empYrMoGroups AS f ON
        p.Emp_No = f.Emp_No
        AND
        YEAR( p."Date" ) = f."Year"
        AND
        MONTH( p."Date" ) = f."Month"
WHERE
    p.WorkTypeId IS NULL;
  • The execution-plans for both Approach 2 and Approach 2b are almost identical, excepting that Approach 2b has an extra Computed Scalar step for some reason.
  • The execution plans for Approach 1 and Approach 2 are very different, however, with Approach 1 having more branches than Approach 2 despite their similar semantics.
  • But my execution-plans won't match yours because it's very context-dependent, especially w.r.t. what indexes and PKs you have, and if there's any other columns involved, etc.

Approach 1's plan looks like this:

enter image description here

Approach 2b's plan looks like this:

enter image description here

@SOS's plan, for comparison, is a lot simpler... and I honestly don't know why, but it does show how good SQL Server's query optimizer is thesedays:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文