如何正确利用我的数据池的 Row_Number() （分区）

发布于 2024-11-14 10:43:03 字数 3143 浏览 3 评论 0原文

我们有下表（输出已经排序并分开以便于理解）：

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
----------------------------------------------------------------------------
|  3 | 100 | 500 |       Change | 2011-01-01 02:00:00 |                  Z |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------
|  4 | 100 | 510 |       Create | 2011-01-01 00:30:00 |                  T |
----------------------------------------------------------------------------
|  5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 |                  A |
----------------------------------------------------------------------------

什么是 ActionCode？我们在c#中使用它，它代表一个枚举值，

我想要实现什么？

好吧，我需要以下输出：

| FK1 | FK2 |   ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 |       Create |                  H |
| 100 | 500 |       Create |                  Z |
| 100 | 510 |       Create |                  T |
| 100 | 520 | CreateSystem |                  A |
-------------------------------------------------

好吧，实际的逻辑是什么？我们有一些复合键的逻辑组（FK1 + FK2）。每个组都可以分为以 Create 或 CreateSystem 开头的分区。每个分区以 Create、CreateSystem 或 Change 结尾。每个分区的 SomeAttributeValue 的实际值应该是分区最后一行的值。

不可能有以下数据池：

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  7 | 100 | 500 |       Change | 2011-01-02 02:00:00 |                  Z |
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------

然后期望 PK 7 影响 PK 2 或 PK 6 影响 PK 1。

我什至不知道如何/从哪里开始...我怎样才能实现这一点？我们正在 mssql 2005+ 上运行

编辑：
有转储可用：

instanceId：我的PK
租户Id：FK 1
CampaignId：FK 2
callId: FK 3
refillCounter: FK 4
TicketType: ActionCode (1 & 4 & 6 是 Create, 5 是Change，必须忽略 3）
ticketType、profileId、contactPersonId、ownerId、handlingStartTime、handlingEndTime、memo、callWasPreselected、creatorId、creationTS、changerId、changeTS 应取自 Create （组中分区的第一行）
callingState、reasonId、followUpDate、callingAttempts 和 CallingAttemptsConsecutivelyNotReached 应从最后一个获取创建（这将是“one-line-partition-in-group”/与上面的相同）或Change（组中分区的最后一行）

原文

we have following table (output is already ordered and separated for understanding):

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
----------------------------------------------------------------------------
|  3 | 100 | 500 |       Change | 2011-01-01 02:00:00 |                  Z |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------
|  4 | 100 | 510 |       Create | 2011-01-01 00:30:00 |                  T |
----------------------------------------------------------------------------
|  5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 |                  A |
----------------------------------------------------------------------------

what is ActionCode? we use this in c# and there it represents an enum-value

what do i want to achieve?

well, i need the following output:

| FK1 | FK2 |   ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 |       Create |                  H |
| 100 | 500 |       Create |                  Z |
| 100 | 510 |       Create |                  T |
| 100 | 520 | CreateSystem |                  A |
-------------------------------------------------

well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.

it is not possible to have following datapool:

| PK | FK1 | FK2 |   ActionCode |         CreationTS  | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
|  7 | 100 | 500 |       Change | 2011-01-02 02:00:00 |                  Z |
|  6 | 100 | 500 |       Create | 2011-01-02 00:00:00 |                  H |
|  2 | 100 | 500 |       Change | 2011-01-01 01:00:00 |                  X |
|  1 | 100 | 500 |       Create | 2011-01-01 00:00:00 |                  Y |
----------------------------------------------------------------------------

and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.

i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+

EDIT:
there's a dump available:

instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别理我 2024-11-21 10:43:03

我假设每个分区只能包含一个 Create 或 CreateSystem，否则您的需求定义不明确。以下未经测试，因为我没有示例表，也没有易于使用的格式的示例数据：（

;With Partitions as (
     Select
         t1.FK1,
         t1.FK2,
         t1.CreationTS as StartTS,
         t2.CreationTS as EndTS
     From
         Table t1
             left join
         Table t2
             on
                  t1.FK1 = t2.FK1 and
                  t1.FK2 = t2.FK2 and
                  t1.CreationTS < t2.CreationTS and
                  t2.ActionCode in ('Create','CreateSystem')
             left join
         Table t3
             on
                  t1.FK1 = t3.FK1 and
                  t1.FK2 = t3.FK2 and
                  t1.CreationTS < t3.CreationTS and
                  t3.CreationTS < t2.CreationTS and
                  t3.ActionCode in ('Create','CreateSystem')
       where
           t1.ActionCode in ('Create','CreateSystem') and
           t3.FK1 is null
), PartitionRows as (
     SELECT
         t1.FK1,
         t1.FK2,
         t1.ActionCode,
         t2.SomeAttributeValue,
         ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
     from
         Partitions t1
             inner join
         Table t2
             on
                t1.FK1 = t2.FK1 and
                t1.FK2 = t2.FK2 and
                t1.StartTS <= t2.CreationTS and
                (t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1

请注意，我在这里使用各种保留名称）

基本逻辑是：分区 CTE 用于根据 FK1、FK2、包含的开始时间戳和排除的结束时间戳定义每个分区。它通过与基表的三重连接来实现这一点。来自 t2 的行被选择出现在 t1 的行之后，然后来自 t3 的行被选择出现在 t1 的匹配行之间代码>t1和t2。然后，在 WHERE 子句中，我们从结果集中排除 t3 中发生匹配的任何行 - 结果是 t1 中的行和 t1 中的行>t2 表示两个相邻分区的开始。

然后，第二个 CTE 从每个分区的 Table 中检索所有行，但根据 CreationTS 在每个分区内分配一个 ROW_NUMBER() 分数，按降序排序，结果是每个分区中的 ROW_NUMBER() 1 是最后出现的一行。

最后，在选择中，我们选择在各自分区中最后出现的那些行。

这一切都假设 CreationTS 值在每个分区内是不同的。如果这个假设不成立，我也可以使用 PK 重新处理它。

I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:

;With Partitions as (
     Select
         t1.FK1,
         t1.FK2,
         t1.CreationTS as StartTS,
         t2.CreationTS as EndTS
     From
         Table t1
             left join
         Table t2
             on
                  t1.FK1 = t2.FK1 and
                  t1.FK2 = t2.FK2 and
                  t1.CreationTS < t2.CreationTS and
                  t2.ActionCode in ('Create','CreateSystem')
             left join
         Table t3
             on
                  t1.FK1 = t3.FK1 and
                  t1.FK2 = t3.FK2 and
                  t1.CreationTS < t3.CreationTS and
                  t3.CreationTS < t2.CreationTS and
                  t3.ActionCode in ('Create','CreateSystem')
       where
           t1.ActionCode in ('Create','CreateSystem') and
           t3.FK1 is null
), PartitionRows as (
     SELECT
         t1.FK1,
         t1.FK2,
         t1.ActionCode,
         t2.SomeAttributeValue,
         ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
     from
         Partitions t1
             inner join
         Table t2
             on
                t1.FK1 = t2.FK1 and
                t1.FK2 = t2.FK2 and
                t1.StartTS <= t2.CreationTS and
                (t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1

(Please note than I'm using all kinds of reserved names here)

The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.

The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.

Finally, within the select, we choose those rows that occur last within their respective partitions.

This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.

回复收藏 0 原文

海螺姑娘 2024-11-21 10:43:03

它可以通过递归 CTE 来求解。这里（假设分区内的行按 CreationTS 排序）：

WITH partitioned AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
  FROM data
),
subgroups AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup = 1,
    Subrank  = 1
  FROM partitioned
  WHERE rn = 1
  UNION ALL
  SELECT
    p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
    Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
    Subrank  = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
  FROM partitioned p
    INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
      AND p.rn = s.rn + 1
),
finalranks AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup, Subrank,
    rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
    /* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
  FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):

WITH partitioned AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
  FROM data
),
subgroups AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup = 1,
    Subrank  = 1
  FROM partitioned
  WHERE rn = 1
  UNION ALL
  SELECT
    p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
    Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
    Subrank  = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
  FROM partitioned p
    INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
      AND p.rn = s.rn + 1
),
finalranks AS (
  SELECT
    PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
    Subgroup, Subrank,
    rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
    /* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
  FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

回复收藏 0 原文

~没有更多了~