如何正确利用我的数据池的 Row_Number() (分区)
我们有下表(输出已经排序并分开以便于理解):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
什么是 ActionCode
?我们在c#
中使用它,它代表一个枚举值,
我想要实现什么?
好吧,我需要以下输出:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
好吧,实际的逻辑是什么? 我们有一些复合键的逻辑组(FK1 + FK2)。每个组都可以分为以 Create
或 CreateSystem
开头的分区。每个分区以 Create
、CreateSystem
或 Change
结尾。每个分区的 SomeAttributeValue
的实际值应该是分区最后一行的值。
不可能有以下数据池:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
然后期望 PK 7 影响 PK 2 或 PK 6 影响 PK 1。
我什至不知道如何/从哪里开始...我怎样才能实现这一点? 我们正在 mssql 2005+ 上运行
编辑:
有转储可用:
- instanceId:我的PK
- 租户Id:FK 1
- CampaignId:FK 2
- callId: FK 3
- refillCounter: FK 4
- TicketType: ActionCode (1 & 4 & 6 是
Create
, 5 是Change
,必须忽略 3) - ticketType、profileId、contactPersonId、ownerId、handlingStartTime、handlingEndTime、memo、callWasPreselected、creatorId、creationTS、changerId、changeTS 应取自
Create
(组中分区的第一行) - callingState、reasonId、followUpDate、callingAttempts 和 CallingAttemptsConsecutivelyNotReached 应从最后一个获取
创建
(这将是“one-line-partition-in-group”/与上面的相同)或Change
(组中分区的最后一行)
we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode
? we use this in c#
and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create
or CreateSystem
. each partition ends with Create
, CreateSystem
or Change
. The actual value of SomeAttributeValue
for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
- instanceId: my PK
- tenantId: FK 1
- campaignId: FK 2
- callId: FK 3
- refillCounter: FK 4
- ticketType: ActionCode (1 & 4 & 6 are
Create
, 5 isChange
, 3 must be ignored) - ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the
Create
(first line in partition in groups) - callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last
Create
(which then would be a "one-line-partition-in-group" / the same as the upper one) orChange
(last line in partition in groups)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我假设每个分区只能包含一个 Create 或 CreateSystem,否则您的需求定义不明确。以下未经测试,因为我没有示例表,也没有易于使用的格式的示例数据:(
请注意,我在这里使用各种保留名称)
基本逻辑是:分区 CTE 用于根据 FK1、FK2、包含的开始时间戳和排除的结束时间戳定义每个分区。它通过与基表的三重连接来实现这一点。来自
t2
的行被选择出现在t1
的行之后,然后来自t3
的行被选择出现在t1
的匹配行之间代码>t1和t2
。然后,在 WHERE 子句中,我们从结果集中排除t3
中发生匹配的任何行 - 结果是t1
中的行和t1
中的行>t2 表示两个相邻分区的开始。然后,第二个 CTE 从每个分区的
Table
中检索所有行,但根据CreationTS
在每个分区内分配一个ROW_NUMBER()
分数,按降序排序,结果是每个分区中的ROW_NUMBER()
1 是最后出现的一行。最后,在选择中,我们选择在各自分区中最后出现的那些行。
这一切都假设
CreationTS
值在每个分区内是不同的。如果这个假设不成立,我也可以使用 PK 重新处理它。I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from
t2
are selected to occur after the rows fromt1
, then the rows fromt3
are selected to occur between the matching rows fromt1
andt2
. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred fromt3
- the result being that the row fromt1
and the row fromt2
represent the start of two adjacent partitions.The second CTE then retrieves all rows from
Table
for each partition, but assigning aROW_NUMBER()
score within each partition, based on theCreationTS
, sorted descending, with the result thatROW_NUMBER()
1 within each partition is the last row to occur.Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that
CreationTS
values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.它可以通过递归 CTE 来求解。这里(假设分区内的行按 CreationTS 排序):
It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by
CreationTS
):