Row_number对于重复记录不是唯一的

发布于 2025-01-19 13:01:51 字数 719 浏览 4 评论 0原文

我正在尝试从每个UUID的一堆行中找到特定行的最新更新。为此，我们在分区上使用row_number（），如下所示，

"row_number"() OVER (
    PARTITION BY "uuid"
    ORDER BY "uuid" ASC,
        "status"."sequence" DESC,
        "modifiedon" DESC
) "row_id_ranked",

此后，除了'row_id_ranked = 1'以外的任何其他内容都被丢弃了。但是，当发生重试/重新传输时，“ status”。“序列”和“ modifiedon”将完全相同。这将使用row_id_ranked = 1创建utiple行。很难删除它们。根据我的理解，文档>应该是一个独特的价值。但是对于这些重复的情况，显然不是。

row_number（）→bigint＃
返回每行的唯一顺序数字，从一个开始，根据窗口分区中的行的顺序。

在这种情况下，我该如何脱颖而出？

原文

I am trying to find the latest update of a particular row from a bunch of rows per uuid.
For that we use row_number() over a partition as shown below,

"row_number"() OVER (
    PARTITION BY "uuid"
    ORDER BY "uuid" ASC,
        "status"."sequence" DESC,
        "modifiedon" DESC
) "row_id_ranked",

After this, anything other than 'row_id_ranked = 1' are discarded. But when a retry/re-transmission happens, "status"."sequence" and "modifiedon" will be exactly same. This creates mutiple rows with row_id_ranked=1. Making it difficult to remove them.
As per my understanding per the documentation, the row_number should be a unique value. But in case of these duplicates, it clearly is not.

row_number() → bigint#
Returns a unique, sequential number for each row, starting with one,
according to the ordering of rows within the window partition.

How do I de-duplicate in this case?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦行七里 2025-01-26 13:01:51

假设除了状态序列之外不存在第三列，并且在日期上进行修改以打破潜在的平局，并且还假设您不关心保留哪一条记录，则可以使用 RANDOM() 这里：

ROW_NUMBER() OVER (PARTITION BY uuid
                   ORDER BY "status"."sequence" DESC, modifiedon DESC, RANDOM())
    AS row_id_ranked

Assuming that there does not exist some third column beyond the status sequence and modified on date to break a potential tie, and also assuming you don't care which single record gets retained, you may use RANDOM() here:

ROW_NUMBER() OVER (PARTITION BY uuid
                   ORDER BY "status"."sequence" DESC, modifiedon DESC, RANDOM())
    AS row_id_ranked

回复收藏 0 原文

够运 2025-01-26 13:01:51

如果您在行号的“over”子句中使用“按 uuid 分区”，您将为每个不同的 uuid 获得第 1 行。也就是说，您已按 uuid 对数据进行分区，并且每个分区的编号将重新开始，如您发布的定义中所示。
我认为您只想使用 order by，它将返回 1 行号 1。

"row_number"() OVER (
    ORDER BY "uuid" ASC,
        "status"."sequence" DESC,
        "modifiedon" DESC
) "row_id_ranked",

If you use "Partition by uuid" in the "over" clause of row number you will get a row 1 for each distinct uuid. That is you have partitioned your data by uuid and the numbering will restart for each partition, as in the definition you have posted.
I think that you just want to use order by, which will return 1 row number 1.

"row_number"() OVER (
    ORDER BY "uuid" ASC,
        "status"."sequence" DESC,
        "modifiedon" DESC
) "row_id_ranked",

回复收藏 0 原文

~没有更多了~

关于作者

贱人配狗天长地久

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

Row_number对于重复记录不是唯一的

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

Row_number对于重复记录不是唯一的

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。