如何从雪花中的数据中删除重复物?
我有此数据,必须删除以黄色突出显示的重复数据。我试图使用不同的分组,但似乎不起作用。
即使评论和价值有所不同,在这种情况下,具有CrimsStandard的价值的任何值都被认为是父母的孩子,因此被认为是重复的,因为前6个字段是相同的。在这种情况下,如何删除重复记录?
CLIENT;CLIENT_ID;QTY_TYPE;QUANTITY;AMOUNT;TRANTYPE;COMMENT;VALUE;ORDER_ID;DEV_COMM;AMT_COMM;ID
TEST_IMPL8;P-AGNIndPlusFd;A;;140000;Sell;INS;CRIMSSTANDARD;10902601;;;
TEST_IMPL8;P-AGNIndPlusFd;Q;2898;;Buy;INS;CRIMSSTANDARD;10902701;;;
TEST_IMPL8;P-AGNIndPlusFd;A;;140000;Sell;INS;CRIMSSTANDARD;10902601;;;
TEST_IMPL8;P-DepVz;A;;82000;Sell;GEN,TransferOrder;10902568;Y;0;
TEST_IMPL8;P-DepVz;A;;82000;Sell;INS;CRIMSSTANDARD;10902568;;;
TEST_IMPL8;P-DivBnd2Vv;A;;4862.09;Buy;GEN;OHPEN_14613296BD001571;10902668;;;14613296BD001571
TEST_IMPL8;P-DivBnd2Vv;A;;4862.09;Buy;INS;CRIMSSTANDARD;10902668;;;
TEST_IMPL8;P-Dyn4Vz;A;;13000;Buy;INS;CRIMSSTANDARD;10877286;;;
TEST_IMPL8;P-EmuCoInsBVv;Q;524.6892;;Sell;GEN;OHPEN_14613296BD001565;10677375;;;14613296BD001565
TEST_IMPL8;P-EmuCoInsBVv;Q;524.6892;;Sell;INS;CRIMSSTANDARD;10677375;;;
TEST_IMPL8;P-VNAMijBel;Q;10236;;Sell;INS;CRIMSSTANDARD;11402183;;;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
第一次通过,您需要摆脱彼此重复的行(这将在数据中完全相同的数据中解决第1和第3行问题)。使用此处概述的方法:
接下来,尝试消除前6列的重复项,并在评论列中删除CrimsStandard指定的子记录:
First pass, you'll need to get rid of rows that are entire duplicates of each other (this will fix the row 1 and row 3 issue in your data where the rows are exactly the same). Use the approach outlined here: How to delete Duplicate records in snowflake database table
Next, try this to eliminate duplicates of first 6 columns, and deleting the child record designated by CRIMSSTANDARD in the COMMENT column:
雪花上没有在上没有区别,但是您可以使用
限定
:参见此处查看更多详细信息
编辑:
要保持具有
dev_comm
,AMT_COMM
和ID
non NON NON的行,您可以修改您的订单条件:您也可以拥有不同的 订单条件:优先级通过中的
顺序订购字段。例如,如果您有多个具有
下按第一个元素订购。dev_comm
列的行,但是优先考虑具有ID
的行,则需要放置iff(ID ...
首先在There is no
distinct on
in Snowflake, but you can have a similar result usingqualify
:See here to see more details
EDIT:
To keep the rows that have the
dev_comm
,amt_comm
andID
non null in priority you can modify your order condition:You can also have different priorities by ordering the fields in the
order by
. For example in case you have multiple lines that have thedev_comm
columns but you want in priority the line that have theID
, you need to put theiff(ID ...
first in yourorder by
. It order by the first element, then if there is several lines with the same first element it order by the second, etc...