Snowflake 查询中不会引发唯一键约束错误
在 Postgres 中,当我们尝试使用相同的唯一键组合重新插入记录时,会引发唯一键约束错误。
但在 Snowflake 中,不会抛出错误并且允许重复(即使在查询中添加 ENFORCED 关键字之后)
create table DUMMY( ONE integer autoincrement start 1 increment 1, TWO integer NOT NULL, THREE varchar(7) NOT NULL, FOUR varchar(7) NOT NULL )
ALTER TABLE DUMMY ADD CONSTRAINT UNIQUENESS UNIQUE(TWO, THREE) ENFORCED;
In Postgres, Unique key contraint error is thrown when we try to re-insert record with same combination of unique key.
But in Snowflake, error is not thrown and duplicates are allowed (even after adding ENFORCED keyword in query)
create table DUMMY( ONE integer autoincrement start 1 increment 1, TWO integer NOT NULL, THREE varchar(7) NOT NULL, FOUR varchar(7) NOT NULL )
ALTER TABLE DUMMY ADD CONSTRAINT UNIQUENESS UNIQUE(TWO, THREE) ENFORCED;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Snowflake 自称为数据库。但事实并非如此。
https://docs.snowflake.com/en/sql-reference/constraints -概述.html
Snowflake calls itself a database. But it isn't.
https://docs.snowflake.com/en/sql-reference/constraints-overview.html
完全个人意见:
是免费的午餐。你告诉数据库,我要把东西放进你里面,每次我这样做时,都为我运行这些检查。
当您扩展到非常大的表时,此类检查会降低性能,原因有两个。数据库应如何最佳地执行。如果它执行的相同检查超出了它的需要怎么办?在其他数据库中,您可以关闭检查,并且经常这样做,以获得较大转换操作的性能。
在 Snowflake 中你可以/必须管理它。有些人认为这是诅咒,我认为这是祝福。您可以在“方式”上对数据进行重复数据删除,然后直接使用它。
DML 的原因是它允许现有脚本“工作”。但如果现有流程依赖于异常,则存在不利的一面。但公平地说,如果您正在加载 1B 行,如何正确处理 3 个重复项。
有很多方法可以对数据进行重复数据删除,根据我们的经验,我们发现对于大量数据类型,我们使用专用表根据数据形状和时间/流程的知识来对输入数据进行重复数据删除。重复发生的地方。
Completely personal opinion:
is a free lunch. You are telling the database, I am going to put stuff in you, and every time I do, run these check for me.
When you scale to really large tables, such checks kill performance, for two reasons. How should the DB optimally do the enforcing. What if it does the same check more than it needs too. In other databases you can turn off the checks and you often do, to get performance on larger transform operations.
Where-as in Snowflake you can/have to manage it. Some people think of this as a curse, I think of it as blessing. You can de-dup your data on the "way in" and then just use it.
Reasons for the DML are it's allows existing scripts to "work". But has the down side if the existing processes rely of exceptions. But to be fair, if you are loading 1B rows, how do you correctly handle 3 dups.
There many ways to de-dup your data, and in our experience we found for large amounts of data types, we used dedicated tables to de-dup our input data based on knowledge of the shape of the data, and the flow of when/where duplicates happened.