为什么在涉及唯一约束的情况下 Postgres 处理 NULL 的方式不一致?

发布于 2024-11-04 06:56:50 字数 1697 浏览 5 评论 0原文

我最近注意到 Postgres 在具有唯一约束的列中处理 NULL 的方式不一致。

考虑一个人员表:

create table People (
   pid  int  not null,
   name text not null,
   SSN  text unique,
 primary key (pid)
);

SSN 列应保持唯一。我们可以检查:

-- Add a row.
insert into People(pid, name, SSN)
values(0, 'Bob', '123');

-- Test the unique constraint.
insert into People(pid, name, SSN)
values(1, 'Carol', '123');

第二次插入失败,因为它违反了 SSN 的唯一约束。到目前为止,一切都很好。但让我们尝试一下 NULL:

insert into People(pid, name, SSN)
values(1, 'Carol', null);

这有效。

select *    
from People;

0;"Bob";"123"
1;"Carol";"<NULL>"

唯一的列将采用空值。有趣的。 Postgres 如何断言 null 在任何方面都是唯一的,或者在这方面不是唯一的?

我想知道是否可以在唯一列中添加两行为空的行。

insert into People(pid, name, SSN)
values(2, 'Ted', null);

select *    
from People;

0;"Bob";"123"
1;"Carol";"<NULL>"
2;"Ted";"<NULL>"

是的,我可以。现在,尽管 SSN 应该是唯一的,但 SSN 列中有两行为 NULL。

Postgres 文档 说,为了唯一约束,空值不被视为相等。

好的。我明白这一点。这是空处理中的一个很好的微妙之处:通过考虑唯一约束列中的所有 NULL 是不相交的,我们延迟了唯一约束的强制执行,直到有一个实际的非空值作为强制执行的基础。

太酷了。但这就是 Postgres 让我迷失的地方。如果唯一约束列中的所有 NULL 都不相等,如文档所述,那么我们应该在 select unique 查询中看到所有空值。

select distinct SSN
from People;

"<NULL>"
"123"

没有。那里只有一个空值。 Postgres 似乎有这个错误。但我想知道:还有其他解释吗?


编辑:

Postgres 文档确实指定“在此比较中空值被视为相等”。在有关 SELECT DISTINCT 的部分。虽然我不理解这个概念,但我很高兴文档中对此进行了详细说明。

I recently noticed an inconsistency in how Postgres handles NULLs in columns with a unique constraint.

Consider a table of people:

create table People (
   pid  int  not null,
   name text not null,
   SSN  text unique,
 primary key (pid)
);

The SSN column should be kept unique. We can check that:

-- Add a row.
insert into People(pid, name, SSN)
values(0, 'Bob', '123');

-- Test the unique constraint.
insert into People(pid, name, SSN)
values(1, 'Carol', '123');

The second insert fails because it violates the unique constraint on SSN. So far, so good. But let's try a NULL:

insert into People(pid, name, SSN)
values(1, 'Carol', null);

That works.

select *    
from People;

0;"Bob";"123"
1;"Carol";"<NULL>"

A unique column will take a null. Interesting. How can Postgres assert that null is in any way unique, or not unique for that matter?

I wonder if I can add two rows with null in a unique column.

insert into People(pid, name, SSN)
values(2, 'Ted', null);

select *    
from People;

0;"Bob";"123"
1;"Carol";"<NULL>"
2;"Ted";"<NULL>"

Yes I can. Now there are two rows with NULL in the SSN column even though SSN is supposed to be unique.

The Postgres documentation says, For the purpose of a unique constraint, null values are not considered equal.

Okay. I can see the point of this. It's a nice subtlety in null-handling: By considering all NULLs in a unique-constrained column to be disjoint, we delay the unique constraint enforcement until there is an actual non-null value on which to base that enforcement.

That's pretty cool. But here's where Postgres loses me. If all NULLs in a unique-constrained column are not equal, as the documentation says, then we should see all of the nulls in a select distinct query.

select distinct SSN
from People;

"<NULL>"
"123"

Nope. There's only a single null there. It seems like Postgres has this wrong. But I wonder: Is there another explanation?


Edit:

The Postgres docs do specify that "Null values are considered equal in this comparison." in the section on SELECT DISTINCT. While I do not understand that notion, I'm glad it's spelled out in the docs.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

深海少女心 2024-11-11 06:56:50

处理 null 时,这样说几乎总是错误的:

“空值在这里表现得像某某,*所以它们应该表现得像
这里如此这般”

这里< /a> 是一篇从 postgres 角度讨论该主题的优秀文章,简要概括为根据上下文对空值进行不同的处理,并且不要错误地对它们做出任何假设。

It is almost always a mistake when dealing with null to say:

"nulls behave like so-and-so here, *so they should behave like
such-and-such here"

Here is an excellent essay on the subject from a postgres perspective. Briefly summed up by saying nulls are treated differently depending on the context and don't make the mistake of making any assumptions about them.

空袭的梦i 2024-11-11 06:56:50

底线是,PostgreSQL 对 null 执行的操作是因为 SQL 标准是这么说的。

空值显然很棘手,并且可以用多种方式解释(未知值、缺失值等),因此在最初编写 SQL 标准时,作者必须在某些地方进行一些调用。我想说时间已经证明它们或多或少是正确的,但这并不意味着不可能有另一种数据库语言可以稍微(或完全)不同地处理未知和不存在的值。但 PostgreSQL 实现了 SQL,所以就是这样。

正如在另一个答案中已经提到的那样,杰夫·戴维斯(Jeff Davis)写了一些关于处理空值的好文章和演示文稿。

The bottom line is, PostgreSQL does what it does with nulls because the SQL standard says so.

Nulls are obviously tricky and can be interpreted in multiple ways (unknown value, absent value, etc.), and so when the SQL standard was initially written, the authors had to make some calls at certain places. I'd say time has proved them more or less right, but that doesn't mean that there couldn't be another database language that handles unknown and absent values slightly (or wildly) differently. But PostgreSQL implements SQL, so that's that.

As was already mentioned in a different answer, Jeff Davis has written some good articles and presentations on dealing with nulls.

三生路 2024-11-11 06:56:50

NULL 被认为是唯一的,因为 NULL 并不表示值不存在。列中的 NULL 是未知值。当您比较两个未知数时,您不知道它们是否相等,因为您不知道它们是什么。

想象一下,你有两个盒子,分别标记为 A 和 B。如果你不打开盒子,也看不到里面的情况,你永远不知道里面装的是什么。如果你被问到“这两个盒子里的东西一样吗?”你只能回答“我不知道”。

在这种情况下,PostgreSQL 也会做同样的事情。当被要求比较两个 NULL 时,它会说“我不知道”。这与 SQL 数据库中 NULL 的疯狂语义有很大关系。 另一个答案中链接的文章是理解 NULL 行为的一个很好的起点。请注意:它因供应商而异。

NULL is considered to be unique because NULL doesn't represent the absence of a value. A NULL in a column is an unknown value. When you compare two unknowns, you don't know whether or not they are equal because you don't know what they are.

Imagine that you have two boxes marked A and B. If you don't open the boxes and you can't see inside, you never know what the contents are. If you're asked "Are the contents of these two boxes the same?" you can only answer "I don't know".

In this case, PostgreSQL will do the same thing. When asked to compare two NULLs, it says "I don't know." This has a lot to do with the crazy semantics around NULL in SQL databases. The article linked to in the other answer is an excellent starting point to understanding how NULLs behave. Just beware: it varies by vendor.

惜醉颜 2024-11-11 06:56:50

唯一索引中存在多个 NULL 值是可以的,因为 x = NULL 对于所有 x 都是 false,特别是当 x 本身为 NULL 时。您还会在 WHERE 子句中遇到这种行为,您必须说 WHERE x IS NULLWHERE x IS NOT NULL 而不是 WHERE x = NULL< /code> 和 WHERE x <>空。

Multiple NULL values in a unique index are okay because x = NULL is false for all x and, in particular, when x is itself NULL. You'll also run into this behavior in WHERE clauses where you have to say WHERE x IS NULL and WHERE x IS NOT NULL rather than WHERE x = NULL and WHERE x <> NULL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文