为什么我需要“OR NULL”在 MySQL 中计算带条件的行数时

发布于 2024-10-17 17:21:41 字数 1144 浏览 5 评论 0原文

有一个关于 MySQL 的 COUNT() 聚合函数的问题时不时地出现在我的脑海中。我想得到一些解释为什么它会这样工作。

当我开始使用 MySQL 时,我很快了解到它的 COUNT(condition) 似乎只有在条件最后还包含 OR NULL 时才能正常工作。如果出现更复杂的 COUNT 条件,则需要根据经验确定将其准确放置在何处。在MSSQL中,你不需要这个OR NULL来获得正确的结果,所以我想知道它的解释。所以,这是一个例子。

让我们有一个非常基本的表,具有以下结构和数据:

CREATE TABLE test (
  `value` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO test (value) VALUES(1);
INSERT INTO test (value) VALUES(4);
INSERT INTO test (value) VALUES(5);
INSERT INTO test (value) VALUES(6);
INSERT INTO test (value) VALUES(4);
INSERT INTO test (value) VALUES(4);
INSERT INTO test (value) VALUES(5);
INSERT INTO test (value) VALUES(2);
INSERT INTO test (value) VALUES(8);
INSERT INTO test (value) VALUES(1);

场景:我想计算值 = 4 的行数。一个明显的解决方案是使用 WHERE 过滤它并执行 COUNT(*)但我对基于 COUNT(条件) 的解决方案感兴趣。

所以,我想到的解决方案是:

SELECT COUNT(value=4) 
  FROM test

结果是10。这显然是错误的。

第二次尝试使用 OR NULL:

SELECT COUNT(value=4 OR NULL) 
  FROM test

结果是 3。这是正确的。

有人可以解释这背后的逻辑吗?这是 MySQL 中的一些错误吗?还是有一个逻辑解释为什么我需要将那个看起来奇怪的 OR NULL 添加到 COUNT 条件的末尾才能获得正确的结果?

There is a question about MySQL's COUNT() aggregate function that keeps popping into my head time to time. I would like to get some explanation to why it is working the way it is.

When I started working with MySQL I quickly learned that its COUNT(condition) seems only to work properly if condition also contains an OR NULL in the end. In case of more complicated COUNT conditions it was an empirical process to find out where to put it exactly. In MSSQL you do not need this OR NULL to get proper results, so I would like to know the explanation for it. So, here is an example.

Lets have a very basic table with the following structure and data:

CREATE TABLE test (
  `value` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO test (value) VALUES(1);
INSERT INTO test (value) VALUES(4);
INSERT INTO test (value) VALUES(5);
INSERT INTO test (value) VALUES(6);
INSERT INTO test (value) VALUES(4);
INSERT INTO test (value) VALUES(4);
INSERT INTO test (value) VALUES(5);
INSERT INTO test (value) VALUES(2);
INSERT INTO test (value) VALUES(8);
INSERT INTO test (value) VALUES(1);

Scenario: I would like to count how many rows I have where the value = 4. An obvious solution would be to filter for it using a WHERE and do a COUNT(*) but I am interested in a COUNT(condition) based solution.

So, the solution that comes to my mind is:

SELECT COUNT(value=4) 
  FROM test

The result is 10. This is obviously wrong.

Second attempt with OR NULL:

SELECT COUNT(value=4 OR NULL) 
  FROM test

The result is 3. It is correct.

Can someone explain the logic behind this? Is this some bug in MySQL or is there a logical explanation why I need to add that strange-looking OR NULL to the end of the COUNT condition to get the correct result?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

罪#恶を代价 2024-10-24 17:21:41

这应该会显示所有

SELECT 4=4, 3=4, 1 or null, 0 or null

输出

1   |   0   |   1   |   NULL

事实

  1. COUNT 将计算结果为 NOT NULL 的列/表达式相加。只要不为空,任何值都会加 1。例外是 COUNT(DISTINCT),仅当尚未计数时它才会递增。

  2. 当单独使用 BOOLEAN 表达式时,它返回 1 或 0。

  3. 当布尔值是 与 NULL 进行“或”运算,仅当它为 0(假)时才为 NULL

给其他人

是的,如果计数是唯一需要的列,则可以使用 WHERE value=4 但如果它是一个想要计算 4 的查询 < em>以及检索其他计数/聚合,则过滤器不起作用。另一种选择是 SUM(value=4),例如

SELECT sum(value=4)
  FROM test

This should reveal all

SELECT 4=4, 3=4, 1 or null, 0 or null

Output

1   |   0   |   1   |   NULL

Facts

  1. COUNT adds up the columns / expressions that evaluate to NOT NULL. Anything will increment by 1, as long as it is not null. Exception is COUNT(DISTINCT) where it increments only if it is not already counted.

  2. When a BOOLEAN expression is used on its own, it returns either 1 or 0.

  3. When a boolean is OR-ed with NULL, it is NULL only when it is 0 (false)

To others

Yes if the count is the ONLY column desired, one could use WHERE value=4 but if it is a query that wants to count the 4's as well as retrieving other counts/aggregates, then the filter doesn't work. An alternative would have been SUM(value=4), e.g.

SELECT sum(value=4)
  FROM test
红墙和绿瓦 2024-10-24 17:21:41

COUNT() 函数接受一个参数,该参数被视为 NULLNOT NULL。如果它是 NOT NULL - 那么它会增加该值,并且不会执行任何其他操作。

在您的情况下,表达式 value=4TRUEFALSE,显然都是 truefalse< /code> 不为空,这就是为什么你得到 10。

但我对基于 COUNT(condition) 的解决方案感兴趣。

基于 count 的解决方案总是会更慢(慢得多),因为它会导致表全扫描和每个值的迭代比较。

COUNT() function accepts an argument, that is treated as NULL or NOT NULL. If it is NOT NULL - then it increments the value, and doesn't do anything otherwise.

In your case expression value=4 is either TRUE or FALSE, obviously both true and false are not null, that is why you get 10.

but I am interested in a COUNT(condition) based solution.

The count-based solution will be always slower (much slower), because it will cause table fullscan and iterative comparison of each value.

梦幻的味道 2024-10-24 17:21:41

COUNT(表达式) 计算表达式不为 NULL 的行数。仅当 value 为 NULL 时,表达式 value=4 才为 NULL,否则为 TRUE (1) 或 FALSE (0),两者均被计数。

1 = 4         | FALSE
4 = 4         | TRUE
1 = 4 OR NULL | NULL
4 = 4 OR NULL | TRUE

您可以使用 SUM 来代替:

SELECT SUM(value=4) FROM test

这在您的特定示例中并不是特别有用,但如果您想使用单个表扫描来计算满足多个不同谓词的行(例如在以下查询中),那么它可能会很有用:

SELECT
    SUM(a>b) AS foo,
    SUM(b>c) AS bar,
    COUNT(*) AS total_rows
FROM test

COUNT(expression) counts the number of rows for which the expression is not NULL. The expression value=4 is only NULL if value is NULL, otherwise it is either TRUE (1) or FALSE (0), both of which are counted.

1 = 4         | FALSE
4 = 4         | TRUE
1 = 4 OR NULL | NULL
4 = 4 OR NULL | TRUE

You could use SUM instead:

SELECT SUM(value=4) FROM test

This is not particularly useful in your specific example but it can be useful if you want to count rows satisfying multiple different predicates using a single table scan such as in the following query:

SELECT
    SUM(a>b) AS foo,
    SUM(b>c) AS bar,
    COUNT(*) AS total_rows
FROM test
与往事干杯 2024-10-24 17:21:41

我建议您会发现更标准的语法在不同的数据库引擎之间移动得更好,并且总是给出正确的结果。

 select count(*)
 from test
 where value = 4

您使用的语法是 Mysql 变体吗?

I would suggest that you will find the more standard syntax moves better between different database engines and will always give the correct result.

 select count(*)
 from test
 where value = 4

Is the syntax you used a Mysql variant?

时光无声 2024-10-24 17:21:41

这是因为 COUNT(表达式) 对 VALUES 进行计数。在SQL理论中,NULL是一个STATE,而不是一个VALUE,因此它不被计算在内。 NULL 是一种状态,表示字段的值未知。

现在,当您写入“value=4”时,其计算结果为布尔值 TRUE 或 FALSE。由于 TRUE 和 FALSE 都是 VALUES,因此结果是 10。

当您添加“OR NULL”时,实际上有“TRUE OR NULL”和“FALSE OR NULL”。现在,“TRUE OR NULL”的计算结果为 TRUE,而“FALSE OR NULL”的计算结果为 NULL。因此结果是 3,因为只有 3 个值(和 7 个 NULL 状态)。

It's because COUNT(expression) counts VALUES. In SQL theory, NULL is a STATE, not a VALUE and thus is it not counted. NULL is a state that means that field's value is unknown.

Now, when you write "value=4" this evaluates to boolean TRUE or FALSE. Since both TRUE and FALSE are VALUES, the result is 10.

When you add "OR NULL", you actually have "TRUE OR NULL" and "FALSE OR NULL". Now, "TRUE OR NULL" evaluates to TRUE, while "FALSE OR NULL" evaluates to NULL. Thus the result is 3, because you only have 3 values (and seven NULL states).

↘紸啶 2024-10-24 17:21:41

这是我测试后的直观图片:

在此处输入图像描述

Here is a intuitive picture after I test it:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文