Count Distinct (pl sql) 返回空值

发布于 2024-07-25 08:45:38 字数 3161 浏览 4 评论 0原文

对这些查询中无意义的表/列名先行表示歉意。如果您曾经使用过 Remedy 的数据库后端，您就会明白。

我遇到一个问题，当我怀疑实际值应该在 20 左右（我相信是 23）时，Count Distinct 返回空值。下面是一系列查询及其返回值。

SELECT count(distinct t442.c1)
      FROM t442, t658, t631
     WHERE t442.c1 = t658.c536870930
       AND t442.c200000003 = 'Network'
       AND t442.c536871139 < 2
       AND t631.c536870913 = t442.c1
       AND t658.c536870925 = 1
       AND (t442.c7 = 6 OR t442.c7 = 5)
       AND t442.c536870954 > 1141300800
       AND (t442.c240000010 = 0)

结果 = 497。

添加表 t649 并确保它有链接回表 t442 的记录：

 SELECT COUNT (DISTINCT t442.c1)
              FROM t442, t658, t631, t649
             WHERE t442.c1 = t658.c536870930
               AND t442.c200000003 = 'Network'
               AND t442.c536871139 < 2
               AND t631.c536870913 = t442.c1
               AND t658.c536870925 = 1
               AND (t442.c7 = 6 OR t442.c7 = 5)
               AND t442.c536870954 > 1141300800
               AND (t442.c240000010 = 0)
               AND t442.c1 = t649.c536870914

结果 = 263。

筛选出表 t649 中列 c536870939 <= 1 的记录：

SELECT COUNT (DISTINCT t442.c1)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

结果 = 24。

筛选 HAVING 语句：

SELECT COUNT (DISTINCT t442.c1)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
        HAVING COUNT (DISTINCT t631.c536870922) =
                                              COUNT (DISTINCT t649.c536870931)

结果 = null 。

如果我运行以下查询，我在结果列表中看不到任何可以解释为什么我没有获得任何类型的返回值的内容。即使我从 SELECT 中删除 DISTINCT，情况也是如此。（我分别返回 25 行和 4265 行数据）。

SELECT DISTINCT t442.c1, t631.c536870922, t649.c536870931
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

我在其他几个地方设置了与返回空值的查询完全相同的查询，并且它工作得很好——返回正确值的可用数字。我必须假设在这种情况下任何独特的东西都与数据相关，而不是实际的查询，但我不确定在数据中寻找什么来解释它。在聚合之前，我无法在原始数据中找到任何空值。我不知道还有什么会导致这种情况。

任何帮助，将不胜感激。

原文

Preemptive apologies for the nonsensical table/column names on these queries. If you've ever worked with the DB backend of Remedy, you'll understand.

I'm having a problem where a Count Distinct is returning a null value, when I suspect the actual value should be somewhere in the 20's (23, I believe). Below is a series of queries and their return values.

SELECT count(distinct t442.c1)
      FROM t442, t658, t631
     WHERE t442.c1 = t658.c536870930
       AND t442.c200000003 = 'Network'
       AND t442.c536871139 < 2
       AND t631.c536870913 = t442.c1
       AND t658.c536870925 = 1
       AND (t442.c7 = 6 OR t442.c7 = 5)
       AND t442.c536870954 > 1141300800
       AND (t442.c240000010 = 0)

Result = 497.

Add table t649 and make sure it has records linked back to table t442:

 SELECT COUNT (DISTINCT t442.c1)
              FROM t442, t658, t631, t649
             WHERE t442.c1 = t658.c536870930
               AND t442.c200000003 = 'Network'
               AND t442.c536871139 < 2
               AND t631.c536870913 = t442.c1
               AND t658.c536870925 = 1
               AND (t442.c7 = 6 OR t442.c7 = 5)
               AND t442.c536870954 > 1141300800
               AND (t442.c240000010 = 0)
               AND t442.c1 = t649.c536870914

Result = 263.

Filter out records in table t649 where column c536870939 <= 1:

SELECT COUNT (DISTINCT t442.c1)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

Result = 24.

Filter on the HAVING statement:

SELECT COUNT (DISTINCT t442.c1)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
        HAVING COUNT (DISTINCT t631.c536870922) =
                                              COUNT (DISTINCT t649.c536870931)

Result = null.

If I run the following query, I can't see anything in the result list that would explain why I'm not getting any kind of return value. This is true even if I remove the DISTINCT from the SELECT. (I get 25 and 4265 rows of data back, respectively).

SELECT DISTINCT t442.c1, t631.c536870922, t649.c536870931
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

I have several other places where I have the query set up exactly like the one that is returning the null value and it work perfectly fine--returning usable numbers that are the correct values. I have to assume that whatever is unique in this situation is related to data and not the actual query, but I'm not sure what to look for in the data to explain it. I haven't been able to find any null values in the raw data before aggregation. I don't know what else would cause this.

Any help would be appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

憧憬巴黎街头的黎明 2024-08-01 08:45:38

我现在知道了。原始查询中的问题是使用 HAVING 子句而不使用 GROUP BY 子句是非常不寻常的（如果不是的话，实际上是错误的）。答案在于查询各个部分执行的操作顺序。

在原始查询中，您执行以下操作：

SELECT COUNT(DISTINCT t442.c1)
  FROM ...
 WHERE ...
HAVING COUNT(DISTINCT t631.c536870922) = COUNT(DISTINCT t649.c536870931);

数据库将执行连接和约束，此时它将执行任何分组依据和聚合操作。在这种情况下，您没有进行分组，因此 COUNT 操作是针对整个数据集的。根据您上面发布的值， COUNT(DISTINCT t631.c536870922) = 25 和 COUNT(DISTINCT t649.c536870931) = 24。 HAVING 子句现在被应用，导致没有任何匹配 - 您要求的情况是总数的计数set（即使有多个 c1）是相等的，但它们不是。 DISTINCT 应用于空结果集，但您什么也得不到。

您真正想要做的只是您在示例中发布的内容的一个版本，该版本会输出行数：

SELECT count(*)
  FROM (SELECT t442.c1     
          FROM t442
             , t658
             , t631
             , t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (   t442.c7 = 6
                OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
         GROUP BY t442.c1
        HAVING COUNT(DISTINCT t631.c536870922) = COUNT(DISTINCT t649.c536870931)
       );

这将为您提供具有相同数量的 631 和 631 的 c1 列的列表。 649 个表条目。注意：您应该非常小心在查询中使用 DISTINCT。例如，在您发布上面结果的情况下，完全没有必要；通常，它充当一种壁纸来覆盖查询中的错误，这些错误由于 WHERE 子句中缺少约束而无法按照您想要的方式返回结果（“嗯，我的查询返回所有这些值的重复值。嗯，一个DISTINCT 将解决这个问题”）。

I understand now. Your problem in the original query is that it is highly unusual (if not, in fact, wrong) to use a HAVING clause without a GROUP BY clause. The answer lies in the order of operation the various parts of the query are performed.

In the original query, you do this:

SELECT COUNT(DISTINCT t442.c1)
  FROM ...
 WHERE ...
HAVING COUNT(DISTINCT t631.c536870922) = COUNT(DISTINCT t649.c536870931);

The database will perform your joins and constraints, at which point it would do any group by and aggregation operations. In this case, you are not grouping, so the COUNT operations are across the whole data set. Based on the values you posted above, COUNT(DISTINCT t631.c536870922) = 25 and COUNT(DISTINCT t649.c536870931) = 24. The HAVING clause now gets applied, resulting in nothing matching - your asking for cases where the count of the total set (even though there are multiple c1s) are equal, and they are not. The DISTINCT gets applied to an empty result set, and you get nothing.

What you really want to do is just a version of what you posted in the example that spit out the rows counts:

SELECT count(*)
  FROM (SELECT t442.c1     
          FROM t442
             , t658
             , t631
             , t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (   t442.c7 = 6
                OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
         GROUP BY t442.c1
        HAVING COUNT(DISTINCT t631.c536870922) = COUNT(DISTINCT t649.c536870931)
       );

This will give you a list of the c1 columns that have equal numbers of the 631 & 649 table entries. Note: You should be very careful about the use of DISTINCT in your queries. For example, in the case where you posted the results above, it is completely unnecessary; oftentimes it acts as a kind of wallpaper to cover over errors in queries that don't return results the way you want due to a missed constraint in the WHERE clause ("Hmm, my query is returning dupes for all these values. Well, a DISTINCT will fix that problem").

回复收藏 0 原文

神也荒唐 2024-08-01 08:45:38

结果是什么：

SELECT COUNT (DISTINCT t631.c536870922),
       COUNT (DISTINCT t649.c536870931)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

如果两列从来没有相等的值，那么添加 HAVING 子句将从结果集中消除所有行是有意义的。

What is the result of:

SELECT COUNT (DISTINCT t631.c536870922),
       COUNT (DISTINCT t649.c536870931)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1

If the two columns there never have equal values, then it makes sense that adding the HAVING clause would eliminate all rows from the result set.

回复收藏 0 原文

傲影 2024-08-01 08:45:38

COUNT(DISTINCT column) 不计算 NULL 值：

SELECT  COUNT(DISTINCT val1)
FROM    (
        SELECT  NULL AS val1
        FROM    dual
        )

---
0

可能是这种情况吗？

COUNT(DISTINCT column) doesn't count NULL values:

SELECT  COUNT(DISTINCT val1)
FROM    (
        SELECT  NULL AS val1
        FROM    dual
        )

---
0

Could it be the case?

回复收藏 0 原文

打小就很酷 2024-08-01 08:45:38

我会尝试将 HAVING 子句条件放在 WHERE 子句中。您选择 HAVING 有什么理由吗？仅供参考，HAVING 是在返回结果集后完成的过滤器，这可能会导致意外结果。它也不用于查询的优化。如果您不必使用HAVING，我建议不要使用它。

我建议将计数添加到 SELECT 子句中，然后将它们加入到 WHERE 子句中。

回复收藏 0 原文

宁愿没拥抱 2024-08-01 08:45:38

如果我这样做：

SELECT distinct t442.c1, count(distinct t631.c536870922), 
    count (distinct t649.c536870931)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
           group by t442.c1
           having count(distinct t631.c536870922)= 
                         count (distinct t649.c536870931)

我会看到应该计算的 23 行。删除 HAVING 语句将返回 24 行，即不满足 HAVING 标准的额外行。

编辑：
查询结果，按照 Steve Broberg 的要求：

row | t442.c1         | cnt t631 | cnt 649
-------------------------------------------
1   | CHG000000230378 |    2     |    1
2   | CHG000000230846 |    1     |    1
3   | CHG000000232562 |    1     |    1
4   | CHG000000232955 |    1     |    1
5   | CHG000000232956 |    1     |    1
6   | CHG000000232958 |    1     |    1
7   | CHG000000233027 |    1     |    1
8   | CHG000000233933 |    1     |    1
9   | CHG000000233934 |    1     |    1
10  | CHG000000233997 |    1     |    1
11  | CHG000000233998 |    1     |    1
12  | CHG000000233999 |    1     |    1
13  | CHG000000234001 |    1     |    1
14  | CHG000000234005 |    1     |    1
15  | CHG000000234009 |    1     |    1
16  | CHG000000234012 |    1     |    1
17  | CHG000000234693 |    1     |    1
18  | CHG000000234696 |    1     |    1
19  | CHG000000234730 |    1     |    1
20  | CHG000000234839 |    1     |    1
21  | CHG000000235115 |    1     |    1
22  | CHG000000235224 |    1     |    1
23  | CHG000000235488 |    1     |    1
24  | CHG000000235847 |    1     |    1

如果我包含 HAVING 子句，则第一行将被正确过滤掉。

If I do this:

SELECT distinct t442.c1, count(distinct t631.c536870922), 
    count (distinct t649.c536870931)
          FROM t442, t658, t631, t649
         WHERE t442.c1 = t658.c536870930
           AND t442.c200000003 = 'Network'
           AND t442.c536871139 < 2
           AND t631.c536870913 = t442.c1
           AND t658.c536870925 = 1
           AND (t442.c7 = 6 OR t442.c7 = 5)
           AND t442.c536870954 > 1141300800
           AND (t442.c240000010 = 0)
           AND t442.c1 = t649.c536870914
           AND t649.c536870939 > 1
           group by t442.c1
           having count(distinct t631.c536870922)= 
                         count (distinct t649.c536870931)

I see the 23 rows that should be counted. Removing the HAVING statement returns 24 rows, the extra one which does not meet that HAVING criteria.

EDIT:
Results of the query, as requested per Steve Broberg:

row | t442.c1         | cnt t631 | cnt 649
-------------------------------------------
1   | CHG000000230378 |    2     |    1
2   | CHG000000230846 |    1     |    1
3   | CHG000000232562 |    1     |    1
4   | CHG000000232955 |    1     |    1
5   | CHG000000232956 |    1     |    1
6   | CHG000000232958 |    1     |    1
7   | CHG000000233027 |    1     |    1
8   | CHG000000233933 |    1     |    1
9   | CHG000000233934 |    1     |    1
10  | CHG000000233997 |    1     |    1
11  | CHG000000233998 |    1     |    1
12  | CHG000000233999 |    1     |    1
13  | CHG000000234001 |    1     |    1
14  | CHG000000234005 |    1     |    1
15  | CHG000000234009 |    1     |    1
16  | CHG000000234012 |    1     |    1
17  | CHG000000234693 |    1     |    1
18  | CHG000000234696 |    1     |    1
19  | CHG000000234730 |    1     |    1
20  | CHG000000234839 |    1     |    1
21  | CHG000000235115 |    1     |    1
22  | CHG000000235224 |    1     |    1
23  | CHG000000235488 |    1     |    1
24  | CHG000000235847 |    1     |    1

The first row is filtered out properly if I include the HAVING clause.

回复收藏 0 原文

~没有更多了~