Count Distinct (pl sql) 返回空值
对这些查询中无意义的表/列名先行表示歉意。 如果您曾经使用过 Remedy 的数据库后端,您就会明白。
我遇到一个问题,当我怀疑实际值应该在 20 左右(我相信是 23)时,Count Distinct 返回空值。 下面是一系列查询及其返回值。
SELECT count(distinct t442.c1)
FROM t442, t658, t631
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
结果 = 497。
添加表 t649 并确保它有链接回表 t442 的记录:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
结果 = 263。
筛选出表 t649 中列 c536870939 <= 1 的记录:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
结果 = 24。
筛选 HAVING 语句:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
HAVING COUNT (DISTINCT t631.c536870922) =
COUNT (DISTINCT t649.c536870931)
结果 = null 。
如果我运行以下查询,我在结果列表中看不到任何可以解释为什么我没有获得任何类型的返回值的内容。 即使我从 SELECT 中删除 DISTINCT,情况也是如此。 (我分别返回 25 行和 4265 行数据)。
SELECT DISTINCT t442.c1, t631.c536870922, t649.c536870931
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
我在其他几个地方设置了与返回空值的查询完全相同的查询,并且它工作得很好——返回正确值的可用数字。 我必须假设在这种情况下任何独特的东西都与数据相关,而不是实际的查询,但我不确定在数据中寻找什么来解释它。 在聚合之前,我无法在原始数据中找到任何空值。 我不知道还有什么会导致这种情况。
任何帮助,将不胜感激。
Preemptive apologies for the nonsensical table/column names on these queries. If you've ever worked with the DB backend of Remedy, you'll understand.
I'm having a problem where a Count Distinct is returning a null value, when I suspect the actual value should be somewhere in the 20's (23, I believe). Below is a series of queries and their return values.
SELECT count(distinct t442.c1)
FROM t442, t658, t631
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
Result = 497.
Add table t649 and make sure it has records linked back to table t442:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
Result = 263.
Filter out records in table t649 where column c536870939 <= 1:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
Result = 24.
Filter on the HAVING statement:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
HAVING COUNT (DISTINCT t631.c536870922) =
COUNT (DISTINCT t649.c536870931)
Result = null.
If I run the following query, I can't see anything in the result list that would explain why I'm not getting any kind of return value. This is true even if I remove the DISTINCT from the SELECT. (I get 25 and 4265 rows of data back, respectively).
SELECT DISTINCT t442.c1, t631.c536870922, t649.c536870931
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
I have several other places where I have the query set up exactly like the one that is returning the null value and it work perfectly fine--returning usable numbers that are the correct values. I have to assume that whatever is unique in this situation is related to data and not the actual query, but I'm not sure what to look for in the data to explain it. I haven't been able to find any null values in the raw data before aggregation. I don't know what else would cause this.
Any help would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我现在知道了。 原始查询中的问题是使用 HAVING 子句而不使用 GROUP BY 子句是非常不寻常的(如果不是的话,实际上是错误的)。 答案在于查询各个部分执行的操作顺序。
在原始查询中,您执行以下操作:
数据库将执行连接和约束,此时它将执行任何分组依据和聚合操作。 在这种情况下,您没有进行分组,因此 COUNT 操作是针对整个数据集的。 根据您上面发布的值, COUNT(DISTINCT t631.c536870922) = 25 和 COUNT(DISTINCT t649.c536870931) = 24。 HAVING 子句现在被应用,导致没有任何匹配 - 您要求的情况是总数的计数set(即使有多个 c1)是相等的,但它们不是。 DISTINCT 应用于空结果集,但您什么也得不到。
您真正想要做的只是您在示例中发布的内容的一个版本,该版本会输出行数:
这将为您提供具有相同数量的 631 和 631 的 c1 列的列表。 649 个表条目。 注意:您应该非常小心在查询中使用 DISTINCT。 例如,在您发布上面结果的情况下,完全没有必要; 通常,它充当一种壁纸来覆盖查询中的错误,这些错误由于 WHERE 子句中缺少约束而无法按照您想要的方式返回结果(“嗯,我的查询返回所有这些值的重复值。嗯,一个DISTINCT 将解决这个问题”)。
I understand now. Your problem in the original query is that it is highly unusual (if not, in fact, wrong) to use a HAVING clause without a GROUP BY clause. The answer lies in the order of operation the various parts of the query are performed.
In the original query, you do this:
The database will perform your joins and constraints, at which point it would do any group by and aggregation operations. In this case, you are not grouping, so the COUNT operations are across the whole data set. Based on the values you posted above, COUNT(DISTINCT t631.c536870922) = 25 and COUNT(DISTINCT t649.c536870931) = 24. The HAVING clause now gets applied, resulting in nothing matching - your asking for cases where the count of the total set (even though there are multiple c1s) are equal, and they are not. The DISTINCT gets applied to an empty result set, and you get nothing.
What you really want to do is just a version of what you posted in the example that spit out the rows counts:
This will give you a list of the c1 columns that have equal numbers of the 631 & 649 table entries. Note: You should be very careful about the use of DISTINCT in your queries. For example, in the case where you posted the results above, it is completely unnecessary; oftentimes it acts as a kind of wallpaper to cover over errors in queries that don't return results the way you want due to a missed constraint in the WHERE clause ("Hmm, my query is returning dupes for all these values. Well, a DISTINCT will fix that problem").
结果是什么:
如果两列从来没有相等的值,那么添加 HAVING 子句将从结果集中消除所有行是有意义的。
What is the result of:
If the two columns there never have equal values, then it makes sense that adding the HAVING clause would eliminate all rows from the result set.
COUNT(DISTINCT column)
不计算NULL
值:可能是这种情况吗?
COUNT(DISTINCT column)
doesn't countNULL
values:Could it be the case?
我会尝试将 HAVING 子句条件放在 WHERE 子句中。 您选择 HAVING 有什么理由吗? 仅供参考,HAVING 是在返回结果集后完成的过滤器,这可能会导致意外结果。 它也不用于查询的优化。 如果您不必使用HAVING,我建议不要使用它。
我建议将计数添加到 SELECT 子句中,然后将它们加入到 WHERE 子句中。
I would try putting the HAVING clause conditions in the WHERE clause instead. Is there any reason you chose HAVING? Just FYI, HAVING is a filter that is done after the result set is returned which may cause unexpected results. Also it is not used in the optimization of the query. If you don't have to use HAVING I would suggest not using it.
I would suggest adding the counts to the SELECT clause then joining them in the WHERE clause.
如果我这样做:
我会看到应该计算的 23 行。 删除 HAVING 语句将返回 24 行,即不满足 HAVING 标准的额外行。
编辑:
查询结果,按照 Steve Broberg 的要求:
如果我包含 HAVING 子句,则第一行将被正确过滤掉。
If I do this:
I see the 23 rows that should be counted. Removing the HAVING statement returns 24 rows, the extra one which does not meet that HAVING criteria.
EDIT:
Results of the query, as requested per Steve Broberg:
The first row is filtered out properly if I include the HAVING clause.