对多列进行 DISTINCT 计数

发布于 2024-08-05 21:33:29 字数 251 浏览 2 评论 0原文

有没有更好的方法来执行这样的查询:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

我需要计算此表中不同项目的数量,但不同项目超过两列。

我的查询工作正常,但我想知道是否可以仅使用一个查询(不使用子查询)获得最终结果

Is there a better way of doing a query like this:

SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
      FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(20

硪扪都還晓 2024-08-12 21:33:29

如果您想提高性能,可以尝试在两列的哈希值或串联值上创建持久计算列。

一旦它被持久化,只要该列是确定性的并且您使用“健全的”数据库设置,就可以对其进行索引和/或可以在其上创建统计信息。

我相信计算列的不同计数将相当于您的查询。

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

长不大的小祸害 2024-08-12 21:33:29

编辑:从不太可靠的仅校验和查询更改
我发现了一种方法(在 SQL Server 2005 中)对我来说非常有效,并且我可以根据需要使用尽可能多的列(通过将它们添加到 CHECKSUM() 函数)。 REVERSE() 函数将 int 转换为 varchar 以使不同的值更可靠

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

Edit: Altered from the less-than-reliable checksum-only query
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems
-小熊_ 2024-08-12 21:33:29

要作为单个查询运行,请连接列,然后获取连接字符串实例的不同计数。

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

在 MySQL 中,您可以执行相同的操作,而无需连接步骤,如下所示:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

MySQL 文档中提到了此功能:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

In MySQL you can do the same thing without the concatenation step as follows:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

This feature is mentioned in the MySQL documentation:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

小ぇ时光︴ 2024-08-12 21:33:29

您不喜欢现有查询的哪些方面?如果您担心两列之间的 DISTINCT 不只返回唯一的排列,为什么不尝试一下呢?

它确实可以像您在 Oracle 中所期望的那样工作。

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

编辑

我在分析方面走入了死胡同,但答案是令人沮丧的明显...

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

编辑2

鉴于以下数据,上面提供的连接解决方​​案将计算错误:

col1  col2
----  ----
A     AA
AA    A

所以我们包括分隔符...

select col1 + '*' + col2 from t23
/

显然,所选分隔符必须是一个字符或一组字符,并且永远不能出现在任一列中。

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

edit

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

edit 2

Given the following data the concatenating solution provided above will miscount:

col1  col2
----  ----
A     AA
AA    A

So we to include a separator...

select col1 + '*' + col2 from t23
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

深爱不及久伴 2024-08-12 21:33:29

一些 SQL 数据库可以使用元组表达式,因此您可以这样做:

SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
      FROM DocumentOutputItems;

如果您的数据库不支持此功能,则可以根据 @oncel-umut-turer 的 CHECKSUM 或其他提供良好唯一性的标量函数的建议进行模拟,例如
COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))

MySQL 特别支持 COUNT(DISTINCT expr, expr, ...) 这是非 SQL 标准语法。它还指出在标准 SQL 中,您必须对 COUNT(DISTINCT ...) 内的所有表达式进行串联。

元组的相关用途是执行 IN 查询,例如作为:

SELECT * FROM DocumentOutputItems
WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

Some SQL databases can work with a tuple expression so you can just do:

SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
      FROM DocumentOutputItems;

If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.
COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).

MySQL specifically supports COUNT(DISTINCT expr, expr, ...) which is non-SQL standard syntax. It also notes In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).

A related use of tuples is performing IN queries such as:

SELECT * FROM DocumentOutputItems
WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));
忆沫 2024-08-12 21:33:29

怎么样:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

可能只是做和你已经做的一样的事情,但它避免了 DISTINCT。

How about something like:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

Probably just does the same as you are already though but it avoids the DISTINCT.

清风疏影 2024-08-12 21:33:29

这是一个没有子选择的较短版本:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

它在 MySQL 中运行良好,我认为优化器可以更轻松地理解这个版本。

编辑:显然我误读了 MSSQL 和 MySQL - 对此感到抱歉,但也许它无论如何都有帮助。

Here's a shorter version without the subselect:

SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems

It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.

Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.

哥,最终变帅啦 2024-08-12 21:33:29

我已经使用了这种方法并且它对我很有效。

SELECT COUNT(DISTINCT DocumentID || DocumentSessionId) 
FROM  DocumentOutputItems

对于我的情况,它提供了正确的结果。

I have used this approach and it has worked for me.

SELECT COUNT(DISTINCT DocumentID || DocumentSessionId) 
FROM  DocumentOutputItems

For my case, it provides correct result.

離人涙 2024-08-12 21:33:29

如果您使用固定长度的数据类型,则可以转换为binary 来非常轻松且快速地完成此操作。假设 DocumentIdDocumentSessionId 都是 int,因此长度为 4 个字节...

SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems

我的具体问题要求我划分 SUM 由各种外键和日期字段的不同组合的 COUNT 得出,按另一个外键分组,偶尔按某些值或键进行过滤。表非常大,使用子查询会大大增加查询时间。由于其复杂性,统计根本不是一个可行的选择。 CHECKSUM 解决方案的转换速度也太慢,特别是由于数据类型不同,我不能冒其不可靠性的风险。

然而,使用上述解决方案几乎没有增加查询时间(与仅使用 SUM 相比),并且应该是完全可靠的!它应该能够帮助处于类似情况的其他人,所以我将其发布在这里。

If you're working with datatypes of fixed length, you can cast to binary to do this very easily and very quickly. Assuming DocumentId and DocumentSessionId are both ints, and are therefore 4 bytes long...

SELECT COUNT(DISTINCT CAST(DocumentId as binary(4)) + CAST(DocumentSessionId as binary(4)))
FROM DocumentOutputItems

My specific problem required me to divide a SUM by the COUNT of the distinct combination of various foreign keys and a date field, grouping by another foreign key and occasionally filtering by certain values or keys. The table is very large, and using a sub-query dramatically increased the query time. And due to the complexity, statistics simply wasn't a viable option. The CHECKSUM solution was also far too slow in its conversion, particularly as a result of the various data types, and I couldn't risk its unreliability.

However, using the above solution had virtually no increase on the query time (comparing with using simply the SUM), and should be completely reliable! It should be able to help others in a similar situation so I'm posting it here.

乱了心跳 2024-08-12 21:33:29

您的查询没有任何问题,但您也可以这样做:

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery

There's nothing wrong with your query, but you could also do it this way:

WITH internalQuery (Amount)
AS
(
    SELECT (0)
      FROM DocumentOutputItems
  GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
  FROM internalQuery
断肠人 2024-08-12 21:33:29

怎么样,

Select DocumentId, DocumentSessionId, count(*) as c 
from DocumentOutputItems 
group by DocumentId, DocumentSessionId;

这将为我们提供 DocumentId 和 DocumentSessionId 所有可能组合的计数

How about this,

Select DocumentId, DocumentSessionId, count(*) as c 
from DocumentOutputItems 
group by DocumentId, DocumentSessionId;

This will get us the count of all possible combinations of DocumentId, and DocumentSessionId

-残月青衣踏尘吟 2024-08-12 21:33:29

如果只有一个字段为“DISTINCT”,则可以使用:

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

并且返回与原始查询计划相同的查询计划,如使用 SET SHOWPLAN_ALL ON 测试的那样。但是,您正在使用两个字段,因此您可以尝试一些疯狂的事情,例如:

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

但如果涉及 NULL,您将会遇到问题。我只是坚持原来的查询。

if you had only one field to "DISTINCT", you could use:

SELECT COUNT(DISTINCT DocumentId) 
FROM DocumentOutputItems

and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:

    SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems

but you'll have issues if NULLs are involved. I'd just stick with the original query.

千仐 2024-08-12 21:33:29

我希望 MS SQL 也可以执行类似 COUNT(DISTINCT A, B) 的操作。但它不能。

起初,JayTee 的答案对我来说似乎是一个解决方案,但经过一些测试 CHECKSUM() 未能创建唯一值。一个简单的例子是,CHECKSUM(31,467,519) 和 CHECKSUM(69,1120,823) 给出了相同的答案,即 55。

然后我做了一些研究,发现 Microsoft 不建议使用 CHECKSUM 进行更改检测。在一些论坛中,有人建议使用

SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))

,但这也令人不安。

您可以按照 TSQL CHECKSUM 难题中的建议使用 HASHBYTES() 函数。然而,这也有很小的机会不返回唯一的结果。

我建议使用

SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems

I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.

At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.

Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using

SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))

but this is also not conforting.

You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.

I would suggest using

SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems
白云悠悠 2024-08-12 21:33:29

希望这能起作用,我正在 prima vista 上写

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId

Hope this works i am writing on prima vista

SELECT COUNT(*) 
FROM DocumentOutputItems 
GROUP BY DocumentId, DocumentSessionId
脱离于你 2024-08-12 21:33:29

我在 Google 搜索自己的问题时发现了这一点,发现如果计算 DISTINCT 对象,则会得到返回的正确数字(我正在使用 MySQL)

SELECT COUNT(DISTINCT DocumentID) AS Count1, 
  COUNT(DISTINCT DocumentSessionId) AS Count2
  FROM DocumentOutputItems

I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)

SELECT COUNT(DISTINCT DocumentID) AS Count1, 
  COUNT(DISTINCT DocumentSessionId) AS Count2
  FROM DocumentOutputItems
記憶穿過時間隧道 2024-08-12 21:33:29

这对我有用。在 Oracle 中:

SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;

在 jpql 中:

SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;

It works for me. In oracle:

SELECT SUM(DECODE(COUNT(*),1,1,1))
FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;

In jpql:

SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;
∝单色的世界 2024-08-12 21:33:29

我有一个类似的问题,但我的查询是一个子查询,其中包含主查询中的比较数据。就像:

Select code, id, title, name 
(select count(distinct col1) from mytable where code = a.code and length(title) >0)
from mytable a
group by code, id, title, name
--needs distinct over col2 as well as col1

忽略这个的复杂性,我意识到我无法使用原始问题中描述的双子查询将 a.code 的值获取到子查询中

Select count(1) from (select distinct col1, col2 from mytable where code = a.code...)
--this doesn't work because the sub-query doesn't know what "a" is

所以最终我发现我可以作弊,并组合列

Select count(distinct(col1 || col2)) from mytable where code = a.code...

:是最终起作用的

I had a similar question but the query I had was a sub-query with the comparison data in the main query. something like:

Select code, id, title, name 
(select count(distinct col1) from mytable where code = a.code and length(title) >0)
from mytable a
group by code, id, title, name
--needs distinct over col2 as well as col1

ignoring the complexities of this, I realized I couldn't get the value of a.code into the subquery with the double sub query described in the original question

Select count(1) from (select distinct col1, col2 from mytable where code = a.code...)
--this doesn't work because the sub-query doesn't know what "a" is

So eventually I figured out I could cheat, and combine the columns:

Select count(distinct(col1 || col2)) from mytable where code = a.code...

This is what ended up working

ペ泪落弦音 2024-08-12 21:33:29

此查询帮助我(在 MSSQL 中)找到具有多个列的唯一计数。

SELECT COUNT(DISTINCT concat_column) AS unique_count
FROM (
  SELECT CONCAT(column1, '|', column2, '|', column3) AS concat_column
  FROM your_table_name_here
) AS subquery;

This query helps me (in MSSQL) find the unique count with more than one column.

SELECT COUNT(DISTINCT concat_column) AS unique_count
FROM (
  SELECT CONCAT(column1, '|', column2, '|', column3) AS concat_column
  FROM your_table_name_here
) AS subquery;
·深蓝 2024-08-12 21:33:29

此代码在 2 个参数上使用不同的参数,并提供特定于这些不同值行计数的行数计数。它在 MySQL 中对我来说就像一个魅力。

select DISTINCT DocumentId as i,  DocumentSessionId as s , count(*) 
from DocumentOutputItems   
group by i ,s;

This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in MySQL like a charm.

select DISTINCT DocumentId as i,  DocumentSessionId as s , count(*) 
from DocumentOutputItems   
group by i ,s;
春庭雪 2024-08-12 21:33:29

您只需使用计数功能两次即可。

在这种情况下,它将是:

SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
FROM DocumentOutputItems

You can just use the Count Function Twice.

In this case, it would be:

SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
FROM DocumentOutputItems
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文