如何计算重复行?

发布于 2024-12-15 16:06:47 字数 464 浏览 1 评论 0原文

我希望能够从包含两个整数值的元组创建直方图。

这是查询:

 SELECT temp.ad_id, temp.distance  as hits FROM ( 
 'UNION ALL .join(cupound_query)' # python
) as temp GROUP BY temp.ad_id,temp.distance 

对于此输入:

(51, 5)
(51, 0)
(51, 3)
(51, 0)
(88, 2)
(88, 2)
(88, 2)
(84, 1)
(81, 9)

将是:

(88,2) : 3
(51,0) : 2
(51,3) : 1
(51,5) : 1
(84,1) : 1
(81,9) : 1

如何创建这些值的直方图?
换句话说,我如何计算一行中有多少次重复?

I want to be able to create a histogram out of a tuple containing two integers values.

Here it is the query:

 SELECT temp.ad_id, temp.distance  as hits FROM ( 
 'UNION ALL .join(cupound_query)' # python
) as temp GROUP BY temp.ad_id,temp.distance 

For this input:

(51, 5)
(51, 0)
(51, 3)
(51, 0)
(88, 2)
(88, 2)
(88, 2)
(84, 1)
(81, 9)

Would be:

(88,2) : 3
(51,0) : 2
(51,3) : 1
(51,5) : 1
(84,1) : 1
(81,9) : 1

How can I create a histogram of those values?
In other words, how can I count how many times a row has a duplicate?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

半寸时光 2024-12-22 16:06:48

这个问题留下了解释的空间。此测试用例显示 2 个嵌套步骤:

CREATE TABLE tbl (ad_id int, distance int);
INSERT INTO tbl VALUES
  (510, 0), (956, 3), (823, 3), (880, 2)
, (523, 3), (467, 0), (843, 1), (816, 9)
, (533, 4), (721, 7), (288, 3), (900, 3)
, (526, 9), (750, 7), (302, 8), (463, 6)
, (742, 8), (804, 2), (62,  7), (880, 2)
, (523, 3), (467, 0), (843, 1), (816, 9)
, (533, 4), (721, 7), (288, 3), (900, 3)
, (526, 9), (750, 7), (302, 8), (816, 9)
, (533, 4), (721, 7), (288, 3), (900, 3)
, (533, 4), (721, 7), (288, 3), (396, 5)
;

每个值有多少个重复项?

SELECT ad_id, count(*) AS ct FROM tbl GROUP BY 1;

结果:

ad_id  | ct
-------+----
62     | 1
288    | 4
302    | 2
396    | 1
...

读取:ad_id 62 存在 1x,ad_id 288 存在 4x,...

如何计算行有重复项的次数?

SELECT ct, count(*) AS ct_ct
FROM  (SELECT ad_id, count(*) AS ct FROM tbl GROUP BY 1) sub
GROUP  BY 1
ORDER  BY 1;

结果:

 ct | ct_ct
----+-------
1   | 8
2   | 7
3   | 2
4   | 3

读取:出现 8 次“ad_id 是唯一的”,出现 7 次“2 行具有相同 ad_id”,...

>db>>小提琴此处

The question leaves room for interpretation. This test case shows 2 nested steps:

CREATE TABLE tbl (ad_id int, distance int);
INSERT INTO tbl VALUES
  (510, 0), (956, 3), (823, 3), (880, 2)
, (523, 3), (467, 0), (843, 1), (816, 9)
, (533, 4), (721, 7), (288, 3), (900, 3)
, (526, 9), (750, 7), (302, 8), (463, 6)
, (742, 8), (804, 2), (62,  7), (880, 2)
, (523, 3), (467, 0), (843, 1), (816, 9)
, (533, 4), (721, 7), (288, 3), (900, 3)
, (526, 9), (750, 7), (302, 8), (816, 9)
, (533, 4), (721, 7), (288, 3), (900, 3)
, (533, 4), (721, 7), (288, 3), (396, 5)
;

How many duplicates per value?

SELECT ad_id, count(*) AS ct FROM tbl GROUP BY 1;

Result:

ad_id  | ct
-------+----
62     | 1
288    | 4
302    | 2
396    | 1
...

Read: ad_id 62 exists 1x, ad_id 288 exists 4x, ...

How to count how many times rows have duplicates?

SELECT ct, count(*) AS ct_ct
FROM  (SELECT ad_id, count(*) AS ct FROM tbl GROUP BY 1) sub
GROUP  BY 1
ORDER  BY 1;

Result:

 ct | ct_ct
----+-------
1   | 8
2   | 7
3   | 2
4   | 3

Read: 8 occurrences of "ad_id is unique", 7 occurrences of "2 rows with same ad_id", ...

db<>fiddle here

脱离于你 2024-12-22 16:06:48

只需将 count(*) 添加到您的选择中即可:

SELECT temp.ad_id, temp.distance as hits, count(*)
....

Just add count(*) to your select:

SELECT temp.ad_id, temp.distance as hits, count(*)
....
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文