如何统计SQL中所有组合出现的次数?

发布于 2024-10-01 09:24:21 字数 591 浏览 4 评论 0原文

是否有任何选项可以在不使用临时表或过程的情况下获得一个 SQL 查询中所有元素的计数组合?

考虑以下三个表:

  • products (id, Product_name)

  • transactions (id, date)

  • transaction_has_product (id, Product_id, transaction_id)

示例数据

  • 产品

    <前><代码>1 AAA 2 血脑屏障 3CCC
  • 交易

    <前><代码>1 some_date 2 某个日期
  • transaction_has_products

    <前><代码>1 1 1 2 2 1 3 3 1 4 1 2 5 2 2

结果应该是:

AAA, BBB = 2   
AAA, CCC = 1   
BBB, CCC = 1   
AAA, BBB, CCC = 1

Is there any option to gain counted combinations of all elements in one SQL query, without using temp tables or procedures?

Consider these three tables:

  • products (id, product_name)

  • transactions (id, date)

  • transaction_has_product (id, product_id, transaction_id)

Sample data

  • products

    1   AAA
    2   BBB
    3   CCC
    
  • transactions

    1   some_date
    2   some_date
    
  • transaction_has_products

    1   1   1
    2   2   1
    3   3   1
    4   1   2
    5   2   2
    

Result should be:

AAA, BBB = 2   
AAA, CCC = 1   
BBB, CCC = 1   
AAA, BBB, CCC = 1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦途 2024-10-08 09:24:21

这并不容易,因为与其他行相比,最后一行中的匹配产品数量不同。您也许可以使用某种 GROUP_CONCAT() 运算符(在 MySQL 中可用;可在其他 DBMS 中实现,例如 Informix 和可能的 PostgreSQL)来完成此操作,但我对此没有信心。

成对匹配

SELECT p1.product_name AS name1, p2.product_name AS name2, COUNT(*)
  FROM (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p1
  JOIN (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p2
    ON p1.transaction_id = p2.transaction_id
   AND p1.product_name   < p2.product_name
 GROUP BY p1.name, p2.name;

处理三重匹配并非易事;进一步扩展它肯定是相当困难的。

Not easily because you have different numbers of matched products in the last row compared with the other rows. You might be able to do it with some sort of GROUP_CONCAT() operator (available in MySQL; implementable in other DBMS, such as Informix and probably PostgreSQL), but I'm not confident of that.

Pairwise Matching

SELECT p1.product_name AS name1, p2.product_name AS name2, COUNT(*)
  FROM (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p1
  JOIN (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p2
    ON p1.transaction_id = p2.transaction_id
   AND p1.product_name   < p2.product_name
 GROUP BY p1.name, p2.name;

Handling the triple match is non-trivial; extending it further than that is definitely rather difficult.

╄→承喏 2024-10-08 09:24:21

如果您预先知道所有产品是什么,您可以通过像这样旋转数据来做到这一点。

如果您事先不知道产品是什么,您可以在存储过程中动态构建此查询。如果产品数量很大,这两种方法的实用性都会崩溃,但我认为无论如何实现这一要求,这都可能是正确的。

select
    product_combination, 
    case product_combination
        when 'AAA, BBB' then aaa_bbb
        when 'AAA, CCC' then aaa_ccc
        when 'BBB, CCC' then bbb_ccc
        when 'AAA, BBB, CCC' then aaa_bbb_ccc
    end as number_of_transactions
from
(
    select 'AAA, BBB' as product_combination union all
    select 'AAA, CCC' union all
    select 'BBB, CCC' union all
    select 'AAA, BBB, CCC'
) as combination_list
cross join
(
    select
        sum(case when aaa = 1 and bbb = 1 then 1 else 0 end) as aaa_bbb,
        sum(case when aaa = 1 and ccc = 1 then 1 else 0 end) as aaa_ccc,
        sum(case when bbb = 1 and ccc = 1 then 1 else 0 end) as bbb_ccc,
        sum(case when aaa = 1 and bbb = 1 and ccc = 1 then 1 else 0 end) as aaa_bbb_ccc
    from
    (
        select
            count(case when a.product_name = 'AAA' then 1 else null end) as aaa,
            count(case when a.product_name = 'BBB' then 1 else null end) as bbb,
            count(case when a.product_name = 'CCC' then 1 else null end) as ccc,
            b.transaction_id
        from
            products a
        inner join
            transaction_has_products b
        on
            a.id = b.product_id
        group by
            b.transaction_id
    ) as product_matrix
) as combination_counts

结果:

product_combination  number_of_transactions
AAA, BBB             2
AAA, CCC             1
BBB, CCC             1
AAA, BBB, CCC        1

If you know what all the products will be up front, you could do it by pivoting the data like this.

If you don't know what the products will be up front, you could build this query dynamically in a stored procedure. The practicality of either approach would break down if the number of products is large, but I think that would probably be true no matter how this requirement is accomplished.

select
    product_combination, 
    case product_combination
        when 'AAA, BBB' then aaa_bbb
        when 'AAA, CCC' then aaa_ccc
        when 'BBB, CCC' then bbb_ccc
        when 'AAA, BBB, CCC' then aaa_bbb_ccc
    end as number_of_transactions
from
(
    select 'AAA, BBB' as product_combination union all
    select 'AAA, CCC' union all
    select 'BBB, CCC' union all
    select 'AAA, BBB, CCC'
) as combination_list
cross join
(
    select
        sum(case when aaa = 1 and bbb = 1 then 1 else 0 end) as aaa_bbb,
        sum(case when aaa = 1 and ccc = 1 then 1 else 0 end) as aaa_ccc,
        sum(case when bbb = 1 and ccc = 1 then 1 else 0 end) as bbb_ccc,
        sum(case when aaa = 1 and bbb = 1 and ccc = 1 then 1 else 0 end) as aaa_bbb_ccc
    from
    (
        select
            count(case when a.product_name = 'AAA' then 1 else null end) as aaa,
            count(case when a.product_name = 'BBB' then 1 else null end) as bbb,
            count(case when a.product_name = 'CCC' then 1 else null end) as ccc,
            b.transaction_id
        from
            products a
        inner join
            transaction_has_products b
        on
            a.id = b.product_id
        group by
            b.transaction_id
    ) as product_matrix
) as combination_counts

Result:

product_combination  number_of_transactions
AAA, BBB             2
AAA, CCC             1
BBB, CCC             1
AAA, BBB, CCC        1

再见回来 2024-10-08 09:24:21

根据您对查询的控制程度(对于 postgresql,TSQL 可能需要更改)

SELECT COUNT(*) FROM transactions t WHERE
(
     SELECT COUNT(DISTINCT tp.product) 
     FROM transaction_has_products tp 
     WHERE tp.[transaction_id] = t.id and tp.product IN (1, 2, 3)
) = 3

,其中 (1,2,3) 是您要检查的 ID 列表, = 3 等于列表中的条目数量。

Depending on how much control you have on the query you could do (this is TSQL might have to change for postgresql)

SELECT COUNT(*) FROM transactions t WHERE
(
     SELECT COUNT(DISTINCT tp.product) 
     FROM transaction_has_products tp 
     WHERE tp.[transaction_id] = t.id and tp.product IN (1, 2, 3)
) = 3

where (1,2,3) is the list of ID's you want to check for and the = 3 equals the amount of entries in the list.

何必那么矫情 2024-10-08 09:24:21
  1. 生成所有可能的组合。我用这个来支持自己:https://stackoverflow.com/a/9135162/2244766(这有点棘手,我不完全理解逻辑...但它有效!)
  2. 创建一个子查询,将 products_in_transactions 聚合到每个 transaction_id 的产品数组中
  3. 使用数组包含运算符将它们连接起来

经过上述步骤后,您可能会得到类似的结果:

with all_combis as (
    with RECURSIVE y1 as (
            with x1 as (
                --select id from products
                select distinct product_id as a from transaction_has_products 
            )
            select array[a] as b ,a as c ,1 as d 
            from x1
            union all
            select b||a,a,d+1
            from x1
            join y1 on (a < c)
    )
    select *
    from y1
)
, grouped_transactions as (
  SELECT 
    array_agg(product_id) as products
  FROM transaction_has_products
  GROUP BY transaction_id
)
SELECT all_combis.b, count(*)
from all_combis
left JOIN grouped_transactions ON grouped_transactions.products @> all_combis.b 
--WHERE array_upper(b, 1) > 1 -- or whatever
GROUP BY all_combis.b
order by array_upper(b, 1) desc, count(*) desc

您可以加入您的products 表来用产品名称替换产品 id - 但我想你会从这里得到它。
这是小提琴(sqlfiddle 今天过得很糟糕 - 所以请在您的db,以防它抛出一些奇怪的错误,例如超时或类似的错误)

GL,HF:D

  1. generete all possible combinations. I supported myself with this: https://stackoverflow.com/a/9135162/2244766 (it's a little tricky, I don't fully understand the logic... but it works!)
  2. make a subquery where you aggregate products_in_transactions into arrays of products per transaction_id
  3. Join both of them using array inclusion operators

After above steps, you could get something like:

with all_combis as (
    with RECURSIVE y1 as (
            with x1 as (
                --select id from products
                select distinct product_id as a from transaction_has_products 
            )
            select array[a] as b ,a as c ,1 as d 
            from x1
            union all
            select b||a,a,d+1
            from x1
            join y1 on (a < c)
    )
    select *
    from y1
)
, grouped_transactions as (
  SELECT 
    array_agg(product_id) as products
  FROM transaction_has_products
  GROUP BY transaction_id
)
SELECT all_combis.b, count(*)
from all_combis
left JOIN grouped_transactions ON grouped_transactions.products @> all_combis.b 
--WHERE array_upper(b, 1) > 1 -- or whatever
GROUP BY all_combis.b
order by array_upper(b, 1) desc, count(*) desc

You can join your products table to replace product ids with their names - but I gues you'll get it from here.
here's the fiddle (sqlfiddle is having a bad day today - so check this out on your db in case it throws some weird error like timeout or something like that)

GL, HF :D

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文