获取每个类别的前 10 名产品

发布于 2024-10-16 20:00:56 字数 1642 浏览 10 评论 0原文

我有一个类似这样的查询

SELECT 
t.category, 
tc.product, 
tc.sub-product,
 count(*) as sales 
 FROM tg t, ttc tc
 WHERE t.value = tc.value
 GROUP BY t.category, tc.product, tc.sub-product;

现在在我的查询中我想获得每个类别的前 10 个产品（按销售额排名最高），对于每个类别我需要前 5 个子类别（按销售额排名最高）

您可以假设问题陈述为像这样：

获取每个类别的销售额排名前 10 的产品，以及每个产品获取销售额排名前 5 的子产品。

这里的类别可以是书籍
产品可以是 Harry Porter 图书
子产品可以是 HarryPorter 系列 5

示例输入数据格式

category |product |subproduct |Sales [count (*)]

abc   test1    test11     120

abc   test1    test11     100

abc   test1    test11     10

abc   test1    test11     10

abc   test1    test11     10

abc   test1    test11     10

abc   test1    test12     10

abc   test1    test13     8

abc   test1    test14     6

abc   test1    test15     5

abc   test2    test21     80

abc   test2    test22     60

abc   test3    test31     50

abc   test3    test32     40

abc   test4    test41     30

abc   test4    test42     20

abc   test5    test51     10

abc   test5    test52     5 

abc   test6    test61     5 

|

|

|

bcd   test2    test22     10 

xyz   test3    test31     5 

xyz   test3    test32     3 

xyz   test4    test41     2

输出将是“

top 5 rf for (abc) -> abc,test1(289) abc,test2 (140), abc test3 (90), abc test4(50) , abc test5 (15)

top 5 rfm for (abc,test1) -> test11(260),test12(10),test13(8),test14(6),test15(5) and so on

我的查询失败，因为结果真的很大。我正在阅读有关 Oracle 分析函数（如排名）的内容。有人可以帮我修改吗？” 也可以工作。

使用分析函数的任何其他方法 .com/node/55。但是无法获得正确的 sql 查询。

任何帮助将不胜感激。我在这个问题上被困了 2 天:(

原文

I have a query which is something like this

SELECT 
t.category, 
tc.product, 
tc.sub-product,
 count(*) as sales 
 FROM tg t, ttc tc
 WHERE t.value = tc.value
 GROUP BY t.category, tc.product, tc.sub-product;

Now in my query I want to get top 10 products for every category (top by sales ) and for every category I need top 5 sub category (top by sales)

You can assume the problem statement as something like this :

Get top 10 products for each category by sales and for each product get top 5 sub-products by sales .

Here category can be Books
Product can be Harry Porter book
sub productcan be HarryPorter series 5

Sample input data format

category |product |subproduct |Sales [count (*)]

abc   test1    test11     120

abc   test1    test11     100

abc   test1    test11     10

abc   test1    test11     10

abc   test1    test11     10

abc   test1    test11     10

abc   test1    test12     10

abc   test1    test13     8

abc   test1    test14     6

abc   test1    test15     5

abc   test2    test21     80

abc   test2    test22     60

abc   test3    test31     50

abc   test3    test32     40

abc   test4    test41     30

abc   test4    test42     20

abc   test5    test51     10

abc   test5    test52     5 

abc   test6    test61     5 

|

|

|

bcd   test2    test22     10 

xyz   test3    test31     5 

xyz   test3    test32     3 

xyz   test4    test41     2

Output would be "

top 5 rf for (abc) -> abc,test1(289) abc,test2 (140), abc test3 (90), abc test4(50) , abc test5 (15)

top 5 rfm for (abc,test1) -> test11(260),test12(10),test13(8),test14(6),test15(5) and so on

My query is failing because results are really huge . I am reading about oracle analytic functions like rank. Can someone help me modifying this query using analytical functions. Any other approach can also work.

I am referring to this http://www.orafaq.com/node/55. But unable to get a right sql query for this.

Any help would be appreciated..I am like stuck for 2 days on this :(

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

与往事干杯 2024-10-23 20:00:56

可能有一些原因不使用分析函数，而是单独使用分析函数：

select am, rf, rfm, rownum_rf2, rownum_rfm
from
(
    -- the 3nd level takes the subproduct ranks, and for each equally ranked
    -- subproduct, it produces the product ranking
    select am, rf, rfm, rownum_rfm,
      row_number() over (partition by rownum_rfm order by rownum_rf) rownum_rf2
    from
    (
        -- the 2nd level ranks (without ties) the products within
        -- categories, and subproducts within products simultaneosly
        select am, rf, rfm,
          row_number() over (partition by am order by count_rf desc) rownum_rf,
          row_number() over (partition by am, rf order by count_rfm desc) rownum_rfm
        from
        (
            -- inner most query counts the records by subproduct
            -- using regular group-by. at the same time, it uses
            -- the analytical sum() over to get the counts by product
            select tg.am, ttc.rf, ttc.rfm,
              count(*) count_rfm,
              sum(count(*)) over (partition by tg.am, ttc.rf) count_rf
            from tg inner join ttc on tg.value = ttc.value
            group by tg.am, ttc.rf, ttc.rfm
        ) X
    ) Y
    -- at level 3, we drop all but the top 5 subproducts per product
    where rownum_rfm <= 5   -- top  5 subproducts
) Z
-- the filter on the final query retains only the top 10 products
where rownum_rf2 <= 10  -- top 10 products
order by am, rownum_rf2, rownum_rfm;

我使用 rownum 而不是rank，这样你就不会得到平局，或者换句话说，平局将是随机决定的。如果数据不够密集（前 10 个产品中的任何一个产品中的子产品少于 5 个 - 它可能会显示其他产品的子产品），这也不起作用。但如果数据密集（大型数据库），查询应该可以正常工作。

The below makes two passes of the data, but returns correct results in each case. Again, this is a rank-without-ties query.

select am, rf, rfm, count_rf, count_rfm, rownum_rf, rownum_rfm
from
(
    -- next join the top 10 products to the data again to get
    -- the subproduct counts
    select tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf, count(*) count_rfm,
        ROW_NUMBER() over (partition by tg.am, tg.rf order by 1 desc) rownum_rfm
    from (
        -- first rank all the products
        select tg.am, tg.value, ttc.rf, count(*) count_rf,
            ROW_NUMBER() over (order by 1 desc) rownum_rf
        from tg
        inner join ttc on tg.value = ttc.value
        group by tg.am, tg.value, ttc.rf
        order by count_rf desc
        ) tg
    inner join ttc on tg.value = ttc.value and tg.rf = ttc.rf
    -- filter the inner query for the top 10 products only
    where rownum_rf <= 10
    group by tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf
) X
-- filter where the subproduct rank is in top 5
where rownum_rfm <= 5
order by am, rownum_rf, rownum_rfm;

列：

count_rf : count of sales by product
count_rfm : count of sales by subproduct
rownum_rf : product rank within category (rownumber - without ties)
rownum_rfm : subproduct rank within product (without ties)

There are probably reasons not to use analytical functions, but using analytical functions alone:

select am, rf, rfm, rownum_rf2, rownum_rfm
from
(
    -- the 3nd level takes the subproduct ranks, and for each equally ranked
    -- subproduct, it produces the product ranking
    select am, rf, rfm, rownum_rfm,
      row_number() over (partition by rownum_rfm order by rownum_rf) rownum_rf2
    from
    (
        -- the 2nd level ranks (without ties) the products within
        -- categories, and subproducts within products simultaneosly
        select am, rf, rfm,
          row_number() over (partition by am order by count_rf desc) rownum_rf,
          row_number() over (partition by am, rf order by count_rfm desc) rownum_rfm
        from
        (
            -- inner most query counts the records by subproduct
            -- using regular group-by. at the same time, it uses
            -- the analytical sum() over to get the counts by product
            select tg.am, ttc.rf, ttc.rfm,
              count(*) count_rfm,
              sum(count(*)) over (partition by tg.am, ttc.rf) count_rf
            from tg inner join ttc on tg.value = ttc.value
            group by tg.am, ttc.rf, ttc.rfm
        ) X
    ) Y
    -- at level 3, we drop all but the top 5 subproducts per product
    where rownum_rfm <= 5   -- top  5 subproducts
) Z
-- the filter on the final query retains only the top 10 products
where rownum_rf2 <= 10  -- top 10 products
order by am, rownum_rf2, rownum_rfm;

I used rownum instead of rank so you don't ever get ties, or in other words, ties will be randomly decided. This also doesn't work if the data is not dense enough (less than 5 subproducts in any of the top 10 products - it may show subproducts from some other products instead). But if the data is dense (large established database), the query should work fine.

The below makes two passes of the data, but returns correct results in each case. Again, this is a rank-without-ties query.

select am, rf, rfm, count_rf, count_rfm, rownum_rf, rownum_rfm
from
(
    -- next join the top 10 products to the data again to get
    -- the subproduct counts
    select tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf, count(*) count_rfm,
        ROW_NUMBER() over (partition by tg.am, tg.rf order by 1 desc) rownum_rfm
    from (
        -- first rank all the products
        select tg.am, tg.value, ttc.rf, count(*) count_rf,
            ROW_NUMBER() over (order by 1 desc) rownum_rf
        from tg
        inner join ttc on tg.value = ttc.value
        group by tg.am, tg.value, ttc.rf
        order by count_rf desc
        ) tg
    inner join ttc on tg.value = ttc.value and tg.rf = ttc.rf
    -- filter the inner query for the top 10 products only
    where rownum_rf <= 10
    group by tg.am, tg.rf, ttc.rfm, tg.count_rf, tg.rownum_rf
) X
-- filter where the subproduct rank is in top 5
where rownum_rfm <= 5
order by am, rownum_rf, rownum_rfm;

columns:

count_rf : count of sales by product
count_rfm : count of sales by subproduct
rownum_rf : product rank within category (rownumber - without ties)
rownum_rfm : subproduct rank within product (without ties)

回复收藏 0 原文

临走之时 2024-10-23 20:00:56

这是猜测，但您可能可以从以下内容开始：

drop table category_sales;

一些测试数据：

create table category_sales (
   category    varchar2(14),
   product     varchar2(14),
   subproduct  varchar2(14),
   sales       number
);

begin

  for cate in 1 .. 10 loop
  for prod in 1 .. 20 loop
  for subp in 1 .. 30 loop

      insert into category_sales values (
             'Cat '  || cate,
             'Prod ' || cate||prod,
             'Subp ' || cate||prod||subp,
              trunc(dbms_random.value(1,30 + cate - prod + subp))
      );

  end loop; end loop; end loop;

end;
/

实际查询：

select * from (
  select 
    category,
    product,
    subproduct,
    sales,
    category_sales,
    product_sales,
    top_subproduct,
    -- Finding best products within category:
    dense_rank () over (
      partition by category
      order     by product_sales desc
    ) top_product 
  from (
    select 
      -- Finding the best Subproducts within
      -- category and product:
      dense_rank () over (
         partition by category, 
                      product 
         order     by sales desc
      )                             top_subproduct,
      -- Finding the sum(sales) within a 
      -- category and prodcut
      sum(sales) over (
         partition by category, 
                      product
      )                             product_sales,
      -- Finding the sum(sales) within 
      -- category
      sum(sales) over (
         partition by category
      )                             category_sales,
      category,
      product,
      subproduct,
      sales
    from
      category_sales
  )
)
where 
--    Only best 10 Products
      top_product       <= 10 and
--    Only best 5 subproducts:
      top_subproduct    <= 5
-- "Best" categories first:
order by 
      category_sales desc,
      top_product    desc,
      top_subproduct desc;

在该查询中，category_sales 列返回该类别的销售额总和它被返回到谁的记录中。这意味着，同一类别的每条记录都具有相同的 category_sales。需要此列来首先对结果集进行排序，其中最好的（销售）类别（order by ...category_sales desc）。

同样，product_sales 是类别-产品组合的销售额总和。此列用于查找每个类别中最好的n（此处：10） 个产品（其中 top_product <= 10）。

top_product 列是使用 dense_rank() over... 分析函数“创建”的。对于类别中最好的产品，它是 1，对于第二好的产品，它是 2，依此类推（因此 where top_product <= 10。

top_suproduct 列的计算公式为类似的方式如 top_product （即使用 dense_rank）。

It's guesswork, but you could probably start from something like this:

drop table category_sales;

Some test data:

create table category_sales (
   category    varchar2(14),
   product     varchar2(14),
   subproduct  varchar2(14),
   sales       number
);

begin

  for cate in 1 .. 10 loop
  for prod in 1 .. 20 loop
  for subp in 1 .. 30 loop

      insert into category_sales values (
             'Cat '  || cate,
             'Prod ' || cate||prod,
             'Subp ' || cate||prod||subp,
              trunc(dbms_random.value(1,30 + cate - prod + subp))
      );

  end loop; end loop; end loop;

end;
/

The actual query:

select * from (
  select 
    category,
    product,
    subproduct,
    sales,
    category_sales,
    product_sales,
    top_subproduct,
    -- Finding best products within category:
    dense_rank () over (
      partition by category
      order     by product_sales desc
    ) top_product 
  from (
    select 
      -- Finding the best Subproducts within
      -- category and product:
      dense_rank () over (
         partition by category, 
                      product 
         order     by sales desc
      )                             top_subproduct,
      -- Finding the sum(sales) within a 
      -- category and prodcut
      sum(sales) over (
         partition by category, 
                      product
      )                             product_sales,
      -- Finding the sum(sales) within 
      -- category
      sum(sales) over (
         partition by category
      )                             category_sales,
      category,
      product,
      subproduct,
      sales
    from
      category_sales
  )
)
where 
--    Only best 10 Products
      top_product       <= 10 and
--    Only best 5 subproducts:
      top_subproduct    <= 5
-- "Best" categories first:
order by 
      category_sales desc,
      top_product    desc,
      top_subproduct desc;

In that query, the column category_sales returns the sum of sales of the category in whose record it is returned. That means, every record of the same category has the same category_sales. This column is needed to order the result set with the best (sales) categories first (order by ... category_sales desc).

Similarly, product_sales is the sum of sales for a category-product combination. This column is used to find the best n (here:10) products in each category (where top_product <= 10).

The column top_product is "created" with the dense_rank() over... analytical function. For the best product in a category,it's 1, for the second best it's 2 and so on (hence the where top_product <= 10.

The columntop_suproduct is calculated in a similar fashion like top_product (that is with dense_rank).

回复收藏 0 原文

~没有更多了~