具有多个表连接的每组前 N 个

发布于 2024-10-22 08:09:43 字数 1281 浏览 4 评论 0原文

根据我的研究,这是一个非常常见的问题,通常有一个相当简单的解决方案。我的任务是将几个查询从获取所有结果更改为获取每组前3个。起初进展顺利,我使用了该网站的一些建议和答案来实现这一目标(浏览次数最多的产品)。然而,由于多次加入,我在最后一个“最畅销产品”方面遇到了困难。

基本上,我需要按每个产品的最高销售额(其中每个供应商的最大产品数为 3)按顺序获取所有产品 我已经连接了多个表来创建原始查询,每次我尝试使用变量生成排名会产生无效结果。以下内容应该有助于更好地理解该问题(为简洁起见,我删除了不必要的字段):

产品表

productid | vendorid | approved | active | deleted

供应商表 订单

vendorid | approved | active | deleted

orderid | `status` | deleted

订单项目表

orderitemid | orderid | productid | price

现在,我获取所有结果的原始查询如下:

SELECT COUNT(oi.price) AS `NumSales`, 
       p.productid, 
       p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;

最后, (这就是我被难住的地方),我试图改变上述声明,以便我只收到每个供应商的前 3 个产品(按销售数量计算)。我想添加到目前为止我所拥有的内容,但我很尴尬这样做,而且这个问题已经是一堵文字墙了。我尝试过变量但不断得到无效结果。任何帮助将不胜感激。

Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.

Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):

Product Table

productid | vendorid | approved | active | deleted

Vendor Table

vendorid | approved | active | deleted

Order Table

orderid | `status` | deleted

Order Items Table

orderitemid | orderid | productid | price

Now, my original query to get all results is as follows:

SELECT COUNT(oi.price) AS `NumSales`, 
       p.productid, 
       p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;

Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

烂柯人 2024-10-29 08:09:43

即使您指定 LIMIT 100,这种类型的查询也需要进行完整扫描并构建表,然后检查每条记录并对行进行编号,最后过滤出您要显示的 100 条记录。

select
    vendorid, productid, NumSales
from
(
    select
        vendorid, productid, NumSales,
        @r := IF(@g=vendorid,@r+1,1) RowNum,
        @g := vendorid
    from (select @g:=null) initvars
    CROSS JOIN 
    (
        SELECT COUNT(oi.price) AS NumSales, 
               p.productid, 
               p.vendorid
        FROM products p
        INNER JOIN vendors v ON (p.vendorid = v.vendorid)
        INNER JOIN orders_items oi ON (p.productid = oi.productid)
        INNER JOIN orders o ON (oi.orderid = o.orderid)
        WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
        AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
        AND o.`Status` = 'SETTLED'
        AND o.Deleted = 0
        GROUP BY p.vendorid, p.productid
        ORDER BY p.vendorid, NumSales DESC
    ) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;

这里的方法是

  1. 分组以获取 NumSales
  2. 使用变量对每个供应商/产品的销售额进行行编号
  3. 过滤编号数据集以允许每个供应商最多 3 个
  4. 按 NumSales DESC 排序剩余部分并仅返回 100

Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.

select
    vendorid, productid, NumSales
from
(
    select
        vendorid, productid, NumSales,
        @r := IF(@g=vendorid,@r+1,1) RowNum,
        @g := vendorid
    from (select @g:=null) initvars
    CROSS JOIN 
    (
        SELECT COUNT(oi.price) AS NumSales, 
               p.productid, 
               p.vendorid
        FROM products p
        INNER JOIN vendors v ON (p.vendorid = v.vendorid)
        INNER JOIN orders_items oi ON (p.productid = oi.productid)
        INNER JOIN orders o ON (oi.orderid = o.orderid)
        WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
        AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
        AND o.`Status` = 'SETTLED'
        AND o.Deleted = 0
        GROUP BY p.vendorid, p.productid
        ORDER BY p.vendorid, NumSales DESC
    ) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;

The approach here is

  1. Group by to get NumSales
  2. Use variables to row number the sales per vendor/product
  3. Filter the numbered dataset to allow for a max of 3 per vendor
  4. Order the remaining by NumSales DESC and return only 100
腹黑女流氓 2024-10-29 08:09:43

我喜欢这个优雅的解决方案,但是当我在我的开发机器上运行经过调整但类似的查询时,我得到了一个不确定的结果集。我相信这是由于 MySql 优化器在同一语句中处理分配和读取用户变量的方式造成的。

来自文档

作为一般规则,您永远不应该为用户变量赋值并在同一语句中读取该值。您可能会得到预期的结果,但这并不能保证。涉及用户变量的表达式的求值顺序是未定义的,并且可能会根据给定语句中包含的元素而改变;此外,不保证 MySQL 服务器版本之间的顺序相同。

只需在此处添加此注释,以防其他人遇到这种奇怪的行为。

I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.

From the docs:

As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.

Just adding this note here in case someone else comes across this weird behaviour.

和影子一齐双人舞 2024-10-29 08:09:43

@RichardTheKiwi 给出的答案非常有效,让我完成了 99% 的工作!我正在使用 MySQL,并且只获取每个组的第一行标记有行号,而其余行仍然为 NULL。这导致查询仅返回每个组的热门命中,而不是前三行。为了解决这个问题,我必须在 initvars 子查询中初始化 @r。 更改

我将from (select @g:=null) initvars

from (select @g:=null, @r:=null) initvars

您还可以初始化 @r 为 0,效果是一样的。对于那些不太熟悉这种类型语法的人来说,附加部分是读取每个排序组,以及一行是否与前一行具有相同的 vendorid(使用 @g 进行跟踪) 变量,它增加行号,该行号存储在变量 @r 中。当此进程到达具有新 vendorid 的下一组时,IF 语句将不再计算为 true,并且 @r 变量(从而RowNum) 将重置为 1。

The answer given by @RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize @r in the initvars subquery. I changed,

from (select @g:=null) initvars

to

from (select @g:=null, @r:=null) initvars

You could also initialize @r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the @g variable, it increments the row number, which is stored in the variable @r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the @r variable (and thereby the RowNum) will be reset to 1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文