具有多个表连接的每组前 N 个
根据我的研究,这是一个非常常见的问题,通常有一个相当简单的解决方案。我的任务是将几个查询从获取所有结果更改为获取每组前3个。起初进展顺利,我使用了该网站的一些建议和答案来实现这一目标(浏览次数最多的产品)。然而,由于多次加入,我在最后一个“最畅销产品”方面遇到了困难。
基本上,我需要按每个产品的最高销售额(其中每个供应商的最大产品数为 3)按顺序获取所有产品 我已经连接了多个表来创建原始查询,每次我尝试使用变量生成排名会产生无效结果。以下内容应该有助于更好地理解该问题(为简洁起见,我删除了不必要的字段):
产品表
productid | vendorid | approved | active | deleted
供应商表 订单
vendorid | approved | active | deleted
表
orderid | `status` | deleted
订单项目表
orderitemid | orderid | productid | price
现在,我获取所有结果的原始查询如下:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
最后, (这就是我被难住的地方),我试图改变上述声明,以便我只收到每个供应商的前 3 个产品(按销售数量计算)。我想添加到目前为止我所拥有的内容,但我很尴尬这样做,而且这个问题已经是一堵文字墙了。我尝试过变量但不断得到无效结果。任何帮助将不胜感激。
Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.
Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):
Product Table
productid | vendorid | approved | active | deleted
Vendor Table
vendorid | approved | active | deleted
Order Table
orderid | `status` | deleted
Order Items Table
orderitemid | orderid | productid | price
Now, my original query to get all results is as follows:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
即使您指定 LIMIT 100,这种类型的查询也需要进行完整扫描并构建表,然后检查每条记录并对行进行编号,最后过滤出您要显示的 100 条记录。
这里的方法是
Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.
The approach here is
我喜欢这个优雅的解决方案,但是当我在我的开发机器上运行经过调整但类似的查询时,我得到了一个不确定的结果集。我相信这是由于 MySql 优化器在同一语句中处理分配和读取用户变量的方式造成的。
来自文档:
只需在此处添加此注释,以防其他人遇到这种奇怪的行为。
I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.
From the docs:
Just adding this note here in case someone else comes across this weird behaviour.
@RichardTheKiwi 给出的答案非常有效,让我完成了 99% 的工作!我正在使用 MySQL,并且只获取每个组的第一行标记有行号,而其余行仍然为 NULL。这导致查询仅返回每个组的热门命中,而不是前三行。为了解决这个问题,我必须在
initvars
子查询中初始化@r
。 更改我将
from (select @g:=null) initvars
为
from (select @g:=null, @r:=null) initvars
您还可以初始化
@r
为 0,效果是一样的。对于那些不太熟悉这种类型语法的人来说,附加部分是读取每个排序组,以及一行是否与前一行具有相同的vendorid
(使用@g 进行跟踪)
变量,它增加行号,该行号存储在变量@r
中。当此进程到达具有新vendorid
的下一组时,IF
语句将不再计算为 true,并且@r
变量(从而RowNum
) 将重置为 1。The answer given by @RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize
@r
in theinitvars
subquery. I changed,from (select @g:=null) initvars
to
from (select @g:=null, @r:=null) initvars
You could also initialize
@r
to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the samevendorid
as the previous row, which is tracked with the@g
variable, it increments the row number, which is stored in the variable@r
. When this process reaches the next group with a newvendorid
, theIF
statement will no longer evaluate as true and the@r
variable (and thereby theRowNum
) will be reset to 1.