查询顾客与列出的产品一起购买了什么
我正在尝试优化一个非常旧的查询,我无法理解它。我想要存档的结果是,我想向网上商店的访问者推荐其他客户感兴趣的内容,即他们与访问者正在查看的产品一起购买的其他内容。
我有一个子查询,但它非常慢,大约需要 15 秒处理大约 8 000 000 行。
布局是,放入用户购物篮中的所有产品都保存在表 wsBasket
中,并由 basketid
分隔(在另一个表中与成员关联)。
在这个例子中,我想列出用户与productid 427一起购买的所有最受欢迎的产品,但不列出productid 427本身。
SELECT productid, SUM(quantity) AS qty
FROM wsBasket
WHERE basketid IN
(SELECT basketid
FROM wsBasket
WHERE productid=427) AND productid!=427
GROUP by productid
ORDER BY qty
DESC LIMIT 0,4;
非常感谢任何帮助!希望这至少对某人有意义:)
更新1: 感谢您的评论,伙计们,这是我的答案,它们不适合评论字段。
在上面的查询上使用 EXPLAIN 我得到了以下结果。请注意,我的表上没有任何索引(id
字段上的主键除外),我想修改查询以从索引中受益并将索引放置在正确的键上。
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
| 1 | PRIMARY | wsBasket | ALL | NULL | NULL | NULL | NULL | 2821 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | wsBasket | ALL | NULL | NULL | NULL | NULL | 2821 | Using where |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
i'm trying to get optimize an very old query that i can't wrap my head around. the result that i want to archive is that i want to recommend the visitor on a web shop what other customers have shown interest in, i.e. what else they have bought together with the product that the visitor is looking at.
i have a subquery but it's very slow, takes ~15s on ~8 000 000 rows.
the layout is that all products that are put in a users basket are kept in a table wsBasket
and separated by a basketid
(which in another table is associated with a member).
in this example i want to list all the most popular products that users have bought together with productid 427, but not list the productid 427 itself.
SELECT productid, SUM(quantity) AS qty
FROM wsBasket
WHERE basketid IN
(SELECT basketid
FROM wsBasket
WHERE productid=427) AND productid!=427
GROUP by productid
ORDER BY qty
DESC LIMIT 0,4;
any help is much appreciated! hope this makes any sense at all to at least someone :)
UPDATE 1:
thanks for your comments guys here are my answers, they didn't fit in the comments-field.
Using EXPLAIN on the above query i got the fllowing. Please note, I do not have any indexes on the table (except for primary key on the id
-field), i want to modify the query to benefit from indexes and place indexes on the right keys.
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
| 1 | PRIMARY | wsBasket | ALL | NULL | NULL | NULL | NULL | 2821 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | wsBasket | ALL | NULL | NULL | NULL | NULL | 2821 | Using where |
+----+--------------------+----------+------+---------------+------+---------+------+------+----------------------------------------------+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
需要添加两个明显的索引:一个在篮子 ID 上,第二个在产品 ID 上:然后重试查询和一个新的 EXPLAIN 以查看索引是否正在使用
Two obvious indexes to add: one on basketid and a second on productid: then retry the query and a new EXPLAIN to see that the indexes are being used
除了确保
productid
和basketid
上存在合适的索引之外,将查询构造为简单联接而不是子查询通常也会使您受益匪浅,尤其是在 MySQL 中。对我来说,在可能相似的数据集上,连接在
EXPLAIN
输出中产生了两个select_type: SIMPLE
行,而子查询方法则产生了可怕的性能问题依赖子查询
。因此,连接速度快了一个数量级以上。As well as ensuring that suitable indexes exist on
productid
andbasketid
, you will often benefit from structuring your query as a simple join rather than a subquery, especially in MySQL.For me, on a possibly-similar dataset, the join resulted in two
select_type: SIMPLE
rows in theEXPLAIN
output, whereas the subquery method spat out a horrible-for-performanceDEPENDENT SUBQUERY
. Consequently the join was well over an order of magnitude faster.在此查询中主要用于搜索的两个字段是productid 和basketid。
当您搜索productid等于427的记录时,数据库不知道在哪里可以找到该记录。它甚至不知道如果确实找到了一个匹配项,则不会再有另一个匹配项,因此它必须查找整个表,可能有数千条记录。
索引是一个已排序的单独文件,并且仅包含您感兴趣的排序字段。因此创建索引可以节省大量时间!
The two fields you mainly use for searching in this query are productid and basketid.
When you search for records having productid equal to 427, Database has no clue where to find this record. It doesn't even know that if it does find one matching, that there will not be another matching one, so it has to look through the entire table, potentially thousands of records.
An index is a separate file that is sorted, and contains only the field/s you're interested in sorting on. so creating an index saves a immense amount of time!