Mysql慢查询:JOIN +多个 WHERES +订购依据

发布于 2024-09-25 11:35:03 字数 3676 浏览 0 评论 0原文

潜伏好久了,第一个问题!

我正在努力优化这个查询,它选择与所选过滤器匹配的最低价格的商品:

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link
FROM product_info
NATURAL JOIN (SELECT * FROM product_all WHERE product_all.date = '2010-09-30') as product_all
WHERE (product_info.category = 2  
AND product_info.gender = 'W' )
GROUP BY product_all.prod_id
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13

它的解释:

| id | select_type | table        | type   | possible_keys                                             | key     | key_len | ref                 | rows   | Extra                           |  
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+  
|  1 | PRIMARY     | <derived2>   | ALL    | NULL                                                     | NULL    | NULL    | NULL                | 89801  | Using temporary; Using filesort | 
|  1 | PRIMARY     | product_info | eq_ref | PRIMARY,category_prod_id_retail_price,category_ret...     | PRIMARY | 4       | product_all.prod_id | 1      | Using where                     | 
|  2 | DERIVED     | product_all  | ref    | date_2                                                    | date_2  | 3       |                     | 144107 |                                 | 

我已经尝试消除子查询,直观上看起来更好,但实际上需要更长的时间:

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link
FROM product_info
NATURAL JOIN product_all
WHERE (product_all.date = '2010-09-30'
AND product_info.category = 2 
AND product_info.gender = 'W' )
GROUP BY product_all.prod_id
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13

它的解释:

| id | select_type | table        | type | possible_keys                                             | key                      | key_len | ref                               | rows | Extra                                        |  
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+  
|  1 | SIMPLE      | product_info | ref  | PRIMARY,category_prod_id_retail_price,category_ret...     | category_retail_price    | 5       | const                             | 269  | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | product_all  | ref  | PRIMARY,prod_id,date_2                                    | prod_id                  | 4       | equipster_db.product_info.prod_id | 141  | Using where                                  | 

这里是表格:

CREATE TABLE `product_all` (
`prod_id` INT( 10 ) NOT NULL PRIMARY KEY ,
`ref_id` INT( 10) NOT NULL PRIMARY KEY ,
`date` DATE NOT NULL ,
`buy_link` BLOB NOT NULL ,
`sale_price` FLOAT NOT NULL
) ENGINE = MYISAM ;


CREATE TABLE `product_info` (
`prod_id` INT( 10 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`prod_name` VARCHAR( 200 ) NOT NULL,
`brand` VARCHAR( 50 ) NOT NULL,
`retail_price` FLOAT NOT NULL
`category` INT( 3 ) NOT NULL,
`gender` VARCHAR( 1 ) NOT NULL,
`type` VARCHAR( 10 ) NOT NULL
) ENGINE = MYISAM ;

我的问题:
-哪种查询结构看起来是最佳的?
-哪些索引可以优化此查询?
-不太重要的是:当添加或删除 WHERE 子句或使用不同的 ORDER BY 时,索引方法如何改变,例如按 % off 排序:

ORDER BY (1-(MIN(product_all.sale_price)/product_info.retail_price)) DESC  

编辑:两个查询的自然联接都作用于 prod_id (一条记录)在product_info中可以有多个实例在product_all中,这就是为什么它们需要分组)

long time lurker, first question!

I am struggling to optimize this query, which selects the lowest priced items that match the chosen filters:

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link
FROM product_info
NATURAL JOIN (SELECT * FROM product_all WHERE product_all.date = '2010-09-30') as product_all
WHERE (product_info.category = 2  
AND product_info.gender = 'W' )
GROUP BY product_all.prod_id
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13

Its explain:

| id | select_type | table        | type   | possible_keys                                             | key     | key_len | ref                 | rows   | Extra                           |  
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+  
|  1 | PRIMARY     | <derived2>   | ALL    | NULL                                                     | NULL    | NULL    | NULL                | 89801  | Using temporary; Using filesort | 
|  1 | PRIMARY     | product_info | eq_ref | PRIMARY,category_prod_id_retail_price,category_ret...     | PRIMARY | 4       | product_all.prod_id | 1      | Using where                     | 
|  2 | DERIVED     | product_all  | ref    | date_2                                                    | date_2  | 3       |                     | 144107 |                                 | 

I've tried eliminating the subquery, which intuitively seems better but in practice takes even longer:

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link
FROM product_info
NATURAL JOIN product_all
WHERE (product_all.date = '2010-09-30'
AND product_info.category = 2 
AND product_info.gender = 'W' )
GROUP BY product_all.prod_id
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13

And its explain:

| id | select_type | table        | type | possible_keys                                             | key                      | key_len | ref                               | rows | Extra                                        |  
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+  
|  1 | SIMPLE      | product_info | ref  | PRIMARY,category_prod_id_retail_price,category_ret...     | category_retail_price    | 5       | const                             | 269  | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | product_all  | ref  | PRIMARY,prod_id,date_2                                    | prod_id                  | 4       | equipster_db.product_info.prod_id | 141  | Using where                                  | 

Here are the tables:

CREATE TABLE `product_all` (
`prod_id` INT( 10 ) NOT NULL PRIMARY KEY ,
`ref_id` INT( 10) NOT NULL PRIMARY KEY ,
`date` DATE NOT NULL ,
`buy_link` BLOB NOT NULL ,
`sale_price` FLOAT NOT NULL
) ENGINE = MYISAM ;


CREATE TABLE `product_info` (
`prod_id` INT( 10 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`prod_name` VARCHAR( 200 ) NOT NULL,
`brand` VARCHAR( 50 ) NOT NULL,
`retail_price` FLOAT NOT NULL
`category` INT( 3 ) NOT NULL,
`gender` VARCHAR( 1 ) NOT NULL,
`type` VARCHAR( 10 ) NOT NULL
) ENGINE = MYISAM ;

My Questions:
-which query structure seems optimal?
-what indices would optimize this query?
-less importantly: how does the indexing approach change when adding or removing WHERE clauses or using a different ORDER BY, such as sorting by % off:

ORDER BY (1-(MIN(product_all.sale_price)/product_info.retail_price)) DESC  

edit: both queries' natural join acts on prod_id (one record in product_info can have multiple instances in product_all, which is why they need to be grouped)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

冷…雨湿花 2024-10-02 11:35:03

索引在 mysql 中产生了巨大的差异,一个查询需要 15 分钟,如果使用一组错误的索引,则需要 0.2 秒,而使用正确的索引则需要 0.2 秒,但找到正确的平衡通常是问题所在。当然,如果没有一些示例数据,很难说下面的解决方案是否会节省您的时间,但理论上它应该可以。

为了回答您的问题,我将重新设计表格,如下所示:

CREATE TABLE `product_all` ( 
`prod_id` INT( 10 ) NOT NULL, 
`ref_id` INT( 10) NOT NULL, 
`date` DATE NOT NULL , 
`buy_link` BLOB NOT NULL , 
`sale_price` FLOAT NOT NULL,
PRIMARY KEY (prod_id, ref_id) ,
INDEX date_Index (`date` ASC),
UNIQUE INDEX prod_price_Index (prod_id ASC, sale_price ASC)
) ENGINE = MYISAM ; 


CREATE TABLE `product_info` ( 
`prod_id` INT( 10 ) NOT NULL AUTO_INCREMENT, 
`prod_name` VARCHAR( 200 ) NOT NULL, 
`brand` VARCHAR( 50 ) NOT NULL, 
`retail_price` FLOAT NOT NULL, 
`category` INT( 3 ) NOT NULL, 
`gender` VARCHAR( 1 ) NOT NULL, 
`type` VARCHAR( 10 ) NOT NULL,
PRIMARY KEY (prod_id) ,
UNIQUE INDEX prod_id_name_Index (prod_id ASC, prod_name ASC),
INDEX category_Index (category ASC),
INDEX gender_Index (gender ASC)
) ENGINE = MYISAM ;

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link         
FROM product_info         
NATURAL JOIN (SELECT * FROM product_all WHERE product_all.date = '2010-09-30') as product_all         
WHERE (product_info.category = 2           
AND product_info.gender = 'W' )         
GROUP BY product_all.prod_id         
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13        

这里的性能增益是通过对正在连接的主要字段建立索引并在 where 子句中获得的。就我个人而言,我会选择您的第一个查询,因为当您认为它应该表现得更好时。

据我了解第一个和第二个查询中发生的情况:

  • 第一个查询正在被过滤
    在执行之前的子查询
    自然连接,这意味着它是唯一的
    加入结果数据而不是
    整个桌子。
  • 第二个查询是加入
    整个第二个表,然后
    过滤结果行
    一切回到你想要的。

根据经验,通常您希望在主要连接字段以及 where 子句中最常使用的字段上添加索引。我还在您想要定期查询的一些字段上放置了一些唯一索引,例如 prod_id_name_Index。

如果这不能提高您的性能,如果您可以发布一些虚拟数据来使用,我也许可以获得一个可以进行基准测试的更快的解决方案。

这里是一篇介绍 mysql 索引性能的文章,值得一读你想了解更多。

祝你好运!

编辑:我第一次错过了你的最后一个问题,答案是,如果你对主要连接字段建立索引,然后更改为 where 只会稍微影响整体性能,但我放在表上的唯一索引应该考虑到您想要查询的大部分内容。要记住的主要事情是,如果您频繁查询或连接某个字段,那么它确实应该被索引,但是您不应该担心重新调整索引策略方面的小查询和顺序更改。

Indices make a massive difference in mysql, one query that took 15 minutes with a wrong set of indices took .2 seconds with the right ones, but its finding the right balance that is generally the issue. Naturally without some sample data its really hard to say if the below solution will save you any time, but in theory it should.

To answer your questions, I would redesign the tables like so:

CREATE TABLE `product_all` ( 
`prod_id` INT( 10 ) NOT NULL, 
`ref_id` INT( 10) NOT NULL, 
`date` DATE NOT NULL , 
`buy_link` BLOB NOT NULL , 
`sale_price` FLOAT NOT NULL,
PRIMARY KEY (prod_id, ref_id) ,
INDEX date_Index (`date` ASC),
UNIQUE INDEX prod_price_Index (prod_id ASC, sale_price ASC)
) ENGINE = MYISAM ; 


CREATE TABLE `product_info` ( 
`prod_id` INT( 10 ) NOT NULL AUTO_INCREMENT, 
`prod_name` VARCHAR( 200 ) NOT NULL, 
`brand` VARCHAR( 50 ) NOT NULL, 
`retail_price` FLOAT NOT NULL, 
`category` INT( 3 ) NOT NULL, 
`gender` VARCHAR( 1 ) NOT NULL, 
`type` VARCHAR( 10 ) NOT NULL,
PRIMARY KEY (prod_id) ,
UNIQUE INDEX prod_id_name_Index (prod_id ASC, prod_name ASC),
INDEX category_Index (category ASC),
INDEX gender_Index (gender ASC)
) ENGINE = MYISAM ;

SELECT product_info.*, MIN(product_all.sale_price) as sale_price, product_all.buy_link         
FROM product_info         
NATURAL JOIN (SELECT * FROM product_all WHERE product_all.date = '2010-09-30') as product_all         
WHERE (product_info.category = 2           
AND product_info.gender = 'W' )         
GROUP BY product_all.prod_id         
ORDER BY MIN(product_all.sale_price) ASC LIMIT 13        

The performance gain here is gained my indexing the main fields that are being joined upon and are featured in the where clause. Personally I would go with your first query as when you think about it that should perform better.

As far as I understand whats happening in the first and second query:

  • The first query is being filtered by
    a sub-query prior to doing the
    natural join, that means its only
    joining in the resulting data and not
    the whole table.
  • The second query is joining the
    entire second table and then
    filtering the resulting rows of the
    whole lot back to what you want.

As a rule of thumb normally you want to add indices on your major joining fields and also the fields that you use the most in where clauses. I've also put some unique indices on some of the fields that you will want to query regularly, such as prod_id_name_Index.

If this doesn't improve your performance if you could maybe post some dummy data to play with I might be able to get a faster solution that I can benchmark.

Here is an article that goes through indexing for performance in mysql, worth a read if you want to know more.

Good luck!

EDIT: Your final question I missed the first time, the answer is that if your indexing the main joining fields then changes to the where will only impact the overall performance slightly, but the unique indices I've put on the tables should account for the majority of things you'll want to base queries upon. The main thing to remember is if you query or join upon a field frequently then it should really be indexed, but minor queries and changes to the order by you should just not worry about in terms of realigning your indexing strategy.

千仐 2024-10-02 11:35:03

就性能而言,使用它从来都不是一件好事

select *

您应该使用单独的列名称。

select column1,column2 etc...

Performance wise, its never a good thing to use

select *

You should use the individual column names instead.

select column1,column2 etc...
回心转意 2024-10-02 11:35:03

就我个人而言,我是一个 sql 极简主义者,避免任何类型的子查询或无法索引到索引列的连接。

如果这实际上不可能,我可能会单独运行子查询来收集我的密钥,对它们进行客户端站点排序,然后构建一个 where in (...) 子句。

JohnVD 提出了很多好的观点,但如果您需要创建一个包含 Product_name 的唯一密钥,您应该真正看看是否可以将其标准化为 it。

如果可能的话,应不惜一切代价避免对 varchar 列建立索引。每个索引条目都与列的最大大小一样大,即使它们通常只是其中的一小部分。如果您使用像 utf-8 这样的字符集,那么大小约为 maxlen+3。

根据您的限制,似乎需要 order by 。但仅供参考,当您进行分组依据时,如果您要消耗整个结果集,请添加 ORDER BY NULL。通过解释运行这两个变体以查看原因;按 null 排序消除了隐含的文件排序,您可以在客户端进行排序。 (如果您使用汇总进行分组,则这是不可能的)

Personally I'm a sql minimalist and avoid any kind of sub queries or joins that can't be index to index columns.

If that's not really possible I'll probably run the subqueries individually to collect my keys, sort them client site, and then build a where in (...) clause.

JohnVD makes a lot of good points but if you're going to need to make a unique key including product_name you should really see if that can be normalized out into an it.

Indexing varchar columns is something to steer away from at all costs if possible. Each index entry is as big as the maximum size of the column, even if they're usually only a fraction of that. And if you're using a charset like utf-8 then the size is ~ maxlen+3.

With your limit it seems the order by is needed. But just as an FYI when you're doing a group by, if you're going to consume the whole result set then tack on an ORDER BY NULL. Run the two variants through explain to see why; the order by null eliminates an implied filesort and you can sort client side. (This isn't possible if you're doing group by with rollup though)

笑红尘 2024-10-02 11:35:03

您应该坚持第二个查询。在列上使用索引可以最大限度地减少受影响的行。在这种情况下,它可能是日期。如果过滤条件始终包含多个列,您应该尝试多列索引。 MySQL 只会使用一个索引。

You should stick with the second query. Use an index on the column which reduces the affected rows the most. In this case it might be the date. if the filter conditions always contain more than one column you should try a multicolumn index. MySQL will only use one index.

岁月静好 2024-10-02 11:35:03

正如米奇所说,尝试找到自然会具有较少记录数的标准肯定会赢得性能。如果类别 + 性别非常常见,请将其作为两个列的索引。此外,一旦找到最佳标准,您可以更改以下查询以更好地匹配它。 “STRAIGHT_JOIN”告诉MySQL按照您指定的顺序执行操作,而不是尝试更改用于查询基础的主表并连接到另一个表...所以,我不知道哪个类别索引更准确、性别或日期...如果 Date 的记录基础较少,那么我会将其交换为 FROM 子句中的第一个表,并在心里将日期上的 IT 标准移动到 WHERE 子句的第一个位置(仅我个人)以在视觉上与表格保持同步)。我已经看到 STRAIGHT_JOIN 在许多看似简单查询的情况下显着提高了性能。

SELECT STRAIGHT_JOIN
      product_info.*, 
      MIN(product_all.sale_price) as sale_price, 
      product_all.buy_link 
   FROM 
      product_info,
      product_all 
   where 
          product_info.category = 2   
      AND product_info.gender = 'W'
      and product_info.prod_id = product_all.prod_id
      AND product_all.date = '2010-09-30'
   GROUP BY 
      product_info.prod_id 
   ORDER BY 
      MIN(product_all.sale_price) ASC 
   LIMIT 13 

As Mitch stated, trying to find the criteria that would naturally have the lower count of records would definitely win for performance. And if Category + Gender would be very common, make that an index on BOTH columns. Additionally, once you find that optimimum criteria you might alter the following query to match it better. The "STRAIGHT_JOIN" tells MySQL to do it in the order you state instead of it trying to change the primary table used for querying basis and joining to the other... So, I don't know which is more accurate of index of category, gender or date... If Date is going to have less record basis, then I would swap THAT as the first table in the FROM clause, and mentally move ITs criteria on date to the first position of the WHERE clause (just me personally to keep in synch with tables visually). I've seen STRAIGHT_JOIN improve performance significantly in MANY situations that otherwise appeared to be simple queries.

SELECT STRAIGHT_JOIN
      product_info.*, 
      MIN(product_all.sale_price) as sale_price, 
      product_all.buy_link 
   FROM 
      product_info,
      product_all 
   where 
          product_info.category = 2   
      AND product_info.gender = 'W'
      and product_info.prod_id = product_all.prod_id
      AND product_all.date = '2010-09-30'
   GROUP BY 
      product_info.prod_id 
   ORDER BY 
      MIN(product_all.sale_price) ASC 
   LIMIT 13 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文