优化在同一个表上使用多个左联接的查询
我遇到过一个花费“太长时间”的查询。该查询在 10 个左右的表之间有 50 多个左连接。为了简要概述数据库模型,连接的表是存储特定数据类型(例如:date_fields、integer_fields、text_fields 等)数据的表,每个表都有一个值列,即“数据字段”id,和票证 ID。该查询是基于“票据”与其“数据字段”之间的关联表以编程方式构建的。
连接语句如下所示:
...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)
当对查询使用解释时显示以下内容:
1 SIMPLE t ref idx_dataset_id idx_dataset_id 5 const 2871 Using where; Using temporary; Using filesort
1 SIMPLE t001 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 5
...
1 SIMPLE t056 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 8
我可以采取什么方向来调整此查询?所有索引似乎都已就位。也许t表(门票)的行数(2871)应该减少。多少个左连接太多了?数据字段表是否应该只连接一次,然后查询每个表以获取所需的数据?
I've come across a query that is taking "too long". The query has 50+ left joins between 10 or so tables. To give a brief overview of the database model, the tables joined are tables that store data for a particular data type (ex: date_fields, integer_fields, text_fields, etc.) and each has a column for the value, a "datafield" id, and a ticket id. The query is built programmatically based on an association table between a "ticket" and its "data fields".
The join statements look something like the following:
...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)
When using explain on the query shows the following:
1 SIMPLE t ref idx_dataset_id idx_dataset_id 5 const 2871 Using where; Using temporary; Using filesort
1 SIMPLE t001 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 5
...
1 SIMPLE t056 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 8
What direction can I take to tune this query? All the indexes seem to be in place. Perhaps the t table (tickets) row number (2871) should be reduced. How many left joins is too much? Should the datafield tables be joined only once and then queried each for the data that is required?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在使用可怕的反模式的变体,称为Entity-Attribute-Value。您将属性存储在单独的行上,因此如果您想要重建看起来像传统数据行的内容,则需要为每个属性创建一个联接。
这会创建一个具有 50 个连接的查询,这并不奇怪。对于大多数数据库来说,这太多了,无法有效运行(您还没有确定您正在使用哪个数据库)。最终,您将需要更多属性,并且可能会超出数据库可以执行的联接数量的某些架构限制。
解决方案是:不要在 SQL 中重建行。
而是将属性作为多行进行查询,而不是尝试将它们组合到单行中。
然后,您必须在应用程序中编写一个函数来循环生成的行集,并将属性一一收集到应用程序空间中的对象中,这样您就可以像使用单个实体一样使用它。
You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.
It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.
The solution is: don't reconstruct the row in SQL.
Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.
Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.
为了更清晰的查询,我会使用这样的东西:
连接可能会是左侧的,但这取决于数据的结构。
查询中没有 union,只有两个 join
for the clearer query i would use something like this:
Joins would be probably LEFT, but it depends on the structure of your data.
There is no union in the query and there are only two joins