优化在同一个表上使用多个左联接的查询

发布于 2024-09-27 03:56:12 字数 892 浏览 11 评论 0原文

我遇到过一个花费“太长时间”的查询。该查询在 10 个左右的表之间有 50 多个左连接。为了简要概述数据库模型，连接的表是存储特定数据类型（例如：date_fields、integer_fields、text_fields 等）数据的表，每个表都有一个值列，即“数据字段”id，和票证 ID。该查询是基于“票据”与其“数据字段”之间的关联表以编程方式构建的。

连接语句如下所示：

...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)

当对查询使用解释时显示以下内容：

1   SIMPLE   t       ref   idx_dataset_id                   idx_dataset_id  5   const   2871   Using where; Using temporary; Using filesort
1   SIMPLE   t001   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   5   
... 
1   SIMPLE   t056   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   8

我可以采取什么方向来调整此查询？所有索引似乎都已就位。也许t表（门票）的行数（2871）应该减少。多少个左连接太多了？数据字段表是否应该只连接一次，然后查询每个表以获取所需的数据？

原文

I've come across a query that is taking "too long". The query has 50+ left joins between 10 or so tables. To give a brief overview of the database model, the tables joined are tables that store data for a particular data type (ex: date_fields, integer_fields, text_fields, etc.) and each has a column for the value, a "datafield" id, and a ticket id. The query is built programmatically based on an association table between a "ticket" and its "data fields".

The join statements look something like the following:

...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)

When using explain on the query shows the following:

1   SIMPLE   t       ref   idx_dataset_id                   idx_dataset_id  5   const   2871   Using where; Using temporary; Using filesort
1   SIMPLE   t001   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   5   
... 
1   SIMPLE   t056   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   8

What direction can I take to tune this query? All the indexes seem to be in place. Perhaps the t table (tickets) row number (2871) should be reduced. How many left joins is too much? Should the datafield tables be joined only once and then queried each for the data that is required?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

￡噩梦荏苒 2024-10-04 03:56:12

您正在使用可怕的反模式的变体，称为Entity-Attribute-Value。您将属性存储在单独的行上，因此如果您想要重建看起来像传统数据行的内容，则需要为每个属性创建一个联接。

这会创建一个具有 50 个连接的查询，这并不奇怪。对于大多数数据库来说，这太多了，无法有效运行（您还没有确定您正在使用哪个数据库）。最终，您将需要更多属性，并且可能会超出数据库可以执行的联接数量的某些架构限制。

解决方案是：不要在 SQL 中重建行。

而是将属性作为多行进行查询，而不是尝试将它们组合到单行中。

SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)

然后，您必须在应用程序中编写一个函数来循环生成的行集，并将属性一一收集到应用程序空间中的对象中，这样您就可以像使用单个实体一样使用它。

You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.

It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.

The solution is: don't reconstruct the row in SQL.

Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.

SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)

Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.

回复收藏 0 原文

§对你不离不弃 2024-10-04 03:56:12

为了更清晰的查询，我会使用这样的东西：

SELECT ... FROM tickets as t  
JOIN ticket_text_fields as txt ON t.id = txt.ticket_id  
JOIN ticket_date_fields as dt ON t.id = dt.ticket_id  
WHERE txt.textfield_id IN (...)
AND dt.datefield_id IN (...)

连接可能会是左侧的，但这取决于数据的结构。
查询中没有 union，只有两个 join

for the clearer query i would use something like this:

SELECT ... FROM tickets as t  
JOIN ticket_text_fields as txt ON t.id = txt.ticket_id  
JOIN ticket_date_fields as dt ON t.id = dt.ticket_id  
WHERE txt.textfield_id IN (...)
AND dt.datefield_id IN (...)

Joins would be probably LEFT, but it depends on the structure of your data.
There is no union in the query and there are only two joins

回复收藏 0 原文

~没有更多了~