帮助我更改这个单一的复杂查询以使用临时表
关于系统: - 有导师创建课程和包 - 遵循基于标签的搜索方法。当新导师注册和导师创建包时创建标签关系(这使得导师和包可搜索)。有关详细信息,请查看标签在该系统中如何工作部分?以下。
以下是相关查询 任何人都可以帮我建议一种使用临时表的方法。我们已经对所有相关字段建立了索引,看起来这是使用这种方法可能花费的最短时间:-
SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%Dictatorship%" )) AS key_1_total_matches
,
SUM(DISTINCT( t.tag LIKE "%democracy%"
OR tt.tag LIKE "%democracy%"
OR ttt.tag LIKE "%democracy%" )) AS key_2_total_matches
,
COUNT(DISTINCT( od.id_od )) AS
tutor_popularity,
CASE
WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1
AND wc.wc_type = 0
AND wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'classes_published'
,
CASE
WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'packs_published',
td . *,
u . *
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
LEFT JOIN learning_packs_tag_relations AS lptagrels
ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON lptagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses_tag_relations AS wtagrels
ON td.id_tutor = wtagrels.id_tutor
LEFT JOIN webclasses AS wc
ON wtagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
LEFT JOIN tutors_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
LEFT JOIN tags AS t
ON t.id_tag = ttagrels.id_tag
LEFT JOIN tags AS tt
ON tt.id_tag = lptagrels.id_tag
LEFT JOIN tags AS ttt
ON ttt.id_tag = wtagrels.id_tag
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND CASE
WHEN ( ( tt.id_tag = lptagrels.id_tag )
AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( ( ttt.id_tag = wtagrels.id_tag )
AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
AND ( t.tag LIKE "%Dictatorship%"
OR t.tag LIKE "%democracy%"
OR tt.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%democracy%"
OR ttt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%democracy%" )
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
AND key_2_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
问题
上述查询返回的结果是正确的(并且逻辑按预期工作),但是对于较重的数据,查询所花费的时间会惊人地增加,而对于我拥有的当前数据,查询所花费的时间大约为 10 秒,而正常查询时间为 0.005 - 0.0002 秒,这使得它完全无法使用。
有人在我的上一个问题中建议执行以下操作:
- 创建一个临时表并在此处插入可能最终出现在最终结果集中的所有相关数据,
- 在此表上运行多次更新,一次连接所需的表而不是全部他们同时
- 最终对该临时表执行查询以提取最终结果
所有这一切都是在存储过程中完成的,最终结果已经通过了单元测试,并且速度非常快。
到目前为止我从未使用过临时表。只有当我能得到一些提示、某种示意性表示,以便我可以从...开始时,
查询是否有问题? 执行时间超过 10 秒的原因是什么?
标签在该系统中如何工作?
- 当导师注册时,会输入标签并根据导师的详细信息(如姓名等)创建标签关系。
- 当导师创建包时,会再次输入标签并标记关系是根据包的详细信息(例如包名称、描述等)创建的。
- 存储在tutorials_tag_relations中的导师的标签关系以及存储在learning_packs_tag_relations中的包的标签关系。所有单独的标签都存储在标签表中。
About the system:
- There are tutors who create classes and packs
- A tags based search approach is being followed.Tag relations are created when new tutors register and when tutors create packs (this makes tutors and packs searcheable). For details please check the section How tags work in this system? below.
Following is the concerned query
Can anybody help me suggest an approach using temporary tables. We have indexed all the relevant fields and it looks like this is the least time possible with this approach:-
SELECT SUM(DISTINCT( t.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%Dictatorship%" )) AS key_1_total_matches
,
SUM(DISTINCT( t.tag LIKE "%democracy%"
OR tt.tag LIKE "%democracy%"
OR ttt.tag LIKE "%democracy%" )) AS key_2_total_matches
,
COUNT(DISTINCT( od.id_od )) AS
tutor_popularity,
CASE
WHEN ( IF(( wc.id_wc > 0 ), ( wc.wc_api_status = 1
AND wc.wc_type = 0
AND wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'classes_published'
,
CASE
WHEN ( IF(( lp.id_lp > 0 ), ( lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN ( 'INT' )
) ), 0)
) THEN 1
ELSE 0
END AS 'packs_published',
td . *,
u . *
FROM tutor_details AS td
JOIN users AS u
ON u.id_user = td.id_user
LEFT JOIN learning_packs_tag_relations AS lptagrels
ON td.id_tutor = lptagrels.id_tutor
LEFT JOIN learning_packs AS lp
ON lptagrels.id_lp = lp.id_lp
LEFT JOIN learning_packs_categories AS lpc
ON lpc.id_lp_cat = lp.id_lp_cat
LEFT JOIN learning_packs_categories AS lpcp
ON lpcp.id_lp_cat = lpc.id_parent
LEFT JOIN learning_pack_content AS lpct
ON ( lp.id_lp = lpct.id_lp )
LEFT JOIN webclasses_tag_relations AS wtagrels
ON td.id_tutor = wtagrels.id_tutor
LEFT JOIN webclasses AS wc
ON wtagrels.id_wc = wc.id_wc
LEFT JOIN learning_packs_categories AS wcc
ON wcc.id_lp_cat = wc.id_wp_cat
LEFT JOIN learning_packs_categories AS wccp
ON wccp.id_lp_cat = wcc.id_parent
LEFT JOIN order_details AS od
ON td.id_tutor = od.id_author
LEFT JOIN orders AS o
ON od.id_order = o.id_order
LEFT JOIN tutors_tag_relations AS ttagrels
ON td.id_tutor = ttagrels.id_tutor
LEFT JOIN tags AS t
ON t.id_tag = ttagrels.id_tag
LEFT JOIN tags AS tt
ON tt.id_tag = lptagrels.id_tag
LEFT JOIN tags AS ttt
ON ttt.id_tag = wtagrels.id_tag
WHERE ( u.country = 'IE'
OR u.country IN ( 'INT' ) )
AND CASE
WHEN ( ( tt.id_tag = lptagrels.id_tag )
AND ( lp.id_lp > 0 ) ) THEN lp.id_status = 1
AND lp.published = 1
AND lpcp.status = 1
AND ( lpcp.country_code = 'IE'
OR lpcp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( ( ttt.id_tag = wtagrels.id_tag )
AND ( wc.id_wc > 0 ) ) THEN wc.wc_api_status = 1
AND wc.wc_type = 0
AND
wc.class_date > '2010-06-01 22:00:56'
AND wccp.status = 1
AND ( wccp.country_code = 'IE'
OR wccp.country_code IN (
'INT'
) )
ELSE 1
END
AND CASE
WHEN ( od.id_od > 0 ) THEN od.id_author = td.id_tutor
AND o.order_status = 'paid'
AND CASE
WHEN ( od.id_wc > 0 ) THEN od.can_attend_class = 1
ELSE 1
END
ELSE 1
END
AND ( t.tag LIKE "%Dictatorship%"
OR t.tag LIKE "%democracy%"
OR tt.tag LIKE "%Dictatorship%"
OR tt.tag LIKE "%democracy%"
OR ttt.tag LIKE "%Dictatorship%"
OR ttt.tag LIKE "%democracy%" )
GROUP BY td.id_tutor
HAVING key_1_total_matches = 1
AND key_2_total_matches = 1
ORDER BY tutor_popularity DESC,
u.surname ASC,
u.name ASC
LIMIT 0, 20
The problem
The results returned by the above query are correct (AND logic working as per expectation), but the time taken by the query rises alarmingly for heavier data and for the current data I have it is like 10 seconds as against normal query timings of the order of 0.005 - 0.0002 seconds, which makes it totally unusable.
Somebody suggested in my previous question to do the following:-
- create a temporary table and insert here all relevant data that might end up in the final result set
- run several updates on this table, joining the required tables one at a time instead of all of them at the same time
- finally perform a query on this temporary table to extract the end result
All this was done in a stored procedure, the end result has passed unit tests, and is blazing fast.
I have never worked with temporary tables till now. Only if I could get some hints, kind of schematic representations so that I can start with...
Is there something faulty with the query?
What can be the reason behind 10+ seconds of execution time?
How tags work in this system?
- When a tutor registers, tags are entered and tag relations are created with respect to tutor's details like name, surname etc.
- When a Tutors create packs, again tags are entered and tag relations are created with respect to pack's details like pack name, description etc.
- tag relations for tutors stored in tutors_tag_relations and those for packs stored in learning_packs_tag_relations. All individual tags are stored in tags table.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
临时表并不是灵丹妙药。查询的根本问题在于这样的模式:
对 LIKE 比较的左侧使用通配符会保证无法使用索引。实际上,您正在扫描涉及的所有三个表...
您需要利用全文搜索,或者 MySQL 的原生 FTS 或 3rd 方东西,如 Sphinx。我所知道的所有 FTS 都包含一个评分/排名值,表明比赛的强度 - 算法细节可以阅读MySQL文档。但分数/排名与您得到的不一样:
SUM(DISTINCT LIKE...)
,您可以使用以下方法获得相同的结果:Temporary tables are not a silver bullet. The fundamental problem with your queries lies with patterns like this:
Wildcarding the left side of a LIKE comparison guarantees that an index can not be used. Effectively, you're table scanning all three tables involved...
You need to leverage Full Text Searching, either MySQL's native FTS or 3rd party stuff like Sphinx. All the FTS I've known include a scoring/rank value indicating the strength of the match - you can read the MySQL documentation for the algorithm details. But the score/rank is not the same as what you've got:
SUM(DISTINCT LIKE...)
, you could get the same using something like: