MYSQL 三重联接性能帮助,复制到 Tmp 表
我正在对一个新闻网站进行查询,该网站将找到要在主页上显示的FeaturedContent。以这种方式标记的内容被标记为“FeaturedContent”,并按“homepage”在特色表中排序。我目前已获得所需的输出,但查询运行时间超过 3 秒,我需要缩短该时间。如何优化像下面这样的查询?
编辑:按照建议每分钟具体化一次视图,缩短至 0.4 秒:
SELECT f.position, s.item_id, s.item_type, s.title, s.caption, s.date
FROM live.search_all s
INNER JOIN live.tags t
ON s.item_id = t.item_id AND s.item_type = t.item_type AND t.tag = 'FeaturedContent'
LEFT OUTER JOIN live.featured f
ON s.item_id = f.item_id AND s.item_type = f.item_type AND f.feature_type = 'homepage'
ORDER BY position IS NULL, position ASC, date
这将按顺序返回所有主页功能,然后是按日期排序的其他特色内容。
解释看起来像这样:
|-id---|-select_type-|-table-|-type---|-possible_keys---------|-key--------|-key_len-|-ref---------------------------------------|-rows--|-Extra-------------------------------------------------------------|
|-1----|-SIMPLE------|-t2----|-ref----|-PRIMARY,tag_index-----|-tag_index--|-303-----|-const-------------------------------------|-2-----|-Using where; Using index; Using temporary; Using filesort;--------|
|-1----|-SIMPLE------|-t-----|-ref----|-PRIMARY---------------|-PRIMARY----|-4-------|-newswires.t2.id---------------------------|-1974--|-Using index-------------------------------------------------------|
|-1----|-SIMPLE------|-s-----|-eq_ref-|-PRIMARY, search_index-|-PRIMARY----|-124-----|-newswires.t.item_id,newswires.t.item_type-|-1-----|-------------------------------------------------------------------|
|-1----|-SIMPLE------|-f-----|-index--|-NULL------------------|-PRIMARY----|-190-----|-NULL--------------------------------------|-13----|-Using index-------------------------------------------------------|
配置文件如下:
|-Status---------------|-Time-----|
|-starting-------------|-0.000091-|
|-Opening tables-------|-0.000756-|
|-System lock----------|-0.000005-|
|-Table lock-----------|-0.000008-|
|-init-----------------|-0.000004-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000043-|
|-optimizing-----------|-0.000019-|
|-statistics-----------|-0.000127-|
|-preparing------------|-0.000023-|
|-Creating tmp table---|-0.001802-|
|-executing------------|-0.000001-|
|-Copying to tmp table-|-0.311445-|
|-Sorting result-------|-0.014819-|
|-Sending data---------|-0.000227-|
|-end------------------|-0.000002-|
|-removing tmp table---|-0.002010-|
|-end------------------|-0.000005-|
|-query end------------|-0.000001-|
|-freeing items--------|-0.000296-|
|-logging slow query---|-0.000001-|
|-cleaning up----------|-0.000007-|
我是阅读解释输出的新手,所以我不确定是否有更好的可用排序,或者可以采取任何相当简单的方法来加快速度。
search_all表是物化视图表,定期更新,而tags和Featured表是视图。这些视图不是可选的,也不能解决。
标签视图结合了标签和关系表,根据 item_type 和 item_id 返回标签列表,但其他视图都是一个表的简单视图。
编辑:对于物化视图,最大的瓶颈似乎是“复制到临时表”步骤。如果不排序输出,则需要 0.0025 秒(好多了!),但最终输出确实需要排序。有什么方法可以提高该步骤的性能或解决该问题吗?
抱歉,如果格式难以阅读,我是新手,不确定它是如何定期完成的。
感谢您的帮助!如果还需要什么,请告诉我!
编辑:桌子尺寸,供参考:
标签关系:197,411
标签:16,897
故事数:51,801
图片:28,383
视频:2,408
精选:13
I'm working on a query for a news site, which will find FeaturedContent for display on the main homepage. Content marked this way is tagged as 'FeaturedContent', and ordered in a featured table by 'homepage'. I currently have the desired output, but the query runs in over 3 seconds, which I need to cut down on. How does one optimize a query like the one which follows?
EDIT: Materialized the view every minute as suggested, down to .4 seconds:
SELECT f.position, s.item_id, s.item_type, s.title, s.caption, s.date
FROM live.search_all s
INNER JOIN live.tags t
ON s.item_id = t.item_id AND s.item_type = t.item_type AND t.tag = 'FeaturedContent'
LEFT OUTER JOIN live.featured f
ON s.item_id = f.item_id AND s.item_type = f.item_type AND f.feature_type = 'homepage'
ORDER BY position IS NULL, position ASC, date
This returns all the homepage features in order, followed by other featured content ordered by date.
The explain looks like this:
|-id---|-select_type-|-table-|-type---|-possible_keys---------|-key--------|-key_len-|-ref---------------------------------------|-rows--|-Extra-------------------------------------------------------------|
|-1----|-SIMPLE------|-t2----|-ref----|-PRIMARY,tag_index-----|-tag_index--|-303-----|-const-------------------------------------|-2-----|-Using where; Using index; Using temporary; Using filesort;--------|
|-1----|-SIMPLE------|-t-----|-ref----|-PRIMARY---------------|-PRIMARY----|-4-------|-newswires.t2.id---------------------------|-1974--|-Using index-------------------------------------------------------|
|-1----|-SIMPLE------|-s-----|-eq_ref-|-PRIMARY, search_index-|-PRIMARY----|-124-----|-newswires.t.item_id,newswires.t.item_type-|-1-----|-------------------------------------------------------------------|
|-1----|-SIMPLE------|-f-----|-index--|-NULL------------------|-PRIMARY----|-190-----|-NULL--------------------------------------|-13----|-Using index-------------------------------------------------------|
And the Profile is as follows:
|-Status---------------|-Time-----|
|-starting-------------|-0.000091-|
|-Opening tables-------|-0.000756-|
|-System lock----------|-0.000005-|
|-Table lock-----------|-0.000008-|
|-init-----------------|-0.000004-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000001-|
|-checking permissions-|-0.000043-|
|-optimizing-----------|-0.000019-|
|-statistics-----------|-0.000127-|
|-preparing------------|-0.000023-|
|-Creating tmp table---|-0.001802-|
|-executing------------|-0.000001-|
|-Copying to tmp table-|-0.311445-|
|-Sorting result-------|-0.014819-|
|-Sending data---------|-0.000227-|
|-end------------------|-0.000002-|
|-removing tmp table---|-0.002010-|
|-end------------------|-0.000005-|
|-query end------------|-0.000001-|
|-freeing items--------|-0.000296-|
|-logging slow query---|-0.000001-|
|-cleaning up----------|-0.000007-|
I'm new to reading the EXPLAIN output, so I'm unsure if I have a better ordering available, or anything rather simple that could be done to speed these along.
The search_all table is the materialized view table which is periodically updated, while the tags and featured tables are views. These views are not optional, and cannot be worked around.
The tags view combines tags and a relational table to get back a listing of tags according to item_type and item_id, but the other views are all simple views of one table.
EDIT: With the materialized view, the biggest bottleneck seems to be the 'Copying to temp table' step. Without ordering the output, it takes .0025 seconds (much better!) but the final output does need ordered. Is there any way to enhance the performance of that step, or work around it?
Sorry if the formatting is difficult to read, I'm new and unsure how it is regularly done.
Thanks for your help! If anything else is needed, please let me know!
EDIT: Table sizes, for reference:
Tag Relations: 197,411
Tags: 16,897
Stories: 51,801
Images: 28,383
Videos: 2,408
Featured: 13
我认为单独优化您的查询不会很有用。首先想到的是,连接子查询本身就是由 UNION 组成的,这本身就是性能的双重瓶颈。
如果您可以选择更改数据库结构,那么我建议将 3 个表
stories
、images
和videos
合并为一个表,如果它们看起来非常相似(为它们添加一个类型 ENUM('story', 'image', 'video')
)以区分记录;这将删除子查询和联合。此外,您对
故事
和视频
的看法似乎没有使用索引字段来过滤内容。您正在查询索引列吗?如果不知道完整的表结构和数据的重新分区,这是一个非常棘手的问题!
另一种选择不涉及对现有数据库进行修改(特别是如果它已经在生产中),可以将此信息“缓存”到另一个表中,该表将由 cron 作业定期刷新。
缓存可以在不同级别上完成,可以在完整查询上进行,也可以在其子部分上进行(独立视图,或将 3 个联合合并到单个缓存表中,等等)。
此选项的可行性取决于它是否可以接受显示稍微过时的数据,或不显示。对于数据的某些部分来说,这可能是可以接受的,这可能意味着您将仅缓存查询中涉及的表/视图的子集。
I think optimizing your query alone won't be very useful. First thoughts are that joining a subquery, itself made of UNIONs, is alone a double bottleneck for performance.
If you have the option to change your database structure, then I would suggest to merge the 3 tables
stories
,images
andvideos
into one, if they are, as it looks like, very similar (adding them atype ENUM('story', 'image', 'video')
) to differentiate the records; this would remove both the subquery and the union.Also, it looks like your views on
stories
andvideos
, are not using an indexed field to filter content. Are you querying an indexed column?It's a pretty tricky problem without knowing your full table structure and the repartition of your data!
Another option, which would not involve bringing modifications to your existing database (especially if it is already in production), would be to "cache" this information into another table, which would be periodically refreshed by a cron job.
The caching can be done at different levels, either on the full query, or on subparts of it (independent views, or the 3 unions merged into a single cache table, etc.)
The viability of this option depends on whether it is acceptable to display slightly outdated data, or not. It might be acceptable for just some parts of your data, which may imply that you will cache just a subset of the tables/views involved in the query.