在 MySQL 中转换 UNION 查询
我有一个非常大的表(8GB),其中包含有关文件的信息,我需要针对它运行一个报告,看起来像这样:
(select * from fs_walk_scan where file_path like '\\\\server1\\groot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server1\\hroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server1\\iroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server2\\froot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server2\\groot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server3\\hroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server4\\iroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server5\\iroot$\\%' order by file_size desc limit 0,30)
[...]
order by substring_index(file_path,'\\',4), file_size desc
此方法完成了我需要做的事情:获取每个文件的 30 个最大文件的列表体积。 然而,这是极其缓慢的,并且“喜欢”的搜索是硬编码的,即使它们位于另一个表中并且可以通过这种方式获得。
我正在寻找一种方法来做到这一点,而无需多次浏览巨大的桌子。 有人有主意吗?
谢谢。
PS我无法以任何方式改变巨大的源表的结构。
更新:file_path 和 file_size 上有索引,但是每个子(?)查询仍然需要大约 10 分钟,而且我必须至少执行 22 分钟。
I have a very large table (8gb) with information about files, and i need to run a report against it that would would look something like this:
(select * from fs_walk_scan where file_path like '\\\\server1\\groot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server1\\hroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server1\\iroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server2\\froot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server2\\groot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server3\\hroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server4\\iroot$\\%' order by file_size desc limit 0,30)
UNION ALL
(select * from fs_walk_scan where file_path like '\\\\server5\\iroot$\\%' order by file_size desc limit 0,30)
[...]
order by substring_index(file_path,'\\',4), file_size desc
This method accomplishes what I need to do: Get a list of the 30 biggest files for each volume. However, this is deathly slow, and the 'like' searches are hardcoded even though they are sitting in another table and can be gotten that way.
What I'm looking for is a way to do this without going through the huge table several times. Anyone have any ideas?
Thanks.
P.S. I cant change the structure of the huge source table in any way.
Update: There are indexes on file_path and file_size, but each one of those sub(?)queries still takes about 10 mins, and I have to do 22 minimum.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以使用正则表达式:
否则,如果您可以修改表结构,请添加两列来保存服务器名称和基本路径(并对它们建立索引),以便您可以创建更简单的查询:
您可以设置一个触发器来初始化插入记录时的字段,或者随后进行批量更新以填充两个额外的列。
You could use a regexp:
Otherwise if you can modify your table structure, add two columns to hold the server name and base path (and index them), so that you can create a simpler query:
You can either set up a trigger to initialise the fields when you insert the record, or else do a bulk update afterwards to fill in the two extra columns.
该表上有什么类型的索引? 此索引:
CREATE INDEX fs_search_idx ON fs_walk_scan(file_path, file_size desc)
将显着加快此查询...如果您还没有类似的索引。
更新:
您说 file_path 和 file_size 上已经有索引......它们是单独的索引吗? 或者是否有一个索引将两列一起索引? 对于这个查询来说,差异将是巨大的。 即使有 22 个子查询,如果索引正确,速度也应该非常快。
What kind of indexes do you have on that table? This index:
CREATE INDEX fs_search_idx ON fs_walk_scan(file_path, file_size desc)
would speed this query up significantly... if you don't already have one like it.
Update:
You said there are already indexes on file_path and file_size... are they individual indexes? Or is there one single index with both columns indexed together? The difference would be huge for this query. Even with 22 subqueries, if indexed right, this should be blazing fast.
您可以使用分组和自加入来实现此目的。
它仍然是一个 O(n) 查询(n 是组数),但更灵活、更短。
编辑:
另一种方法是使用变量。 您的目的的可行性将取决于您将如何运行此查询。
我还没有尝试过这段代码,但是可以像这样使用变量的概念来达到您的目的。
You can use grouping and self join for achieving this.
It still is an O(n) query (n being number of groups) but is more flexible and shorter.
Edit:
Another approach is using variables. Feasibility for your purpose will depend on how you are going to run this query.
I haven't tried this code yet, but the concept of variables can be used like this for your purpose.
像这样的东西怎么样(尚未测试过,但看起来很接近):
这样您就可以对单个字段进行一对比较,这些比较通常与您所描述的内容相匹配。 也可以使用正则表达式,但我还没有这样做。
How about something like this (haven't tested it, but looks close):
This way you're doing a pair of comparisons on the individual field which will generically match what you've described. It may be possible to use a regex, too, but I've not done it.
试试这个。
您希望获取文件大小较大且文件路径相同的记录少于 30 条的每条记录。
编辑:
显然这表现得像狗一样。 那么...这个循环语法怎么样?
这里的想法是
1. 获取文件路径列表
2. 循环,对每个路径进行查询,这将获得 30 个最大的文件大小。
(我确实查阅了语法,但我对 MySQL 不太热衷,所以如果它不完全存在,请原谅。请随意编辑/评论)
Try this.
You want to get every record where there are fewer than 30 records with greater file size and the same file path.
Edit:
Apparently this performs like a dog. So... How about this looping syntax?
The idea here is to
1. get a list of the file paths
2. loop, doing a query for each path which will get the 30 largest file sizes.
(I did look up the syntax, but I'm not very hot on MySQL, so appologies if it's not quite there. Feel free to edit/comment)
你可以做这样的事情......假设 fs_list 有一个你的“LIKE”搜索列表:
You could do something like this... assuming fs_list has a list of your "LIKE" searches: