Windows 桌面搜索 - SQL 慢得令人难以置信'%search%'
我正在尝试使用 SQL 查询 Windows 桌面搜索 API。
我不得不说我真的很讨厌 Windows 7 搜索 GUI,所以我决定编写自己的。我有很多索引文件(大约 1.000.000),并且我想搜索名称。比如:显示每个包含“bunny”的名字。
但在这里我遇到了性能问题。搜索速度
SELECT "System.ItemPathDisplay"
FROM "SystemIndex"
WHERE System.FileName LIKE 'egon%'
确实很快。也是 %egon
的替代方案。但%egon%
需要永远。我不确定这是否是索引的本质(我知道可能性大大增加)或者我是否做错了什么。
问题是:
- windows索引只是一个大的SQL数据库,这是否正确?
- 如果是这样,我在哪里可以找到有关数据库结构(主键、索引)的准确信息。
如果我有的话,它基本上只是优化 SQL。
替代问题:有人知道快速 SQL 语句来查找名称中某处带有 egon 的所有文件吗?
编辑:为什么我不喜欢搜索 GUI
嗯,与 XP 相比,它只是不直观。如果你禁用狗并使用旧的 XP 界面,我可以创建一个搜索查询,例如:
- 所有超过 1 个月的文件
- 大于 10 MB
- 名称模式
*_homework_*.docx
在 Windows 7 中尝试此操作,无需“学习”语法。天哪,我不想只是为了查找一个文件而学习另一种语法。
另一个主要问题可能是我的搜索习惯。大多数时候我以某种方式知道文件名(或部分)并且只想要位置。如果您以这种方式使用搜索,则会遇到几个问题:
- 首先,您总是必须在其前面加上名称前缀:
- 然后文件夹名称布局很愚蠢(它是按父文件夹排序,而不是完整路径,我想想,因为.. tada ...参见下一点)
- 然后,更烦人的是,如果你有一个结果列表并且你试图对它们进行排序,那么它需要很长时间
现在我真的认为我的系统有一个错误。我尝试快速检查它,在一些平均大小的文件夹中搜索“test”,他找到了一些文件。然后我尝试对它们进行文件夹排序(以验证我的第二点),现在他只是在永远搜索......我的意思是真的,当我打字时,他试图找到“你好”这个词......哦,完成了 - 他找到大约 20 个文件。所以,现在,让我们尝试一些东西......好吧,现在看来他已经康复了......但是,仍然,为了我的口味放慢......
所以,关于搜索的咒骂已经足够了:-)
I am trying to query the windows desktop search API using SQL.
I have to say I really HATE the windows 7 search GUI, and so I decided to write my own. I have a lot of files indexed (approx 1.000.000), and I want to do a search for names. Something like: Show me every name which contains "bunny".
But here i run into a performance problem. Searching for
SELECT "System.ItemPathDisplay"
FROM "SystemIndex"
WHERE System.FileName LIKE 'egon%'
is really fast. Also the %egon
alternative. But %egon%
takes forever. I am not sure if it is in the nature of the index (I understand that the possibilities increase enormously) or if I am doing something wrong.
The question is:
- Is it correct that the windows index is only a big SQL database?
- If so, where can I find exact information about the structure of the DB (primary keys, indexes).
If I have that, its basically just optimizing SQL.
Alternative question: Does anybody knows a fast SQL statement to find all files with egon somewhere in the name.
Edit: Why I do not like the search GUI
Well, its just not intuitive, compared to XP. If you disable the dog and use the old XP interface, I could create a search query like:
- All files older than 1 month
- bigger than 10 MB
- name pattern
*_homework_*.docx
Try this in Windows 7 without "learning" the syntax. And hell, I do not want to learn another syntax just to find one file.
The other main problem are maybe my search habits. Most of the time I somehow know the file name (or parts) and simply want the location. And if you use the search this way you ran into several problem:
- First of all, you always have to prefix it with name:
- Then the folder name layout is stupid (it is ordering by parent folder, not full path, I think, because.. tada... see next point)
- Then, even more annoying, if you have a list of results and you try to sort them, it takes forever
And now I really think my system has a bug. I tried to quickly check it, searched in some average size folder for "test" and he found some files. Then I tried to sort them for folders (to verify my second point) and now he is just searching forever... I mean really, while I am typing he tries to find the word "hello"... oh, finished - he found approx 20 files. So, now, lets try something.... Ok, now it seems like he has recovered.. But still, to slow for my taste...
So, enough cursing about search :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看起来他们正在名称上构建索引,因此只要您指定了字符串的开头,它就可以使用索引,但如果没有指定,它就必须使用表扫描。
假设他们使用 Microsoft 的全文搜索引擎,然后尝试使用以下内容:
... WHERE system.filename CONTAINS 'egon'
基本上有两种选择:它将被拒绝为无效(即此 SQL 接口不支持其 FT 搜索扩展),否则它会快得多。
编辑:哎呀——语法应该是“contains(system.filename, 'egon')”。对此感到抱歉。
It looks like they're building an index on the name, so it can use the index as long as you've specified the beginning of the string, but if you haven't, it has to use a table scan.
Assuming they're using Microsoft's full-text search engine, then try using something like:
... WHERE system.filename CONTAINS 'egon'
There are basically two choices: it'll be rejected as invalid (i.e. this SQL interface doesn't support their F-T search extension) or else it'll be quite a bit faster.
EDIT:Oops -- the syntax should be "contains(system.filename, 'egon')". Sorry 'bout that.
也许尝试一下
Maybe try
这很慢,因为您无法使用索引。原因是您正在字符串中的任意位置而不是字符串的开头搜索匹配项,这意味着您必须扫描整个表以查找内容。
This is slow because you are unable to use an index. The reason is that you are searching for a match anwhere in the string rather than at the start of the string which means you must scan the entire table for the contents.