T-SQL 索引服务 SQL openquery 优化
场景:
我正在使用 T-SQL 存储过程 (Sql Server Management Studio) 返回使用 MS 索引服务的文本文档的搜索匹配项,并且此(简化的)查询:
SELECT *
FROM openquery(
filesystem2,
'SELECT
Path, Rank, Filename
FROM
SCOPE('' "e:\test\documents" '')
WHERE
CONTAINS('' FORMSOF (INFLECTIONAL, "test" ) '')
') b
此查询在几个小时内无法正常工作几天前。 虽然没有完全证实,但属性缓存和主索引之间的交互似乎无法正常工作,因为我可以通过以下任一方式找到所需的文档:
1)删除 SCOPE 参数(即仅使用“FROM SCOPE()”作为FROM 子句
2) 删除 WHERE 子句(并保持 SCOPE 函数不变)
因此,我可以仅通过内容或仅通过语言环境“查找”所需的文档,但不能同时使用两者。
一种选择是重新索引目录,但目前重新索引只是最后的选择。
话虽这么说,我重写了查询以排除指定的 SCOPE 并包含一个附加的 WHERE 子句:
SELECT *
FROM openquery(
filesystem2,
'SELECT
Path, Rank, Filename
FROM
SCOPE()
WHERE
CONTAINS('' FORMSOF (INFLECTIONAL, "test" ) '') and
Path like ''%e:\test\documents%''
') b
此查询在搜索时返回正确的文档。 但是,我担心使用 LIKE 关键字可能会影响性能。 因此,我研究了每个查询的执行计划,但它们完全相同......这告诉我两件事之一:
1)索引服务的查询组件以使它们相等的方式优化两个查询。
2)当没有引用数据库表时,查询分析器无法为远程查询提供准确的反馈。
问题(排名不分先后)。 有人对以下内容有任何见解吗?:
1)什么可能导致上述场景中描述的属性缓存和主索引之间的原始问题的行为?
2) 关于执行计划,
a) Would the Querying Component process/optimize both queries the same?
b) Can Sql Server Management Studio provide execution plan feedback for openquery queries that do not reference any DB tables?
3) 最后,哪个查询更高效/更快,为什么?
a) i.e. should I use the second one because it solves my problem?
谢谢你!
Scenario:
I am using a T-SQL stored proc (Sql Server Management Studio) to return search matches for text documents using the MS Indexing Service and this (simplified) query:
SELECT *
FROM openquery(
filesystem2,
'SELECT
Path, Rank, Filename
FROM
SCOPE('' "e:\test\documents" '')
WHERE
CONTAINS('' FORMSOF (INFLECTIONAL, "test" ) '')
') b
This query stopped working properly a few days ago. Though not fully substantiated, it seems that the interaction between the Property Cache and the Master Index is not working properly because I can find the desired documents by either,
1) removing the SCOPE parameter (i.e. just using "FROM SCOPE()" as the FROM clause
2) removing the WHERE clause (and keeping the SCOPE function as is)
So, I can "find" the desired documents by just content or by just locale, but not by using both together.
One option would be to reindex the catalog, but reindexing is, for now, only an option of last resort.
That being said, I rewrote the query to exclude the specified SCOPE and include an additional WHERE clause:
SELECT *
FROM openquery(
filesystem2,
'SELECT
Path, Rank, Filename
FROM
SCOPE()
WHERE
CONTAINS('' FORMSOF (INFLECTIONAL, "test" ) '') and
Path like ''%e:\test\documents%''
') b
This query returns the proper documents when searching. However, I was/am concerned about a potential performance hit using the LIKE keyword. So, I investigated each query's execution plan, but they were exactly the same...which tells me 1 of two things:
1) the Querying Component of the Indexing Service optimizes both queries in such a way as to make them equal.
2) The query analyzer does not provide accurate feedback for remote queries when no DB tables are referenced.
Questions (in no particular order). Does anyone have any insight into the following?:
1) What could cause the behavior of the original problem between the Property Cache and the Master Index described in the scenario above?
2) Regarding the Execution plan,
a) Would the Querying Component process/optimize both queries the same?
b) Can Sql Server Management Studio provide execution plan feedback for openquery queries that do not reference any DB tables?
3) Finally, Which query is more efficient/faster, and why?
a) i.e. should I use the second one because it solves my problem?
Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
null 值可能是一个问题。 我不确定这个确切的情况,但有时包含“
where xxx is not null
”可能会产生真正的影响。有时另一种选择是在打开查询之后在表上放置 where 条件
从 openquery(.....) 中选择 aaa, bbb,其中 aaa = zzz
。 (为了更好的风格,选择你需要的列而不是*。至于哪个更高效或更快,你可能必须用一个简单的计时过程包装查询,并自行判断是否不能使用 。
最后,只要您的查询有效并且不违反您为项目设置的任何标准,就可以使用它
null values might be a problem. I'm not sure about this exact case, but sometimes including "
where xxx is not null
" can make a real difference.Another option sometimes is to put where conditions on the table after the open query
select aaa, bbb from openquery(.....) where aaa = zzz
. (For better style, select the columns you need instead of *.As for which is more efficient or faster, you may have to wrap the query with a simple timing process and judge for yourself if you can't use the metrics provided by the SQL Management default messages.
In the end, as long as your query works and does not break any standards that you have set for your project, yes - use it.