FreeText 查询很慢 - 包括 TOP 和 Order By

发布于 2024-08-31 02:32:27 字数 762 浏览 11 评论 0原文

Product 表中有 700K 条记录。查询:

<代码> 选择前 1 个 ID, 姓名 来自产品 WHERE contains(Name, '“白色连衣裙”') ORDER BY DateMadeNew 描述

运行大约需要 1 分钟。 DateMadeNew 上有非聚集索引,Name 上有 FreeText 索引。

如果我删除 TOP 1 或 Order By - 运行时间不到 1 秒。

这是执行计划的链接。 http://screencast.com/t/ZDczMzg5N

看起来 FullTextMatch 的执行次数超过 40 万次。为什么会发生这种情况?怎样才能让它更快呢?

更新 5/3/2010

看来多字 FreeText 搜索的基数不正常:

优化器估计有 28K 条记录匹配“White Dress”,而实际上只有 1 条。 http://screencast.com/t/NjM3ZjE4NjAt

如果我将“白色连衣裙”替换为“白色” ,估计数量是“27,951”,而实际数量是“28,487”,这要好得多。

优化器似乎只使用搜索基数的短语中的第一个单词。

The Product table has 700K records in it. The query:


SELECT TOP 1 ID,
Name
FROM Product
WHERE contains(Name, '"White Dress"')
ORDER BY DateMadeNew desc

takes about 1 minute to run. There is an non-clustered index on DateMadeNew and FreeText index on Name.

If I remove TOP 1 or Order By - it takes less then 1 second to run.

Here is the link to execution plan.
http://screencast.com/t/ZDczMzg5N

Looks like FullTextMatch has over 400K executions. Why is this happening? How can it be made faster?

UPDATE 5/3/2010

Looks like cardinality is out of whack on multi word FreeText searches:

Optimizer estimates that there are 28K records matching 'White Dress', while in reality there is only 1.
http://screencast.com/t/NjM3ZjE4NjAt

If I replace 'White Dress' with 'White', estimated number is '27,951', while actual number is '28,487' which is a lot better.

It seems like Optimizer is using only the first word in phrase being searched for cardinality.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

合久必婚 2024-09-07 02:32:27

看起来 FullTextMatch 的执行次数超过 400K。为什么会发生这种情况?

由于您有一个与 TOP 1 组合的索引,优化器认为最好遍历索引,检查条目的每条记录。

怎样才能让它更快?

如果更新统计信息没有帮助,请尝试向您的查询添加提示:

SELECT  TOP 1 *
FROM    product pt
WHERE   CONTAINS(name, '"test1"')
ORDER BY
        datemadenew DESC
OPTION (HASH JOIN)

这将强制引擎使用 HASH JOIN 算法来连接您的表和全文查询的输出。

全文查询被视为返回由 FULLTEXT INDEX 定义中提供的 KEY INDEX 索引的值集的远程源。

更新

如果您的 ORM 使用参数化查询,您可以创建计划指南。

  • 使用 Profiler 拦截 ORM 发送的查询逐字
  • 使用提示在 SSMS 中生成正确的计划并将其保存为 XML code>
  • sp_create_plan_guideOPTION USE PLAN 结合使用,强制优化器始终使用此计划。

Looks like FullTextMatch has over 400K executions. Why is this happening?

Since you have an index combined with TOP 1, optimizer thinks that it will be better to traverse the index, checking each record for the entry.

How can it be made faster?

If updating the statistics does not help, try adding a hint to your query:

SELECT  TOP 1 *
FROM    product pt
WHERE   CONTAINS(name, '"test1"')
ORDER BY
        datemadenew DESC
OPTION (HASH JOIN)

This will force the engine to use a HASH JOIN algorithm to join your table and the output of the fulltext query.

Fulltext query is regarded as a remote source returning the set of values indexed by KEY INDEX provided in the FULLTEXT INDEX definition.

Update:

If your ORM uses parametrized queries, you can create a plan guide.

  • Use Profiler to intercept the query that the ORM sends verbatim
  • Generate a correct plan in SSMS using hints and save it as XML
  • Use sp_create_plan_guide with an OPTION USE PLAN to force the optimizer always use this plan.
泛滥成性 2024-09-07 02:32:27

编辑

来自http://technet.microsoft .com/en-us/library/cc721269.aspx#_Toc202506240

最重要的是
选择正确的连接类型
全文查询。基数
FulltextMatch STVF 的估计
对于正确的计划非常重要。
所以首先要检查的是
FulltextMatch 基数估计。
这是估计的点击数
在全文搜索的索引中
细绳。例如,在查询中
图 3 这应该接近
包含以下内容的文档数量
术语“词”。大多数情况下应该
非常准确,但如果估计
距离很远,你可以
制定糟糕的计划。估计为
单一条款通常非常好,
但估计多个项,例如
短语或 AND 查询更复杂
因为不可能知道什么
索引中术语的交集
将基于频率
索引中的术语。如果基数
估计是好的,计划是坏的
可能是由查询引起的
优化器成本模型。唯一的方法是
修复计划问题是使用查询
提示强制某种类型的连接
或优化。

因此,它根本无法从存储的信息中得知这两个搜索词是否可能是完全独立的,还是经常一起出现。也许您应该有 2 个单独的过程,一个用于单词查询,让优化器执行其操作,另一个用于多词过程,您强制执行“足够好”的计划(如果您不想要一个“足够好”的计划,sys.dm_fts_index_keywords 可能会有所帮助)一种尺寸适合所有计划)。

注意:您的单字程序可能需要使用“WITH RECOMPILE”选项来查看本文的这一部分。

在 SQL Server 2008 全文搜索中,我们能够更改根据所用搜索词的基数估计生成的计划。如果查询计划是固定的(因为它位于存储过程内的参数化查询中),则不会执行此步骤。因此,编译后的计划始终为该查询提供服务,即使该计划对于给定的搜索词并不理想。

原始答案

不过,您的新计划看起来仍然很糟糕。看起来它只从全文查询部分返回 1 行,但扫描了 Product 表中的所有 770159 行。

这表现如何?

CREATE TABLE #tempResults
(
ID int primary key,
Name varchar(200),
DateMadeNew datetime
)

INSERT INTO #tempResults
SELECT 
      ID, Name, DateMadeNew 
      FROM Product 
      WHERE contains(Name, '"White Dress"')


SELECT TOP 1
    *
    FROM #tempResults
    ORDER BY DateMadeNew desc

Edit

From http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506240

The most important thing is that the
correct join type is picked for
full-text query. Cardinality
estimation on the FulltextMatch STVF
is very important for the right plan.
So the first thing to check is the
FulltextMatch cardinality estimation.
This is the estimated number of hits
in the index for the full-text search
string. For example, in the query in
Figure 3 this should be close to the
number of documents containing the
term ‘word’. In most cases it should
be very accurate but if the estimate
was off by a long way, you could
generate bad plans. The estimation for
single terms is normally very good,
but estimating multiple terms such as
phrases or AND queries is more complex
since it is not possible to know what
the intersection of terms in the index
will be based on the frequency of the
terms in the index. If the cardinality
estimation is good, a bad plan
probably is caused by the query
optimizer cost model. The only way to
fix the plan issue is to use a query
hint to force a certain kind of join
or OPTIMIZE FOR.

So it simply cannot know from the information it stores whether the 2 search terms together are likely to be quite independent or commonly found together. Maybe you should have 2 separate procedures one for single word queries that you let the optimiser do its stuff on and one for multi word procedures that you force a "good enough" plan on (sys.dm_fts_index_keywords might help if you don't want a one size fits all plan).

NB: Your single word procedure would likely need the WITH RECOMPILE option looking at this bit of the article.

In SQL Server 2008 full-text search we have the ability to alter the plan that is generated based on a cardinality estimation of the search term used. If the query plan is fixed (as it is in a parameterized query inside a stored procedure), this step does not take place. Therefore, the compiled plan always serves this query, even if this plan is not ideal for a given search term.

Original Answer

Your new plan still looks pretty bad though. It looks like it is only returning 1 row from the full text query part but scanning all 770159 rows in the Product table.

How does this perform?

CREATE TABLE #tempResults
(
ID int primary key,
Name varchar(200),
DateMadeNew datetime
)

INSERT INTO #tempResults
SELECT 
      ID, Name, DateMadeNew 
      FROM Product 
      WHERE contains(Name, '"White Dress"')


SELECT TOP 1
    *
    FROM #tempResults
    ORDER BY DateMadeNew desc
浅暮の光 2024-09-07 02:32:27

我看不到链接的执行计划,网络警察正在阻止它,所以这只是一个猜测......

如果没有 TOPORDER BY 它运行得很快,尝试这样做:

SELECT TOP 1
    *
    FROM (SELECT 
              ID, Name, DateMadeNew 
              FROM Product 
              WHERE contains(Name, '"White Dress"')
         ) dt
    ORDER BY DateMadeNew desc

I can't see the linked execution plan, network police are blocking that, so this is just a guess...

if it is running fast without the TOP and ORDER BY, try doing this:

SELECT TOP 1
    *
    FROM (SELECT 
              ID, Name, DateMadeNew 
              FROM Product 
              WHERE contains(Name, '"White Dress"')
         ) dt
    ORDER BY DateMadeNew desc
堇色安年 2024-09-07 02:32:27

对此有几点想法:

1)您是否更新了“产品”表上的统计信息?查看操作的估计行数和实际行数也很有用。

2) 您使用的 SQL Server 版本是什么?我在使用 SQL Server 2008 时遇到了类似的问题,结果只是没有安装 Service Pack 1。安装 SP1 后,原本需要几分钟的 FreeText 查询(由于与实际情况相比有大量的实际执行)减少到只需要一秒钟。

A couple of thoughts on this one:

1) Have you updated the statistics on the Product table? It would be useful to see the estimates and actual number of rows on the operations there too.

2) What version of SQL Server are you using? I had a similar issue with SQL Server 2008 that turned out to be nothing more than not having Service Pack 1 installed. Install SP1 and a FreeText query that was taking a couple of minutes (due to a huge number of actual executions against actual) went down to taking a second.

别忘他 2024-09-07 02:32:27

我之前也遇到过同样的问题。

性能取决于您选择用于全文索引的唯一索引。
我的表格有两个独特的列 - IDarticle_number

查询:

select top 50 id, article_number, name, ... 
from ARTICLE 
CONTAINS(*,'"BLACK*" AND "WHITE*"')
ORDER BY ARTICLE_NUMBER

如果全文索引连接到ID,那么根据搜索的单词,它会很慢。
如果全文索引连接到 ARTICLE_NUMBER UNIQUE 索引,那么它总是很快。

I had the same problem earlier.

The performance depends on which unique index you choose for full text indexing.
My table has two unique columns - ID and article_number.

The query:

select top 50 id, article_number, name, ... 
from ARTICLE 
CONTAINS(*,'"BLACK*" AND "WHITE*"')
ORDER BY ARTICLE_NUMBER

If the full text index is connected to ID then it is slow depending on the searched words.
If the full text index is connected to ARTICLE_NUMBER UNIQUE index then it was always fast.

一萌ing 2024-09-07 02:32:27

我有更好的解决方案。

I. 让我们首先概述建议的解决方案,因为它们也可能在某些情况下使用:

  1. 选项(散列连接) - 不好,因为您可能会收到错误“由于提示,查询处理器无法生成查询计划重新提交查询,而不指定任何提示,也不使用 SET FORCEPLAN。”

  2. SELECT TOP 1 * FROM (ORIGINAL_SELECT) ORDER BY ... - 不好,当您需要使用 ORIGINAL_SELECT 的分页结果时

  3. sp_create_plan_guide - 不好好,至于使用 plan_guide 你必须保存特定 sql 语句的计划,这对于动态 sql 语句(例如由 ORM 生成)不起作用

II。我的解决方案包含两部分
1.用于全文搜索的自连接表
2. 使用 MS SQL HASH 连接提示 MSDN 连接提示

您的 SQL :

SELECT TOP 1 ID, Name FROM Product WHERE contains(Name, '"White Dress"') 
ORDER BY DateMadeNew desc

应重写为:

SELECT TOP 1 p.ID, p.Name FROM Product p INNER HASH JOIN Product fts ON fts.ID = p.ID
WHERE contains(fts.Name, '"White Dress"') 
ORDER BY p.DateMadeNew desc

如果您使用带/不带 Castle Active Records 的 NHibernate,我已在 post 如何编写拦截器来修改查询以用 INNER HASH JOIN 替换 INNER JOIN

I have better solution.

I. Let's first overview proposed solutions as they also may be used in some cases:

  1. OPTION (HASH JOIN) - is not good as you may get error "Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN."

  2. SELECT TOP 1 * FROM (ORIGINAL_SELECT) ORDER BY ... - is not good, when you need to use paginating results from you ORIGINAL_SELECT

  3. sp_create_plan_guide - is not good, as to use plan_guide you have to save plan for specific sql statement, this won't work for dynamic sql statements (e.g. generated by ORM)

II. My Solution contains of two parts
1. Self join table used for Full Text search
2. Use MS SQL HASH Join Hints MSDN Join Hints

Your SQL :

SELECT TOP 1 ID, Name FROM Product WHERE contains(Name, '"White Dress"') 
ORDER BY DateMadeNew desc

Should be rewritten as :

SELECT TOP 1 p.ID, p.Name FROM Product p INNER HASH JOIN Product fts ON fts.ID = p.ID
WHERE contains(fts.Name, '"White Dress"') 
ORDER BY p.DateMadeNew desc

If you are using NHibernate with/without Castle Active Records, I've replied in post how to write interceptor to modify your query to replace INNER JOIN by INNER HASH JOIN

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文