全文搜索:正在搜索干扰词
我有一个带有全文搜索索引的 SQL Server 2008 数据库。我已在非索引字列表中定义了非索引字“al”。然而,当我搜索任何带有关键字“al”的短语时,“al”一词仍然会出现在排名中。
这可能与我正在分解搜索词并重建它们有关。然后,我在多个字段中搜索并对结果进行排名:http://pastebin.com/fdce11ff。此功能可将搜索分解
'al hamra'
为
("*al*" ~ "*hamra*") OR ("*al*" OR "*hamra*")
全文搜索。
想象一下这个场景:
名称: 阿尔·哈姆拉,作者:杰克·布朗,类型:小说 Al Karawan,作者:Al Hanz,类型:浪漫
现在,搜索 'al hamra' 将返回 'Al Karawan',尽管 ' al' 已在非索引字列表中。这是为什么呢?我认为非索引字表会导致单词失去权重?
I have a database in SQL Server 2008 with Full Text Search indexes. I have defined the Stopword 'al' in the Stoplist. However, when I search for any phrase with the keyword 'al', the word 'al' is still uesd in ranking.
This might be related to the fact that I am breaking up search terms, and reconstructing them. I am then searching across multiple fields and ranking the results: http://pastebin.com/fdce11ff. This functions to break up a search
'al hamra'
into
("*al*" ~ "*hamra*") OR ("*al*" OR "*hamra*")
for the Full Text Search.
Imagine this scenario:
Name:
Al Hamra, Author: Jack Brown, Genre: Fiction
Al Karawan, Author: Al Hanz, Genre: Romance
Now a search for 'al hamra' will return 'Al Karawan', in spite of the fact that 'al' is in the stoplist. Why is this? I thought stoplists would cause words to lose their weightage?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
干扰词特定于代码页;你把它添加到正确的了吗?您可以使用 sys.dm_fts_parser 来测试它(如下)可能比在代码中手动断词效果更好(或不是)。
假设您使用的是代码页 1033。如果您的干扰词位于您期望的代码页中,那么它应该在列表中显示为干扰词。
Noise words are specific to code pages; have you added it to the right one? You can use sys.dm_fts_parser to test it (below) this also might work better than your manual word breaking in the code (or not).
Assuming you are using code page 1033. If your noise word is in the code page you expect then it should be visible as a noiseword in the list.