SQL查询匹配关键字?
我有一个表,其中有一列为 nvarchar(max)
,其中包含从 Word 文档中提取的文本。如何创建一个选择查询,将另一个关键字列表作为参数传递,并返回按匹配数排序的行?
也许可以通过全文搜索来实现?
I have a table with a column as nvarchar(max)
with text extracted from word documents in it. How can I create a select query that I'll pass another a list of keywords as parameter and return the rows ordered by the number of matches?
Maybe it is possible with full text search?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,可以通过全文搜索,并且可能是最佳答案。对于直接的 T-SQL 解决方案,您可以使用拆分函数和联接,例如假设一个名为 dbo.Numbers 的数字表(您可能需要决定不同的上限):
以及使用该数字表的拆分函数:
然后你可以简单地说:
作为一个过程:
然后你可以传入
@List
(例如EXEC dbo.Search @List = N'foo,bar,splunge'
) 来自 C#。这不会非常快,但我确信它会比将所有数据提取到 C# 中并手动进行双嵌套循环更快。
Yes, possible with full text search, and likely the best answer. For a straight T-SQL solution, you could use a split function and join, e.g. assuming a table of numbers called dbo.Numbers (you may need to decide on a different upper limit):
And a splitting function that uses that table of numbers:
Then you can simply say:
As a procedure:
Then you can just pass in
@List
(e.g.EXEC dbo.Search @List = N'foo,bar,splunge'
) from C#.This won't be super fast, but I'm sure it will be quicker than pulling all the data out into C# and double-nested loop it manually.
如何...返回按[全文]匹配数排序的行
我自己没有使用过它,但相信 SQL Server 2008 支持 CONTAINSTABLE 匹配的加权可能对您有帮助:
如果您没有一个返回按点击次数加权的结果的引擎...
您可以编写一个接受两个输入并返回一个整数的 UDF:大文本值是第一个输入,而您正在寻找的单词第二个是逗号分隔的字符串。该函数返回一个整数,表示在文本中实际至少找到一次的不同查找单词的数量,或者查找查找单词的总次数。实施——如何衡量——取决于你。例如,您可能希望按照从最重要到最不重要的顺序排列查找的单词,并为重要的单词命中赋予比不太重要的单词命中更高的权重。
然后,您可以使用全文搜索引擎查找至少包含一个单词(您对它们进行 OR 运算)的所有记录,然后通过 UDF 标量函数运行此结果集:
how to ... return the rows ordered by the number of [full-text] matches
I have not used it myself but believe SQL Server 2008 supports weighting the CONTAINSTABLE matches which might be of help to you:
If you don't have an engine that returns results weighted by the number of hits ...
You could write a UDF that takes two inputs and returns an integer: the big textvalue is the first input and the words you're looking for as a comma-delimited string is the second. The function returns an integer representing either the number of distinct looked-for words that were actually found at least once in the text, or the total number of times the looked-for words were found. Implementation --how to weight-- is up to you. Maybe, for example, you'd want to arrange the looked-for words in most-important to least-important order, and give an important word hit more weight than a less important word hit.
You could then use your full text search engine to find all records that contain at least one of the words (you'd OR them), and you'd run this result set through your UDF scalar function: