当前位置：文江博客话题详情

SQL查询匹配关键字？

发布于 2024-12-02 07:17:16 字数 130 浏览 0 评论 0原文

我有一个表，其中有一列为 nvarchar(max) ，其中包含从 Word 文档中提取的文本。如何创建一个选择查询，将另一个关键字列表作为参数传递，并返回按匹配数排序的行？

也许可以通过全文搜索来实现？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦幻之岛 2024-12-09 07:17:16

是的，可以通过全文搜索，并且可能是最佳答案。对于直接的 T-SQL 解决方案，您可以使用拆分函数和联接，例如假设一个名为 dbo.Numbers 的数字表（您可能需要决定不同的上限）：

SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 200000;

WITH n AS
(
    SELECT
        rn = ROW_NUMBER() OVER
        (ORDER BY s1.[object_id])
    FROM sys.objects AS s1
    CROSS JOIN sys.objects AS s2
    CROSS JOIN sys.objects AS s3
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n
WHERE rn <= @UpperLimit + 1;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

以及使用该数字表的拆分函数：

CREATE FUNCTION dbo.SplitStrings
(
    @List NVARCHAR(MAX)
)
RETURNS TABLE
AS
    RETURN
    (
        SELECT DISTINCT
            [Value] = LTRIM(RTRIM(
                SUBSTRING(@List, [Number],
                CHARINDEX(N',', @List + N',', [Number]) - [Number])))
        FROM
            dbo.Numbers
        WHERE
            Number <= LEN(@List)
            AND SUBSTRING(N',' + @List, [Number], 1) = N','
    );
GO

然后你可以简单地说：

SELECT key, NvarcharColumn /*, other cols */
FROM dbo.table AS outerT
WHERE EXISTS
(
  SELECT 1 
    FROM dbo.table AS t 
    INNER JOIN dbo.SplitStrings(N'list,of,words') AS s
    ON t.NvarcharColumn LIKE '%' + s.Item + '%'
    WHERE t.key = outerT.key
);

作为一个过程：

CREATE PROCEDURE dbo.Search
    @List NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT key, NvarcharColumn /*, other cols */
    FROM dbo.table AS outerT
    WHERE EXISTS
    (
      SELECT 1 
        FROM dbo.table AS t 
        INNER JOIN dbo.SplitStrings(@List) AS s
        ON t.NvarcharColumn LIKE '%' + s.Item + '%'
        WHERE t.key = outerT.key
    );
END
GO

然后你可以传入 @List （例如 EXEC dbo.Search @List = N'foo,bar,splunge' ) 来自 C#。

这不会非常快，但我确信它会比将所有数据提取到 C# 中并手动进行双嵌套循环更快。

Yes, possible with full text search, and likely the best answer. For a straight T-SQL solution, you could use a split function and join, e.g. assuming a table of numbers called dbo.Numbers (you may need to decide on a different upper limit):

SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 200000;

WITH n AS
(
    SELECT
        rn = ROW_NUMBER() OVER
        (ORDER BY s1.[object_id])
    FROM sys.objects AS s1
    CROSS JOIN sys.objects AS s2
    CROSS JOIN sys.objects AS s3
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n
WHERE rn <= @UpperLimit + 1;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

And a splitting function that uses that table of numbers:

CREATE FUNCTION dbo.SplitStrings
(
    @List NVARCHAR(MAX)
)
RETURNS TABLE
AS
    RETURN
    (
        SELECT DISTINCT
            [Value] = LTRIM(RTRIM(
                SUBSTRING(@List, [Number],
                CHARINDEX(N',', @List + N',', [Number]) - [Number])))
        FROM
            dbo.Numbers
        WHERE
            Number <= LEN(@List)
            AND SUBSTRING(N',' + @List, [Number], 1) = N','
    );
GO

Then you can simply say:

SELECT key, NvarcharColumn /*, other cols */
FROM dbo.table AS outerT
WHERE EXISTS
(
  SELECT 1 
    FROM dbo.table AS t 
    INNER JOIN dbo.SplitStrings(N'list,of,words') AS s
    ON t.NvarcharColumn LIKE '%' + s.Item + '%'
    WHERE t.key = outerT.key
);

As a procedure:

CREATE PROCEDURE dbo.Search
    @List NVARCHAR(MAX)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT key, NvarcharColumn /*, other cols */
    FROM dbo.table AS outerT
    WHERE EXISTS
    (
      SELECT 1 
        FROM dbo.table AS t 
        INNER JOIN dbo.SplitStrings(@List) AS s
        ON t.NvarcharColumn LIKE '%' + s.Item + '%'
        WHERE t.key = outerT.key
    );
END
GO

Then you can just pass in @List (e.g. EXEC dbo.Search @List = N'foo,bar,splunge') from C#.

This won't be super fast, but I'm sure it will be quicker than pulling all the data out into C# and double-nested loop it manually.

回复收藏 0 原文

熟人话多 2024-12-09 07:17:16

如何...返回按[全文]匹配数排序的行

我自己没有使用过它，但相信 SQL Server 2008 支持 CONTAINSTABLE 匹配的加权可能对您有帮助：

            http://msdn.microsoft.com/en-us/library/ms189760.aspx

如果您没有一个返回按点击次数加权的结果的引擎...

您可以编写一个接受两个输入并返回一个整数的 UDF：大文本值是第一个输入，而您正在寻找的单词第二个是逗号分隔的字符串。该函数返回一个整数，表示在文本中实际至少找到一次的不同查找单词的数量，或者查找查找单词的总次数。实施——如何衡量——取决于你。例如，您可能希望按照从最重要到最不重要的顺序排列查找的单词，并为重要的单词命中赋予比不太重要的单词命中更高的权重。

然后，您可以使用全文搜索引擎查找至少包含一个单词（您对它们进行 OR 运算）的所有记录，然后通过 UDF 标量函数运行此结果集：

             pseudo code

            select title,  weightfunction(summary, 'word1,word2,word3....wordN')
            from docs
            where summary contains ( word1 or word2 or word3 ... or wordN)
            order by weightfunction(summary, 'word1,word2,word3....wordN') desc

how to ... return the rows ordered by the number of [full-text] matches

I have not used it myself but believe SQL Server 2008 supports weighting the CONTAINSTABLE matches which might be of help to you:

            http://msdn.microsoft.com/en-us/library/ms189760.aspx

If you don't have an engine that returns results weighted by the number of hits ...

You could write a UDF that takes two inputs and returns an integer: the big textvalue is the first input and the words you're looking for as a comma-delimited string is the second. The function returns an integer representing either the number of distinct looked-for words that were actually found at least once in the text, or the total number of times the looked-for words were found. Implementation --how to weight-- is up to you. Maybe, for example, you'd want to arrange the looked-for words in most-important to least-important order, and give an important word hit more weight than a less important word hit.

You could then use your full text search engine to find all records that contain at least one of the words (you'd OR them), and you'd run this result set through your UDF scalar function:

             pseudo code

            select title,  weightfunction(summary, 'word1,word2,word3....wordN')
            from docs
            where summary contains ( word1 or word2 or word3 ... or wordN)
            order by weightfunction(summary, 'word1,word2,word3....wordN') desc

回复收藏 0 原文

~没有更多了~