如何在 SQL 中识别文本字段中的多个单词标签

发布于 2025-01-08 12:11:24 字数 813 浏览 0 评论 0原文

我想要一个可以传递文本字符串的 SQL 过程,它将从关键字表中识别文本中的特定关键字(标签)。

到目前为止,我有以下内容,对于单个单词来说非常有效;

INSERT INTO #tags SELECT Word 
FROM dbo.SplitWords('some colours are blue, green, red and light blue')

SELECT Word
FROM    #tags
INTERSECT
SELECT  Tag
FROM    dbo.Tags

DROP TABLE #tags

如果我的标签表包含“绿色”、“红色”和“蓝色”条目,它们将按照您的预期返回。

我想知道的是如何最好地实现类似的结果,但对于多个单词的标签...例如,“浅蓝色”

我意识到上面代码的问题是我正在分割源文本分成单独的单词,所以“浅蓝色”永远不会匹配,但是我可以采取不同的路线,不会涉及光标等吗?

感谢您的帮助


刚刚意识到以下内容可以实现我所需要的

DECLARE @Text as nvarchar(max)

SELECT @Text = 'some colours are blue, green, red and light blue'

SELECT  TagID, 
    Tag 
FROM    Tags 
WHERE   @Text LIKE '% ' + Tag + ' %' 
ORDER BY Tag

但我担心如果我的表有几千行并且我正在搜索的文本字符串很长,它可能会效率低下。

有人对如何更有效地完成相同的过程有任何想法吗?

I would like to have a SQL procedure which can be passed a text string and it will identify specific keywords (tags) in the text from a table of keywords.

So far I have the following which works really nicely for single words;

INSERT INTO #tags SELECT Word 
FROM dbo.SplitWords('some colours are blue, green, red and light blue')

SELECT Word
FROM    #tags
INTERSECT
SELECT  Tag
FROM    dbo.Tags

DROP TABLE #tags

If my tags tables has entries for 'green', 'red' and 'blue' they are returned as you might expect.

What I would like to know is how best I can achieve a similar result but for tags which are multiple words... for example, 'light blue'

I realise the problem I have with the code above is that I am spliting the source text into individual words so 'light blue' would never match but is there a different route I could take which won't invovle cursors etc?

Thanks for your help


Just realized the following would achieve what I need

DECLARE @Text as nvarchar(max)

SELECT @Text = 'some colours are blue, green, red and light blue'

SELECT  TagID, 
    Tag 
FROM    Tags 
WHERE   @Text LIKE '% ' + Tag + ' %' 
ORDER BY Tag

But I'm worried it could be inefficient if my table has several thousand rows and the text string I'm searching is very long.

Anyone have any thoughts on how I could do the same process more efficiently?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

睫毛上残留的泪 2025-01-15 12:11:24

查看我对 TSQL - A join using full-text CONTAINS 的回答

它利用了@Conrad 的想法,但让你超越了 CONTAINS 的变量限制

Check out my answer on TSQL - A join using full-text CONTAINS

It leverages @Conrad's idea, but gets you past the variable restriction of CONTAINS

爱殇璃 2025-01-15 12:11:24

好吧,我决定采用之前的解决方案,直到我有时间重新思考/等待它成为问题。
旧方法现在执行时间太长,迫使我寻找替代解决方案。

经过一些播放后,我发现的最佳解决方案(就执行时间的长度而言)是光标......

因此,我为我想要搜索文本的短语设置了一个光标,并快速通过它填充临时表与我匹配的短语 ID。
关键是使用 FAST_FORWARD 和 FORWARD_ONLY 设置来最大限度地提高性能。

完成后,我只需将临时表连接回数据库表即可返回我需要的短语的任何详细信息。

示例代码如下:

DECLARE @PageText nvarchar(max) -- 页面文本的参数

CREATE TABLE #Matches (PhraseID int)

DECLARE @PhraseID int
声明 @PhraseText nvarchar(100)

声明 curMatchingPhrases CURSOR FAST_FORWARD FORWARD_ONLY FOR
SELECT p.PhraseID,
p.短语文本
FROM 短语 p

打开 curMatchingPhrases

从 curMatchingPhrases 中获取下一个 INTO @PhraseID、@PhraseText

WHILE @@FETCH_STATUS = 0
BEGIN

IF EXISTS (SELECT 'match' WHERE @PageText LIKE '% ' + @PhraseText + ' %')
BEGIN
    INSERT #Matches SELECT  @PhraseID
    WHERE   @PhraseID NOT IN (SELECT PhraseID FROM #Matches)
END

FETCH NEXT FROM curMatchingPhrases INTO @PhraseID, @PhraseText 

END

CLOSE curMatchingPhrases
DEALLOCATE curMatchingPhrases

SELECT * FROM #Matches

DROP TABLE #Matches

我相信这里的其他人将能够找到更优雅的解决方案,但光标已将我的 6+sec SP 减少到 0 - 1sec,所以我很高兴目前。

莫乔

OK, I settled for my previous solution until I had time to have a re-think / wait for it to become a problem.
The old method is now taking too long to execute forcing me to find an alternative solution.

After some playing the best solution I have found (in terms of length of execution time) is to cursor it....

I therefore set up a cursor for the phrase I want to search the text for and blitz through it populating a temp table with the phrase IDs which match as I go.
Key thing to use FAST_FORWARD and FORWARD_ONLY settings to maximize performance.

Once done, I simply join my temp table back to my db tables to return whatever details of the phrases I need.

Example code below:

DECLARE @PageText nvarchar(max) -- parameter of page text

CREATE TABLE #Matches (PhraseID int)

DECLARE @PhraseID int
DECLARE @PhraseText nvarchar(100)

DECLARE curMatchingPhrases CURSOR FAST_FORWARD FORWARD_ONLY FOR
SELECT p.PhraseID,
p.PhraseText
FROM Phrases p

OPEN curMatchingPhrases

FETCH NEXT FROM curMatchingPhrases INTO @PhraseID, @PhraseText

WHILE @@FETCH_STATUS = 0
BEGIN

IF EXISTS (SELECT 'match' WHERE @PageText LIKE '% ' + @PhraseText + ' %')
BEGIN
    INSERT #Matches SELECT  @PhraseID
    WHERE   @PhraseID NOT IN (SELECT PhraseID FROM #Matches)
END

FETCH NEXT FROM curMatchingPhrases INTO @PhraseID, @PhraseText 

END

CLOSE curMatchingPhrases
DEALLOCATE curMatchingPhrases

SELECT * FROM #Matches

DROP TABLE #Matches

I'm sure others on here will be able to find more elegant solutions, but the cursor has reduced a 6+sec SP down to 0 - 1sec for me, so I'm happy for now.

Mojo

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文