如何在 SQL 中识别文本字段中的多个单词标签
我想要一个可以传递文本字符串的 SQL 过程,它将从关键字表中识别文本中的特定关键字(标签)。
到目前为止,我有以下内容,对于单个单词来说非常有效;
INSERT INTO #tags SELECT Word
FROM dbo.SplitWords('some colours are blue, green, red and light blue')
SELECT Word
FROM #tags
INTERSECT
SELECT Tag
FROM dbo.Tags
DROP TABLE #tags
如果我的标签表包含“绿色”、“红色”和“蓝色”条目,它们将按照您的预期返回。
我想知道的是如何最好地实现类似的结果,但对于多个单词的标签...例如,“浅蓝色”
我意识到上面代码的问题是我正在分割源文本分成单独的单词,所以“浅蓝色”永远不会匹配,但是我可以采取不同的路线,不会涉及光标等吗?
感谢您的帮助
刚刚意识到以下内容可以实现我所需要的
DECLARE @Text as nvarchar(max)
SELECT @Text = 'some colours are blue, green, red and light blue'
SELECT TagID,
Tag
FROM Tags
WHERE @Text LIKE '% ' + Tag + ' %'
ORDER BY Tag
但我担心如果我的表有几千行并且我正在搜索的文本字符串很长,它可能会效率低下。
有人对如何更有效地完成相同的过程有任何想法吗?
I would like to have a SQL procedure which can be passed a text string and it will identify specific keywords (tags) in the text from a table of keywords.
So far I have the following which works really nicely for single words;
INSERT INTO #tags SELECT Word
FROM dbo.SplitWords('some colours are blue, green, red and light blue')
SELECT Word
FROM #tags
INTERSECT
SELECT Tag
FROM dbo.Tags
DROP TABLE #tags
If my tags tables has entries for 'green', 'red' and 'blue' they are returned as you might expect.
What I would like to know is how best I can achieve a similar result but for tags which are multiple words... for example, 'light blue'
I realise the problem I have with the code above is that I am spliting the source text into individual words so 'light blue' would never match but is there a different route I could take which won't invovle cursors etc?
Thanks for your help
Just realized the following would achieve what I need
DECLARE @Text as nvarchar(max)
SELECT @Text = 'some colours are blue, green, red and light blue'
SELECT TagID,
Tag
FROM Tags
WHERE @Text LIKE '% ' + Tag + ' %'
ORDER BY Tag
But I'm worried it could be inefficient if my table has several thousand rows and the text string I'm searching is very long.
Anyone have any thoughts on how I could do the same process more efficiently?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查看我对 TSQL - A join using full-text CONTAINS 的回答
它利用了@Conrad 的想法,但让你超越了 CONTAINS 的变量限制
Check out my answer on TSQL - A join using full-text CONTAINS
It leverages @Conrad's idea, but gets you past the variable restriction of CONTAINS
好吧,我决定采用之前的解决方案,直到我有时间重新思考/等待它成为问题。
旧方法现在执行时间太长,迫使我寻找替代解决方案。
经过一些播放后,我发现的最佳解决方案(就执行时间的长度而言)是光标......
因此,我为我想要搜索文本的短语设置了一个光标,并快速通过它填充临时表与我匹配的短语 ID。
关键是使用 FAST_FORWARD 和 FORWARD_ONLY 设置来最大限度地提高性能。
完成后,我只需将临时表连接回数据库表即可返回我需要的短语的任何详细信息。
示例代码如下:
DECLARE @PageText nvarchar(max) -- 页面文本的参数
CREATE TABLE #Matches (PhraseID int)
DECLARE @PhraseID int
声明 @PhraseText nvarchar(100)
声明 curMatchingPhrases CURSOR FAST_FORWARD FORWARD_ONLY FOR
SELECT p.PhraseID,
p.短语文本
FROM 短语 p
打开 curMatchingPhrases
从 curMatchingPhrases 中获取下一个 INTO @PhraseID、@PhraseText
WHILE @@FETCH_STATUS = 0
BEGIN
END
CLOSE curMatchingPhrases
DEALLOCATE curMatchingPhrases
SELECT * FROM #Matches
DROP TABLE #Matches
我相信这里的其他人将能够找到更优雅的解决方案,但光标已将我的 6+sec SP 减少到 0 - 1sec,所以我很高兴目前。
莫乔
OK, I settled for my previous solution until I had time to have a re-think / wait for it to become a problem.
The old method is now taking too long to execute forcing me to find an alternative solution.
After some playing the best solution I have found (in terms of length of execution time) is to cursor it....
I therefore set up a cursor for the phrase I want to search the text for and blitz through it populating a temp table with the phrase IDs which match as I go.
Key thing to use FAST_FORWARD and FORWARD_ONLY settings to maximize performance.
Once done, I simply join my temp table back to my db tables to return whatever details of the phrases I need.
Example code below:
DECLARE @PageText nvarchar(max) -- parameter of page text
CREATE TABLE #Matches (PhraseID int)
DECLARE @PhraseID int
DECLARE @PhraseText nvarchar(100)
DECLARE curMatchingPhrases CURSOR FAST_FORWARD FORWARD_ONLY FOR
SELECT p.PhraseID,
p.PhraseText
FROM Phrases p
OPEN curMatchingPhrases
FETCH NEXT FROM curMatchingPhrases INTO @PhraseID, @PhraseText
WHILE @@FETCH_STATUS = 0
BEGIN
END
CLOSE curMatchingPhrases
DEALLOCATE curMatchingPhrases
SELECT * FROM #Matches
DROP TABLE #Matches
I'm sure others on here will be able to find more elegant solutions, but the cursor has reduced a 6+sec SP down to 0 - 1sec for me, so I'm happy for now.
Mojo