将多个输入变体匹配到一个 SQL 行
我想知道在大量搜索之后如何使用标准 TSQL 将不同的输入变体匹配到一个 sql 行。场景如下:
我的 sql 行中有以下文本:I love
然后我有以下 3 个输入,所有这些输入都应返回与该行的匹配项:
“我想告诉你,我们都爱 StackOverflow”
“我完全爱上 StackOverflow”
“我真的爱 StackOverflow”
正如你所看到的,我已经加粗了匹配的原因,试图让你更清楚他们为什么匹配。 I'm
中的 I
也是有意匹配的,因此如果我们可以将其包含在匹配中就好了。
我考虑过使用以下 TSQL 分割输入字符串:
-- Create a space delimited string for testing
declare @str varchar(max)
select @str = 'I want to tell you we all love StackOverflow'
-- XML tag the string by replacing spaces with </x><x> tags
declare @xml xml
select @xml = cast('<x><![CDATA['+ replace(@str,' ',']]></x><x><![CDATA[') + ']]></x>' as xml)
-- Finally select values from nodes <x> and trim at the same time
select ltrim(rtrim(mynode.value('.[1]', 'nvarchar(12)'))) as Code
from (select @xml doc) xx
cross apply doc.nodes('/x') (mynode)
这使我将所有单词作为单独的行,但随后我无法弄清楚如何执行匹配这些单词的查询。
因此,任何来自这一点的帮助或任何所需的替代匹配方式将不胜感激!
更新:
@freefaller 向我指出了 RegEx 路线并创建了一个函数,我已经能够进一步前进,因此 +1 @freefaller,但是我现在需要知道如何将其实现查看我的所有表行而不是“我爱”的硬编码输入我现在有以下选择语句:
SELECT * FROM dbo.FindWordsInContext('i love','I want to tell you we all love StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I''m totally in love with StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I really love StackOverflow',30)
上面返回匹配的次数和匹配的字符串的上下文,因此上面的第一个选择返回:
Hits Context
1 ...I want to tell you we all love StackOv...
所以基于我们现在有的事实上面有人能告诉我如何让这个函数查看所有匹配的行,然后返回匹配的行吗?
I would like to know after much searching how I would match different variations of input to one sql row using standard TSQL. Here is the scenario:
I have in my sql row the following text: I love
I then have the following 3 inputs all of which should return a match to this row:
"I want to tell you we all love StackOverflow"
"I'm totally in love with StackOverflow"
"I really love StackOverflow"
As you can see I have bolded the reason for the match to try and make it clearer to you why they match. The I
in I'm
is deliberately matched too so it would be good if we could include that in matches.
I thought about splitting the input string which I done using the following TSQL:
-- Create a space delimited string for testing
declare @str varchar(max)
select @str = 'I want to tell you we all love StackOverflow'
-- XML tag the string by replacing spaces with </x><x> tags
declare @xml xml
select @xml = cast('<x><![CDATA['+ replace(@str,' ',']]></x><x><![CDATA[') + ']]></x>' as xml)
-- Finally select values from nodes <x> and trim at the same time
select ltrim(rtrim(mynode.value('.[1]', 'nvarchar(12)'))) as Code
from (select @xml doc) xx
cross apply doc.nodes('/x') (mynode)
This gets me all the words as separate rows but then I could not work out how to do the query for matching these.
Therefore any help from this point or any alternate ways of matching as required would be more than greatly appreciated!
UPDATE:
@freefaller pointed me to the RegEx route and creating a function I have been able to get a bit further forward, therefore +1 @freefaller, however I now need to know how I can get it to look at all my table rows rather than the hard-coded input of 'I love' I now have the following select statements:
SELECT * FROM dbo.FindWordsInContext('i love','I want to tell you we all love StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I''m totally in love with StackOverflow',30)
SELECT * FROM dbo.FindWordsInContext('i love','I really love StackOverflow',30)
The above returns me the number of times matched and the context of the string matched, therefore the first select above returns:
Hits Context
1 ...I want to tell you we all love StackOv...
So based on the fact we now have the above can anyone tell me how to make this function look at all of the rows for matches and then return the row/rows that have a match?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一种选择是通过 SQLCLR 对象使用正则表达式如此处所述。
我自己从未创建过 SQLCLR 对象,因此无法评价此方法的易用性。然而,我是正则表达式的忠实粉丝,并建议将它们用于大多数文本搜索/操作
编辑:作为对评论的回应,我没有 SQLCLR 的经验,但假设您可以使用它,类似于以下简单的未经测试的TSQL可能会起作用......
One option would be to use Regular Expressions via SQLCLR objects as explained here.
I have never myself created SQLCLR objects, so cannot comment on the ease of this method. I am however, a great fan of Regular Expressions and would recommend their use for most text search / manipulation
Edit: In response to the comment, I have no experience of SQLCLR, but assuming you get that working, something like the following simple untested TSQL might work...
我已经设法找到了自己问题的答案,所以我想我会在这里发帖,以防其他人将来有类似的要求。基本上,它依赖于 SQL-CLR 正则表达式功能,并且运行时对性能的影响最小。
首先在您的服务器上启用 SQL-CLR(如果尚未可用)(您需要是系统管理员):
然后您需要在 SQL 中创建程序集(不要忘记从
D:\SqlRegEx.dll< 更改您的路径/code> 并使用
SAFE
权限集,因为这是最严格和最安全的权限集,但这里不会详细介绍。):现在创建您将调用的实际函数:
最后并完成并回答我自己的问题,然后我们就可以运行以下 TSQL:
我希望这对将来的某个人提供简单的搜索选项有所帮助。
I have managed to come up with an answer to my own question so thought I thought I would post here in case anyone else has similar requirements in the future. Basically it relies upon the SQL-CLR regular expression functionality and runs with minimal impact to performance.
Firstly enable SQL-CLR on your server if not already available (you need to be sysadmin):
Then you will need to create the assembly in SQL (Don't forget to change your path from
D:\SqlRegEx.dll
and useSAFE
permission set as this is the most restrictive and safest set of permissions but won't go into detail here.) :Now create the actual function you will call:
Finally and to complete and answer my own question we can then run the following TSQL:
I hope this will help someone in what should be a simple search option in the future.