如何在 SQL Server 中查找与给定字符串相似的字符串?

发布于 2024-12-23 01:58:10 字数 617 浏览 5 评论 0原文

我有一个 SQL Server 表,其中包含多个字符串列。我需要编写一个应用程序来获取字符串并在 SQL Server 表中搜索类似的字符串。

例如,如果我将“ ? ? ? ”或“ ? ? ? ”作为输入字符串,我应该从 SQL 表中获取这些:

1 - مختاری
2 - شهاب مختاری
3 - شهاب الدین مختاری

我已经在网上搜索了解决方案,但没有发现任何有用的东西。我已阅读这个问题,但这将不能帮助我,因为:

  1. 我使用的是 MS SQL Server 而不是 MySQL
  2. 我的表内容是波斯语,所以我不能使用 Levenshtein distance 和类似的方法
  3. 我更喜欢仅 SQL Server 的解决方案,而不是基于索引或守护程序的解决方案 解决方案。

最好的解决方案是帮助我们按相似性对结果进行排序的解决方案,但是,它是可选的。

您对此有什么建议吗?

谢谢

I have a SQL server table which contains several string columns. I need to write an application which gets a string and search for similar strings in SQL server table.

For example, if I give the "مختار" or "مختر" as input string, I should get these from SQL table:

1 - مختاری
2 - شهاب مختاری
3 - شهاب الدین مختاری

I've searched the net for a solution but I have found nothing useful. I've read this question , but this will not help me because:

  1. I am using MS SQL Server not MySQL
  2. my table contents are in Persian, so I can't use Levenshtein distance and similar methods
  3. I prefer an SQL Server only solution, not an indexing or daemon based solution.

The best solution would be a solution which help us sort result by similarity, but, its optional.

Do you have any suggestion for that?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

街角迷惘 2024-12-30 01:58:10

MSSQL 支持 LIKE ,这看起来应该可以工作。有什么原因它不适合您的程序吗?

SELECT * FROM table WHERE input LIKE '%مختار%'

MSSQL supports LIKE which seems like it should work. Is there a reason it's not suitable for your program?

SELECT * FROM table WHERE input LIKE '%مختار%'
冰魂雪魄 2024-12-30 01:58:10

嗯..考虑到您阅读了另一篇文章,您可能已经了解了 like 运算符...也许您的问题是“获取字符串并搜索类似的内容”?

--This part searches for a string you want

declare @MyString varchar(max)

set @MyString = (Select column from table
where **LOGIC TO FIND THE STRING GOES HERE**)


--This part searches for that string

select searchColumn, ABS(Len(searchColumn) - Len(@MyString)) as Similarity
from table where data LIKE '%' + @MyString + '%'
Order by Similarity, searchColumn

相似部分就像您发布的内容一样。如果字符串“更相似”,意味着它们具有相似的长度,则它们在结果查询中的位置会更高。
绝对部分显然可以避免,但我这样做是为了以防万一。

希望有帮助=-)

Hmm.. considering that you read the other post you probably know about the like operator already... maybe your problem is "getting the string and searching for something similar"?

--This part searches for a string you want

declare @MyString varchar(max)

set @MyString = (Select column from table
where **LOGIC TO FIND THE STRING GOES HERE**)


--This part searches for that string

select searchColumn, ABS(Len(searchColumn) - Len(@MyString)) as Similarity
from table where data LIKE '%' + @MyString + '%'
Order by Similarity, searchColumn

The similarity part is something like the thing you posted. If the strings are "more similar" meaning that they have a similar length, they will be higher on the results query.
The absolute part can be avoided obviously but I did it just in case.

Hope that helps =-)

默嘫て 2024-12-30 01:58:10

除了 like 运算符之外,您还可以使用条件 WHERE instr(columnname, search) > 0 ;然而,这通常比较慢。它的作用是返回一个字符串在另一个字符串中的起始位置。因此,如果在 ABCDEFG 中搜索 CD,它将返回 3. 3>0,因此将返回该记录。然而,在您所描述的情况下,似乎是最好的解决方案。

Besides like operator, you can use the condition WHERE instr(columnname, search) > 0; however this is generally slower. What it does is return the starting position of a string within another string. thus if searching in ABCDEFG for CD it would return 3. 3>0, so the record would be returned. However in the case you've described, like seems to be the best solution.

月棠 2024-12-30 01:58:10

一般的问题是,在相同的字母在单词的开头、中间和结尾有不同的书写形式,因此 - 不同的代码 - 我们可以尝试使用特定的波斯语排序规则,但一般来说这没有帮助。

第二个选项 - 是使用 SQL FTS 功能,但同样 - 如果它没有该语言的特殊语言模块 - 它的用处要小得多。

最通用的方法 - 使用您自己的语言处理 - 这是非常复杂的任务。接下来的关键词和google可以帮助理解问题的大小:DLP、单词和术语、bi-gramms、n-gramms、语法和形态变化

The general problem is that in languages where the same letter has different writing form in the beginning, middle and at the end of word, and thus - different codes - we can try to use specific Persian collations, but in general this will not help.

The second option - is to use SQL FTS abilities, but again - if it has not special language module for the language - it is much less useful.

And most general way - to use your own language processing - which is very complex task at all. The next keywords and google can help to understand the size of the problem: DLP, words and terms, bi-gramms, n-gramms, grammar and morphology inflection

后知后觉 2024-12-30 01:58:10

尝试使用内置 Soundex() 和 Difference() 函数。我希望它们在波斯语方面工作得很好。

看看下面的参考资料:
http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/

相似性() 函数帮助您按相似性对结果进行排序(正如您在问题中所问的那样),也可以使用与 Levenshtein 不同的算法,编辑距离取决于 @method 的值算法:

0 Levenshtein 编辑距离算法

1 Jaccard 相似系数算法

2 Jaro-Winkler 距离算法的一种形式

3 最长公共子序列算法

Try to use the Built-in Soundex() And Difference() functions. I hope they work fine for Persian.

Look at the following reference:
http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/

Similarity() function helps you to sort result by similarity (as you asked in your question) and it is also possible using algorithms different from Levenshtein edit distance depends on the Value for @method Algorithm:

0 The Levenshtein edit distance algorithm

1 The Jaccard similarity coefficient algorithm

2 A form of the Jaro-Winkler distance algorithm

3 Longest common subsequence algorithm

陌若浮生 2024-12-30 01:58:10

类似的操作员可能不会做他所要求的事情。例如,如果我的数据库记录中有一个记录值“请,我想问一个问题”。并且可以说,在我的查询中,我想找到像这样的匹配相似度“我可以问一个问题吗” .like 运算符可以使用 like %[your Senttence] 或 [your Senttence]% 来执行此操作,但不建议将其用于字符串相似性,因为句子可能会发生变化,并且所有 like 逻辑可能无法获取匹配的记录。使用朴素贝叶斯为句子进行相似性文本分类,为句子分配标签,或者您可以尝试 MSSQL 服务器中的语义搜索功能

Like operator may not do what he is asking for. Like for example, if i have a record value "please , i want to ask a question' in my database record. and lets say on my query, i want to find a match similarity like this 'Can i ask a question, please'. like operator may do this using like %[your senttence] or [your sentence]% but it is not advisable to use it for string similarity cos sentences may change and all your like logic may not fetch the matching records. It is advisable to use naive bayes text classification for similarities assigning labels to your sentences or you can try the semantic search function in MSSQL server

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文