将相似名称转换为 SQL 中的规范形式
我有一个书籍数据库。在某些情况下,作者可以使用其姓名的变体作为作者。
例如。 Albert Einstein 可以写成 Albert E. 或 A. Einstein
现在我需要创建一个方法,例如 SQL 数据库,并根据这种情况告诉我作者的 ID。
是否有内置方法可以将名称的多个变体转换为 SQL 语言(例如 PostgreSQL)中的单一类型?
或者我应该使用其他语言(例如 C、C++ 或 Python)更改名称,将名称转换为特定形式,然后发送到数据库?
我宁愿在 SQL 中使用方法而不是使用其他语言
I have a database of books. There can be situations where an author can use variations of their names as author.
For eg. Albert Einstein can be written as Albert E. or A. Einstein
Now I need to create a method such that SQL database and tell me the ID of the author based on such situations.
Is there a built in way to convert these multiple variations of a name into a single type within SQL language such as PostgreSQL?
Or Should I change the name using some another language such as C, C++ or Python to convert the name into a particular form and then send to database?
I would rather have a method within SQL instead of using some another language
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以将 PostgreSQL 全文搜索与 同义词词典。配置文件将包含
You could use PostgreSQL full text search with a thesaurus dictionary. The configuration file would then contain
维基百科说:
Soundex 是一种语音算法,用于按英语发音对名称进行索引。目标是将同音词编码为相同的表示形式,以便尽管拼写存在微小差异,但它们仍可以匹配。[1]该算法主要对辅音进行编码;元音除非是第一个字母,否则不会被编码。 Soundex 是所有语音算法中最广为人知的算法(部分原因是它是流行数据库软件的标准功能,例如 DB2、PostgreSQL、[2] MySQL、[3] SQLite、[4] Ingres、MS SQL Server、[5] ] Oracle。[6] 和 SAP ASE。[7]) Soundex 的改进是许多现代语音算法的基础
https://en.wikipedia.org/wiki/Soundex
Wikipedia says:
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.[1] The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms (in part because it is a standard feature of popular database software such as DB2, PostgreSQL,[2] MySQL,[3] SQLite,[4] Ingres, MS SQL Server,[5] Oracle.[6] and SAP ASE.[7]) Improvements to Soundex are the basis for many modern phonetic algorithms
https://en.wikipedia.org/wiki/Soundex
当执行这样的操作时:
unnest()
从名称Albert Einstein
返回部分Albert
和Einstein
,并计算分数。分数越高,最有可能匹配。它返回:
有关完整的 SQL,请参阅 DBFIDDLE
当然,此脚本还有改进的空间,因为像“Albert Camus”或“Siegfried Einstein”这样的作者在此代码中也获得了 1 分。
When doing something like this:
The
unnest()
returns the partsAlbert
andEinstein
from the nameAlbert Einstein
, and a score s counted. Higher the scores are most likely a match.It returns:
For complete SQL, see DBFIDDLE
There is, of course, room for improvement in this script because an author like 'Albert Camus' or 'Siegfried Einstein' also get a score of 1 in this code.