将相似名称转换为 SQL 中的规范形式

发布于 2025-01-09 05:21:32 字数 296 浏览 2 评论 0原文

我有一个书籍数据库。在某些情况下,作者可以使用其姓名的变体作为作者。

例如。 Albert Einstein 可以写成 Albert E. 或 A. Einstein

现在我需要创建一个方法,例如 SQL 数据库,并根据这种情况告诉我作者的 ID。

是否有内置方法可以将名称的多个变体转换为 SQL 语言(例如 PostgreSQL)中的单一类型?

或者我应该使用其他语言(例如 C、C++ 或 Python)更改名称,将名称转换为特定形式,然后发送到数据库?

我宁愿在 SQL 中使用方法而不是使用其他语言

I have a database of books. There can be situations where an author can use variations of their names as author.

For eg. Albert Einstein can be written as Albert E. or A. Einstein

Now I need to create a method such that SQL database and tell me the ID of the author based on such situations.

Is there a built in way to convert these multiple variations of a name into a single type within SQL language such as PostgreSQL?

Or Should I change the name using some another language such as C, C++ or Python to convert the name into a particular form and then send to database?

I would rather have a method within SQL instead of using some another language

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

左岸枫 2025-01-16 05:21:32

您可以将 PostgreSQL 全文搜索与 同义词词典。配置文件将包含

Albert E. : Albert Einstein
A. Einstein : Albert Einstein

You could use PostgreSQL full text search with a thesaurus dictionary. The configuration file would then contain

Albert E. : Albert Einstein
A. Einstein : Albert Einstein
夏の忆 2025-01-16 05:21:32

维基百科说:

Soundex 是一种语音算法,用于按英语发音对名称进行索引。目标是将同音词编码为相同的表示形式,以便尽管拼写存在微小差异,但它们仍可以匹配。[1]该算法主要对辅音进行编码;元音除非是第一个字母,否则不会被编码。 Soundex 是所有语音算法中最广为人知的算法(部分原因是它是流行数据库软件的标准功能,例如 DB2、PostgreSQL、[2] MySQL、[3] SQLite、[4] Ingres、MS SQL Server、[5] ] Oracle。[6] 和 SAP ASE。[7]) Soundex 的改进是许多现代语音算法的基础

https://en.wikipedia.org/wiki/Soundex

Wikipedia says:

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.[1] The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms (in part because it is a standard feature of popular database software such as DB2, PostgreSQL,[2] MySQL,[3] SQLite,[4] Ingres, MS SQL Server,[5] Oracle.[6] and SAP ASE.[7]) Improvements to Soundex are the basis for many modern phonetic algorithms

https://en.wikipedia.org/wiki/Soundex

温柔女人霸气范 2025-01-16 05:21:32

当执行这样的操作时:

select 
   id, 
   (Author) as Author,
   sum(case when length(name)>2 
            then 1 
            else 0 end) score
from 
   books 
   left join unnest(pg_catalog.string_to_array('Albert Einstein', ' ')) name
         on Author like '%'||name||'%'
group by id
order by score DESC

unnest() 从名称 Albert Einstein 返回部分 AlbertEinstein,并计算分数。分数越高,最有可能匹配。

它返回:

id作者分数
1Albert Einstein2
2A. Einstein1
3Albert E.1
4Anne Ellit0
6Arthur Eddington0
5Allan W. Eckert0

有关完整的 SQL,请参阅 DBFIDDLE

当然,此脚本还有改进的空间,因为像“Albert Camus”或“Siegfried Einstein”这样的作者在此代码中也获得了 1 分。

When doing something like this:

select 
   id, 
   (Author) as Author,
   sum(case when length(name)>2 
            then 1 
            else 0 end) score
from 
   books 
   left join unnest(pg_catalog.string_to_array('Albert Einstein', ' ')) name
         on Author like '%'||name||'%'
group by id
order by score DESC

The unnest() returns the parts Albert and Einstein from the name Albert Einstein, and a score s counted. Higher the scores are most likely a match.

It returns:

idauthorscore
1Albert Einstein2
2A. Einstein1
3Albert E.1
4Anne Ellit0
6Arthur Eddington0
5Allan W. Eckert0

For complete SQL, see DBFIDDLE

There is, of course, room for improvement in this script because an author like 'Albert Camus' or 'Siegfried Einstein' also get a score of 1 in this code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文