SQLite3 中带变音符号的排序顺序

发布于 2024-09-24 12:44:41 字数 219 浏览 10 评论 0原文

我正在使用 python/django 使用 sqlite 数据库。我需要对包含德语元音变音 (ä,ö,ü) 的结果进行正确排序(最后不使用元音变音)。在网上阅读不同的文章我什至不确定这是否可能。 因此,如有任何建议/说明,我们将不胜感激。我已经研究了 create_collat​​ion 等的文档,但我找不到任何对“初学者”有用的示例。此外,如果可能的话,我想知道如何对现有的表应用必要的修改!

I'm working with a sqlite database, using python/django. I would need to have my results, which contain german umlauts (ä,ö,ü) to be correctly sorted (not with the umlauts at the end). Reading different articles on the web I'm not even sure if it is possible or not.
So any advise/instructions on that are appreciated. I already studied the docs for create_collation etc. but I couldn't find any helpful examples for "beginners". Furthermore, if it is possible I'd like to know how to apply the necessary modifications on already existing tables!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

等风来 2024-10-01 12:44:41

所以任何建议/说明都是
赞赏。我已经研究过
create_collat​​ion 等的文档,但我
找不到任何有用的例子
“初学者”。

要使用 sqlite3 创建排序规则,您需要一个类似于 C 的 strcmp

def stricmp(str1, str2):
    str1 = str1.lower()
    str2 = str2.lower()
    if str1 == str2:
        return 0
    elif str1 < str2:
        return -1
    else:
        return 1

db = sqlite3.connect(':memory:')
# SQLite's default NOCASE collation is ASCII-only
# Override it with a (mostly) Unicode-aware version
db.create_collation('NOCASE', stricmp)

请注意,虽然此排序规则将正确处理 'ü' == 'Ü',但它仍然会具有 'ü' > 'v',因为字母在大小写折叠后仍然按 Unicode 代码点顺序排序。编写一个对德语友好的排序函数留给读者作为练习。或者更好的是,对于现有 Unicode 库的作者。

此外,如果可能的话我会
想知道如何应用
已经进行了必要的修改
现有表!

仅当您的索引使用已覆盖的排序规则时,才需要修改数据库。 删除该索引并重新创建它。

请注意,任何具有 UNIQUE(或 PRIMARY KEY)约束的列都将具有隐式索引。

So any advise/instructions on that are
appreciated. I already studied the
docs for create_collation etc. but I
couldn't find any helpful examples for
"beginners".

To create a collation with sqlite3, you need a function that works like C's strcmp.

def stricmp(str1, str2):
    str1 = str1.lower()
    str2 = str2.lower()
    if str1 == str2:
        return 0
    elif str1 < str2:
        return -1
    else:
        return 1

db = sqlite3.connect(':memory:')
# SQLite's default NOCASE collation is ASCII-only
# Override it with a (mostly) Unicode-aware version
db.create_collation('NOCASE', stricmp)

Note that although this collation will correctly handle 'ü' == 'Ü', it will still have 'ü' > 'v', because the letters still sort in Unicode code point order after case folding. Writing a German-friendly collation function is left as an exercise to the reader. Or better, to the author of an existing Unicode library.

Furthermore, if it is possible I'd
like to know how to apply the
necessary modifications on already
existing tables!

You only need to modify the DB if you have an index that uses a collation you've overridden. Drop that index and re-create it.

Note that any column with a UNIQUE (or PRIMARY KEY) constraint will have an implicit index.

彼岸花ソ最美的依靠 2024-10-01 12:44:41

一年前,这里曾提出过类似的问题

正如该问题的OP所述,答案对你来说可能有点过分了。不过,我确实推荐 James Tauber 的 Unicode 排序算法。

他的网页上有一个例子:

from pyuca import Collator

c = Collator("allkeys.txt") 
sorted_words = sorted(words, key=c.sort_key)

A similar question was asked on here 1 year ago.

The answer may be overkill for you, as stated by the OP of that question. I do, however, recommend James Tauber's Unicode Collation Algorithm.

An example is right on his webpage:

from pyuca import Collator

c = Collator("allkeys.txt") 
sorted_words = sorted(words, key=c.sort_key)
我也只是我 2024-10-01 12:44:41
import locale
locale.setlocale(locale.LC_ALL, 'cs_CZ.utf8')
con = sqlite3.connect(....)
con.create_collation("czech", locale.strcoll)
cur = con.cursor()
cur.execute("select name from people order by name collate czech")
import locale
locale.setlocale(locale.LC_ALL, 'cs_CZ.utf8')
con = sqlite3.connect(....)
con.create_collation("czech", locale.strcoll)
cur = con.cursor()
cur.execute("select name from people order by name collate czech")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文