在 Ruby/Rails 中使用特定排序规则对值进行排序

发布于 2024-10-27 03:37:15 字数 405 浏览 6 评论 0 原文

是否可以使用 Ruby 中的特定排序规则对值数组进行排序?我需要根据 da_DK 排序规则进行排序。

给定数组 %w(Aarhus Aalborg Assens) 我希望返回 ['Assens', 'Aalborg', 'Aarhus'] 这是丹麦语的正确顺序。

标准排序方法

%w(Aarhus Aalborg Assens).sort

返回看起来像 ascii 顺序的内容(至少不是丹麦顺序):

["Aalborg", "Aarhus", "Assens"]

环境是 Snow Leopard 和运行 ruby​​ 1.9.2 和 Rails 3.0.5 的 linux。

Is it possible to sort an array of values using a specific collation in Ruby? I have a need to sort according to the da_DK collation.

Given the array %w(Aarhus Aalborg Assens) I would like to have ['Assens', 'Aalborg', 'Aarhus'] back which is the correct order in Danish.

The standard sort method

%w(Aarhus Aalborg Assens).sort

returns something that looks like the ascii order (at least not the Danish order):

["Aalborg", "Aarhus", "Assens"]

The environment is both Snow Leopard and linux running ruby 1.9.2 and Rails 3.0.5.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

始终不够爱げ你 2024-11-03 03:37:15

根据维基百科

在丹麦语和挪威语字母表中,也存在与瑞典语相同的额外元音(见下文),但顺序不同且字形不同(...、X、Y、Z、Æ、Ø、Å) 。此外,“Aa”相当于“Å”。丹麦字母传统上将“W”视为“V”的变体,但今天“W”被认为是一个单独的字母。”

这会导致排序失败。

这样做可以解决问题:

names = %w(Aarhus Aalborg Assens)
names.sort_by { |w| w.gsub('Aa', 'Å') } # => ["Assens", "Aalborg", "Aarhus"]

对于其他字母也可以采取类似的做法具有复合字符组合以转换为单个字符。

这样做的原因是 sort_by 执行了 Schwartzian Transformation,所以它实际上是按从块返回的返回值进行排序,在本例中,是将名称中的“Aa”替换为“Å”。替换是临时的,当数组返回时会被丢弃。 sort_by

非常强大,但对于简单排序,您应该使用 sort,因为它对于比较两个简单值的排序更快。在对象的顶层,那么您是否应该使用 sort 还是 sort_by 就变得很麻烦了。如果您必须进行更复杂的计算或在对象中进行挖掘,那么<。 code>sort_by 可以证明更快。没有真正硬性的方法来知道哪个更好,因此,如果您必须对大型数组进行排序或处理对象,我强烈建议使用基准测试进行测试,因为差异可能很大,有时排序 可能是更好的选择。

编辑:

Ruby 本身不会做你想做的事,因为它不知道那里设置的每个字符的排序顺序。有一个关于合并 讨论 .org/" rel="noreferrer">IBM 的 ICU 解释了原因。如果你想要 ICU 的功能,你可以查看 ICU4R。我还没有玩过它,但这听起来像是 Ruby 中唯一真正的解决方案。

您也许可以使用 Postgres 这样的数据库做一些事情。它们支持各种排序规则选项,但通常会强制您在创建数据库时声明排序规则...或者可能是在创建表时...自从我创建新表以来已经有一段时间了。无论如何,这将是一个选择,尽管这会很痛苦。

According to Wikipedia:

In the Danish and Norwegian alphabets, the same extra vowels as in Swedish (see below) are also present but in a different order and with different glyphs (..., X, Y, Z, Æ, Ø, Å). Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but today "W" is considered a separate letter."

This would throw off sorting.

Do this to fix the problem:

names = %w(Aarhus Aalborg Assens)
names.sort_by { |w| w.gsub('Aa', 'Å') } # => ["Assens", "Aalborg", "Aarhus"]

and something similar for the other letters that have compound character combinations to convert to the single character.

The reason this works is sort_by does a Schwartzian Transformation, so it's actually sorting by the return value returned from the block, which, in this case, is the name with 'Aa' replaced with 'Å'. The replacement is temporary, and discarded when the array is sorted.

sort_by is very powerful, but does have some overhead. For a simple sort you should use sort because its faster. For sorts where you're comparing two simple values at the top level of an object then it becomes a wash whether you should use sort or sort_by. If you have to do more complex calculations or dig around in an object then sort_by can prove to be faster. There isn't a real hard-and-fast way to know which is better, so I strongly recommend testing with a benchmark if you have to sort large arrays or deal with objects because the difference can be large, and sometimes sort can be the better choice.

EDIT:

Ruby, by itself, isn't going to do what you want, because it has no knowledge of the sort order of every character set out there. There's a discussion regarding incorporating IBM's ICU that explains why that is. If you want ICU's abilities, you could look into ICU4R. I haven't played with it, but it sounds like your only real solution in Ruby.

You might be able to do something with a database like Postgres. They support various collating options but usually force you to declare the collation when you create the database... or maybe it's when the table is created... it's been a while since I created a new table. Anyway, that'd be an option, though it would be a pain.

娇俏 2024-11-03 03:37:15

我在 Github 上找到了 ffi-locale ,据我所知,它解决了我的问题。

它允许使用以下代码:

FFILocale::setlocale FFILocale::LC_COLLATE, 'da_DK.UTF-8'
%w(Aarhus Aalborg Assens).sort { |a,b| FFILocale::strcoll(a, b) }

返回正确的结果:

=> ["Assens", "Aalborg", "Aarhus"]

我还没有研究性能,但它调用本机代码,因此它应该比 Ruby 字符替换代码更快...

更新
它并不完美 :( 它在 Snow Leopard 上无法正常工作 - 似乎 strcoll 功能在 OS X 上被破坏了一段时间。这对我来说很烦人,但主要的部署平台是 linux - 它可以在其中工作- 所以这是我目前首选的解决方案。

I found the ffi-locale on Github and that solves my problem as far as I can see.

It allows the following code:

FFILocale::setlocale FFILocale::LC_COLLATE, 'da_DK.UTF-8'
%w(Aarhus Aalborg Assens).sort { |a,b| FFILocale::strcoll(a, b) }

Which returns the correct result:

=> ["Assens", "Aalborg", "Aarhus"]

I haven't investigated performance yet but it calls out to native code so it ought to be faster that Ruby character replacement code...

Update
It is not perfect :( It does not work properly on Snow Leopard - it seems that the strcoll function is broken on OS X and have been for some time. It is annoying to me but the main platform for deployment is linux - where it works - so it is my currently preferred solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文