如何在 ruby 中对字母数字数组进行排序

发布于 2024-10-27 13:17:35 字数 640 浏览 6 评论 0原文

如何在 ruby 中按字母数字顺序对数组数据进行排序？

假设我的数组是 a = [test_0_1, test_0_2, test_0_3, test_0_4, test_0_5, test_0_6, test_0_7, test_0_8, test_0_9, test_1_0, test_1_1, test_1_2, test_1_3, test_1_4, test_1_5, test_1_6, test_1_7, test_1_8, 、测试_1_10、 test_1_11, test_1_12, test_1_13, test_1_14, ..........test_1_121.........................]

我希望我的输出是：

.
.
.
test_1_121
.
.
.
test_1_14
test_1_13
test_1_12
test_1_11
test_1_10
test_1_9
test_1_8
test_1_7
test_1_6
test_1_5
test_1_4
test_1_3
test_1_2
test_1_1
test_0_10
test_0_9
test_0_8
test_0_7
test_0_6
test_0_5
test_0_4
test_0_3
test_0_2
test_0_1

原文

How I can sort array data alphanumerically in ruby?

Suppose my array is a = [test_0_1, test_0_2, test_0_3, test_0_4, test_0_5, test_0_6, test_0_7, test_0_8, test_0_9, test_1_0, test_1_1, test_1_2, test_1_3, test_1_4, test_1_5, test_1_6, test_1_7, test_1_8, test_1_9, test_1_10, test_1_11, test_1_12, test_1_13, test_1_14, ...........test_1_121...............]

I want my output to be:

.
.
.
test_1_121
.
.
.
test_1_14
test_1_13
test_1_12
test_1_11
test_1_10
test_1_9
test_1_8
test_1_7
test_1_6
test_1_5
test_1_4
test_1_3
test_1_2
test_1_1
test_0_10
test_0_9
test_0_8
test_0_7
test_0_6
test_0_5
test_0_4
test_0_3
test_0_2
test_0_1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嘿哥们儿 2024-11-03 13:17:35

一种通用算法，用于对在任意位置包含非填充序列号的字符串进行排序。

padding = 4
list.sort{|a,b|
  a,b = [a,b].map{|s| s.gsub(/\d+/){|m| "0"*(padding - m.size) + m } }
  a<=>b
}

其中 padding 是您希望数字在比较期间具有的字段长度。字符串中找到的任何数字如果包含少于“填充”数量的数字，则在比较之前将用零填充，从而产生预期排序顺序。

要产生用户682932要求的结果，只需在排序块后添加.reverse，这会将自然排序（升序）翻转为降序。

通过对字符串进行预循环，您当然可以动态地找到字符串列表中的最大位数，您可以使用它来代替硬编码一些任意填充长度，但这将需要更多处理（更慢）和更多代码。例如

padding = list.reduce(0){|max,s| 
  x = s.scan(/\d+/).map{|m|m.size}.max
  (x||0) > max ? x : max
}

A generic algorithm for sorting strings that contain non-padded sequence numbers at arbitrary positions.

padding = 4
list.sort{|a,b|
  a,b = [a,b].map{|s| s.gsub(/\d+/){|m| "0"*(padding - m.size) + m } }
  a<=>b
}

where padding is the field length you want the numbers to have during comparison. Any number found in a string will be zero padded before comparison if it consists of less than "padding" number of digits, which yields the expected sorting order.

To yield the result asked for by the user682932, simply add .reverse after the sort block, which will flip the natural ordering (ascending) into a descending order.

With a pre-loop over the strings you can of course dynamically find the maximum number of digits in the list of strings, which you can use instead of hard-coding some arbitrary padding length, but that would require more processing (slower) and a bit more code. E.g.

padding = list.reduce(0){|max,s| 
  x = s.scan(/\d+/).map{|m|m.size}.max
  (x||0) > max ? x : max
}

回复收藏 0 原文

薔薇婲 2024-11-03 13:17:35

例如，如果您只是按字符串排序，则将无法获得“test_2”和“test_10”之间的正确顺序。所以这样做：

sort_by{|s| s.scan(/\d+/).map{|s| s.to_i}}.reverse

If you simply sort as string, you will not get the correct ordering between 'test_2' and 'test_10', for example. So do:

sort_by{|s| s.scan(/\d+/).map{|s| s.to_i}}.reverse

回复收藏 0 原文

过期情话 2024-11-03 13:17:35

您可以将块传递给排序函数以对其进行自定义排序。在您的情况下，您会遇到问题，因为您的数字没有用零填充，因此此方法对数字部分进行零填充，然后对它们进行排序，从而得到您想要的排序顺序。

a.sort { |a,b|
  ap = a.split('_')
  a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
  bp = b.split('_')
  b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
  b <=> a
}

You can pass a block to the sort function to custom sort it. In your case you will have a problem because your numbers aren't zero padded, so this method zero pads the numerical parts, then sorts them, resulting in your desired sort order.

a.sort { |a,b|
  ap = a.split('_')
  a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
  bp = b.split('_')
  b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
  b <=> a
}

回复收藏 0 原文

许你一世情深 2024-11-03 13:17:35

排序例程的处理时间可能差异很大。排序的基准测试变体可以快速找到最快的做事方式：

#!/usr/bin/env ruby

ary = %w[
    test_0_1  test_0_2   test_0_3 test_0_4 test_0_5  test_0_6  test_0_7
    test_0_8  test_0_9   test_1_0 test_1_1 test_1_2  test_1_3  test_1_4  test_1_5
    test_1_6  test_1_7   test_1_8 test_1_9 test_1_10 test_1_11 test_1_12 test_1_13
    test_1_14 test_1_121
]

require 'ap'
ap ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse

及其输出：

>> [
>>     [ 0] "test_1_121",
>>     [ 1] "test_1_14",
>>     [ 2] "test_1_13",
>>     [ 3] "test_1_12",
>>     [ 4] "test_1_11",
>>     [ 5] "test_1_10",
>>     [ 6] "test_1_9",
>>     [ 7] "test_1_8",
>>     [ 8] "test_1_7",
>>     [ 9] "test_1_6",
>>     [10] "test_1_5",
>>     [11] "test_1_4",
>>     [12] "test_1_3",
>>     [13] "test_1_2",
>>     [14] "test_1_1",
>>     [15] "test_1_0",
>>     [16] "test_0_9",
>>     [17] "test_0_8",
>>     [18] "test_0_7",
>>     [19] "test_0_6",
>>     [20] "test_0_5",
>>     [21] "test_0_4",
>>     [22] "test_0_3",
>>     [23] "test_0_2",
>>     [24] "test_0_1"
>> ]

测试算法的速度显示：

require 'benchmark'

n = 50_000
Benchmark.bm(8) do |x|
  x.report('sort1') { n.times { ary.sort { |a,b| b <=> a }         } }
  x.report('sort2') { n.times { ary.sort { |a,b| a <=> b }.reverse } }
  x.report('sort3') { n.times { ary.sort { |a,b|
                                  ap = a.split('_')
                                  a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
                                  bp = b.split('_')
                                  b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
                                  b <=> a
                                } } }

  x.report('sort_by1') { n.times { ary.sort_by { |s| s                                               }         } }
  x.report('sort_by2') { n.times { ary.sort_by { |s| s                                               }.reverse } }
  x.report('sort_by3') { n.times { ary.sort_by { |s| s.scan(/\d+/).map{ |s| s.to_i }                 }.reverse } }
  x.report('sort_by4') { n.times { ary.sort_by { |v| a = v.split(/_+/); [a[0], a[1].to_i, a[2].to_i] }.reverse } }
  x.report('sort_by5') { n.times { ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i]      }.reverse } }
end


>>               user     system      total        real
>> sort1     0.900000   0.010000   0.910000 (  0.919115)
>> sort2     0.880000   0.000000   0.880000 (  0.893920)
>> sort3    43.840000   0.070000  43.910000 ( 45.970928)
>> sort_by1  0.870000   0.010000   0.880000 (  1.077598)
>> sort_by2  0.820000   0.000000   0.820000 (  0.858309)
>> sort_by3  7.060000   0.020000   7.080000 (  7.623183)
>> sort_by4  6.800000   0.000000   6.800000 (  6.827472)
>> sort_by5  6.730000   0.000000   6.730000 (  6.762403)
>>

Sort1 和 sort2 以及 sort_by1 和 sort_by2 帮助建立 sort 的基线，sort_by 以及两个带有 reverse 的排序。

排序 sort3 和 sort_by3 是此页面上的另外两个答案。 Sort_by4 和 sort_by5 是我如何做的两个旋转，sort_by5 是我经过几分钟的修改后想出的最快的。

这显示了算法中的微小差异如何对最终输出产生影响。如果有更多的迭代，或者对更大的数组进行排序，则差异会更加极端。

Sort routines can have greatly varying processing times. Benchmarking variations of the sort can quickly home in on the fastest way to do things:

#!/usr/bin/env ruby

ary = %w[
    test_0_1  test_0_2   test_0_3 test_0_4 test_0_5  test_0_6  test_0_7
    test_0_8  test_0_9   test_1_0 test_1_1 test_1_2  test_1_3  test_1_4  test_1_5
    test_1_6  test_1_7   test_1_8 test_1_9 test_1_10 test_1_11 test_1_12 test_1_13
    test_1_14 test_1_121
]

require 'ap'
ap ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse

And its output:

>> [
>>     [ 0] "test_1_121",
>>     [ 1] "test_1_14",
>>     [ 2] "test_1_13",
>>     [ 3] "test_1_12",
>>     [ 4] "test_1_11",
>>     [ 5] "test_1_10",
>>     [ 6] "test_1_9",
>>     [ 7] "test_1_8",
>>     [ 8] "test_1_7",
>>     [ 9] "test_1_6",
>>     [10] "test_1_5",
>>     [11] "test_1_4",
>>     [12] "test_1_3",
>>     [13] "test_1_2",
>>     [14] "test_1_1",
>>     [15] "test_1_0",
>>     [16] "test_0_9",
>>     [17] "test_0_8",
>>     [18] "test_0_7",
>>     [19] "test_0_6",
>>     [20] "test_0_5",
>>     [21] "test_0_4",
>>     [22] "test_0_3",
>>     [23] "test_0_2",
>>     [24] "test_0_1"
>> ]

Testing the algorithms for speed shows:

require 'benchmark'

n = 50_000
Benchmark.bm(8) do |x|
  x.report('sort1') { n.times { ary.sort { |a,b| b <=> a }         } }
  x.report('sort2') { n.times { ary.sort { |a,b| a <=> b }.reverse } }
  x.report('sort3') { n.times { ary.sort { |a,b|
                                  ap = a.split('_')
                                  a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
                                  bp = b.split('_')
                                  b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
                                  b <=> a
                                } } }

  x.report('sort_by1') { n.times { ary.sort_by { |s| s                                               }         } }
  x.report('sort_by2') { n.times { ary.sort_by { |s| s                                               }.reverse } }
  x.report('sort_by3') { n.times { ary.sort_by { |s| s.scan(/\d+/).map{ |s| s.to_i }                 }.reverse } }
  x.report('sort_by4') { n.times { ary.sort_by { |v| a = v.split(/_+/); [a[0], a[1].to_i, a[2].to_i] }.reverse } }
  x.report('sort_by5') { n.times { ary.sort_by { |v| a,b,c = v.split(/_+/); [a, b.to_i, c.to_i]      }.reverse } }
end


>>               user     system      total        real
>> sort1     0.900000   0.010000   0.910000 (  0.919115)
>> sort2     0.880000   0.000000   0.880000 (  0.893920)
>> sort3    43.840000   0.070000  43.910000 ( 45.970928)
>> sort_by1  0.870000   0.010000   0.880000 (  1.077598)
>> sort_by2  0.820000   0.000000   0.820000 (  0.858309)
>> sort_by3  7.060000   0.020000   7.080000 (  7.623183)
>> sort_by4  6.800000   0.000000   6.800000 (  6.827472)
>> sort_by5  6.730000   0.000000   6.730000 (  6.762403)
>>

Sort1 and sort2 and sort_by1 and sort_by2 help establish baselines for sort, sort_by and both of those with reverse.

Sorts sort3 and sort_by3 are two other answers on this page. Sort_by4 and sort_by5 are two spins on how I'd do it, with sort_by5 being the fastest I came up with after a few minutes of tinkering.

This shows how minor differences in the algorithm can make a difference in the final output. If there were more iterations, or larger arrays being sorted the differences would be more extreme.

回复收藏 0 原文

誰認得朕 2024-11-03 13:17:35

与 @ctcherry 答案类似，但速度更快：

a.sort_by {|s| "%s%05i%05i" % s.split('_') }.reverse

编辑：我的测试：

require 'benchmark'
ary = []
100_000.times { ary << "test_#{rand(1000)}_#{rand(1000)}" }
ary.uniq!; puts "Size: #{ary.size}"

Benchmark.bm(5) do |x|
  x.report("sort1") do
    ary.sort_by {|e| "%s%05i%05i" % e.split('_') }.reverse
  end
  x.report("sort2") do
    ary.sort { |a,b|
      ap = a.split('_')
      a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
      bp = b.split('_')
      b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
      b <=> a
    } 
  end
  x.report("sort3") do
    ary.sort_by { |v| a, b, c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse
  end
end

输出：

Size: 95166

           user     system      total        real
sort1  3.401000   0.000000   3.401000 (  3.394194)
sort2 94.880000   0.624000  95.504000 ( 95.722475)
sort3  3.494000   0.000000   3.494000 (  3.501201)

Similar to @ctcherry answer, but faster:

a.sort_by {|s| "%s%05i%05i" % s.split('_') }.reverse

EDIT: My tests:

require 'benchmark'
ary = []
100_000.times { ary << "test_#{rand(1000)}_#{rand(1000)}" }
ary.uniq!; puts "Size: #{ary.size}"

Benchmark.bm(5) do |x|
  x.report("sort1") do
    ary.sort_by {|e| "%s%05i%05i" % e.split('_') }.reverse
  end
  x.report("sort2") do
    ary.sort { |a,b|
      ap = a.split('_')
      a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
      bp = b.split('_')
      b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
      b <=> a
    } 
  end
  x.report("sort3") do
    ary.sort_by { |v| a, b, c = v.split(/_+/); [a, b.to_i, c.to_i] }.reverse
  end
end

Output:

Size: 95166

           user     system      total        real
sort1  3.401000   0.000000   3.401000 (  3.394194)
sort2 94.880000   0.624000  95.504000 ( 95.722475)
sort3  3.494000   0.000000   3.494000 (  3.501201)

回复收藏 0 原文

沙与沫 2024-11-03 13:17:35

在此发布一种在 Ruby 中执行自然十进制排序的更通用方法。
以下内容受到我的“像 Xcode”排序代码的启发，来自 https://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/lib/xcodeproj/project/object/helpers/sort_helper.rb，本身松散地受到 https://rosettacode.org/wiki/Natural_sorting#Ruby。

即使很明显我们希望“10”位于“2”之后进行自然十进制排序，但对于所需的多种可能的替代行为，还需要考虑其他方面：

我们如何对待“001”/“01”等相等性：我们保持原始数组顺序还是有后备逻辑？（下面，选择在第一遍相等的情况下进行第二遍，并具有严格的排序逻辑）
我们是否忽略连续空格进行排序，还是每个空格字符都计数？（下面，选择在第一次传递时忽略连续空格，并对相等传递进行严格比较）
其他特殊字符的问题相同。（下面，选择将任何非空格和非数字字符单独计数）
我们是否忽略大小写？ “a”是在“A”之前还是之后？（下面，选择在第一次传递时忽略大小写，并且在相等传递中我们在“A”之前有“a”）

考虑到这些因素：

这意味着我们几乎肯定应该使用 scan 而不是split，因为我们可能会比较三种子字符串（数字、空格、所有其他）。
这意味着我们几乎肯定应该使用 Comparable 类和 def <=>(other)，因为不可能简单地 map > 每个子字符串到其他东西，根据上下文（第一遍和相等遍），这些东西将有两种不同的行为。

这会导致实现有点冗长，但它非常适合边缘情况：

  # Wrapper for a string that performs a natural decimal sort (alphanumeric).
  # @example
  #   arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }
  class NaturalSortString
    include Comparable
    attr_reader :str_fallback, :ints_and_strings, :ints_and_strings_fallback, :str_pattern

    def initialize(str)
      # fallback pass: case is inverted
      @str_fallback = str.swapcase
      # first pass: digits are used as integers, spaces are compacted, case is ignored
      @ints_and_strings = str.scan(/\d+|\s+|[^\d\s]+/).map do |s|
        case s
        when /\d/ then Integer(s, 10)
        when /\s/ then ' '
        else s.downcase
        end
      end
      # second pass: digits are inverted, case is inverted
      @ints_and_strings_fallback = @str_fallback.scan(/\d+|\D+/).map do |s|
        case s
        when /\d/ then Integer(s.reverse, 10)
        else s
        end
      end
      # comparing patterns
      @str_pattern = @ints_and_strings.map { |el| el.is_a?(Integer) ? :i : :s }.join
    end

    def <=>(other)
      if str_pattern.start_with?(other.str_pattern) || other.str_pattern.start_with?(str_pattern)
        compare = ints_and_strings <=> other.ints_and_strings
        if compare != 0
          # we sort naturally (literal ints, spaces simplified, case ignored)
          compare
        else
          # natural equality, we use the fallback sort (int reversed, case swapped)
          ints_and_strings_fallback <=> other.ints_and_strings_fallback
        end
      else
        # type mismatch, we sort alphabetically (case swapped)
        str_fallback <=> other.str_fallback
      end
    end
  end

使用

示例 1：

arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }

示例 2：

arrayOfFilenames.sort! do |x, y|
  NaturalSortString.new(x) <=> NaturalSortString.new(y)
end

您可以在 https://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/spec/project/object/helpers/sort_helper_spec.rb，我使用此参考进行订购：
[
'一',
'一',
'0.1.1',
'0.1.01',
'0.1.2',
'0.1.10',
'1',
'01',
'1a',
'2',
'2a',
'10',
'一个'，
'一个'，
'一',
'a 2',
'a1',
'A1B001',
'A01B1',
】

当然，现在就可以随意定制您自己的排序逻辑。

Posting here a more general way to perform a natural decimal sort in Ruby.
The following is inspired by my code for sorting "like Xcode" from https://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/lib/xcodeproj/project/object/helpers/sort_helper.rb, itself loosely inspired by https://rosettacode.org/wiki/Natural_sorting#Ruby.

Even if it's clear that we want "10" to be after "2" for a natural decimal sort, there are other aspects to consider with multiple possible alternative behaviors wanted:

How do we treat equality like "001"/"01": do we keep the original array order or do we have a fallback logic? (Below, choice is made to have a second pass with a strict ordering logic in case of equality during first pass)
Do we ignore consecutive spaces for sorting, or does each space character count? (Below, choice is made to ignore consecutive spaces on first pass, and have a strict comparison on the equality pass)
Same question for other special characters. (Below, choice is made to make any non-space and non-digit character count individually)
Do we ignore case or not; is "a" before or after "A"? (Below, choice is made to ignore case on first pass, and we have "a" before "A" on the equality pass)

With those considerations:

It means that we should almost certainly use scan instead of split, because we're going to have potentially three kinds of substrings to compare (digits, spaces, all-the-rest).
It means that we should almost certainly work with a Comparable class and with def <=>(other) because it's not possible to simply map each substring to something else that would have two distinct behaviors depending on context (the first pass and the equality pass).

This results in a bit lengthy implementation, but it works nicely for edge situations:

  # Wrapper for a string that performs a natural decimal sort (alphanumeric).
  # @example
  #   arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }
  class NaturalSortString
    include Comparable
    attr_reader :str_fallback, :ints_and_strings, :ints_and_strings_fallback, :str_pattern

    def initialize(str)
      # fallback pass: case is inverted
      @str_fallback = str.swapcase
      # first pass: digits are used as integers, spaces are compacted, case is ignored
      @ints_and_strings = str.scan(/\d+|\s+|[^\d\s]+/).map do |s|
        case s
        when /\d/ then Integer(s, 10)
        when /\s/ then ' '
        else s.downcase
        end
      end
      # second pass: digits are inverted, case is inverted
      @ints_and_strings_fallback = @str_fallback.scan(/\d+|\D+/).map do |s|
        case s
        when /\d/ then Integer(s.reverse, 10)
        else s
        end
      end
      # comparing patterns
      @str_pattern = @ints_and_strings.map { |el| el.is_a?(Integer) ? :i : :s }.join
    end

    def <=>(other)
      if str_pattern.start_with?(other.str_pattern) || other.str_pattern.start_with?(str_pattern)
        compare = ints_and_strings <=> other.ints_and_strings
        if compare != 0
          # we sort naturally (literal ints, spaces simplified, case ignored)
          compare
        else
          # natural equality, we use the fallback sort (int reversed, case swapped)
          ints_and_strings_fallback <=> other.ints_and_strings_fallback
        end
      else
        # type mismatch, we sort alphabetically (case swapped)
        str_fallback <=> other.str_fallback
      end
    end
  end

Usage

Example 1:

arrayOfFilenames.sort_by { |s| NaturalSortString.new(s) }

Example 2:

arrayOfFilenames.sort! do |x, y|
  NaturalSortString.new(x) <=> NaturalSortString.new(y)
end

You may find my test case at https://github.com/CocoaPods/Xcodeproj/blob/ca7b41deb38f43c14d066f62a55edcd53876cd07/spec/project/object/helpers/sort_helper_spec.rb, where I used this reference for ordering:
[
' a',
' a',
'0.1.1',
'0.1.01',
'0.1.2',
'0.1.10',
'1',
'01',
'1a',
'2',
'2 a',
'10',
'a',
'A',
'a ',
'a 2',
'a1',
'A1B001',
'A01B1',
]

Of course, feel free to customize your own sorting logic now.

回复收藏 0 原文

偷得浮生 2024-11-03 13:17:35

我检查了 Unix sort 函数的 Wikipedia 页面，该函数的 GNU 版本有一个 -V 标志，可以对“版本字符串”进行一般排序。（我认为这意味着数字和非数字的混合，您希望数字部分按数字排序，非数字部分按词法排序）。

文章指出：

GNU 实现有一个 -V --version-sort 选项，它是文本中（版本）数字的自然排序。要比较的两个文本字符串被分成字母块和数字块。字母块按字母数字进行比较，数字块按数字进行比较（即，跳过前导零，数字越多意味着越大，否则最左边不同的数字决定结果）。从左到右比较块，循环中的第一个不相等的块决定哪个文本更大。这恰好适用于 IP 地址、Debian 软件包版本字符串以及在字符串中嵌入可变长度数字的类似任务。

sawa 的解决方案有点像这样，但不按非数字部分排序。

因此，在 Coeur 和 sawa 的，其工作方式类似于 GNU sort -V

a.sort_by do |r|
  # Split the field into array of [<string>, nil] or [nil, <number>] pairs
  r.to_s.scan(/(\D+)|(\d+)/).map do |m|
    s,n = m
    n ? n.to_i : s.to_s # Convert number strings to integers
  end.to_a
end

在我的例子中，我想按这样的字段对 TSV 文件进行排序，所以作为奖励，这也是这种情况的脚本：（

require 'csv'

# Sorts a tab-delimited file input on STDIN, sortin

opts = {
  headers:true,
  col_sep: "\t",
  liberal_parsing: true,
}

table = CSV.new($stdin, **opts)


# Emulate unix's sort -V: split each field into an array of string or
# numeric values, and sort by those in turn. So for example, A10
# sorts above A100.
sorted_ary = table.sort_by do |r|
  r.fields.map do |f|
    # Split the field into array of [<string>, nil] or [nil, <number>] values
    f.to_s.scan(/(\D+)|(\d+)/).map do |m|
      s,n = m
      n ? n.to_i : s.to_s # Convert number strings to integers
    end.to_a
  end
end

puts CSV::Table.new(sorted_ary).to_csv(**opts)

旁白：另一个解决方案这里使用 Gem::Version 进行排序，但这似乎只适用于格式良好的 Gem 版本字符串。）

I checked the Wikipedia page for the Unix sort function, the GNU version of which has a -V flag which sorts "version strings" generically. (I take this to mean mixtures of digits and non-digits, where you want the numeric parts to be sorted numerically, and the non-numeric parts lexically).

The article states that:

The GNU implementation has a -V --version-sort option which is a natural sort of (version) numbers within text. Two text strings that are to be compared are split into blocks of letters and blocks of digits. Blocks of letters are compared alpha-numerically, and blocks of digits are compared numerically (i.e., skipping leading zeros, more digits means larger, otherwise the leftmost digits that differ determine the result). Blocks are compared left-to-right and the first non-equal block in that loop decides which text is larger. This happens to work for IP addresses, Debian package version strings and similar tasks where numbers of variable length are embedded in strings.

sawa's solution works somewhat like this but doesn't sort by non-numeric parts.

So it seems useful to post a solution somewhere between Coeur and sawa's, which works like GNU sort -V

a.sort_by do |r|
  # Split the field into array of [<string>, nil] or [nil, <number>] pairs
  r.to_s.scan(/(\D+)|(\d+)/).map do |m|
    s,n = m
    n ? n.to_i : s.to_s # Convert number strings to integers
  end.to_a
end

In my case, I wanted to sort a TSV file by its fields like this, so as a bonus, here's the script for this case too:

require 'csv'

# Sorts a tab-delimited file input on STDIN, sortin

opts = {
  headers:true,
  col_sep: "\t",
  liberal_parsing: true,
}

table = CSV.new($stdin, **opts)


# Emulate unix's sort -V: split each field into an array of string or
# numeric values, and sort by those in turn. So for example, A10
# sorts above A100.
sorted_ary = table.sort_by do |r|
  r.fields.map do |f|
    # Split the field into array of [<string>, nil] or [nil, <number>] values
    f.to_s.scan(/(\D+)|(\d+)/).map do |m|
      s,n = m
      n ? n.to_i : s.to_s # Convert number strings to integers
    end.to_a
  end
end

puts CSV::Table.new(sorted_ary).to_csv(**opts)

(Aside: another solution here sorts using Gem::Version, but that only seems to work with well formed Gem version strings.)

回复收藏 0 原文

紅太極 2024-11-03 13:17:35

从表面上看，您想要使用排序函数和/或反向函数。

ruby-1.9.2-p136 :009 > a = ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"]
 => ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"] 

ruby-1.9.2-p136 :010 > a.sort
 => ["abc_1", "abc_11", "abc_2", "abc_22", "abc_3"] 
ruby-1.9.2-p136 :011 > a.sort.reverse
 => ["abc_3", "abc_22", "abc_2", "abc_11", "abc_1"]

From the looks of it, you want to use the sort function and/or the reverse function.

ruby-1.9.2-p136 :009 > a = ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"]
 => ["abc_1", "abc_11", "abc_2", "abc_3", "abc_22"] 

ruby-1.9.2-p136 :010 > a.sort
 => ["abc_1", "abc_11", "abc_2", "abc_22", "abc_3"] 
ruby-1.9.2-p136 :011 > a.sort.reverse
 => ["abc_3", "abc_22", "abc_2", "abc_11", "abc_1"]

回复收藏 0 原文