在 Ruby 中将字符串切成给定长度的块的最佳方法是什么?

发布于 2024-07-17 07:49:52 字数 850 浏览 6 评论 0原文

我一直在寻找一种优雅而有效的方法来将 Ruby 中的字符串分成给定长度的子字符串。

到目前为止,我能想到的最好的办法是:

def chunk(string, size)
  (0..(string.length-1)/size).map{|i|string[i*size,size]}
end

>> chunk("abcdef",3)
=> ["abc", "def"]
>> chunk("abcde",3)
=> ["abc", "de"]
>> chunk("abc",3)
=> ["abc"]
>> chunk("ab",3)
=> ["ab"]
>> chunk("",3)
=> []

您可能希望 chunk("", n) 返回 [""] 而不是 []。 如果是这样,只需将其添加为方法的第一行:

return [""] if string.empty?

您会推荐任何更好的解决方案吗?

编辑

感谢 Jeremy Ruten 提供了这个优雅而高效的解决方案:[编辑:效率不高!]

def chunk(string, size)
    string.scan(/.{1,#{size}}/)
end

编辑

string.scan 解决方案需要大约 60 秒将 512k 切成 1k 块 10000相比之下,原来基于切片的解决方案只需要 2.4 秒。

I have been looking for an elegant and efficient way to chunk a string into substrings of a given length in Ruby.

So far, the best I could come up with is this:

def chunk(string, size)
  (0..(string.length-1)/size).map{|i|string[i*size,size]}
end

>> chunk("abcdef",3)
=> ["abc", "def"]
>> chunk("abcde",3)
=> ["abc", "de"]
>> chunk("abc",3)
=> ["abc"]
>> chunk("ab",3)
=> ["ab"]
>> chunk("",3)
=> []

You might want chunk("", n) to return [""] instead of []. If so, just add this as the first line of the method:

return [""] if string.empty?

Would you recommend any better solution?

Edit

Thanks to Jeremy Ruten for this elegant and efficient solution: [edit: NOT efficient!]

def chunk(string, size)
    string.scan(/.{1,#{size}}/)
end

Edit

The string.scan solution takes about 60 seconds to chop 512k into 1k chunks 10000 times, compared with the original slice-based solution which only takes 2.4 seconds.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

月隐月明月朦胧 2024-07-24 07:49:52

使用String#scan

>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,3}/)
=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]

Use String#scan:

>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,4}/)
=> ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]
>> 'abcdefghijklmnopqrstuvwxyz'.scan(/.{1,3}/)
=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]
咽泪装欢 2024-07-24 07:49:52

这是另一种方法:

"abcdefghijklmnopqrstuvwxyz".chars.to_a.each_slice(3).to_a.map {|s| s.to_s }

或者,

"abcdefghijklmnopqrstuvwxyz".chars.each_slice(3).map(&:join)

或者:

=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]

Here is another way to do it:

"abcdefghijklmnopqrstuvwxyz".chars.to_a.each_slice(3).to_a.map {|s| s.to_s }

Or,

"abcdefghijklmnopqrstuvwxyz".chars.each_slice(3).map(&:join)

Either:

=> ["abc", "def", "ghi", "jkl", "mno", "pqr", "stu", "vwx", "yz"]
是你 2024-07-24 07:49:52

倍数,我认为这是最有效的解决方案

def chunk(string, size)
    (string.length / size).times.collect { |i| string[i * size, size] }
end

如果您知道您的字符串是块大小和部分的

def parts(string, count)
    size = string.length / count
    count.times.collect { |i| string[i * size, size] }
end

I think this is the most efficient solution if you know your string is a multiple of chunk size

def chunk(string, size)
    (string.length / size).times.collect { |i| string[i * size, size] }
end

and for parts

def parts(string, count)
    size = string.length / count
    count.times.collect { |i| string[i * size, size] }
end
腻橙味 2024-07-24 07:49:52

我做了一个小测试,将大约 593MB 的数据切成 18991 个 32KB 的块。
在我按下 ctrl+C 之前,您的 slice+map 版本使用 100% CPU 运行了至少 15 分钟。 这个版本使用 String#unpack 在 3.6 秒内完成:

def chunk(string, size)
  string.unpack("a#{size}" * (string.size/size.to_f).ceil)
end

I made a little test that chops about 593MB data into 18991 32KB pieces.
Your slice+map version ran for at least 15 minutes using 100% CPU before I pressed ctrl+C. This version using String#unpack finished in 3.6 seconds:

def chunk(string, size)
  string.unpack("a#{size}" * (string.size/size.to_f).ceil)
end
血之狂魔 2024-07-24 07:49:52

这是针对稍微不同的情况的另一种解决方案,在处理大字符串时并且不需要一次存储所有块。 通过这种方式,它一次存储单个块,并且执行速度比切片字符串快得多:

io = StringIO.new(string)
until io.eof?
  chunk = io.read(chunk_size)
  do_something(chunk)
end

Here is another one solution for slightly different case, when processing large strings and there is no need to store all chunks at a time. In this way it stores single chunk at a time and performs much faster than slicing strings:

io = StringIO.new(string)
until io.eof?
  chunk = io.read(chunk_size)
  do_something(chunk)
end
云淡月浅 2024-07-24 07:49:52
test.split(/(...)/).reject {|v| v.empty?}

拒绝是必要的,因为否则它包括组之间的空白。 我的 regex-fu 还不能立即解决这个问题。

test.split(/(...)/).reject {|v| v.empty?}

The reject is necessary because it otherwise includes the blank space between sets. My regex-fu isn't quite up to seeing how to fix that right off the top of my head.

凉薄对峙 2024-07-24 07:49:52

一个更好的解决方案,考虑到字符串的最后部分可能小于块大小:

def chunk(inStr, sz)  
  return [inStr] if inStr.length < sz  
  m = inStr.length % sz # this is the last part of the string
  partial = (inStr.length / sz).times.collect { |i| inStr[i * sz, sz] }
  partial << inStr[-m..-1] if (m % sz != 0) # add the last part 
  partial
end

A better solution which takes into account the last part of the string which could be less than the chunk size:

def chunk(inStr, sz)  
  return [inStr] if inStr.length < sz  
  m = inStr.length % sz # this is the last part of the string
  partial = (inStr.length / sz).times.collect { |i| inStr[i * sz, sz] }
  partial << inStr[-m..-1] if (m % sz != 0) # add the last part 
  partial
end
真心难拥有 2024-07-24 07:49:52

您还有其他一些限制吗? 否则我会非常想做一些简单的事情,比如

[0..10].each {
   str[(i*w),w]
}

Are there some other constraints you have in mind? Otherwise I'd be awfully tempted to do something simple like

[0..10].each {
   str[(i*w),w]
}
假装不在乎 2024-07-24 07:49:52

只需 text.scan(/.{1,4}/m) 即可解决问题

Just text.scan(/.{1,4}/m) resolves the problem

£烟消云散 2024-07-24 07:49:52

我个人遵循 user8556428 的想法,以避免大多数提案引入的昂贵的中间值,并避免修改输入字符串。 我希望能够将它用作生成器(例如使用 s.each_slice.with_index )。

我的用例实际上是关于字节,而不是字符。 对于字符大小,strscan 是一个很好的解决方案。

class String
    # Slices of fixed byte-length.  May cut multi-byte characters.
    def each_slice(n = 1000, &block)
        return if self.empty?

        if block_given?
            last = (self.length - 1) / n
            (0 .. last).each do |i|
                yield self.slice(i * n, n)
            end
        else
            enum_for(__method__, n)
        end
    end
end


p "abcdef".each_slice(3).to_a # => ["abc", "def"]   
p "abcde".each_slice(3).to_a  # => ["abc", "de"]    
p "abc".each_slice(3).to_a    # => ["abc"]          
p "ab".each_slice(3).to_a     # => ["ab"]           
p "".each_slice(3).to_a       # => []               

I personally followed the idea of user8556428, to avoid the costly intermediate values that most proposals introduce, and to avoid modifying the input string. And I want to be able to use it as a generator (for instance to use s.each_slice.with_index).

My use case is really about bytes, not characters. In the case of character-size, strscan is a great solution.

class String
    # Slices of fixed byte-length.  May cut multi-byte characters.
    def each_slice(n = 1000, &block)
        return if self.empty?

        if block_given?
            last = (self.length - 1) / n
            (0 .. last).each do |i|
                yield self.slice(i * n, n)
            end
        else
            enum_for(__method__, n)
        end
    end
end


p "abcdef".each_slice(3).to_a # => ["abc", "def"]   
p "abcde".each_slice(3).to_a  # => ["abc", "de"]    
p "abc".each_slice(3).to_a    # => ["abc"]          
p "ab".each_slice(3).to_a     # => ["ab"]           
p "".each_slice(3).to_a       # => []               
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文