在 Ruby 中将字符串切成给定长度的块的最佳方法是什么?
我一直在寻找一种优雅而有效的方法来将 Ruby 中的字符串分成给定长度的子字符串。
到目前为止,我能想到的最好的办法是:
def chunk(string, size)
(0..(string.length-1)/size).map{|i|string[i*size,size]}
end
>> chunk("abcdef",3)
=> ["abc", "def"]
>> chunk("abcde",3)
=> ["abc", "de"]
>> chunk("abc",3)
=> ["abc"]
>> chunk("ab",3)
=> ["ab"]
>> chunk("",3)
=> []
您可能希望 chunk("", n)
返回 [""]
而不是 []
。 如果是这样,只需将其添加为方法的第一行:
return [""] if string.empty?
您会推荐任何更好的解决方案吗?
编辑
感谢 Jeremy Ruten 提供了这个优雅而高效的解决方案:[编辑:效率不高!]
def chunk(string, size)
string.scan(/.{1,#{size}}/)
end
编辑
string.scan 解决方案需要大约 60 秒将 512k 切成 1k 块 10000相比之下,原来基于切片的解决方案只需要 2.4 秒。
I have been looking for an elegant and efficient way to chunk a string into substrings of a given length in Ruby.
So far, the best I could come up with is this:
def chunk(string, size)
(0..(string.length-1)/size).map{|i|string[i*size,size]}
end
>> chunk("abcdef",3)
=> ["abc", "def"]
>> chunk("abcde",3)
=> ["abc", "de"]
>> chunk("abc",3)
=> ["abc"]
>> chunk("ab",3)
=> ["ab"]
>> chunk("",3)
=> []
You might want chunk("", n)
to return [""]
instead of []
. If so, just add this as the first line of the method:
return [""] if string.empty?
Would you recommend any better solution?
Edit
Thanks to Jeremy Ruten for this elegant and efficient solution: [edit: NOT efficient!]
def chunk(string, size)
string.scan(/.{1,#{size}}/)
end
Edit
The string.scan solution takes about 60 seconds to chop 512k into 1k chunks 10000 times, compared with the original slice-based solution which only takes 2.4 seconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
使用
String#scan
:Use
String#scan
:这是另一种方法:
或者,
或者:
Here is another way to do it:
Or,
Either:
倍数,我认为这是最有效的解决方案
如果您知道您的字符串是块大小和部分的
I think this is the most efficient solution if you know your string is a multiple of chunk size
and for parts
我做了一个小测试,将大约 593MB 的数据切成 18991 个 32KB 的块。
在我按下 ctrl+C 之前,您的 slice+map 版本使用 100% CPU 运行了至少 15 分钟。 这个版本使用 String#unpack 在 3.6 秒内完成:
I made a little test that chops about 593MB data into 18991 32KB pieces.
Your slice+map version ran for at least 15 minutes using 100% CPU before I pressed ctrl+C. This version using String#unpack finished in 3.6 seconds:
这是针对稍微不同的情况的另一种解决方案,在处理大字符串时并且不需要一次存储所有块。 通过这种方式,它一次存储单个块,并且执行速度比切片字符串快得多:
Here is another one solution for slightly different case, when processing large strings and there is no need to store all chunks at a time. In this way it stores single chunk at a time and performs much faster than slicing strings:
拒绝是必要的,因为否则它包括组之间的空白。 我的 regex-fu 还不能立即解决这个问题。
The reject is necessary because it otherwise includes the blank space between sets. My regex-fu isn't quite up to seeing how to fix that right off the top of my head.
一个更好的解决方案,考虑到字符串的最后部分可能小于块大小:
A better solution which takes into account the last part of the string which could be less than the chunk size:
您还有其他一些限制吗? 否则我会非常想做一些简单的事情,比如
Are there some other constraints you have in mind? Otherwise I'd be awfully tempted to do something simple like
只需
text.scan(/.{1,4}/m)
即可解决问题Just
text.scan(/.{1,4}/m)
resolves the problem我个人遵循 user8556428 的想法,以避免大多数提案引入的昂贵的中间值,并避免修改输入字符串。 我希望能够将它用作生成器(例如使用 s.each_slice.with_index )。
我的用例实际上是关于字节,而不是字符。 对于字符大小,strscan 是一个很好的解决方案。
I personally followed the idea of user8556428, to avoid the costly intermediate values that most proposals introduce, and to avoid modifying the input string. And I want to be able to use it as a generator (for instance to use
s.each_slice.with_index
).My use case is really about bytes, not characters. In the case of character-size, strscan is a great solution.