无论 ruby 版本如何,分割 utf8 字符串
str = "é-du-Marché"
获取第一个字符?
str.split(//).first
我通过如何获取字符串的其余部分而不考虑我的 ruby 版本来
str = "é-du-Marché"
I get the first char via
str.split(//).first
How I can get the rest of the string regardless of my ruby version ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
String首先没有方法。所以你还需要一个分裂。当您以 unicode 模式(确切地说是 utf-8)进行拆分时,您可以访问第一个(和其他字符)。
我的解决方案:
使用 ruby 1.9.2 进行测试:
使用 ruby 1.8.6 进行测试:
使用
first
和last
你会得到结果:str.split(//u , 2).first
是第一个字符str.split(//u, 2).last
是第一个字符之后的字符串。String does not have a method first. So you need in addition a split. When you do the split in unicode-mode (exactly utf-8) you have acces to the first (and other characters).
My solution:
Test with ruby 1.9.2:
Test with ruby 1.8.6:
With
first
andlast
you get your results:str.split(//u, 2).first
is the first characterstr.split(//u, 2).last
is the string after the first character.str[1..-1]
通常应该返回第一个数字之后的所有内容。第一个数字是起始索引,设置为1以跳过第一个数字,第二个数字是长度,设置为-1,因此ruby从后面
注意:多字节字符仅在 Ruby 1.9 中有效。如果你想向下模仿这种行为,你必须自己循环字节并找出需要从数据中删除的内容,因为 Ruby 1.8 不支持这一点。
更新:
您也可以尝试这个,但我不能保证它适用于每个多字节字符:
mb_chars是一个代理类,它在处理 UTF- 时将调用定向到适当的实现8、字符的UTF-32或UTF-16编码(例如多字节字符)。
更详细的信息可以在这里找到: http://api.rubyonrails.org/classes /ActiveSupport/Multibyte/Chars.html
但我不知道旧的 Rails 版本中是否存在这种情况
UPDATE2:
Ruby 1.8 将任何字符串视为一堆字节,在其上调用 size() 将返回用于存储的字节数数据。要确定字符而不管编码如何,请尝试以下操作:
这应该可以正常完成此操作。尝试查看http://blog.grayproducts.net/articles/bytes_and_characters_in_ruby_18,他解释了如何处理旧版 ruby 中的多字节数据。
EDIT3:
尝试扫描和编辑加入操作让我更接近您的问题&解决方案。老实说,我没有时间让完整的解决方案发挥作用,但如果你使用 scan(/./mu) 选项,你可以将其转换为 utf-8,所有 ruby 版本都支持它。
str[1..-1]
should return you everything after the first digit normally.The first number is the starting index, which is set to 1 to skip the first digit, the second is the length, which is set to -1, so ruby counts from the back
Note: that multibyte characters only work in Ruby 1.9. If you wish to mimic this behavior downwards, you'll have to loop over the bytes yourself and figure out what needs to be removed from the data, cause Ruby 1.8 does not support this.
UPDATE:
You could try this as well, but I can't guarantee that it will work for every multibyte char:
the mb_chars is a proxy class that directs the call to the appropiate implementation when dealing with UTF-8, UTF-32 or UTF-16 encoding of characters (e.g. multibyte chars).
More detailed info can be found here : http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html
But I do not know if this exists in older rails versions
UPDATE2:
Ruby 1.8 treats any string just as a bunch of bytes, calling size() on it will return the amount of bytes that is used to store the data. To determine the characters regardless of the encoding try this:
This should do the trick normally. Try looking at http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 who explains how to treat multibyte data in older ruby versions.
EDIT3:
Playing around with the scan & join operations brings me closer to your problem & solution. I honestly don't have the time at to get the full solution working but if you play with the scan(/./mu) options you convert it to utf-8, which is supported by all ruby versions.