Ruby 1.9 正则表达式编码
有解决办法吗?我可以强制正则表达式为二进制吗?我可以轻松地在没有正则表达式的情况下执行 gsub 吗? (我只是将 & 替换为 &)
I am parsing this feed http://www.sixapart.com/labs/update/developers/ with nokogiri and then running some regex on the contents of some tags. The content is UTF-8 mostly, but is occasionally corrupt. However, for my case I don't really care and just need to pass the right parts of the content through, so I'm happy to treat the data as binary/ASCII-8BIT. The problem is that no matter what I do, regexes in my script are treated as either UTF-8 or ASCII. No matter what I set the encoding comment to, or what I do to create the regex.
Is there a solution to this? Can I force the regex to binary? Can I do a gsub without a regex easily? (I am just replacing & with &)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要对初始字符串进行编码并使用 FIXEDENCODING 选项。
You need to encode the initial string and use the FIXEDENCODING option.
字符串
具有编码属性。在应用正则表达式之前尝试使用方法String#force_encoding
。UPD:要使您的正则表达式为ascii,请在此处查看已接受的答案: Ruby 1.9:输入编码未知的正则表达式
Strings
have a property of encoding. Try to use methodString#force_encoding
before applying regex.UPD: To make your regexp be ascii, look on accepted answer here: Ruby 1.9: Regular Expressions with unknown input encoding