Ruby 1.9 正则表达式编码

发布于 2024-09-30 07:19:43 字数 414 浏览 3 评论 0原文

I am parsing this feed http://www.sixapart.com/labs/update/developers/ with nokogiri and then running some regex on the contents of some tags. The content is UTF-8 mostly, but is occasionally corrupt. However, for my case I don't really care and just need to pass the right parts of the content through, so I'm happy to treat the data as binary/ASCII-8BIT. The problem is that no matter what I do, regexes in my script are treated as either UTF-8 or ASCII. No matter what I set the encoding comment to, or what I do to create the regex.

Is there a solution to this? Can I force the regex to binary? Can I do a gsub without a regex easily? (I am just replacing & with &)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

愚人国度 2024-10-07 07:19:43

您需要对初始字符串进行编码并使用 FIXEDENCODING 选项。

1.9.3-head :018 > r = Regexp.new("chars".force_encoding("binary"), Regexp::FIXEDENCODING)
=> /chars/
1.9.3-head :019 > r.encoding
=> #<Encoding:ASCII-8BIT>

You need to encode the initial string and use the FIXEDENCODING option.

1.9.3-head :018 > r = Regexp.new("chars".force_encoding("binary"), Regexp::FIXEDENCODING)
=> /chars/
1.9.3-head :019 > r.encoding
=> #<Encoding:ASCII-8BIT>
才能让你更想念 2024-10-07 07:19:43

字符串具有编码属性。在应用正则表达式之前尝试使用方法String#force_encoding

UPD:要使您的正则表达式为ascii,请在此处查看已接受的答案: Ruby 1.9:输入编码未知的正则表达式

def get_regex(pattern, encoding='ASCII', options=0)
  Regexp.new(pattern.encode(encoding),options)
end

Strings have a property of encoding. Try to use method String#force_encoding before applying regex.

UPD: To make your regexp be ascii, look on accepted answer here: Ruby 1.9: Regular Expressions with unknown input encoding

def get_regex(pattern, encoding='ASCII', options=0)
  Regexp.new(pattern.encode(encoding),options)
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文