Ruby——寻找某种“Regexp unescape”方法

发布于 2024-08-27 19:33:58 字数 1021 浏览 8 评论 0原文

我有一堆带有特殊转义码的字符串,我想存储未转义的 - 例如,解释器显示

"\\014\"\\000\"\\016smoothing\"\\011mean\"\\022color\ “\\011zero@\\016” 但我希望它显示(检查时)为 "\014\"\000\"\016smoothing\"\011mean\"\022color\"\011zero@\016"

取消转义它们的方法是什么?我想我可以制作一个正则表达式从每个连续的 n 个反斜杠中删除 1 个反斜杠,但我没有很多正则表达式经验,似乎应该有一种“更优雅”的方法来做到这一点,

例如,当我 puts MyString 它显示了我想要的输出,但我不知道如何将其捕获到变量中,

谢谢!

编辑以添加上下文:我有这个类用于编组/恢复一些内容,但是当我时。恢复一些旧字符串时,它会抛出一个类型错误,我确定这是因为它们没有(出于某种无法解释的原因)存储为 base64,而是似乎刚刚被转义,这是我不想要的,因为尝试恢复它们同样会给出 TypeError TypeError:不兼容的元帅文件格式(无法读取) 需要格式版本 4.8;给出 92.48 因为 Marshal 查看字符串的第一个字符来确定格式。

require 'base64'
class MarshaledStuff < ActiveRecord::Base

  validates_presence_of :marshaled_obj

  def contents
    obj = self.marshaled_obj
    return Marshal.restore(Base64.decode64(obj))
  end

  def contents=(newcontents)
    self.marshaled_obj = Base64.encode64(Marshal.dump(newcontents))
  end
end

编辑2:更改措辞——我以为它们是“双重转义”,但实际上只是单一转义。哎呀!

I have a bunch of string with special escape codes that I want to store unescaped- eg, the interpreter shows

"\\014\"\\000\"\\016smoothing\"\\011mean\"\\022color\"\\011zero@\\016"
but I want it to show (when inspected) as
"\014\"\000\"\016smoothing\"\011mean\"\022color\"\011zero@\016"

What's the method to unescape them? I imagine that I could make a regex to remove 1 backslash from every consecutive n backslashes, but I don't have a lot of regex experience and it seems there ought to be a "more elegant" way to do it.

For example, when I puts MyString it displays the output I'd like, but I don't know how I might capture that into a variable.

Thanks!

Edited to add context: I have this class that is being used to marshal / restore some stuff, but when I restore some old strings it spits out a type error which I've determined is because they weren't -- for some inexplicable reason -- stored as base64. They instead appear to have just been escaped, which I don't want, because trying to restore them similarly gives the TypeError
TypeError: incompatible marshal file format (can't be read)
format version 4.8 required; 92.48 given

because Marshal looks at the first characters of the string to determine the format.

require 'base64'
class MarshaledStuff < ActiveRecord::Base

  validates_presence_of :marshaled_obj

  def contents
    obj = self.marshaled_obj
    return Marshal.restore(Base64.decode64(obj))
  end

  def contents=(newcontents)
    self.marshaled_obj = Base64.encode64(Marshal.dump(newcontents))
  end
end

Edit 2: Changed wording -- I was thinking they were "double-escaped" but it was only single-escaped. Whoops!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凉薄对峙 2024-09-03 19:33:58

如果您的字符串在打印时给出了正确的输出,那么它们已经被正确转义了。您看到的额外反斜杠可能是因为您在交互式解释器中显示它们,当您显示变量以使它们不那么模糊时,交互式解释器会为您添加额外的反斜杠。

> x
=> "\\"
> puts x
\
=> nil
> x.length
=> 1

请注意,尽管 x 看起来包含两个反斜杠,但字符串的长度是一。额外的反斜杠是由解释器添加的,实际上并不是字符串的一部分。

如果您仍然认为存在问题,请更具体地说明如何显示问题中提到的字符串。


编辑:在您的示例中,唯一需要转义的是八进制转义码。你可以试试这个:

x = x.gsub(/\\[0-2][0-7]{2}/){ |c| c[1,3].to_i(8).chr }

If your strings give you the correct output when you print them then they are already escaped correctly. The extra backslashes you see are probably because you are displaying them in the interactive interpreter which adds extra backslashes for you when you display variables to make them less ambiguous.

> x
=> "\\"
> puts x
\
=> nil
> x.length
=> 1

Note that even though it looks like x contains two backslashes, the length of the string is one. The extra backslash is added by the interpreter and is not really part of the string.

If you still think there's a problem, please be more specific about how you are displaying the strings that you mentioned in your question.


Edit: In your example the only thing that need unescaping are octal escape codes. You could try this:

x = x.gsub(/\\[0-2][0-7]{2}/){ |c| c[1,3].to_i(8).chr }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文