在 Ruby 中转义和取消转义字符串的最佳方法?

发布于 2024-12-22 20:28:03 字数 248 浏览 3 评论 0原文

Ruby 是否有任何内置方法用于转义字符串和取消转义字符串?过去,我使用过正则表达式;然而,我发现 Ruby 可能一直在内部进行此类转换。也许这个功能暴露在某个地方。

到目前为止我已经想出了这些功能。它们有效,但看起来有点老套:

def escape(s)
  s.inspect[1..-2]
end

def unescape(s)
  eval %Q{"#{s}"}
end

有更好的方法吗?

Does Ruby have any built-in method for escaping and unescaping strings? In the past, I've used regular expressions; however, it occurs to me that Ruby probably does such conversions internally all the time. Perhaps this functionality is exposed somewhere.

So far I've come up with these functions. They work, but they seem a bit hacky:

def escape(s)
  s.inspect[1..-2]
end

def unescape(s)
  eval %Q{"#{s}"}
end

Is there a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

阳光下慵懒的猫 2024-12-29 20:28:03

Ruby 2.5 添加了 String#undump作为 String#dump

$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"

有了它:

def escape(s)
  s.dump[1..-2]
end

def unescape(s)
  "\"#{s}\"".undump
end

$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"

Ruby 2.5 added String#undump as a complement to String#dump:

$ irb
irb(main):001:0> dumped_newline = "\n".dump
=> "\"\\n\""
irb(main):002:0> undumped_newline = dumped_newline.undump
=> "\n"

With it:

def escape(s)
  s.dump[1..-2]
end

def unescape(s)
  "\"#{s}\"".undump
end

$irb
irb(main):001:0> escape("\n \" \\")
=> "\\n \\\" \\\\"
irb(main):002:0> unescape("\\n \\\" \\\\")
=> "\n \" \\"
勿忘心安 2024-12-29 20:28:03

有很多转义方法,其中一些:

# Regexp escapings
>> Regexp.escape('\*?{}.')   
=> \\\*\?\{\}\. 
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"

因此,这实际上取决于您需要解决的问题。但我会避免使用检查来转义。

更新 - 有一个转储,检查使用它,看起来这就是您所需要的:

>> "\n\t".dump
=> "\"\\n\\t\""

There are a bunch of escaping methods, some of them:

# Regexp escapings
>> Regexp.escape('\*?{}.')   
=> \\\*\?\{\}\. 
>> URI.escape("test=100%")
=> "test=100%25"
>> CGI.escape("test=100%")
=> "test%3D100%25"

So, its really depends on the issue you need to solve. But I would avoid using inspect for escaping.

Update - there is a dump, inspect uses that, and it looks like it is what you need:

>> "\n\t".dump
=> "\"\\n\\t\""
是你 2024-12-29 20:28:03

Caleb 函数是我能找到的与 String #inspect 相反的最接近的函数,但它包含两个错误:

  • \\ 未正确处理。
  • \x.. 保留了反斜杠。

我修复了上述错误,这是更新版本:

UNESCAPES = {
    'a' => "\x07", 'b' => "\x08", 't' => "\x09",
    'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
    'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
    "\"" => "\x22", "'" => "\x27"
}

def unescape(str)
  # Escape all the things
  str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
    if $1
      if $1 == '\\' then '\\' else UNESCAPES[$1] end
    elsif $2 # escape \u0000 unicode
      ["#$2".hex].pack('U*')
    elsif $3 # escape \0xff or \xff
      [$3].pack('H2')
    end
  }
end

# To test it
while true
    line = STDIN.gets
    puts unescape(line)
end

Caleb function was the nearest thing to the reverse of String #inspect I was able to find, however it contained two bugs:

  • \\ was not handled correctly.
  • \x.. retained the backslash.

I fixed the above bugs and this is the updated version:

UNESCAPES = {
    'a' => "\x07", 'b' => "\x08", 't' => "\x09",
    'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
    'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
    "\"" => "\x22", "'" => "\x27"
}

def unescape(str)
  # Escape all the things
  str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
    if $1
      if $1 == '\\' then '\\' else UNESCAPES[$1] end
    elsif $2 # escape \u0000 unicode
      ["#$2".hex].pack('U*')
    elsif $3 # escape \0xff or \xff
      [$3].pack('H2')
    end
  }
end

# To test it
while true
    line = STDIN.gets
    puts unescape(line)
end
娇俏 2024-12-29 20:28:03

更新:我不再同意我自己的答案,但我不想删除它,因为我怀疑其他人可能会走上这条错误的道路,而且这个答案已经有很多讨论了它是替代方案,所以我认为它仍然有助于对话,但请不要在实际代码中使用这个答案。

如果您不想使用 eval,但愿意使用 YAML 模块,则可以使用它:

require 'yaml'

def unescape(s)
  YAML.load(%Q(---\n"#{s}"\n))
end

YAML 的优势与 eval 相比,它可能更安全。 cane 不允许使用 eval。我看到了使用 $SAFEeval 的建议,但目前无法通过 JRuby 实现。

就其价值而言,Python 确实对 转义反斜杠

Update: I no longer agree with my own answer, but I'd prefer not to delete it since I suspect that others may go down this wrong path, and there's already been a lot of discussion of this answer and it's alternatives, so I think it still contributes to the conversation, but please don't use this answer in real code.

If you don't want to use eval, but are willing to use the YAML module, you can use it instead:

require 'yaml'

def unescape(s)
  YAML.load(%Q(---\n"#{s}"\n))
end

The advantage to YAML over eval is that it is presumably safer. cane disallows all usage of eval. I've seen recommendations to use $SAFE along with eval, but that is not available via JRuby currently.

For what it is worth, Python does have native support for unescaping backslashes.

扛起拖把扫天下 2024-12-29 20:28:03

Ruby 的 inspect 可以提供帮助:

    "a\nb".inspect
=> "\"a\\nb\""

通常,如果我们打印带有嵌入换行符的字符串,我们会得到:

puts "a\nb"
a
b

如果我们打印检查的版本:

puts "a\nb".inspect
"a\nb"

将检查的版本分配给变量,您将得到字符串的转义版本。

要撤消转义,请eval 字符串:

puts eval("a\nb".inspect)
a
b

我不太喜欢这样做。这更多的是一种好奇心,而不是我在实践中会做的事情。

Ruby's inspect can help:

    "a\nb".inspect
=> "\"a\\nb\""

Normally if we print a string with an embedded line-feed, we'd get:

puts "a\nb"
a
b

If we print the inspected version:

puts "a\nb".inspect
"a\nb"

Assign the inspected version to a variable and you'll have the escaped version of the string.

To undo the escaping, eval the string:

puts eval("a\nb".inspect)
a
b

I don't really like doing it this way. It's more of a curiosity than something I'd do in practice.

执笔绘流年 2024-12-29 20:28:03

YAML 的 ::unescape 似乎没有转义引号字符,例如 '"。我猜这是设计使然,但它让我难过。

你绝对不想对任意或客户端提供的数据使用eval

这就是我所使用的。

UNESCAPES = {
    'a' => "\x07", 'b' => "\x08", 't' => "\x09",
    'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
    'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
    "\"" => "\x22", "'" => "\x27"
}

def unescape(str)
  # Escape all the things
  str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
    if $1
      if $1 == '\\' then '\\' else UNESCAPES[$1] end
    elsif $2 # escape \u0000 unicode
      ["#$2".hex].pack('U*')
    elsif $3 # escape \0xff or \xff
      [$3].pack('H2')
    end
  }
end

YAML's ::unescape doesn't seem to escape quote characters, e.g. ' and ". I'm guessing this is by design, but it makes me sad.

You definitely do not want to use eval on arbitrary or client-supplied data.

This is what I use. Handles everything I've seen and doesn't introduce any dependencies.

UNESCAPES = {
    'a' => "\x07", 'b' => "\x08", 't' => "\x09",
    'n' => "\x0a", 'v' => "\x0b", 'f' => "\x0c",
    'r' => "\x0d", 'e' => "\x1b", "\\\\" => "\x5c",
    "\"" => "\x22", "'" => "\x27"
}

def unescape(str)
  # Escape all the things
  str.gsub(/\\(?:([#{UNESCAPES.keys.join}])|u([\da-fA-F]{4}))|\\0?x([\da-fA-F]{2})/) {
    if $1
      if $1 == '\\' then '\\' else UNESCAPES[$1] end
    elsif $2 # escape \u0000 unicode
      ["#$2".hex].pack('U*')
    elsif $3 # escape \0xff or \xff
      [$3].pack('H2')
    end
  }
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文