Ruby 的字符串:转义和取消转义自定义字符

发布于 2024-12-12 17:20:16 字数 2205 浏览 0 评论 0原文

假设我说 £ 字符是危险的,并且我希望能够保护和取消保护任何字符串。反之亦然。

示例 1:

"Foobar £ foobar foobar foobar."  # => dangerous string
"Foobar \£ foobar foobar foobar." # => protected string

示例 2:

"Foobar £ foobar £££££££foobar foobar."         # => dangerous string
"Foobar \£ foobar \£\£\£\£\£\£\£foobar foobar." # => protected string

示例 3:

"Foobar \£ foobar \\£££££££foobar foobar."        # => dangerous string
"Foobar \£ foobar \\\£\£\£\£\£\£\£foobar foobar." # => protected string

使用 Ruby,是否有一种简单的方法可以从字符串中转义(和取消转义)给定字符(例如我的示例中的 £)?

编辑:这是有关此问题行为的说明。

首先,感谢您的回答。我有一个 Rails 应用程序,其中的 Tweet 模型具有 content 字段。推文示例:

tweet = Tweet.create(content: "Hello @bob")

在模型内部,有一个序列化过程,可以像这样转换字符串:

dump('Hello @bob') # => '["Hello £", 42]'
                   # ... where 42 is the id of bob username

然后,我可以像这样反序列化并显示其推文:

load('["Hello £", 42]') # => 'Hello @bob'

同样,也可以使用多个用户名来执行此操作:

dump('Hello @bob and @joe!')        # => '["Hello £ and £!", 42, 185]'
load('["Hello £ and £!", 42, 185]') # => 'Hello @bob and @joe!'

这就是目标 :)

但是这种查找和替换可能很难用类似的东西来执行:

tweet = Tweet.create(content: "£ Hello @bob")

因为这里我们还必须转义 £ 字符。我认为你的解决方案对此很有好处。所以结果就变成了:

dump('£ Hello @bob')       # => '["\£ Hello £", 42]'
load('["\£ Hello £", 42]') # => '£ Hello @bob'

完美。 <3 <3

现在,如果有这样的情况:

tweet = Tweet.create(content: "\£ Hello @bob")

我认为我们首先应该转义每个 \,然后转义每个 £,例如:

dump('\£ Hello @bob')       # => '["\\£ Hello £", 42]'
load('["\\£ Hello £", 42]') # => '£ Hello @bob'

但是...怎么可以我们在这种情况下这样做:

tweet = Tweet.create(content: "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\£ Hello @bob")

...where tweet.content.gsub(/(? 似乎不起作用。

Suppose I said £ character as dangerous, and I want to be able to protect and to unprotect any string. And vice versa.

Example 1:

"Foobar £ foobar foobar foobar."  # => dangerous string
"Foobar \£ foobar foobar foobar." # => protected string

Example 2:

"Foobar £ foobar £££££££foobar foobar."         # => dangerous string
"Foobar \£ foobar \£\£\£\£\£\£\£foobar foobar." # => protected string

Example 3:

"Foobar \£ foobar \\£££££££foobar foobar."        # => dangerous string
"Foobar \£ foobar \\\£\£\£\£\£\£\£foobar foobar." # => protected string

Is there an easy way, with Ruby, to escape (and unescape) a given character (such as £ in my example) from a string?

Edit: here is an explication about the behavior of this question.

First of all, thanks for your answers. I have a Rails app with a Tweet model having a content field. Example of tweet:

tweet = Tweet.create(content: "Hello @bob")

Inside the model, there's a serialization process that converte the string like this:

dump('Hello @bob') # => '["Hello £", 42]'
                   # ... where 42 is the id of bob username

Then, I'm able to deserialize and display its tweet like this:

load('["Hello £", 42]') # => 'Hello @bob'

In the same way, it's also possible to do so with more than one username:

dump('Hello @bob and @joe!')        # => '["Hello £ and £!", 42, 185]'
load('["Hello £ and £!", 42, 185]') # => 'Hello @bob and @joe!'

That's the goal :)

But this find-and-replace could be hard to perform with something like:

tweet = Tweet.create(content: "£ Hello @bob")

'cause here we also have to escape £ char. And I think your solution is good for this. So the result become:

dump('£ Hello @bob')       # => '["\£ Hello £", 42]'
load('["\£ Hello £", 42]') # => '£ Hello @bob'

Just perfect. <3 <3

Now, if there is this:

tweet = Tweet.create(content: "\£ Hello @bob")

I think we first should escape every \, and then escape every £, like:

dump('\£ Hello @bob')       # => '["\\£ Hello £", 42]'
load('["\\£ Hello £", 42]') # => '£ Hello @bob'

However... how can we do in this case:

tweet = Tweet.create(content: "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\£ Hello @bob")

...where tweet.content.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\") seems not working.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

三生殊途 2024-12-19 17:20:16

希望您的 ruby​​ 版本支持lookbehinds。如果不是这样,我的解决方案将不适用于您。

转义字符:

str = str.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\")

非转义字符:

str = str.gsub(/(?<!\\)((?:\\\\)*)\\£/, "\1£")

无论反斜杠的数量如何,两个正则表达式都将起作用。他们是相辅相成的。

转义解释:

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   £           # Match the character “£” literally
)
"

并不是说我匹配某个位置。根本不消耗任何文本。当我精确定位我想要的位置时,我插入一个\。

unescape 的解释:

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(           # Match the regular expression below and capture its match into backreference number 1
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\\          # Match the character “\” literally
£           # Match the character “£” literally
"

这里我保存所有反斜杠减一,并用特殊字符替换这个数量的反斜杠。棘手的事情:)

Hopefully your version of ruby supports lookbehinds. If it doesn't my solution will not work for you.

Escape characters :

str = str.gsub(/(?<!\\)(?=(?:\\\\)*£)/, "\\")

Un-escape characters :

str = str.gsub(/(?<!\\)((?:\\\\)*)\\£/, "\1£")

Both regexes will work regardless of the amount of backslashes. They are complementing each other.

Escape explanation :

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   £           # Match the character “£” literally
)
"

Not that I am matching a certain position. No text is consumed at all. When I pinpoint the position I want I insert a \.

Explanation of unescape :

"
(?<!        # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   \\          # Match the character “\” literally
)
(           # Match the regular expression below and capture its match into backreference number 1
   (?:         # Match the regular expression below
      \\          # Match the character “\” literally
      \\          # Match the character “\” literally
   )*          # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\\          # Match the character “\” literally
£           # Match the character “£” literally
"

Here I am saving all the backslashes minus one and and I replace this number of backslashes with the special character. Tricky stuff :)

坚持沉默 2024-12-19 17:20:16

如果您使用的是具有lookbehind功能的Ruby 1.9,那么FailedDev的答案应该可以很好地工作。如果您使用的是 Ruby 1.8,它没有lookbehind(我认为),则可能会使用不同的方法。尝试一下:

text.gsub!(/(\\.)|£)/m) do
    if ($1 != nil)  # If escaped anything
        "$1"        # replace with self.
    else            # Otherwise escape the
        "\\£"       # unescaped £.
    end
end

请注意,我不是 Ruby 程序员,并且此代码段未经测试(特别是我不确定: if ($1 != nil) 语句的用法是否正确 - 它可能需要是: if ($1 != "")if ($1)),但我确实知道这种通用技术(使用代码代替简单的替换)字符串)有效。我最近在我对类似问题的 JavaScript 解决方案中使用了相同的技术它正在寻找未转义的星号。

If you are using Ruby 1.9, which has lookbehind, then FailedDev's answer should work quite well. If you are using Ruby 1.8, which does not have lookbehind (I think), a different approach may work. Give this a try:

text.gsub!(/(\\.)|£)/m) do
    if ($1 != nil)  # If escaped anything
        "$1"        # replace with self.
    else            # Otherwise escape the
        "\\£"       # unescaped £.
    end
end

Note that I am not a Ruby programmer and this snippet is untested (in particular I'm not sure if the: if ($1 != nil) statement usage is correct - it may need to be: if ($1 != "") or if ($1)), but I do know that this general technique (using code in place of a simple replacement string) works. I recently used this same technique for my JavaScript solution to a similar question which was looking to find unescaped asterisks.

你是年少的欢喜 2024-12-19 17:20:16

我不确定这是否是您想要的,但我认为您可以执行简单的查找和替换:

str = str.gsub("£", "\\£") # to escape
str = str.gsub("\\£", "£") # to unescape

请注意,我将 \ 更改为 \\ 因为您必须转义双引号字符串中的反斜杠。


编辑:我认为你想要的是一个匹配奇数个反斜杠的正则表达式:

str = str.gsub(/(^|[^\\])((?:\\\\)*)\\£/, "\\1\\2£")

它会执行以下转换

"£"       #=> "£"
"\\£"     #=> "£"
"\\\\£"   #=> "\\\\£"
"\\\\\\£" #=> "\\\\£"

I'm not sure if this is what you want, but I think you can do a simple find-and-replace:

str = str.gsub("£", "\\£") # to escape
str = str.gsub("\\£", "£") # to unescape

Note that I changed \ to \\ because you have to escape the backslash in a double-quoted string.


Edit: I think what you want is a regex that matches an odd number of backslashes:

str = str.gsub(/(^|[^\\])((?:\\\\)*)\\£/, "\\1\\2£")

That does the following transformations

"£"       #=> "£"
"\\£"     #=> "£"
"\\\\£"   #=> "\\\\£"
"\\\\\\£" #=> "\\\\£"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文